next up previous contents index
Next: Unification of AAMs and Up: Active Appearance Models Previous: Discussion of Basic AAMs

Subsections

   
Extensions of the Basic AAM

Overview

Though the previous chapters have provided a description and discussion of AAMs, extensions can still be done of the basic AAM. The major contribution of this thesis is - aside from the discussion and documentation of AAMs - represented by the ideas and work described in this and the following chapter. Other extensions to the basic AAM are the work of Cootes et al. [22,13,14], Walker et al. [71], Wolstenholme et al. [72], Mitchell et al. [52]. Some of these extensions will be surveyed in chapter 17.

Enhanced Shape Representation

Basic AAMs using the piece-wise affine warping, relies on the  Delaunay triangulation of the shape points. This results in a triangular mesh covering the  convex hull of the point set. For  concave shapes this might not be the optimal solution. For example there could be substantial texture noise in the area outside the shape - but still inside the convex hull - thus leading to a less compact model. To avoid this we suggest removing the triangles outside the shape. This is trivial to accomplish. Simply traverse the triangles; test if the triangle centroid should be outside the shape polygon. If so, remove the triangle. See figure 9.1 for a pictorial of the problem. To test if a point is inside the shape polygon several approaches could be taken. The current implementation utilizes the geometrical fact that, if a line from the point, p, to infinity crosses the polygon an odd number of times, then the point p is inside the polygon.

  
Figure: Removal of unwanted triangles resulting from the Delaunay triangulation of concave shapes.
\begin{figure}
\begin{center}
\mbox{
\psfig{figure=figs/concav.eps, width=7cm}
}
\end{center}
\end{figure}


  
Figure: (a) Concave shape with convex triangles. (b) Concave shape with convex triangles removed.
\begin{figure}
\begin{center}
\mbox{
\subfigure[]{\psfig{figure=figs/meat...
...{figure=figs/meat_normal.eps,width=55mm} }
}
\end{center}
\end{figure}

Though the above method adds flexibility to the choice of the AAM shape structure, one can still think of problems where even greater shape flexibility is required. Objects can contain areas where the texture variation might be considered noise. One might want to exclude such areas due to arguments similar to the above given. Another situation is that of having several structured objects, but in between those, the texture is highly unstructured. Features to accommodate such situations are implemented in the current AAM framework. Shapes are defined in terms of paths, which is a subset of the shape points with a given point connectivity. Each path can have a combination the following properties:
This provides the builder of an AAM with a high degree of freedom, resulting in a more optimal AAM representation of the given problem. For further details of the representation, refer to appendix G.

   
Increasing Texture Specificity

While the enhanced shape representation of the previous section gave great flexibility it also increased the risk of what we coin the shrinking problem.  During matching of objects with a relative  homogeneous surface - such as the metacarpals - matches sometimes tends to lie inside the real object as illustrated on figure 9.3. This is due to the fact that the AAM evaluates the fit on the object texture only.

  
Figure: The shrinking problem.
\begin{figure}\begin{center}
\mbox{
\psfig{figure=figs/shrink.eps, width=6cm}
}
\end{center}
\end{figure}

To avoid this shortcoming we suggest including some neighboring region of the object. This will usually lead to more texture contrast in the model, since objects (often) are identified as something that stands out from its surroundings.
  
Figure: Shape neighborhood added using an artificial border placed along the normals.
\begin{figure}
\begin{center}
\mbox{
\psfig{figure=figs/artbord.eps, width=45mm}
}
\end{center}
\end{figure}

This neighborhood adding must be done carefully to preserve compactness of the AAM. Clearly, the texture PCA will suffer since the goal is to add new information, namely background pixels, of which one could expect a substantially higher degree of variation than across the object. However, regarding the shape PCA, points can be added without adding new shape information, thus retaining the eigenvalue distribution of the shape PCA. This is accomplished by generating new shape points correlated with the old ones, by adding new points on the normals of the original points in a distance proportional to the relative size of the shape. See figure 9.4.


In pseudo-code this is:
1.
Choose an artificial point distance, dmean = n pixels, relative
to the mean shape.
2.
Calculate the mean size of all n shapes, $s_{mean} = \frac{1}{n}\sum_{i=1}^n S(\mathbf{x}_i)$.
3.
For each shape $[1 \dots n]$
4.
For each outer path
5.
For each model point
6.
Calculate the normal of the point
7.
$d = d_{mean}\frac{S(\mathbf{x}_i)}{s_{mean}}$
8.
Add an artificial point d pixels along the point normal
9.
End
10.
End
11.
End




  
Figure: (a) Shape annotated using 150 landmarks. (b) Shape with a neighborhood region added resulting in $2\times 150=300$ landmarks.
\begin{figure}
\begin{center}
\mbox{
\subfigure[]{\psfig{figure=figs/meta...
...e=figs/meta_neighborhood.eps,width=45mm} }
}
\end{center}
\end{figure}

Refer to the experimental section for results on using this rationale. The metacarpals in figure 9.5 serves a good example of a shape with a relative homogeneous surface. By adding neighborhood the texture in 9.5 (b) is substantially more specific than the shape without, figure 9.5 (a).

   
Border AAMs

As earlier stated the AAM approach is a direct extension of the  Active Shape Models [15]. A major force of ASMs is the handling of objects with  heterogeneity inside shapes, in the meaning of inside areas with high texture variation that is not possible to capture by annotation. Since ASMs only model the texture variation in a local neighborhood around the landmarks, the heterogeneity issue never arises. To incorporate this force of ASMs into AAMs, the enhanced shape representation has been used to provide the current implementation with the feature of Border AAMs - i.e. AAMs that only capture texture information on the border of the object in question. Border AAMs can be thought of as ASMs with very  closely spaced landmarks.

  
Figure: ASM-like AAM generated by adding shape neighborhood and a hole.
\begin{figure}
\begin{center}
\mbox{
\psfig{figure=figs/border.eps, width=45mm}
}
\end{center}
\end{figure}

A Border AAM is achieved by adding an interior path, which defines a hole and by adding an outer path as described in the previous section. A schematic example on a  Border AAM is given figure 9.6.

  
Figure: (a) Shape annotated using 83 landmarks. (b) Border shape with $3\times 83=249$ landmarks.
\begin{figure}\begin{center}
\mbox{
\subfigure[]{\psfig{figure=figs/meat_no...
...{figure=figs/meat_border.eps,width=50mm} }
}
\end{center}
\end{figure}

By using this rationale AAMs can be made insensitive to  large-scale texture noise inside the shapes, which otherwise would lead to a poor texture fit and a low landmark accuracy. The pork carcasses of figure 9.7 constitutes a good example of this situation, due to the  heterogeneity of the structure of  fat and  meat from one training example to another. To conclude this section, we stress that Border AAMs also should be substantially faster than basic AAMs, since only a fraction of the original pixels is considered.

Constrained AAM Search

This section is merely a documentation of considerations for the AAM search, rather than an actual extension. However, this is included here because no accurate description exists to our knowledge. As described in section 7.2 the optimization is accomplished using a combination of pose and model parameter predictions. As seen earlier these predictions are only valid within some range. Consequently, it would make good sense to constrain the predictions within these ranges. For pose parameters we define the bounds as being a function of the training range $[-\alpha;\alpha]$:

\begin{displaymath}\mathrm{pose \ parameter \ bound:} \quad [ -\kappa \alpha;\kappa \alpha ]
\end{displaymath} (9.1)

For the model parameters we assume that these are independently normal distributed over the training set, thus making the bound proportional to the standard deviation over the training set - i.e. $\sqrt{\lambda_i}$. The bounds for the ith model parameter is then:

\begin{displaymath}\mathrm{model \ parameter \ bound:} \quad [ - \nu \sqrt{\lambda_i};\nu \sqrt{\lambda_i} ]
\end{displaymath} (9.2)

The current implementation uses $\kappa = 2$ and $\nu = 3$. Another - and theoretically better - approach could be to calculate the Mahalanobis distance of the model parameters and do a test in the  $\chi^2$-distribution. If this would fail, the model parameters are then projected onto the closest point of the hyper ellipsoid constituted by the Mahalanobis distance. Refer to the later section on Performance Assessment for a further treatment.

   
Initialization

The basic AAM optimization scheme is inherently dependent on good initialization. To accommodate this, we devise the following search-based scheme thus making the use of AAMs fully automated. The technique is inspired by the work of Cootes et al. [22] who uses a pixel difference evaluation criteria and a threshold estimation for detecting multiple object instances. The fact that the AAMs are self-contained is exploited in the initialization - i.e. they can fully synthesize (near) photo-realistic objects of the class that they represent concerning shape and textural appearance. Hence, the model, without any additional data, can be used to perform the initialization. The idea is to exploit an inherent property of the AAM-optimization - i.e. convergence within some range from the optimum. This is utilized to narrow down an exhaustive search from a dense to a sparse population of the hyperspace spanned by pose- and c-parameters. In other words, normal AAM-optimizations are performed sparsely over the image using perturbations of the pose and model parameters. This has proven to be both feasible and robust. A set of relevant search configuration ranges is established and the sampling within this set is done as sparsely as possible. Consider the graph given in figure 7.1, which demonstrates that it should be safe to sample the y-parameter with a frequency of at least 10 pixels. One could also claim that as long as the prediction preserves the right sign it is only a matter of a sufficient number of iterations. To evaluate the fit, the normal similarity measure of AAMs is used:

 \begin{displaymath}E_{fit} = \vert\delta\mathbf{g}\vert^2 = \sum (g_{i,model} - g_{i,image} )^2
\end{displaymath} (9.3)

As in the normal optimization, this could easily be improved by using more elaborate error measures. The crucial add-on to this algorithm is somewhat inspired from the class of  Genetic Algorithms9.1 (GA) which is based on the natural selection of the  Darwinian theory. The total set of search configurations constitutes the initial population. From this we let the n fittest survive. These are then reproduced into more evolved guesses. From these the best is drawn and deemed the initial configuration. This is sometimes referred to as a  multiple hypotheses technique. In pseudo-code, the initialization scheme for detecting one object per image is:

1.
Set m to a suitable low number (we use m=3)
2.
Allocate room for a candidate set, $\{ \mathbf{c} \}$, containing n result entries
3.
Obtain application specific search ranges within each parameter (e.g. $-\sigma_1 \leq c_1 \leq \sigma_1, \ x_{min} \leq x \leq x_{max}$, etc.)
4.
Populate the space spanned by the ranges - as sparsely as the linear regression allows - by a set of search configurations $\mathbf{V} = \{ \mathbf{v}_1, \dots ,\mathbf{v}_n \} $.
5.
For each vector in V
6.
Do AAM optimization (max m iterations)
7.
Calculate the fit, $E_{fit} = \vert\delta\mathbf{g}\vert^2$
8.
If $E_{fit} < \underset{E_{fit}}{\max}\ \{ \mathbf{c} \}$ add (Vi, Efit) and remove the element $\underset{E_{fit}}{\max}\ \{ \mathbf{c} \}$
9.
End
10.
For each element in $\{ \mathbf{c} \}$
11.
Do AAM optimization (max k iterations, k>m)
12.
Calculate the fit, $E_{fit} = \vert\delta\mathbf{g}\vert^2$
13.
Update Efit of the current entry
14.
End



The element in $\{ \mathbf{c} \}$ with the minimum Efit will now hold the initial configuration. Notice that the application specific search ranges in step 3 is merely a help to increase initialization speed and robustness than it is a requirement. If nothing is known beforehand, step 3 is eliminated and an  exhaustive search is performed. This scheme is readily extended into more than one object per image by a clustering of the candidate set using overlap tests. The approach in general can be accelerated substantially by searching in a multi-resolution (pyramidal) representation of the image. Another example on search-based  initialization of deformable models is [45]. Also [27,29,28] featured search-based initialization where multiple  rigid template matching using the  convolution theorem9.2 for a  FFT formulation was utilized.

   
Fine-tuning the Model Fit

Though the AAM search provides a fast way of optimizing the AAM using prior knowledge this might not lead to the optimal solution, primarily due to weakness in the assumption that the optimization problems in an AAM search is strictly similar to those observed in a training set. Therefore, it is suggested to fine-tune the model fit by using a general-purpose optimization method. This approach is feasible since it is reasonable to assume that we are close to the optimum after the traditional AAM search. Hence the optimization fine-tuning should be possible in an reasonable amount of time. Though one is not guaranteed that the hyperspace is totally smooth at the position where the AAM search has converged, it is still more probable that we are near a well-behaved  manifold in the hyperspace.

The considered optimization methods are:
         


Another noteworthy random-sampling technique is the Genetic algorithms which has proven successful and efficient in applications of other deformable models (e.g. [39]). However, this not considered in the following. Other popular optimization methods used in deformable models include  Marquardt-Levenberg as used in the Active Blobs framework [58] as a supplemental to difference decomposition [34]. Conjugate Gradient and BFGS are well known optimization methods (see e.g. [19]), which should have better properties than Steepest Descent. Pattern Search is a non-gradient based optimization algorithm, which has received increasing interest in the optimization community. Finally Simulated Annealing borrows ideas from a physical procedure called annealing where a substance is melted and then slowly cooled down in search of a low energy configuration. In a similar manner, probabilistic optimization is performed with a decreasing temperature that determines how greedy the procedure is in the search for a global minimum. However, since a detailed discussion of optimization methods is somewhat outside the scope of this thesis, the reader is referred to [19,31,61].
To assess the effect of such fine-tuning an AAM has been built upon 20 metacarpal images (238$\times$272$\times$8bit) where metacarpals 2, 3 & 4 were annotated using 50 points each. A model neighborhood of four mean shape pixels was added. The texture model consisted of approx. 13.000 pixels. Here after the model has been used to search four unseen images with a ground truth annotation using the automatic initialization approach devised previously. Results reported in table 9.7 are the mean of the four experiments. For a definition of pt. to crv., refer to the later section on Performance Assessment.
 
Table: Mean fit results using general-purpose optimization methods for fine-tuning.
Type Pt.-Crv. Pixel error  
  (pixels) (intensity)  
None 0.92 5.7  
BGFS 0.89 5.5  
SD 0.88 5.5  
CG 0.87 5.5  
PS 0.87 5.3  
SA 0.89 5.4  
 

This investigation is not conclusive. Nevertheless, the presented results indicate clearly that a fine-tuning of the AAM fit can be accomplished successfully using general-purpose optimization techniques. In the current case the simple method Pattern Search yielded the best results, reaching a 5 % improvement over the point accuracy and 7 % over the pixel error. However, since the performance of these methods is quite sensitive to configuration parameters - i.e. step sizes, standard deviations,  stop criteria - we stress that this might not be generally true. In agreement with this, the random-sampling optimization technique Simulated Annealing is chosen in the experimental chapters over the better Pattern Search in the example. This decision is motivated by the observation that our objective function - i.e. $\vert\delta \mathbf{g}\vert^2$ - most likely is non-convex. Thus deterministic optimization techniques has a high risk of getting caught in spurious minimas, whereas random-sampling techniques are more likely to escape.

         
Robust Similarity Measures

                      As mentioned earlier the basic AAM optimization is driven by texture differences. More accurately the square length of the normalized texture difference vector, $\vert\delta \mathbf{g}\vert^2$ is used. The measure with which the optimization evaluates itself with, is here forth named the similarity measure. This constitutes the essence of what one want: evaluate how similar the model is to the image.   However, the term similar is inherently vague in a mathematical sense. The term $\vert\delta \mathbf{g}\vert^2$ is thus only one interpretation of what similar actually means. In this section, we will dwell on other interpretations of the term similar. Namely, the one that mimics the human ability of compensating for small numbers of  gross errors, and thus achieving robustness in recognition. These are called robust similarity measures and stems from an increasingly popular statistical discipline in vision named  robust statistics, where the term robust refer to the insensitivity to outliers. The notation and treatment below is somewhat based on the work of Black and Rangarajan [3]. Cootes et al. [22] has previously extended the basic AAM with learning-based matching to obtain robustness. This is achieved using a threshold for each element in $\delta\mathbf{g}$ estimated from the training set. We suggest using robust similarity measures. To formalize the model fitting problem, a set of parameters, $\mathbf{c} = [ c_1, \dots , c_p ]^{\mathrm{T}}$, are adjusted to fit a set of measurements (e.g. an image), $\mathbf{g} = [ g_1, \dots, g_m]^{\mathrm{T}}$. This is done by a minimization of the residuals:

\begin{displaymath}E = \sum_{i=1}^m \rho( g_i - u( i, \mathbf{c} ), \sigma_s ) = \sum_{i=1}^m \rho( e_i, \sigma_s )
\end{displaymath} (9.4)

where u is a function that returns the model reconstruction of the ith measurement and $\sigma_s$ is the scale parameter that determines what should be deemed outliers. The $\rho$-function determines the weighting of the residuals, and is also called the  error norm. The most common error norm is  least squares or the  quadratic norm:

\begin{displaymath}\rho( e_i ) = e_i^2
\end{displaymath} (9.5)

This is often referred to as the L2 norm. Basic AAMs uses the quadratic norm (or simply the 2-norm) without any normalization:

\begin{displaymath}E = \sum_{i=1}^m (g_{model} - g_{image})^2 = \sum_{i=1}^m \delta (g_i)^2 = \vert\delta \mathbf{g}\vert^2
\end{displaymath} (9.6)

It is however easily seen, that the quadratic norm is notoriously sensitive to outliers, since these will contribute highly to the overall solution due the rapid growth of the x2 function. To quote [3] pp. 62:
"... an estimator must be more forgiving about outlying measurements ..."
A straightforward approach is to put an upper bound on the accepted residual. This is called the  truncated quadratic norm:
$\displaystyle \rho(e_i,\sigma_s) =
\left\{ \begin{array}{ccl}
e_i^2 & & e_i \le \sqrt{\sigma_s} \\
\sigma_s & & e_i > \sqrt{\sigma_s}
\end{array}\right.$     (9.7)

Another approach is to make the residual growth linear above a certain threshold. This results in the so-called  Huber's minimax estimator:
$\displaystyle \rho(e_i,\sigma_s) =
\left\{ \begin{array}{lll}
\frac{e_i^2}{2\si...
...} & & e_i \le \sigma_s \\
\vert e_i\vert & & e_i > \sigma_s
\end{array}\right.$     (9.8)

Another smooth norm - which falls off faster than the Huber norm - is the  Lorentzian estimator as used by Sclaroff and Pentland in the Active Blob framework [58]9.3:

\begin{displaymath}\rho(e_i,\sigma_s) = \log(1 + \frac{e_i^2}{2\sigma_s^2})
\end{displaymath} (9.9)

Finally we suggest to use a similarity measure with closed connection to the Mahalanobis distance, since it takes the variance of the training set into account and thus makes the similarity measure less sensitive to large residuals in areas of high variance (as observed in the training set). Instead of using the dependent Mahalanobis distance:

 \begin{displaymath}\rho(e_i) = (\mathbf{g}-\overline{\mathbf{g}})^{\mathrm{T}}\Sigma^{-1}(\mathbf{g}-\overline{\mathbf{g}})
\end{displaymath} (9.10)

where $\Sigma$ is the texture difference covariance matrix, we are compelled to assume independence9.4 between pixels, due to the length of the pixel vectors which makes the computation of (9.10) infeasible. This reaches a form similar to the  Mahalanobis distance norm of:

\begin{displaymath}\rho(e_i) = \frac{e_i^2}{\sigma_i^2}
\end{displaymath} (9.11)

where $\sigma_i^2$ is the maximum likelihood estimate of the variance of the ith pixel. Notice however, that this is not the Mahalanobis distance since $\sigma_i^2$ should be the variance of the difference to be so. Of the above similarity measures the Lorentzian and the ''Mahalanobis'' distance have been integrated into the basic AAM to supplement the quadratic norm of Cootes et al. Thereby robustness to outliers in the unseen image has been obtained. An example of this could be to detect the complete bone structure of human hands in radiographs. Since radiographs are 2D projections of density, people wearing finger rings will have high-white outliers on one or more  phalanges. Other examples include highlights in perspective images and absence of interior parts (for example due to partial  occlusion). However, one should notice that even though the AAM search evaluate its predictions using a robust measure, the predictions themselves are done using the pixel differences. To address this problem Cootes et al. [22] performs a thresholding of the texture vector before the prediction. This could be viewed upon as a robust preprocessment. The threshold limit is estimated from the training set. Consequently, to evaluate the proposed similarity measures the fine-tuning optimization was included in all experiments.

  
Figure: Example of AAM search and Simulated Annealing fine-tuning, without (left) and with (right) the use of a robust similarity measure (Lorentzian error norm). Landmark error decreased from 7.0 to 2.4 pixels (pt.-to-crv. error).
\begin{figure}
\begin{center}
\psfig{figure=figs/robresult.eps,width=10cm}
\end{center}
\end{figure}

To show the effect of robust error norm usage, an AAM search with fine-tuning using Simulated Annealing has been done with and without a robust similarity measure. In the case given in figure 9.8 the Lorentzian error norm was used. To simulate outliers the radiograph searched in has been manipulated so that it appears as if the metacarpal is wearing a finger ring.9.5 While not perfect, the robust AAM provides a substantially better fit compared to that of the basic AAM. Through preliminary experiments, it was observed that the Lorentzian error norm performed best. Hence, this is the only norm used in the experimental chapter. For a further treatment of  robust error norms and  line processes in vision refer to [3].

Summary

In this chapter several ideas for extending the basic AAM have been proposed and implemented. Each one addressing potential problems in certain cases of the basic AAM. Preliminary experiments have been conducted to 1) demonstrate the effect of these and 2) to form a basis for the selection of extensions in the experimental part.
next up previous contents index
Next: Unification of AAMs and Up: Active Appearance Models Previous: Discussion of Basic AAMs
Active Appearance Models (be)
2000-09-20