next up previous
Next: Model to Image Fit Up: On Properties of Active Previous: Introduction

Subsections

Active Shape Models

The Active Shape Model represents a parametric deformable model where a statistical model of the global shape variation from a training set is built. This model, called the point distribution model (PDM), is then used to fit a model (or template) to unseen occurrences of the object earlier annotated in the training set. We will briefly describe the construction of the PDM using a principal component analysis (PCA). For a more detailed description of ASMs refer to [6,15]. The shape itself is represented as an n-point polygon in images coordinates:

X = ( x1, y1, ... , xn-1, yn-1, xn, yn )T (1)

To measure the true shape variation the shape X is transformed into a normalized frame of reference with respect to the pose parameters: tx,ty (translation), s (scaling) and $\theta$ (rotation).

\begin{displaymath}\mathbf{x} = T_{t_x,t_y, s, \theta}(\mathbf{X})
\end{displaymath} (2)

The mean shape in this aligned domain is given as:

\begin{displaymath}\mathbf{\overline{x}} = \frac{1}{m} \sum^m_{i=1} \mathbf{x}_i
\end{displaymath} (3)

And the deviation of each shape from the mean shape:

\begin{displaymath}d\mathbf{x}_i = \mathbf{x}_i - \mathbf{\overline{x}}
\end{displaymath} (4)

The estimate of the covariance matrix can now be written as:

\begin{displaymath}\mathbf{\Sigma} = \frac{1}{m} \sum^m_{i=1} d\mathbf{x}_i d\mathbf{x}_i^\mathrm{T}
\end{displaymath} (5)

The principal axis of the 2nth dimensional point cloud are now given as the eigenvectors of the covariance matrix pi. If the ith eigenvalue is denoted $\lambda_i$, the following identity holds true:

\begin{displaymath}\mathbf{\Sigma p}_i = \lambda_i\mathbf{p}_i
\end{displaymath} (6)

The matrix P are then built from each eigenvector ordered in descending order of the corresponding eigenvalues.

\begin{displaymath}\mathbf{P} = \left[
\begin{array}{ccc}
& & \\
& & \\
...
...\mathbf{p}_{2n} \\
& & \\
& & \\
\end{array}
\right]
\end{displaymath} (7)

A shape instance can then be generated by deforming the mean shape by a linear combination of eigenvectors:

\begin{displaymath}\mathbf{x} = \mathbf{\overline{x}} + \mathbf{Pb}
\end{displaymath} (8)

The 2nth dimensional shape space is their by spanned using it's principal axis, i.e. the dimensions are ordered according to their level of shape variance explanation. This results in a very convenient way to compare a candidate shape x', to the training set by performing the orthogonal transformation into b-parameter space and evaluating the shape probability. A model instance is now defined by it's model vector v, which consists of the pose and shape parameters.

\begin{displaymath}\mathbf{v} = \{ t_x,t_y, s, \theta, \mathbf{b} \}
\end{displaymath} (9)

Choosing Modes of Variation

The primary goal of applying a principal component analysis (PCA) to the training set is to reduce the number of parameters in our model.3 In this way the model parameters can be limited to only generate shapes similar to the ones contained in the training set. By organizing the eigenvalues of the covariance matrix from the training shapes in descending order, t modes of variation can be chosen to explain $V \times 100\%$ of the shape variation using:4

\begin{displaymath}\sum_{i=1}^t \lambda_i \geq V \sum_{i=1}^{2n} \lambda_i
\end{displaymath} (10)

The remaining 2n-t modes are then considered shape noise. A suitable value for V could be 0.98; hence $98\%$ of the shape variation can be modelled.5

Alignment of Training Shapes

To obtain a frame of reference for the alignment of shapes our previous work [16] translated all shapes from their center of gravity to origo and scaled them to unit scale, |x| = 1. In this way the corners of a set of aligned rectangles with varying aspect ratio forms a unit circle (see fig. 2, the unaligned shapes are shown on fig. 1). Due to this non-linearity the PCA must use two parameters to span the shape space: $\lambda_1 = 99.6\%$, $\lambda_2 = 0.4\%$ even though variation only exists on one parameter (the aspect ratio). A closer look on figure 2 also shows that the overlaid mean shape doesn't corresponds to an actual shape in the training set. To avoid these non-linearities in the aligned training set the shape can be transformed into tangent space by scaling by $1/\mathbf{x}.\mathbf{\overline{x}}$ [3,4].
  
Figure 1: Training set of 100 unaligned artificially generated rectangles containing 16 points each.
\begin{figure}
\begin{center}
\mbox{
\psfig{figure=unaligned_squares.eps, width=5cm}
}
\end{center}
\end{figure}


  
Figure: Point cloud from aligned rectangles sized to unit scale, |x| = 1. The mean shape is fully shown.
\begin{figure}
\begin{center}
\mbox{
\psfig{figure=no_tangent_space.eps, width=5cm}
}
\end{center}
\end{figure}

The transformation into tangent space aligns all rectangles with corners on straight lines (see fig. 3) and thus enabling modelling of the training set using only linear displacements. Notice how the mean shape is contained in the training set since the PCA now only uses one parameter, $\lambda_1 = 100\%$ to model the change in aspect ratio. In this way the distribution of b-parameters can be kept more compact and non-linearities can be reduced. This leads to better and simpler models.
  
Figure: Point cloud from aligned rectangles sized to unit scale, |x| = 1, and transformed into tangent space. The mean shape is fully shown.
\begin{figure}
\begin{center}
\mbox{
\psfig{figure=tangent_space.eps, width=5cm}
}
\end{center}
\end{figure}

Generation of Plausible Shapes

In the process of matching a model to an unseen image only plausible shapes compared to the training set are of interest. One way to determine this is to impose hard limits on the shape parameters, b, under the model-assumption that the b-parameters are independent gaussian distributed with zero mean. Since the variance of ith principal component is $\lambda_i$ - and $98\%$ of distribution of bi is covered in the range $\pm3\sigma$ - the limits can be chosen as:

\begin{displaymath}-3\sqrt{\lambda_i} \leq b_i \leq 3\sqrt{\lambda_i}
\end{displaymath} (11)

This was the approach used in the our previous work. Due to the simple hypercube restriction it allows every b-parameter simultaneously to take the value of $\pm 3\sqrt{\lambda_i}$ which is highly unlikely. To avoid this the b-parameters can be restricted to a hyperellipsoid using the Mahalanobis distance.

\begin{displaymath}D^2_m = \sum_{i=1}^t \frac{b_k^2}{\lambda_k} \leq D_{max}^2
\end{displaymath} (12)

such that a Dm is smaller than a suitable Dmax corresponds to a plausible shape. As a suitable value for Dmax, 3.0 could be used.
  
Figure 4: The effect of using the Mahalanobis distance in two dimensions. Shape B is valid, shape A is considered illegal and rescaled to A'
\begin{figure}
\begin{center}
\mbox{
\psfig{figure=maha.eps, width=55mm}
}
\end{center}
\end{figure}

If the shape fails this test, b is rescaled to lie on the closest point of the hyperellipsoid. This is illustrated in the two-dimensional case in figure 4.

\begin{displaymath}\mathbf{b} = \mathbf{b} \cdot \frac{D_{max}}{D_m}
\end{displaymath} (13)

If the shape class in question is separated in distinct subclasses of which we need no discrimination6 more complex methods must be used to model the distribution of the b-parameters. One approach is to use a approximation to the distribution through a mixture of gaussians [4]. This approach can be used to represent any non-linear shape variations in the training set and their by control the generation of plausible shapes in a much more general way.
next up previous
Next: Model to Image Fit Up: On Properties of Active Previous: Introduction
Mikkel B. Stegmann c937189 (be 9/2000)
2000-05-26