Segmentation of faces using AAMs
[ see also the AAM Explorer page ]
To obtain a precise location of facial features such as mouth, nose, eyes, eyebrows & jaw using Active Appearance Models.
This small project serves as a tribute to the original work on faces by Cootes, Edwards, Taylor and Lanitis from Manchester University and as preliminary testing of the IMM face database designed and built by Michael Moesby NordstrÝm, Mads Larsen and Janusz Sierakowski, January 2001.
Applications for projects like this spans from: MPEG-4 coding of Facial Animation Parameters - FAPs, Face Recognition (Access Control Systems), Assisted Speech Recognition, Virtual Character Animation - just to mention a few examples.
The IMM face database
The IMM face database consist of 240 images of 40 different persons. Six images of each person are taken with different head angle (approx. +/- 30 degrees), different lightning conditions (spot/no spot) and different face expressions. Every image has been manually annotated with 58 points of correspondence (landmarks).
Fig. 1. Annotated face using 58 landmarks.
The training set consisted of 35 frontal images of 35 different persons (640x480 RGB), each annotated with 35 landmarks. All images were converted to gray scale 8bit (320x240) prior to processing by the AAM.
The texture model consisted of ~7600 pixels and the combined model consisted of 26 parameters (95% variation explained).
The variance explained by the first three eigenvalues in the combined shape and
texture model were approximately 21%, 12% and 9%.
Registration movie of all 35 faces (shows some errors in the annotation).
The first shape mode could be explained as the degree of 'nodding'. The first texture mode is a beard/no beard mode. Notice that due to the small number of training examples, the +/- 3 std. dev. deformation of the first texture mode leads to an almost 'negative' beard.
Modes of variation (+/- 3 std. dev.)
1st Combined Mode (AVI movie)
1st Shape Mode (AVI movie)
1st Texture Mode (AVI movie)
Color face AAM - Modes of variation (+/- 3 std. dev.)
These mode movies are from a nearly similar model showing except that color information is included. Model size ~100.000 pixels. Training set: 37 images.
1st Combined Mode (AVI movie)
2nd Combined Mode (AVI movie)
3rd Combined Mode (AVI movie)
Using an automatic search-based initialization followed by a traditional AAM optimization produced the segmentation result as shown below.
Fig. 2. Left: Unseen input image. Middle: Segmentation result. Right: Resulting model mesh overlaid.
Fig. 3. Left: Initial model. Middle: Model after 2 iterations. Right: Converged model after 12 iterations.
Movie showing the optimization.
A fairly well synthesization of the unseen input image is seen in fig. 3, though the training set only consisted of 35 faces. Further, the accuracy of the detected facial features in fig. 2 is also highly acceptable. The mean distance to associated border was 1.01 pixel. It took ~400 ms (Athlon 1.2GHz) to perform the optimization.
Slide presentation on Extraction of MPEG-4 Facial Animation Parameters with Active Appearance Models as presented in the DTU course 52425 Digital Video Technology.
The IMM face database used in this experiment was designed and built by Michael Moesby NordstrÝm, Mads Larsen and Janusz Sierakowski, January 2001.
My office mate, Lars Pedersen is also gratefully acknowledged for being Mr. Unseen in the above example results.
/Mikkel B. Stegmann