AAM Tracking

This page demonstrates that AAMs are applicable to various motion tracking applications. No explicit motion models have been enforced below, just simple frame-by-frame propagation. For optimal performance, motion models should be adapted.

Active Appearance Motion Models

Rigid object tracking in 3D using an AAM


To perform real-time tracking in 3D of a rigid object using a low-cost web-cam (~30$), a PC and the AAM-API.


The project was done as a small exercise in testing the real-time capabilities of Active Appearance Models. It should not be viewed upon as a robust state-of-the-art tracking system, but merely as a demonstration of performance and another proof of the general nature of AAMs.

It is thus not expected that this constellation in any way will outperform conventional tracking techniques tailored to handle a perspective projection of a planar object.

Training set

The training set consisted of five images of a DAT tape cassette (the nearest object on my desk at the time being). The DAT cassette was annotated using 12 landmarks. The training images were acquired using the web-cam set to CIF (352x288) and subsequently resampled to QCIF (174x144).

Fig. 1. The training images for the AAM Tracker. Each one was annotated using 12 landmarks.


Upon the five training images shown in figure 1, a two-level multi-scale AAM was built. All images were converted to grey scale byte prior to any processing by the AAM. The texture model consisted of 9100 pixels at level 0 and 2261 pixels at level 1.

The variance explained by the first three eigenvalues in the combined shape and texture model were approximately 69%, 22% and 7%.

Modes of variation

1st Combined Mode
1st Shape Mode
1st Texture Mode


Initialisation was performed the first incoming frame by using a search-based initialisation on level 1 of the multi-level AAM. The result was then propagated down to level 0.

Tracking from frame to frame was accomplished simply by propagating the result of the previous frame to the current. Higher frame rate was traded for accuracy by limiting the maximum number of iterations to three. However, due to the accumulation of between-frame changes in pose and model parameters, the results should still converge using moderate movements of the object.


Tracking was performed by a Windows application on a live QCIF (174x144) input from the web-cam. Four frames from an example tracking movie are given in figure 2.

The tracker reached a performance of 7-10 frames/sec. No temporal filtering was performed to increase the robustness of the tracking.

(december, 2001)

Fig. 2. Four frames from an MPEG4 movie showing the AAM Tracker (49 sec. 498KB).


Web-cam input was provided by the Vision Wizard from the VisionSDK.

Tracking of a deformable object

Click here to see tracking results from an off-line AAM tracking experiment done by Martin Egholm Nielsen, Tue Lehn-Schiøler & Mark Wrobel in January 2001 using the AAM-API.

(january, 2001)

Eye tracking

Dan Witzner is using the AAM-API to perform eye tracking. Click here to see preliminary tracking results.


The AAM Mickey

Morten Rufus Blas, Mads Fogtmann Hansen and Kasper Olesen used the AAM-API in spring 2002 to control a virtual character: the infamous AAM-Mickey(!)

(spring 2002)

Real-time tracking of tv-speakers

Jacob Overgaard Hansen, Steffen Holmslykke, Steffen Andersen and Søren Riisgaard used the AAM-API to produce tracking of a female and male tv-speaker. Later they coded their own optimised version of the AAM search, which provided real-time tracking (sequences are courtesy of tv2 and dr).

(fall 2002)

AAMM tracking of cardiac ultrasound images

Guillaume Chatelet and Eric Saloux (Ecole Nationale Superieure D'ingenieurs De Caen & Centre De Recherche – ENSICAEN) extended their copy of the AAM-API to encode the temporal statistics of echocardiogram sequences using the Active Appearance Motion Model (AAMM) proposed by Michell et al. The aim of this project is to study the cardiac contractility function by estimating various prognostic and therapeutic indexes from a localisation of the ventricle borders in ultrasound images. Click here to see an example of their results.

(summer 2003)

Using the AAM-API to convert speech to face movements

The parameters of an AAM of the face is controlled by features extracted from the sound. Feeding the model with recorded speech the output is videos sequences as seen in the examples [ 1 | 2 ] For more information see the home page of Tue Lehn-Schiøler.

(summer 2004)