AAM Tracking

This page demonstrates that AAMs are applicable to various motion tracking applications. No explicit motion models have been enforced below, just simple frame-by-frame propagation. For optimal performance, motion models should be adapted.

Active Appearance Motion Models



Rigid object tracking in 3D using an AAM

Aim

To perform real-time tracking in 3D of a rigid object using a low-cost web-cam (~30$), a PC and the AAM-API.

Motivation

The project was done as a small exercise in testing the real-time capabilities of Active Appearance Models. It should not be viewed upon as a robust state-of-the-art tracking system, but merely as a demonstration of performance and another proof of the general nature of AAMs.

It is thus not expected that this constellation in any way will outperform conventional tracking techniques tailored to handle a perspective projection of a planar object.

Training set

The training set consisted of five images of a DAT tape cassette (the nearest object on my desk at the time being). The DAT cassette was annotated using 12 landmarks. The training images were acquired using the web-cam set to CIF (352x288) and subsequently resampled to QCIF (174x144).

Fig. 1. The training images for the AAM Tracker. Each one was annotated using 12 landmarks.

Model

Upon the five training images shown in figure 1, a two-level multi-scale AAM was built. All images were converted to grey scale byte prior to any processing by the AAM. The texture model consisted of 9100 pixels at level 0 and 2261 pixels at level 1.

The variance explained by the first three eigenvalues in the combined shape and texture model were approximately 69%, 22% and 7%.

Modes of variation

1st Combined Mode
1st Shape Mode
1st Texture Mode

Tracking

Initialisation was performed the first incoming frame by using a search-based initialisation on level 1 of the multi-level AAM. The result was then propagated down to level 0.

Tracking from frame to frame was accomplished simply by propagating the result of the previous frame to the current. Higher frame rate was traded for accuracy by limiting the maximum number of iterations to three. However, due to the accumulation of between-frame changes in pose and model parameters, the results should still converge using moderate movements of the object.

Results

Tracking was performed by a Windows application on a live QCIF (174x144) input from the web-cam. Four frames from an example tracking movie are given in figure 2.

The tracker reached a performance of 7-10 frames/sec. No temporal filtering was performed to increase the robustness of the tracking.

(december, 2001)


Fig. 2. Four frames from an MPEG4 movie showing the AAM Tracker (49 sec. 498KB).

Acknowledgements

Web-cam input was provided by the Vision Wizard from the VisionSDK.


Tracking of a deformable object


Click here to see tracking results from an off-line AAM tracking experiment done by Martin Egholm Nielsen, Tue Lehn-Schiøler & Mark Wrobel in January 2001 using the AAM-API.

(january, 2001)

Eye tracking


Dan Witzner is using the AAM-API to perform eye tracking. Click here to see preliminary tracking results.

(2002-2003)

The AAM Mickey


Morten Rufus Blas, Mads Fogtmann Hansen and Kasper Olesen used the AAM-API in spring 2002 to control a virtual character: the infamous AAM-Mickey(!)

(spring 2002)

Real-time tracking of tv-speakers



Jacob Overgaard Hansen, Steffen Holmslykke, Steffen Andersen and Søren Riisgaard used the AAM-API to produce tracking of a female and male tv-speaker. Later they coded their own optimised version of the AAM search, which provided real-time tracking (sequences are courtesy of tv2 and dr).

(fall 2002)

AAMM tracking of cardiac ultrasound images


Guillaume Chatelet and Eric Saloux (Ecole Nationale Superieure D'ingenieurs De Caen & Centre De Recherche – ENSICAEN) extended their copy of the AAM-API to encode the temporal statistics of echocardiogram sequences using the Active Appearance Motion Model (AAMM) proposed by Michell et al. The aim of this project is to study the cardiac contractility function by estimating various prognostic and therapeutic indexes from a localisation of the ventricle borders in ultrasound images. Click here to see an example of their results.

(summer 2003)

Using the AAM-API to convert speech to face movements


The parameters of an AAM of the face is controlled by features extracted from the sound. Feeding the model with recorded speech the output is videos sequences as seen in the examples [ 1 | 2 ] For more information see the home page of Tue Lehn-Schiøler.

(summer 2004)