@CONFERENCE\{IMM2005-03142,
    author       = "T. Lehn-Schi{\o}ler and L. K. Hansen and J. Larsen",
    title        = "Mapping from Speech to Images Using Continuous State Space Models",
    year         = "2005",
    month        = "jan",
    pages        = "136 - 145",
    booktitle    = "Lecture Notes in Computer Science",
    volume       = "3361",
    series       = "",
    editor       = "Tue Lehn-Schi{\o}ler, Lars Kai Hansen, Jan Larsen, Mapping from Speech to Images Using Continuous State Space Models, Lecture Note",
    publisher    = "Springer",
    organization = "",
    address      = "",
    url          = "http://www2.compute.dtu.dk/pubdb/pubs/3142-full.html",
    abstract     = "In this paper a system that transforms speech waveforms to 
 animated faces are proposed. The system relies on continuous state space models to perform the 
 mapping, this makes it possible to ensure video with no sudden jumps and allows continuous control of the parameters in 'face space'.
The performance of the system is critically dependent on the 
number of hidden variables, with too few variables the model 
cannot represent data, and with too many overfitting is noticed.
Simulations are performed on recordings of {3-}5 sec.\$\backslash\$ video 
sequences with sentences from the Timit database. From a 
subjective point of view the model is able to construct an image 
sequence from an unknown noisy speech sequence even though the 
number of training examples are limited."
}