@MASTERSTHESIS\{IMM2012-06301,
    author       = "K. M. Larsen and L. M. Jeppesen",
    title        = "Analysis of Human Behaviour by Machine Learning",
    year         = "2012",
    school       = "Technical University of Denmark, {DTU} Informatics, {E-}mail: reception@imm.dtu.dk",
    address      = "Asmussens Alle, Building 305, {DK-}2800 Kgs. Lyngby, Denmark",
    type         = "",
    note         = "Supervised by Professor Lars Kai Hansen, lkh@imm.dtu.dk, and Assistant Professor Morten M{\o}rup, mm@imm.dtu.dk, {DTU} Informatics",
    url          = "http://www.imm.dtu.dk/English.aspx",
    abstract     = "This thesis deals with automation of manual annotations for use in the analysis of the interaction pattern between mother and child. The data applied in the thesis are provided by Babylab at the Institute of Psychology, University of Copenhagen, and consists of the three recording modalities; sound, motion capture and video. The focus of this thesis, with respect to the available data, is the recordings of 21 four-months old children and their mothers. The aim of the thesis is to automatically, by the use of machine learning, regenerate labels that have been extracted manually at Babylab. With this, a much time consuming task would be relieved from their shoulders. Furthermore, the human subjectivity of the labels would be removed with the objective replacement of a machine. The re-annotation of labels introduces the area of supervised classification which is used for the task of speaker identification as well for emotion recognition in this thesis. A thorough investigation of different classification approaches forms the basis of the results of the two aforementioned tasks, for the sound data provided by Babylab. These results have a reliability in the same order as that of the manual codings, and are therefore considered very promising for the future work at Babylab. It is also investigated whether the uniqueness of this particular data set, i.e. that three recording modalities are available, is beneficial to the two tasks of speaker identification and emotion recognition. This is tested by including information from the motion capture data to the sound data. The results show no effect as well as an actual high rate of deterioration of the classifier performance for the two tasks, respectively. Besides being included in the two classification tasks, the motion capture data provides stable annotations on several aspects of the mother-child interaction. These have therefore been extracted in an automated way in this thesis. The video modality has also been superficially investigated, with respect to the child's facial expressions. This has been considered as a possible support to the two classification tasks as well as for the direct application in the analyses performed at Babylab of mother-child interaction. This showed interesting prospects that should definitely be pursued by Babylab in the future."
}