@MASTERSTHESIS\{IMM2011-06036,
    author       = "J. Madsen",
    title        = "Modeling of Emotions expressed in Music using Audio features",
    year         = "2011",
    school       = "Technical University of Denmark, {DTU} Informatics, {E-}mail: reception@imm.dtu.dk",
    address      = "Asmussens Alle, Building 305, {DK-}2800 Kgs. Lyngby, Denmark",
    type         = "",
    note         = "Supervised by Professor Lars Kai Hansen, lkh@imm.dtu.dk, {DTU} Informatics",
    url          = "http://www.imm.dtu.dk/English.aspx",
    abstract     = "This thesis presents an alternative method of organizing and rating music, using the emotions expressed in music. This measure can serve as a standalone parameter for searching for new music or in combination with already established methods e.g. happy jazz or sad rock. The approach is to create a mathematical model that automatically can predict labels of emotional expression in music based on audio content. The audio content is quantified using audio features using spectral, ceptral, temporal, musical and perceptual features computed from 7 different feature packs. To measure the emotions expressed in music a listening experiment is developed using experimental design. Participants rate excerpt of 15 seconds on two {9-}point iconic scale (SAM) representing the dimensions of valence and arousal. All ratings are modeled using fitted beta distributions, where outliers are removed appropriately based on empirical measures. A thorough investigation into the consequences of the design and the resulting ratings is made. Furthermore the influence of participants' musical experience, their mood before starting the test and understanding of the test are investigated if there is a connection to their emotional ratings. Using audio features and emotional ratings a mathematical model is designed where the best performing is a stepwise regression model trained on features selected by a Sequential feature selection method using Least Squares and Root Mean Squared Error. The most suitable features are found to model emotions in music that include {MFCC,} Pulse Clarity, Main Loudness, Pulse Clarity, Spectral Flatness per. band, Inharmonicity and {CENS}. Compared to a formulated baseline error measure the model performs 15 \% and 47 \% better for valence and arousal respectively. Resulting in an average error of 0.727 ratings on the arousal scale and valence of 0.887 ratings given that participants rated on a 9 point scale. The model can be used to predict emotional labels for greater datasets for future testing or to predict ratings on a shorter time scale to group musical excerpt based on the dynamic emotional structure in music."
}