Modeling of Emotions expressed in Music using Audio features

Modeling of Emotions expressed in Music using Audio features
Jens Madsen
Abstract	This thesis presents an alternative method of organizing and rating music, using the emotions expressed in music. This measure can serve as a standalone parameter for searching for new music or in combination with already established methods e.g. happy jazz or sad rock. The approach is to create a mathematical model that automatically can predict labels of emotional expression in music based on audio content. The audio content is quantified using audio features using spectral, ceptral, temporal, musical and perceptual features computed from 7 different feature packs. To measure the emotions expressed in music a listening experiment is developed using experimental design. Participants rate excerpt of 15 seconds on two 9-point iconic scale (SAM) representing the dimensions of valence and arousal. All ratings are modeled using fitted beta distributions, where outliers are removed appropriately based on empirical measures. A thorough investigation into the consequences of the design and the resulting ratings is made. Furthermore the influence of participants' musical experience, their mood before starting the test and understanding of the test are investigated if there is a connection to their emotional ratings. Using audio features and emotional ratings a mathematical model is designed where the best performing is a stepwise regression model trained on features selected by a Sequential feature selection method using Least Squares and Root Mean Squared Error. The most suitable features are found to model emotions in music that include MFCC, Pulse Clarity, Main Loudness, Pulse Clarity, Spectral Flatness per. band, Inharmonicity and CENS. Compared to a formulated baseline error measure the model performs 15 % and 47 % better for valence and arousal respectively. Resulting in an average error of 0.727 ratings on the arousal scale and valence of 0.887 ratings given that participants rated on a 9 point scale. The model can be used to predict emotional labels for greater datasets for future testing or to predict ratings on a shorter time scale to group musical excerpt based on the dynamic emotional structure in music.
Type	Master's thesis [Academic thesis]
Year	2011
Publisher	Technical University of Denmark, DTU Informatics, E-mail: reception@imm.dtu.dk
Address	Asmussens Alle, Building 305, DK-2800 Kgs. Lyngby, Denmark
Series	IMM-M.Sc.-2011-35
Note	Supervised by Professor Lars Kai Hansen, lkh@imm.dtu.dk, DTU Informatics
Electronic version(s)	[pdf]
Publication link	http://www.imm.dtu.dk/English.aspx
BibTeX data	[bibtex]
IMM Group(s)	Intelligent Signal Processing