Learning Combinations of Multiple Feature Representations for Music Emotion Prediction
|Jens Madsen, Bjørn Sand Jensen, Jan Larsen|
|Abstract||Music consists of several structures and patterns evolving through time which greatly influences the human decoding of higher-level cognitive aspects of music like the emotions expressed in music.|
For tasks, such as genre, tag and emotion recognition, these structures have often been identified and used as individual and non-temporal features and representations. % not including for multiple --- both temporal and non-temporal --- structures.
In this work, we address the hypothesis whether using multiple temporal and non-temporal representations of different features is beneficial for modeling music structure with the aim to predict the emotions expressed in music.
We test this hypothesis by representing temporal and non-temporal structures using generative models of multiple audio features. The representations are used in a discriminative setting via the Product Probability Kernel and the Gaussian Process model enabling Multiple Kernel Learning, finding optimized combinations of both features and temporal/ non-temporal representations.
We show the increased predictive performance using the combination of different features and representations along with the great interpretive prospects of this approach.
|Keywords||Music emotion prediction; expressed emotions; pairwise comparisons; multiple kernel learning; Gaussian process|
|Type||Conference paper [With referee]|
|Conference||Affect and Sentiment in Multimedia (ASM) - an ACM MM'15 workshop|
|Year||2015 Month October|
|BibTeX data|| [bibtex]|
|IMM Group(s)||Intelligent Signal Processing|