@ARTICLE\{IMM2015-06903,
    author       = "J. Madsen and B. S. Jensen and J. Larsen",
    title        = "Affective Modeling of Music using Probabilistic Features Representations",
    year         = "2015",
    month        = "jul",
    keywords     = "Music emotion prediction; expressed emotions; pairwise comparisons; probabilistic feature representation",
    journal      = "IEEE/{ACM} {TRANSACTIONS} {ON} {AUDIO,} {SPEECH,} {AND} {LANGUAGE} {PROCESSING}",
    volume       = "",
    editor       = "",
    number       = "",
    publisher    = "",
    url          = "http://www2.compute.dtu.dk/pubdb/pubs/6903-full.html",
    abstract     = "The temporal structure in music is an essential aspect when we as humans categorize and describe the cultural,
perceptual and cognitive aspects of music such as genre, emotions, preference and similarity. Historically, however, temporal information has largely been disregarded when building automatic annotation and labeling systems of music. Both in music navigation and recommendation systems. This paper addresses this apparent discrepancy between common sense and the majority of modeling efforts by first providing an analysis and survey of existing work, proposing a simple taxonomy of the many possible feature representations. Next, the different paths in the taxonomy are evaluated by testing the hypothesis whether it is beneficial to include temporal information for predicting high-order aspects of music. We specifically look into the emotions expressed in music as a prototypical high-order aspect of audio. We test the hypothesis and difference between representations using the following pipeline: 1) Extract features for each track obtaining a multivariate feature time-series. 2) Model each track-level time-series by a probabilistic model: Gaussian Mixture models, Autoregressive models, Linear Dynamical Systems, Multinomial models, Markov and Hidden Markov models. 3) Apply the Probability Product Kernel to define a common correlation/similarity function between tracks. 4) Model the observations using a simple, well-known (kernel) logistic classification approach specifically extended for two-alternativeforced choice to ensure robustness. The evaluation is performed on two data sets, including two different aspects of emotions expressed in music. The result provides evidence that increased predictive performance is obtained using temporal information, thus supporting the overall hypothesis."
}