@CONFERENCE\{IMM2014-06836,
    author       = "C. M. Kereliuk and B. Sturm and J. Larsen",
    title        = "Are deep neural networks really learning relevant features?",
    year         = "2014",
    month        = "dec",
    keywords     = "Deep neural networks, audio, feature learning, music information retrieval, genre recognition",
    booktitle    = "DMRN+9: Digital Music Research Network One-day Workshop 2014",
    volume       = "",
    series       = "",
    editor       = "",
    publisher    = "",
    organization = "",
    address      = "",
    note         = "Queen Mary University of London, Tuesday 16th December 2014",
    url          = "http://c4dm.eecs.qmul.ac.uk/dmrn/events/dmrnp9/",
    abstract     = "In recent years deep neural networks (DNNs) have become a popular choice for audio content analysis. This may be attributed to various factors including advancements in training algorithms, computational power, and the potential for DNNs to implicitly learn a set of feature detectors. We have recently re-examined two works that consider DNNs for the task of music genre recognition (MGR). These papers conclude that frame-level features learned by DNNs offer an improvement over traditional, hand-crafted features such as Mel-frequency cepstrum coefficients (MFCCs). However, these conclusions were drawn based on training/testing using the {GTZAN} dataset, which is now known to contain several flaws including replicated observations and artists. We illustrate how considering these flaws dramatically changes the results, which leads one to question the degree to which the learned frame-level features are actually useful for {MGR}. We make available a reproducible software package allowing other researchers to completely duplicate our figures and results."
}