Are deep neural networks really learning relevant features?

Are deep neural networks really learning relevant features?
Corey Mose Kereliuk, Bob Sturm, Jan Larsen
Abstract	In recent years deep neural networks (DNNs) have become a popular choice for audio content analysis. This may be attributed to various factors including advancements in training algorithms, computational power, and the potential for DNNs to implicitly learn a set of feature detectors. We have recently re-examined two works that consider DNNs for the task of music genre recognition (MGR). These papers conclude that frame-level features learned by DNNs offer an improvement over traditional, hand-crafted features such as Mel-frequency cepstrum coefficients (MFCCs). However, these conclusions were drawn based on training/testing using the GTZAN dataset, which is now known to contain several flaws including replicated observations and artists. We illustrate how considering these flaws dramatically changes the results, which leads one to question the degree to which the learned frame-level features are actually useful for MGR. We make available a reproducible software package allowing other researchers to completely duplicate our figures and results.
Keywords	Deep neural networks, audio, feature learning, music information retrieval, genre recognition
Type	Conference paper [Without referee]
Conference	DMRN+9: Digital Music Research Network One-day Workshop 2014
Year	2014 Month December
Note	Queen Mary University of London, Tuesday 16th December 2014
Electronic version(s)	[pdf]
Publication link	http://c4dm.eecs.qmul.ac.uk/dmrn/events/dmrnp9/
BibTeX data	[bibtex]
IMM Group(s)	Scientific Computing, Intelligent Signal Processing