Single-Channel Speech Separation using Sparse Non-Negative Matrix Factorization

Mikkel N. Schmidt, Rasmus K. Olsson

AbstractWe apply machine learning techniques to the problem of separating
multiple speech sources from a single microphone recording.
The method of choice is a sparse non-negative matrix factorization
algorithm, which in an unsupervised manner can learn sparse representations
of the data. This is applied to the learning of personalized
dictionaries from a speech corpus, which in turn are used
to separate the audio stream into its components. We show that
computational savings can be achieved by segmenting the training
data on a phoneme level. To split the data, a conventional speech
recognizer is used. The performance of the unsupervised and supervised
adaptation schemes result in significant improvements in
terms of the target-to-masker ratio.
TypeConference paper [With referee]
ConferenceInterspeech
Year2006    Month September
Electronic version(s)[pdf]
BibTeX data [bibtex]
IMM Group(s)Intelligent Signal Processing