Single-Channel Speech Separation using Sparse Non-Negative Matrix Factorization |
Mikkel N. Schmidt, Rasmus K. Olsson
|
Abstract | We apply machine learning techniques to the problem of separating
multiple speech sources from a single microphone recording.
The method of choice is a sparse non-negative matrix factorization
algorithm, which in an unsupervised manner can learn sparse representations
of the data. This is applied to the learning of personalized
dictionaries from a speech corpus, which in turn are used
to separate the audio stream into its components. We show that
computational savings can be achieved by segmenting the training
data on a phoneme level. To split the data, a conventional speech
recognizer is used. The performance of the unsupervised and supervised
adaptation schemes result in significant improvements in
terms of the target-to-masker ratio. |
Type | Conference paper [With referee] |
Conference | Interspeech |
Year | 2006 Month September |
Electronic version(s) | [pdf] |
BibTeX data | [bibtex] |
IMM Group(s) | Intelligent Signal Processing |