Speech Reconstruction from Binary Masked Spectrograms Using Vector Quantized Speaker Models

Speech Reconstruction from Binary Masked Spectrograms Using Vector Quantized Speaker Models

Abstract	Several source separation techniques use binary masking on spectrograms to separate two or more speakers from each other. In this thesis, the possibilities for obtaining the best quality signal, reconstructed from masked spectrograms through vector quantized models of speakers, is investigated. The advantages and disadvantages of such an approach are examined. Additionally, the task of signal reestimation from a spectrogram is investigated using several algorithms. Vector quantization of speakers can be used to improve on binary masked spectrograms but the approach is not shown to produce high quality speech. It is also concluded that phase information is very important for high quality speech reconstruction, and parameters for optimal phase reestimation are suggested.
Keywords
Type	Master's thesis [Academic thesis]
Year	2006
Publisher	Informatics and Mathematical Modelling, Technical University of Denmark, DTU
Address	Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby
Series	IMM-Thesis-2006-68
Note	Supervised Lars Kai Hansen, IMM.
Electronic version(s)	[pdf] [ps]
BibTeX data	[bibtex]
IMM Group(s)	Intelligent Signal Processing