Speech Reconstruction from Binary Masked Spectrograms Using Vector Quantized Speaker Models

Michael K. Jensen, Søren Skou Nielsen

AbstractSeveral source separation techniques use binary masking on spectrograms to separate two or more speakers from each other. In this thesis, the possibilities for obtaining the best quality signal, reconstructed from masked spectrograms through vector quantized models of speakers, is investigated. The advantages and disadvantages of such an approach are examined. Additionally, the task of signal reestimation from a spectrogram is investigated using several algorithms.

Vector quantization of speakers can be used to improve on binary masked spectrograms but the approach is not shown to produce high quality speech. It is also concluded that phase information is very important for high quality speech reconstruction, and parameters for optimal phase reestimation are suggested.
Keywordssignal processing, data clustering, mel ¯ltering, voiced unvoiced detection, k-means, vector quantization, signal estimation, phase reconstruction, spectrogram reconstruction.
TypeMaster's thesis [Academic thesis]
PublisherInformatics and Mathematical Modelling, Technical University of Denmark, DTU
AddressRichard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby
NoteSupervised Lars Kai Hansen, IMM.
Electronic version(s)[pdf] [ps]
BibTeX data [bibtex]
IMM Group(s)Intelligent Signal Processing

Back  ::  IMM Publications