Robust Isolated Speech Recognition Using Binary Masks

Robust Isolated Speech Recognition Using Binary Masks
Seliz Gulsen Karadogan, Jan Larsen, Michael Syskind Pedersen, Jesper Bunsow Boldt
Abstract	In this paper, we represent a new approach for robust speaker independent ASR using binary masks as feature vectors. This method is evaluated on an isolated digit database, TIDIGIT in three noisy environments (car, bottle and cafe noise types taken from the DRCD Sound Effects Library). Discrete Hidden Markov Models are used for the recognition and the observation vectors are quantized with the K-means algorithm using a Hamming distance. It is found that a recognition rate as high as 92% for clean speech is achievable using Ideal Binary Masks (IBM) where we assume prior target and noise information is available. We propose that using a Target Binary Mask (TBM), where only prior target information is needed, performs as good as using IBMs. We also propose a TBM estimation method based on target sound estimation using non-negative sparse coding (NNSC). The recognition results for TBMs with and without the estimation method for noisy conditions are evaluated and compared with those of using Mel Frequency Cepstral Coefficients (MFCC). It is observed that binary mask feature vectors are robust to noisy conditions.
Keywords	Binary masks, speech recognition
Type	Conference paper [With referee]
Conference	European Signal Porcessing Conference
Year	2010 Month August
BibTeX data	[bibtex]
IMM Group(s)	Intelligent Signal Processing