Robust isolated speech recognition using binary masks

Robust isolated speech recognition using binary masks
Seliz G. Karadogan, Jan Larsen, Michael Syskind Pedersen, Jesper B. Boldt
Abstract	In this paper, we represent a new approach for robust speaker independent ASR using binary masks as feature vectors. This method is evaluated on an isolated digit database, TIDIGIT in three noisy environments (car,bottle and cafe noise types taken from DRCD Sound Effects Library). Discrete Hidden Markov Model is used for the recognition and the observation vectors are quantized with the K-means algorithm using Hamming distance. It is found that a recognition rate as high as 92% for clean speech is achievable using Ideal Binary Masks (IBM) where we assume priori target and noise information is available. We propose that using a Target Binary Mask (TBM) where only priori target information is needed performs as good as using IBMs. We also propose a TBM estimation method based on target sound estimation using non-negative sparse coding (NNSC). The recognition results for TBMs with and without the estimation method for noisy conditions are evaluated and compared with those of using Mel Frequency Ceptsral Coefficients (MFCC). It is observed that binary mask feature vectors are robust to noisy conditions
Keywords	robust, isolated, speech, recognition, binary mask
Type	Conference paper [With referee]
Conference	EUSIPCO-2010
Year	2010 Month August
Note	Supplementary material at http://www2.imm.dtu.dk/pubdb/p.php?5790
Electronic version(s)	[pdf]
BibTeX data	[bibtex]
IMM Group(s)	Intelligent Signal Processing