| Robust isolated speech recognition using binary masks | 
| Seliz G. Karadogan, Jan Larsen, Michael Syskind Pedersen, Jesper B. Boldt 
 
 | 
| Abstract | In this paper, we represent a new approach for robust speaker independent ASR using binary masks as feature vectors. This
 method is evaluated on an isolated digit database, TIDIGIT in three
 noisy environments (car,bottle and cafe noise types taken from
 DRCD Sound Effects Library). Discrete Hidden Markov Model is
 used for the recognition and the observation vectors are quantized
 with the K-means algorithm using Hamming distance. It is found
 that a recognition rate as high as 92% for clean speech is achievable
 using Ideal Binary Masks (IBM) where we assume priori target
 and noise information is available. We propose that using a
 Target Binary Mask (TBM) where only priori target information
 is needed performs as good as using IBMs. We also propose a
 TBM estimation method based on target sound estimation using
 non-negative sparse coding (NNSC). The recognition results for
 TBMs with and without the estimation method for noisy conditions
 are evaluated and compared with those of using Mel Frequency
 Ceptsral Coefficients (MFCC). It is observed that binary mask
 feature vectors are robust to noisy conditions
 | 
| Keywords | robust, isolated, speech, recognition, binary mask | 
| Type | Conference paper [With referee] | 
| Conference | EUSIPCO-2010 | 
| Year | 2010    Month August | 
| Note | Supplementary material at http://www2.imm.dtu.dk/pubdb/p.php?5790 | 
| Electronic version(s) | [pdf] | 
 | BibTeX data | [bibtex] | 
| IMM Group(s) | Intelligent Signal Processing |