Robust isolated speech recognition using binary masks 
Seliz G. Karadogan, Jan Larsen, Michael Syskind Pedersen, Jesper B. Boldt

Abstract  In this paper, we represent a new approach for robust speaker
independent ASR using binary masks as feature vectors. This
method is evaluated on an isolated digit database, TIDIGIT in three
noisy environments (car,bottle and cafe noise types taken from
DRCD Sound Effects Library). Discrete Hidden Markov Model is
used for the recognition and the observation vectors are quantized
with the Kmeans algorithm using Hamming distance. It is found
that a recognition rate as high as 92% for clean speech is achievable
using Ideal Binary Masks (IBM) where we assume priori target
and noise information is available. We propose that using a
Target Binary Mask (TBM) where only priori target information
is needed performs as good as using IBMs. We also propose a
TBM estimation method based on target sound estimation using
nonnegative sparse coding (NNSC). The recognition results for
TBMs with and without the estimation method for noisy conditions
are evaluated and compared with those of using Mel Frequency
Ceptsral Coefficients (MFCC). It is observed that binary mask
feature vectors are robust to noisy conditions 
Keywords  robust, isolated, speech, recognition, binary mask 
Type  Conference paper [With referee] 
Conference  EUSIPCO2010 
Year  2010 Month August 
Note  Supplementary material at http://www2.imm.dtu.dk/pubdb/p.php?5790 
Electronic version(s)  [pdf] 
BibTeX data  [bibtex] 
IMM Group(s)  Intelligent Signal Processing 