Robust isolated speech recognition using binary masks

Seliz G. Karadogan, Jan Larsen, Michael Syskind Pedersen, Jesper B. Boldt

AbstractIn this paper, we represent a new approach for robust speaker
independent ASR using binary masks as feature vectors. This
method is evaluated on an isolated digit database, TIDIGIT in three
noisy environments (car,bottle and cafe noise types taken from
DRCD Sound Effects Library). Discrete Hidden Markov Model is
used for the recognition and the observation vectors are quantized
with the K-means algorithm using Hamming distance. It is found
that a recognition rate as high as 92% for clean speech is achievable
using Ideal Binary Masks (IBM) where we assume priori target
and noise information is available. We propose that using a
Target Binary Mask (TBM) where only priori target information
is needed performs as good as using IBMs. We also propose a
TBM estimation method based on target sound estimation using
non-negative sparse coding (NNSC). The recognition results for
TBMs with and without the estimation method for noisy conditions
are evaluated and compared with those of using Mel Frequency
Ceptsral Coefficients (MFCC). It is observed that binary mask
feature vectors are robust to noisy conditions
Keywordsrobust, isolated, speech, recognition, binary mask
TypeConference paper [With referee]
ConferenceEUSIPCO-2010
Year2010    Month August
NoteSupplementary material at http://www2.imm.dtu.dk/pubdb/p.php?5790
Electronic version(s)[pdf]
BibTeX data [bibtex]
IMM Group(s)Intelligent Signal Processing