Robust Isolated Speech Recognition Using Binary Masks |
Seliz Gulsen Karadogan, Jan Larsen, Michael Syskind Pedersen, Jesper Bunsow Boldt
|
Abstract | In this paper, we represent a new approach for robust speaker
independent ASR using binary masks as feature vectors. This
method is evaluated on an isolated digit database, TIDIGIT in
three noisy environments (car, bottle and cafe noise types taken
from the DRCD Sound Effects Library). Discrete Hidden Markov
Models are used for the recognition and the observation vectors are
quantized with the K-means algorithm using a Hamming distance.
It is found that a recognition rate as high as 92% for clean speech is
achievable using Ideal Binary Masks (IBM) where we assume prior
target and noise information is available. We propose that using a
Target Binary Mask (TBM), where only prior target information
is needed, performs as good as using IBMs. We also propose a
TBM estimation method based on target sound estimation using
non-negative sparse coding (NNSC). The recognition results for
TBMs with and without the estimation method for noisy conditions
are evaluated and compared with those of using Mel Frequency
Cepstral Coefficients (MFCC). It is observed that binary mask
feature vectors are robust to noisy conditions. |
Keywords | Binary masks, speech recognition |
Type | Conference paper [With referee] |
Conference | European Signal Porcessing Conference |
Year | 2010 Month August |
BibTeX data | [bibtex] |
IMM Group(s) | Intelligent Signal Processing |