@MASTERSTHESIS\{IMM2007-05580,
    author       = "D. Bakmand-Mikalski and A. H. Rasmussen",
    title        = "Speaker identification",
    year         = "2007",
    school       = "Informatics and Mathematical Modelling, Technical University of Denmark, {DTU}",
    address      = "Richard Petersens Plads, Building 321, {DK-}2800 Kgs. Lyngby",
    type         = "",
    note         = "Supervised by Assoc. Prof. Niels-Ole Christensen, {IMM,} {DTU}.",
    url          = "http://www2.compute.dtu.dk/pubdb/pubs/5580-full.html",
    abstract     = "This master thesis focus on implementing a real time speaker identification system. Compared to other projects on this field the authors have focused on a more product orientated approach by giving the real time implementation pride of place.

The real time implementation not only motive the authors but also introduce several interesting problem areas. There is a clear distinction between using speech signals recorded under perfect conditions and signals recorded in areas containing ambient noise when implementing a speaker recognition system. 

Front-end signal processing have been used to remove the {DC} value, noise and silence from the signals. This area have been a major challenge and an important factor in achieving high recognition rates. Ignoring these factors not only decreases the recognition rate but also increase the time used for classification as e.g. silence will be classified.

Using front-end processing have lead to better conditions for the feature extraction methods. Mel Frequency Cepstral Coefficients (MFCC) is the most commonly used feature in speaker identification systems and have showed to model the human voice more closely than any other method. Features used in this master thesis are {MFCC,} dMFCC in time and the pitch period. These features have through test showed to be robust and ideal for the real time speaker identification system.

Gaussian Mixture Models and Neural Networks is used as classification systems. It turns out that both classification systems generate high recognition rates based on speech signals recorded under perfect circumstances. No major differences in recognition rates, training time or the time used per classification between these systems have been noticed.

Due to the fact that the classification systems performs almost equally both have been implemented in the final application. The final application have been implemented in C\# and resulted in recognition rates of 95 to 100 percent on signal recorded under perfect conditions. Using speech signals containing ambient noise the recognition rate decreases but still the classifications systems performs with a recognition rate above 90 percent."
}