English Language Speech Database for Speaker Recognition

Corpus Text

The corpus was divided into suggested training and test sets. Part of the text, which is suggested as training subdivision, was made with the attempt to capture all the possible pronunciation of English language including the vowels, consonants and diphthongs, etc. Seven paragraphs of text were constructed and collected, which contains 11 sentences. The training text is the same for every speaker in the database. As for the suggested test subdivision, forty-four sentences (two sentences for each speaker) from NOVA Home [2] were collected.


In summary, for the training set, 154 (7*22) utterances were recorded; and for test set, 44 (2*22) utterances were provided. On average, the duration for reading the training data is: 78.6s for male; 88.3s for female; and 83s for all. The duration for reading test data, on average, is: 16.1s (male); 19.6s (female); and 17.6s (for all).