Enhanced Context Recognition by Sensitivity Pruned Vocabularies

Rasmus Elsborg Madsen, Sigurdur Sigurdsson, Lars Kai Hansen

AbstractLanguage independent `bag-of-words' representations are
surprisingly effective for text classification. The generic BOW
approach is based on a high-dimensional vocabulary which may
reduce the generalization performance of subsequent classifiers,
e.g., based on ill-posed principal component transformations. In
this communication our aim is to study the effect of sensitivity
based pruning of the bag-of-words representation. We consider
neural network based sensitivity maps for determination of term
relevancy, when pruning the vocabularies. With reduced
vocabularies documents are classified using a latent semantic
indexing representation and a probabilistic neural network
classifier. Pruning the vocabularies to approximately 20% of the
original size, we find consistent context recognition enhancement
for two mid size data-sets for a range of training set sizes. We
also study the applicability of the sensitivity measure for
automated keyword generation.
Keywordssensitivity, neural networks, text, classification, dimensionality
TypeConference paper [With referee]
ConferenceProceedings of 17th International Conference on Pattern Recognition (ICPR 2004)
Year2004    Month August    Vol. 2    pp. 483-486
AddressCambridge UK
Electronic version(s)[pdf]
BibTeX data [bibtex]
IMM Group(s)Intelligent Signal Processing