Textual Similarity |
Aqeel Hussain
|
Abstract | The purpose of this thesis is to identify methods for textual similarity measurement. Many proposed solutions for this problem are suggested in literature. Three of these proposals are discussed in depth and implemented. Two focuses on syntax similarity and one focus on semantic similarity. The two syntax algorithms represents edit distance and vector space model algorithms. The semantic algorithm is an ontology based algorithm, which lookup words in WordNet. Using this tool the relatedness between two given texts is estimated. The other algorithms use Levenshtein and n-gram, respectively. The performance of these implementations are tested and discussed.
The thesis concludes that performance is very different and all algorithms perform well in their respective fields. The algorithms cannot be distinguished as to determining one, which outshines the others. Thus an algorithm implementation has to be picked based on the task at hand. |
Type | Bachelor thesis [Academic thesis] |
Year | 2012 |
Publisher | Technical University of Denmark, DTU Informatics, E-mail: reception@imm.dtu.dk |
Address | Asmussens Alle, Building 305, DK-2800 Kgs. Lyngby, Denmark |
Series | IMM-B.Sc.-2012-16 |
Note | Supervised by Professor Robin Sharp, ris@imm.dtu.dk, DTU Informatics |
Electronic version(s) | [pdf] |
Publication link | http://www.imm.dtu.dk/English.aspx |
BibTeX data | [bibtex] |
IMM Group(s) | Computer Science & Engineering |