@MASTERSTHESIS\{IMM2007-05502, author = "M. Sigurdsson and S. C. Halling", title = "Zeeker: A topic-based search engine", year = "2007", keywords = "Search engine design, Information Retrieval, Natural Language Processing, Clustering, Wikipedia, Zeeker Search Engine", school = "Informatics and Mathematical Modelling, Technical University of Denmark, {DTU}", address = "Richard Petersens Plads, Building 321, {DK-}2800 Kgs. Lyngby", type = "", note = "Supervised by Prof. Lars Kai Hansen, {IMM,} {DTU}.", url = "http://www2.compute.dtu.dk/pubdb/pubs/5502-full.html", abstract = "The rapid growth and massive quantities of data on the Internet have increased the importance and complexity of Information Retrieval. The amount and diversity of data introduce shortcomings in the way search engines rank their results. As a result, Search Engine Optimization companies flourish by exploiting the search engine weaknesses in order to manipulate the search results. Furthermore, many search engines present several million results to queries and more often or not these results are biased toward blogs and on-line stores. This bias is due to the link analysis used to rank the search results. Internet link structure is the strength but at the same time the Achilles’ heel of these ranking algorithms. In this work it is proposed to push search engine behavior in a new direction, away from link analysis and toward actual content and topic analysis of web pages. With the use of clustering algorithms and the vast amount of information in Wikipedia the idea is to create categories that are good enough to be used to filter search results. The prospect of letting the user filter the search results by the push of a button would improve the relevance of the search results for a particular query. Categories will give the user fewer yet more relevant results. The well-defined categories and articles of Wikipedia are shown to be valuable as a training set when clustering Internet data. The implemented Zeeker Search Engine has precise categories which can become even better by taking advantage of additional information available in Wikipedia. A user-survey conducted has revealed that Zeeker Search Engine has good relevance when retrieving information, is easy to use and has great potential as a search engine. This work has suggested innovative ideas and ways of using the information in Wikipedia to produce good categories and retrieve more relevant search results." }