@MASTERSTHESIS\{IMM2013-06593, author = "N. S. Johnsen", title = "Robust Financial Prediction by Learning from the Collective Intelligence of Experts", year = "2013", school = "Technical University of Denmark, {DTU} Compute, {E-}mail: compute@compute.dtu.dk", address = "Matematiktorvet, Building 303{-B,} {DK-}2800 Kgs. Lyngby, Denmark", type = "", note = "{DTU} supervisors: Ole Winther, olwi@dtu.dk, {DTU} Compute, and Co-supervisor: Sune Lehmann J{\o}rgensen, {DTU} Compute", url = "http://www.compute.dtu.dk/English.aspx", abstract = "Odds issued by bookmakers may contain generic biases enforced by typical gambling behaviour, which lead to market inefficient odds. By employing a comprehensive dataset of odds from up to 51 bookmakers on the English Premier League and the Spanish La Liga, seasons 00/01-12/13, the existences of such biases are demonstrated. The biases are particularly prominent in the La Liga, suggesting a more irrational betting behaviour. A theoretical analysis of odds setting techniques reveals that market inefficiencies may also originate from bookmakers' inherent objective to balance their books. A neural network classifier, which applies the odds as input features, has been combined with a decision framework based on optimization of the standardized expected return per match to prot on the inefficiencies. Two modifications of the betting model have been proposed. Firstly to accommodate a model bias to engage odds selections with overestimated posteriors, secondly to restrict the model to certain probabilistic regions, in which the odds segment evidently is more protable. It has been demonstrated that the model has high probabilistic accuracy and pro ts significantly on the La Liga, although the returns are generally season dependent. With the inclusion of the posterior restrictions the model yields the highest and most robust annual return of 16\% on the La Liga. The neural network's predictive accuracy is indifferent to whether {5,} 9 or 37 bookmakers' odds are used as input features, indicating a low data complexity. Unsolved issues remain regarding the selection bias and refinements of the probabilistic restrictive model." }