Scientific discovery using genetic programming 
Maarten Keijzer

Abstract  Genetic Programming is capable of automatically inducing symbolic computer programs on the basis of a set of examples or their performance in a simulation. Mathematical expressions are a welldefined subset of symbolic computer programs and are also suitable for optimization using the genetic programming paradigm. The induction of mathematical expressions based on data is called symbolic regression. In this work, genetic programming is extended to not just fit the data i.e., get the numbers right, but also to get the dimensions right. For this units of measurement are used. The main contribution in this work can be summarized as: The symbolic expressions produced by genetic programming can be made suitable for analysis and interpretation by using units of measurements to guide or restrict the search. To achieve this, the following has been accomplished: A standard genetic programming system is modified to be able to induce expressions that moreorless abide type constraints. This system is used to implement a preferential bias towards dimensionally correct solutions. A novel genetic programming system is introduced that is able to induce expressions in languages that need contextsensitive constraints. It is demonstrated that this system can be used to implement a declarative bias towards 1) the exclusion of certain syntactical constructs; 2) the induction of expressions that use units of measurement; 3) the induction of expressions that use matrix algebra; 4) the induction of expressions that are numerically stable and correct. A case study using four realworld problems in the induction of dimensionally correct empirical equations on data using the two different methods is presented to illustrate to use and limitations of these methods in a framework of scientific discovery. 
Type  Ph.D. thesis [Academic thesis] 
Year  2001 
Publisher  Informatics and Mathematical Modelling, Technical University of Denmark, DTU 
Address  Richard Petersens Plads, Building 321, DK2800 Kgs. Lyngby 
Series  IMMPHD200192 
Note  Supervised by Prof. Lars Kai Hansen, IMM, DTU 
Electronic version(s)  [pdf] [ps] 
BibTeX data  [bibtex] 
IMM Group(s)  Intelligent Signal Processing 