Sequence Signals for Protein Expression

Benjamin Daniel H°yer

AbstractThis thesis was prepared at the department of Informatics and Mathematical Modelling at the Technical University of Denmark in fullment of the requirements for acquiring an M.Sc. in Mathematical Modelling and Computation. It was carried out in collaboration with Novozymes A/S.
The thesis deals with codon optimisation as a method for increasing protein yields in an industrial setting. The goal is to provide an accurate and sufficient description of DNA to protein translation for a mathematician to get a grasp of the topic. A parallel goal is to provide some insight into machine learning techniques in biotechnology for biologists. Finally, the goal is to develop a model from a machine learning standard, rather than necessarily a biological standpoint.
This thesis is loosely divided into three parts.
Part 1 Introduction and biological background. Topics from DNA transcription to RNA translation to proteins are covered.
Part 2 Prior works and machine learning methodology.
Part 3 Modelling building and validation.
