@MASTERSTHESIS\{IMM2014-06792, author = "K. B. Albrektsen", title = "Image Based Characterization of Circulating Tumor Cells", year = "2014", school = "Technical University of Denmark, Department of Applied Mathematics and Computer Science", address = "Richard Petersens Plads, Building 324, {DK-}2800 Kgs. Lyngby, Denmark, compute@compute.dtu.dk", type = "", note = "The thesis was supervised by Professor Rasmus Larsen, rlar@dtu.dk, {DTU} Compute, Professor Knut Conradsen, knco@dtu.dk, {DTU} Compute, Postdoc Mark Lyksborg, {DTU} Compute, and Tom Hede Markussen PhD ({CEO} of CytoTrack Aps)", url = "http://www.compute.dtu.dk/English.aspx", abstract = "The assessment of circulating tumor cells (CTCs) in blood samples from cancer patients can help in determining the prognosis for the patient and can help in personalized treatment. The CytoTrack is a fluorescent microscope, which can be used to image possible CTCs within a blood sample. These images are manually looked through by a trained operator, in order to determine which images are of CTCs and which are false positives. This is time consuming and tedious work, and reducing this scoring time is the topic of this thesis. In this thesis images from the CytoTrack are automatically scored. For this different classification methods are tested including random forest and support vector machines. The algorithms are tested on data from breast cancer patients and on data from spiked samples. The performance on the spiked data are significantly better compared with the patient data. This is explained by bigger variations within the patient samples compared with the spiked samples. Through cross validation both high sensitivities and specificities are computed. For this work the sensitivity is weighted over the specificity since it is important not to miss any true positives. To avoid missing true positives, thresholds are determined based on the receiver operating characteristic (ROC) curves. These thresholds are chosen so the true positive rate is equal to one. After choosing a threshold the algorithms are tested on a unknown test set. From this testing it is shown that it is possible to completely avoid false negatives and still classify a significant part of the data as negatives. That is, the amount of data to be scored manually is reduced and hence the scoring time is reduced." }