Botnet Detection by Correlation Analysis

Andre Orvalho

AbstractWhen a bot master uses a control and commander (C&C) mechanism to assemble a large number of bots, infecting them by using well known vulnerabilities, it forms a botnet. Botnets can vary in C&C architecture (Centralized C&C or P2P are the most common), communication protocols used (IRC, HTTP or others like P2P) and observable botnet activities. They are nowadays one of the largest threats on cyber security and it is very important to specify the different characteristics of botnets in order to detect them, the same way a hunter needs to know its prey before preparing methods to catch it. There are 2 important places to look for botnet activity: The network and the infected host.
This project intends to present a study that correlates the behavior on the network with the behavior on the host in order to help detection, studies like [SLWL07] (based on network behavior) and [SM07] (based on host behavior) are two good start points to help on the research. The choice of the architecture was done by looking at the botnet characteristics especially the capacity of changing and evolving which makes methods for detection by misuse obsolete. The system is designed to first look at 4 features of system calls on the host side: First which system call it is, second the name of the application using the system call, third the time between this system call and the last system call and for last the sequence of the past three system calls. A technique of unsupervised learning (the K-means algorithm) will be used to calculate the values for the threshold using an unclassified training set. when on the real world the collection is used to calculate the values to compare with the threshold. If it passes the threshold than the necessary information is passed to the network evaluation block. On the network side and before receiving any data from the host side, it will calculate the threshold for the flows given on the training set. When using the data from the host to narrow down the number of flows to look at, it verifies if their values pass the threshold. The feature used to calculate the threshold is the time between flows. If the network finds flows that pass the threshold for the network evaluation block than it will emit reports and alarms to the user.
The small experiences done show some promising signs for use on the real world even though a lot more further testing is needed especially on the network bit. The prototype shows some limitations that can be overcome by further testing and using other techniques to evolve the prototype.
TypeMaster's thesis [Academic thesis]
Year2012
PublisherTechnical University of Denmark, DTU Informatics, E-mail: reception@imm.dtu.dk
AddressAsmussens Alle, Building 305, DK-2800 Kgs. Lyngby, Denmark
SeriesIMM-M.Sc.-2012-53
NoteSupervised by Professor Robin Sharp, ris@imm.dtu.dk, DTU Informatics
Electronic version(s)[pdf]
Publication linkhttp://www.imm.dtu.dk/English.aspx
BibTeX data [bibtex]
IMM Group(s)Computer Science & Engineering