Processing Intrusion Data with Machine Learning and MapReduce

  • Brunner Csaba
doi: 10.32565/aarms.2017.1.4

Abstract

These past years, cyber-attacks became a daily issue for enterprises. A possible defence against this kind of threat is intrusion detection. One of the key challenges is information extraction from this large amount of logged data. My paper aims to identify cyber-attack types as patterns in log files using advanced parallel computing approach and machine learning techniques. The MapReduce programming model is applied to parallel computing, while decision tree algorithms are used from machine learning.
I discuss two research questions in this paper. First, despite parallelization, are machine learning algorithms still able to provide results with acceptable accuracy measured by traditional data mining figures (accuracy, precision, recall, area under receiver operand characteristic [ROC] curve [AUC])? Second, is it possible to achieve significant performance improvement by measuring runtime execution of the algorithm by introducing several measurement points?
I proved that the machine learning model with two categories in the target variable is preferred to the one having five categories. The average performance improvement was 4–5 times faster for the whole algorithm compared to a single core solution. I achieved most of these improvements during the data transfer phase.

Keywords:

intrusion detection parallel processing machine learning network security

How to Cite

Brunner, C. (2017) “Processing Intrusion Data with Machine Learning and MapReduce”, AARMS – Academic and Applied Research in Military and Public Management Science. Budapest, 16(1), pp. 37–52. doi: 10.32565/aarms.2017.1.4.

Downloads

Download data is not yet available.