Intrusion Detection System using Fuzzy Logic

Intrusion detection plays an important role in today’s computer and communication technology. As such it is very important to design time efficient Intrusion Detection System (IDS) low in both, False Positive Rate (FPR) and False Negative Rate (FNR), but high in attack detection precision. To achieve that, this paper proposes IDS model based on Fuzzy Logic. Proposed model consists of three parts, Input Reduction System (IRS), which uses Principal Component Analysis to reduce the dimensions of the system from 41 to 10, Classification System, which uses Fuzzy C Means to create data clusters based on training data and Pattern Recognition System based on Nearest Neighborhood method, which classifies new-coming data records to their respective clusters. Based on different attack types, the system performance in classification process is different and the best performance is achieved for PROBE attack, with 99.3% success rate, and the best performance in pattern recognition is achieved for U2R with 58.8% of success rate.


Introduction
Since second half of last century, computer networks started to grow with tremendous speed and with them the need for security mechanisms which would ensure data, privacy and computer security grew as well.Many different security mechanisms were designed, yet none was reliable enough to protect the computer-network system from ever evolving threats and attacks.Firewalls were made in order to protect the networks from attacks that come from outside world, but they do not obliterate any intrusion coming from inside the network.Intrusion Detection Systems (IDS), on the other hand, monitor networking packets in order to prevent any form of computer attacks from within the network [1][2][3][4].This work focuses on IDS, since existing commercial IDS's offer wide window for improvements.
In general, IDSs may be designed to perform misuse detection or anomaly detection [1,5].In misuse detection, all known abnormal behavior is defined and the system is trained to recognize it.It works by comparing arriving packet with features of known attack behavior.If any new, not predefined attack arrives, the system would recognize it as normal packet, causing high FNR [2].To avoid very high FNR, misuse based IDS must be retrained very often, sometimes causing delays in the network [6].
Anomaly detection is modeled based on normal behavior [7], so any pattern violating that behavior would be defined as system attack [1,5].Anomaly detection causes high FPR, because even new normal packet, unknown to the system, would be identified as an attack.This deteriorates overall network performance, since some normal packets would never reach destination.For these reasons, most commercial IDS are designed to perform misuse detection alone.
False alarms, be it false positive or false negative, are limiting the performance of IDS.It is therefore very important to reduce both types of these alarms, and the best way to do it is by combining anomaly and misuse detection [5,8].This paper proposes fast and efficient IDS based on fuzzy logic, that will first train the system to cluster the data into different clusters and then each new-coming packet would be classified using pattern recognition into an appropriate cluster.The proposed model produces five outputs, normal packet and four attack types (DoS, U2R, R2L and Probe) [9].The model proposed here was tested in different ways, and that will be shown in the last two sections of this paper.
The paper is organized as follows: Related works are discussed in Section 2, Section 3 gives an overview of methods and algorithms used in this work, Section 4 presents data used for experimentation, Section 5 describes the system model, Section 6 presents and discusses results, and Section 7 concludes the paper.

Related Works
Different machine learning mechanisms, including Artificial Neural Networks, Fuzzy Logic, Genetic Algorithms, etc. have been used on KDD CUP 1999 data for Intrusion Detection [1][2][3][4][5][6][7][8][9][10], with neural networks as main tool in this type of problem.Different neural network algorithms have been used, including Grey Neural Networks [4], RBF [10,11] Recirculation Neural Networks [2], PCA [6,12] and MLP [5], with MLP generally showing better results than others [2].These works are mainly focusing on misuse detection.In order to combine misuse and anomaly detection, many researchers have recently attempted hybrid methods, by combining neural networks with other machine learning mechanisms, such as fuzzy logic or genetic algorithms [5,[13][14][15][16].Fuzzy logic tends to be better tool of clustering, as it is faster and more suitable for real-time systems.Summary of results based on intelligent methods is presented in Table 1.

Algorithms and Methods
Multiple methods are used in this work: PCA for feature reduction, Fuzzy C Means for data classification, and Nearest Neighborhood Method for pattern recognition.

Principle Component Analysis
PCA is very useful mathematical algorithm, based on orthogonal linear transformation, which is widely used for data compression, image processing and feature extraction [6,12].The goal of PCA is to find a set of orthogonal components that minimize the error in reconstructed data.An equivalent formulation of PCA is to find an orthogonal set of vectors that maximize the variance of the projected data [17].
In other words, PCA transforms the data into different frame of reference with minimal error and using fewer features than the original data, while preserving data randomness.[18].For more detailed description of PCA algorithm refer to [17,18].

Data Clustering and Classification
Data clustering and classification is the process of creation of clusters given the initial training data.These clusters can then be used in combination with other methods, such as neural networks, or fuzzy pattern recognition.The classification method used in this paper is called Fuzzy C-Means, and it is a method of clustering based on minimization of objective function J m .
where U is partition matrix, v i is cluster center, d ij is Eucledian distance measure in m-dimensional feature space, between the j th data sample x j and the i th cluster center v i , and   is the membership of j th data point to the i th class.
Partition matrix U is used for grouping a collection of n data sets into c classes, and as such each entry in the partition matrix is represented by the membership function   .The Eucledian distance and cluster centers are given in equations ( 2) and (3). (2) k is a variable on the feature space and m' is the membership exponent which controls the level of fuzziness.
The fuzzy C means is trying to tune the partition matrix, centers and distances, so that the objective function J m is minimized [26].

Pattern Recognition
Pattern recognition is defined as a process of identifying structure in data by comparing is to some known structure, generally developed through methods of classification, such as Fuzzy C means.Multiple methods for pattern recognition exist, and in this research work we focus on Nearest Neighborhood Method, which is suitable for multi-feature pattern recognition process.
In the nearest neighbor classifier, m features for each data sample is considered as a vector, Assuming that the clusters already exist, then the incoming data samples can be classified to their respective clusters by calculating the distance d between the data sample and the center of each cluster.The data sample x will then be classified to belong to the cluster to which center it has the shortest distance, as shown in the equation 5 [26].

Data Description
The data used in this work is widely used KDD CUP 1999 data, which was created based on DARPA Intrusion Detection data set, collected by MIT Lincoln Laboratory.[9].
The data contains 41 features, specifying packet type, protocol and so on, and class label, specifying if the packet is normal or attack.Data set contains 22 attack types, which can be divided into four main categories [9], as follows: -Denial of Service (DoS) denies service to legitimate users, most commonly through overloading of existing resources.Six out of total 22 attack fall into this group.
-User-to-Root (U2R), user with normal user privileges tries to exploit vulnerabilities of the system in order to gain the access to the root of the system.Four out of total 22 attack fall into this group.
-Remote-to-Local (R2L), unauthorized user from a remote machine tries to access local machine by exploiting holes in local machine.Eight out of 22 attacks fall into this group.
-Probing (Probe), unauthorized user monitors the networks in order to obtain information and discover system's vulnerabilities.Four out of total 22 attack fall into this group.
Original KDD CUP 1999 training data, consisting of about 5 million records, was too large to analyze, and for that reason, concise set known as '10% training set' was used.Out of this concise set of 500 000, 4911 data records were selectively chosen to represent the all possible types of packets and were used in training and testing of Fuzzy IDS system presented here.

Intrusion Detection Model
Intrusion Detection Model was designed based on anomaly detection and misuse detection, with Fuzzy C Means recognizing if the attack exists or not, and if it exists which attack it is.To reduce the FPR and FNR, there is an update system, which helps update the classification clusters.Thus, proposed Intrusion Detection Model consists of three main parts: Input Reduction System, Classification system and Pattern Recognition System (Figure 1).Pattern Recognition with new data and Classification together represent the update system, or the system responsible for reducing FPR and FNR.

Input Reduction System
In systems with large dataset characterized with numerous features, input or feature reduction process should be done whenever possible.This step helps remove distracting variance from a dataset and as such improves the performance of the classifier and speeds up the classification process.In this work, single PCA neural network is was chosen as a tool for feature reduction.
PCA Neural Network takes original 41 inputs and reduces the input size to 10.The initial number of PCA was chosen 10, and the system performance was checked accordingly.Then the number of PCA was increased to 15 and then to 41, and the overall performance of the algorithm did not improve.Rather it deteriorated.As scuh it was decided to keep the initial number of PCA components, i.e. altogether 10.General overview of input reduction system is shown in Figure 4.

Clustering based on Fuzzy C Means
Fuzzy Classification system is based on Fuzzy C Means, which receives inputs from IRS, with each input having 10 features.The Fuzzy C Means was done to create 2 clusters, and 5 clusters.Five-cluster system is shown in Figure 5, and it represents the fuzzy system able to recognize all 5 types of packets, namely normal, dos probe, u2r and r2l.To test the success of the performance when the number of cluster is reduced, four two-cluster systems were also simulated.The four two cluster systems were separately trained to recognize the following four sets: normal packet vs. DOS attack, normal packet vs. PROBE attack, normal packet vs. U2R attack and normal packet vs. R2L attack.The further discussion on the testing criteria, training and test data as well as the performance of each of these systems is discussed in the following section.
The clusters and their centers were then used as base for the process of pattern recognition.

Pattern Recognition
Pattern recognition system was based on Nearest Neighborhood method, which relies on previously generated clusters.Each input data is evaluated based on each feature and the distance of each feature and corresponding data center is calculated.The system then classifies the input to belong to that cluster to which it has the shortest distance.

Results and Discussion
The Fuzzy IDS was simulated and tested via two different approaches, one was to cluster and recognize all types of packets, and the other was to cluster and recognize four pairs having pattern "normal packet vs. some attack type".This was done to determine the success in classification of each type of attack using proposed model.
The simulation of each approach was done as follows: first the training and test input data underwent the reduction process via PCA, so that the features of the input data were reduced from 41 to 10.The training data was then partitioned into initial clusters, and mean of each cluster, together with initial partition matrix U was calculated.The process then underwent the modification of centers, until the difference between the new center and the old one was 10 -8 .The values of partition matrix and new clusters were then recorded and these values were used as base for pattern recognition process.
Pattern recognition used PCA reduced test data.During that process the difference between the input data record and all cluster centers was determined, and smallest difference determined to which cluster that data record belongs to.The error of this classification is then calculated and results are shown in Table 3.When all types of packets were used, i.e. normal packets and all five attacks, the size of training data was 500, and the size of test data was 280.The Fuzzy C means created five clusters, and the success rate was 20%.This low success rate can be attributed to the nature of data records, which has high overlapping.
The best performance in Fuzzy C means was detected when the system was trained to generate two clusters, one for normal packets and one for PROBE attack, the performance success was 99.3%.
The best performance in Pattern Recognition process, amounting to 58.8 %, was achieved with two clusters and U2R attack type.For this type of attack, the classification success was 69.1%.These results will be improved for larger input training data sets, but due to computational limit of the devices used in this simulation, this was not done.

Conclusion and Recommendation
In this paper we have proposed new model for Intrusion Detection System.The model consists of three parts: Input reduction system, classification system and pattern recognition system.
Input reduction system is based on PCA, and it reduces the number of inputs from 41 to 13. Classification system is based on Fuzzy C Means and it produces the clusters that will be used in pattern recognition process.The pattern recognition system is used to classify new, unknown data to its corresponding cluster.The pattern recognition process proposed in the model should send the feedback to the classification system in order to update the centers if new, untrained data comes.
The overall system showed the classification for five clusters was 35.7 %, and the best overall performance was achieved for U2R attack with 58.8 % correctness.The classification system's best performance was 99.3% for normal packet and DOS attack.
The future improvements on this research could be done in the respect that the training data size increases, so that the cluster centers get more tuning.And another way to improve the system was to include neural network committee machine after the clustering process.

Figure 3 :
Figure 3: Fuzzy C Means clustering scheme

Table 2
shows FPR and FNR for different artificial intelligence classification algorithms.

Table 2 : Summary of FPR and FNR for different classification algorithms
Table2shows that Fuzzy C-Means combined with MLP ANN has the lowest FPN and FNR, and as such this paper tries to examine the performance of pure fuzzy logic-based IDS.