Detection of congestive heart failures using C4.5 Decision Tree

Automatic electrocardiogram (ECG) heart beat classification is significant for diagnosis of heart failures. The purpose of this study is to evaluate the effect of C4.5 decision tree method in creating the model that will detect and separate normal and congestive heart failures (CHF) on the long-term ECG time series. The research was conducted in two stages: feature extraction using autoregressive (AR) module and classification by applying C4.5 decision tree method. The ECG signals were obtained from BIDMC Congestive heart failure database and classified by applying different experiments. The experiment results showed that the proposed method reached 99.86% classification accuracy (sensitivity 99.77%, specificity 99.93%, area under the ROC curve 0.998) and has potential in detecting the congestive heart failures.


INTRODUCTION
Heart failure is the most common syndrome that develops slowly but causes cardiac dysfunction as the heart is not strong enough to keep blood flowing through the body. It makes damage to the heart caused by heart attacks, long term high blood pressure or an anomaly of one of the heart valves (Son, Kim, Kim, Park, & Kim, 2012). Considering that there is no definite diagnosis of heart failure, medical diagnosis is mostly based on history or physical examinations, such as electrocardiography, chest radiography or echocardiography. Accurate and timely diagnosis of physicians is significant to avoid more damage and to identify appropriate measures and approaches (Son, Kim, Kim, Park, & Kim, 2012) However, heart failure is usually not recognized until it comes to the more advanced phase, referred to as the congestive heart failure, which causes fluid to flow to lungs, feet and abdominal cavity. According to the New York Heart Association (NYHA) (Association, 1964) heart failure is classified into four classes: 1. Class I (Mild): The patient has no limitation of physical activity.

Southeast Europe Journal of Soft Computing
Available online: www.scjournal.com.ba NO.2September 2013-ISSN 2233-1859 Detection of congestive heart failures using C4.5 Decision Tree

, Abdulhamit Subasi
Faculty of Engineering and Information Technologies, Francuskerevolucije bb, Bosnia and Herzegovina

Abstract
Automatic electrocardiogram (ECG) heart beat classification is significant for diagnosis of heart failures. The purpose of this study is to evaluate the effect of C4.5 decision tree method in creating the model that will detect and separate normal and congestive heart failures (CHF) on the long-term ECG time series. The research was conducted in two stages: feature extraction using autoregressive (AR) module and classification by applying C4.5 decision tree method. The ECG signals were obtained from BIDMC Congestive heart failure database and classified by applying different experiments. The experiment results showed that the proposed method reached 99.86% classification accuracy (sensitivity 99.77%, specificity 99.93%, area under the ROC curve 0.998) and has potential in detecting the congestive heart failures.
Heart failure is the most common syndrome that develops slowly but causes cardiac dysfunction as the heart is not strong enough to keep blood flowing through the body. It used by heart attacks, longterm high blood pressure or an anomaly of one of the heart . Considering that there is no definite diagnosis of heart failure, medical ased on history or physical examinations, such as electrocardiography, chest radiography or echocardiography. Accurate and timely diagnosis of physicians is significant to avoid more damage and to identify appropriate measures and (Son, Kim, Kim, Park, & Kim, 2012). However, heart failure is usually not recognized until it comes to the more advanced phase, referred to as the congestive heart failure, which causes fluid to flow to rding to the New (Association, 1964 According to European Heart Network and European Society of Cardiology (Townsend, Luengo Leal, Gray, & Nichols, 2012), each year heart disease causes over 4 million deaths in Europe and over 1.9 million deaths in the European Union (EU) which is 47% deaths in Europe and 40% in EU.
Electrocardiography is noninvasive tool used to measure electrical activity of the heart. The electrocardiogram (ECG) is a safe examination and recording of the electrical impulses that produce the heart beats. ECG shows if the heart is damaged or the rhythm of the heart beat is normal or irregular (Passanisi, 2004).
Detection of congestive heart failures using C4.5 Decision Tree Francuskerevolucije bb, Automatic electrocardiogram (ECG) heart beat classification is significant for failures. The purpose of this study is to evaluate the effect of C4.5 decision tree method in creating the model that will detect and separate term ECG time series. tages: feature extraction using autoregressive (AR) module and classification by applying C4.5 decision tree method. The ECG signals were obtained from BIDMC Congestive heart failure database and classified by applying different experiments. The experimental results showed that the proposed method reached 99.86% classification accuracy (sensitivity 99.77%, specificity 99.93%, area under the ROC curve 0.998) and has potential in detecting the congestive heart failures.
Class II (Mild): The patient has slight limitation Class III (Moderate): The patient experiences marked limitation of physical activity. Class IV (Severe): The patient suffers from severe to complete limitation of activity.
According to European Heart Network and European (Townsend, Luengo-Fernandez, , each year heart disease causes over 4 million deaths in Europe and over 1.9 million deaths in the European Union (EU) which is 47% Electrocardiography is noninvasive tool used to measure electrical activity of the heart. The electrocardiogram (ECG) is a safe examination and recording of the electrical impulses that produce the heart beats. ECG shows if the damaged or the rhythm of the heart beat is normal The ECG signals taken from different subjects consist of many data points; hence ECG signals should be contracted into few features performing feature extraction using autoregressive (AR) modeling. Decomposed ECG signals are used to detect different types of heart failures by using C4.5 decision tree classifier.  (Goldberger, et al., 2000).

The Burg Method for Autoregressive (AR) parameter estimation
In this study, autoregressive (AR) Burg algorithm was used for feature extraction of ECG signals. Autoregressive model is well -known feature extraction method for biological signals. A process of model order p in autoregressive model is given by following formula: Burg algorithm method is technique used for estimating a real valued autoregressive coefficient k a recursively using k a of previous order p-1. It is accurate because it uses many data points at the time minimizing the backward and forward error (Palaniappan, 2010). However, Burg algorithm involves prediction error powers defined by formula:

C4.5 Decision Tree Algorithm
Decision tree is method used to classify instances by arranging them down the tree from root to leaf nodes, where each internal node represent test for some attribute of the tree and has no outgoing edges. The root is a node without incoming edges. The other nodes have exactly one incoming edge and are called leaves. In decision tree learning, instances are classified, starting from the root, down the tree to the leaves, according to the output of the test. Each leaf belongs to specified class, called target value (Mitchell, 1997), (Maimon & Rokach, 2005).
In this research, C4.5 decision tree algorithm was used for generating decision tree. C4.5 algorithm is based on ID3 algorithm, a very simple decision tree algorithm, presented by Quinlan (Quinlan, 1993). This algorithm passes through decision tree, visits each node and selects optimal split. It is achieved by using the gain ratio, represented by following formula: where information gain is the impurity -based criterion which uses an entropy measure as the impurity measure, for some training set S with respect to the attribute A and entropy is the term which describes how equally the attribute splits the data (Mitchell, 1997), (Maimon & Rokach, 2005), (Quinlan, 1993).

EXPERIMENTAL RESULTS
In this research, an automated classifier is designed to classify heart beats signals belonging to two categories: N (normal heart beats) and CHF (congestive heart failures), where 1300 ECG signal segments were taken from MIT -BIH Arrhythmia database and 1500 signal segment from BIDMC Congestive Heart Failure (CHF) database. The whole data set is divided into training subset used to create a model for classifying the ECG signals and testing subset, used to show the performance of the model.
The 10 -fold cross validation method, presented by Salzberg (Salzberg, 2007) is applied to the whole data set, which is divided into 10 folds, trained and tested for 10 times and average cross validation accuracy is found. Furthermore, the efficiency of C4.5 decision tree algorithm in classifying the ECG signals was calculated. Number of leaves in designed tree is 8 and size of the tree is 15. The result of 99.86% shows the high accuracy of the model created, showing that the model created is efficient in identification and classification of ECG signals.
In the study, beside the accuracy, two more statistical indices, Receiver Operating Characteristic (ROC) and Fmeasure were computed for both classes (N and CHF), shown in Table 1. A ROC curve is evaluation metric of observer performance and is created by plotting the number of true positive values on vertical axis and false positive on the horizontal axis. (Witten & Frank, 2005). ROC curves, for both classes (N and CHF) are presented in Figure 1 and Figure  2, respectively. F -measure is evaluation metric of imbalance problems. The ROC curve parameter has value 0.998 and the F -measure average value is 0.999, which demonstrates that C4.5 decision tree classifier obtains high accuracy in classification of ECG heartbeats.

DISCUSSION
Similar studies of detection and separation of normal heartbeats and heartbeats with congestive heart failures can be found in different articles , (Baim, et al., 1986), (Ubeyli, 2009). Comparing the results from this study to previous results, it can be seen that C4.5 decision tree algorithm have big part in the area of ECG signals classification.
Based on the result, it can be noticed that high classification accuracy of the C4.5 decision tree classifier gives the understating of the features representing the ECG signals. Also, high values of ROC curve and F -measure confirm that C4.5 decision tree method can be an applicable classification method. Important feature of this classifier is enviable classification speed.

CONCLUSIONS
In this study, an automated heartbeat classification system is developed for detecting and separating normal heartbeat signals and signals with congestive heart failures. The autoregressive (AR) Burg method was applied for extracting features and C4.5 decision tree classifier to classify the ECG signals to the dataset.
Experimental results showed that C4.5 decision tree algorithm has significant role in identification and classification of ECG heartbeat signals and accuracy of 99.86% confirms it.