Southeast Europe Journal of Soft Computing Diagnosis of Parkinson's Disease Using Fuzzy C-means Clustering and Pattern Recognition

Parkinson's disease (PD) is a global public health problem of enormous dimension. In this study, we aimed to discriminate between healthy people and people with Parkinson's disease (PD). Various studies revealed, that voice is one of the earliest indicator of PD, and for that reason, Parkinson dataset that contains biomedical voice of human is used. The main goal of this paper is to automatically detect whether the speech/voice of a person is affected by PD. We examined the performance of fuzzy c-means (FCM) clustering and pattern recognition methods on Parkinson's disease dataset. The first method has the main aim to distinguish performance between two classes, when trying to differentiate between normal speaking persons and speakers with PD. This method could greatly be improved by classifying data first and then testing new data using these two patterns. Thus, second method used here is pattern recognition. The experimental results have demonstrated that the combination of the fuzzy c-means method and pattern recognition obtained promising results for the classification of PD.


I. INTRODUCTION
Parkinson's disease is a chronic progressive neurological disease that affects a small area of nerve cells called neurons in the area of the brain called the substantia nigra.These cells normally produce dopamine, a chemical (neurotransmitter) that transmits signals between areas in the brain that, when working normally, coordinate smooth and balanced muscle movement.Parkinson's disease causes these nerve cells to die, and as a result, body movements are affected.The exact cause of this cell-death is still unknown.PD usually affects people over the age of 60, and it is more common in men than in women [16], [17].
The first signs are likely to be barely noticeable, a feeling of weakness or stiffness in one limb, perhaps, or a fine trembling of one hand when it is at rest (activity causes the tremor to disappear).Eventually, the shaking worsens and spreads, muscles tend to stiffen, and balance and coordination deteriorate.Thus, the most obvious symptoms are shaking, rigidity, slowness of movement, difficulty with walking and gait, and communication.As symptoms get worse, people with the disease may have trouble walking, talking or doing simple tasks.Other symptoms may include depression and other emotional changes; difficulty in swallowing, chewing; urinary problems or constipation; skin problems and sleep disruptions [17].
Parkinson's causes are unknown, but genetics, aging, and toxins are being researched.There are currently no blood or laboratory tests that have been proven to help in diagnosing PD.Particularly in early stages, the disease can be difficult to diagnose accurately.Doctors may sometimes request brain scans or laboratory tests in order to rule out other diseases.Blood tests, brain imaging techniques such as magnetic resonance image (MRI), positron emission tomography (PET scan), and single photon emission computed tomography (SPECT), may be used to help doctors exclude other medical conditions, such as stroke or brain tumors, that produce symptoms similar to those of Parkinson's disease [10].Amongst others, one of the method for disease diagnosis is detecting and analyzing voice disorders by using acoustic tools that record the changes in pressure at lips or inside the vocal tract.Recently, upon signal processing a group of experts found some features in the voices of the people with Parkinson's disease that can be used as discriminatory measures to differentiate those who have the disease from those who do not.After a Parkinson's diagnosis, Parkinson's disease treatments are given to help relieve symptoms.Although there is presently no cure, there are treatment options such as medication and surgery to manage its symptoms.Since there is no cure now available, early diagnosis is critical for maximizing the effect of treatment and improving the quality of the patient's life.
Scientists are doing a lot of research to look for the answer for causes of the disease.They are studying many possible causes, including aging and poisons in the environment.Abnormal genes seem to lead to Parkinson's disease in some people.But so far, there is not enough proof to show that it is always inherited.Studies in medical biometrics on detecting PD in the early stage are under way and have drawn a lot of attention from the biometrics community in recent years.
Max Little, from the University of Oxford, has been developing software that learns to detect differences in voice patterns, in order to spot distinctive clues associated with Parkinson's.He is using machine learning, collecting a large amount of data when it is known that someone has the disease or not and he train.He introduce a new measure of dysphonia, pitch period entropy (PPE), which is robust to many uncontrollable confounding effects including noisy acoustic environments and normal, healthy variations in voice frequency.He collected sustained phonations from 31 people, 23 with PD, and using a kernel support vector machine (SVM) got overall correct classification performance of 91.4%.He concluded that nonstandard methods in combination with traditional harmonicsto-noise ratios are best able to separate healthy from PD subjects [2].Also, the proposal in [1] gave accuracy of around 90% in the diagnosis of PD using Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs).Adaptive Neuro-Fuzzy Classifier with linguistic hedges gave recognition results with 95.38% training and 94.72% testing classifying performance [5].Paper [12] propose algorithm based on the review synthesis that can tackle real-time constraints in pathological voice recognition for the assessment of Parkinson's disease severity.Pause detection, peak to average power rate clipping and zero thresholding rate calculations produce rich voice features in realtime.These features may be further processed using wavelet transforms and used with a neural network for detection and quantification of speech anomalies related to Parkinson's disease.[13] deals with the application of the artificial immune system to discriminate between healthy and people with Parkinson's disease (PWP).Taking inspiration from natural immune systems, they try to grab useful properties such as automatic recognition, memorization and adaptation.Medical data mining also has great potency for exploring the out of sight patterns in the respective medical data sets.Paper [14] provide a survey of current techniques of knowledge discovery in databases using data mining techniques that are in use today for the classification of Parkinson Disease.The Random Tree Algorithm classifies the Parkinson Disease dataset accurately and provides the 100%.The Linear Discriminant Analysis, C4.5, CS-MC4 and K-NN yields the accuracy results above 90%.K-NN error rate is only 0.0256.Among all, the C-PLS algorithm classifies the dataset with least percentage of 69.74.The C-RT and CS-CRT produce the same error rate of 0.0462.Polat have distinguished people with PD from the healthy people using combination of feature weighting method called FMCFW and knearest neighbour (k-NN) classifier [8].Bhattacharya, and Kharma have combined genetic programming and the expectation maximization algorithm (GP EM) to create learning feature functions on the basis of ordinary feature data (features of voice) [6].In [10] a full investigation into the features extracted from voice signals of people with and without Parkinson's disease was performed, and three different classifiers were used: Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and some discrimination-functionbased (DBF) classifiers.Based on the correct rate, among the mentioned, KNN has shown as the best classifier to differentiate between the people with PD and those without it.
As it is shown, there are many methods that are in use today in medical research and public health for recognizing Parkinson's disease.Classification systems have the potential of being good supportive for the expert.Classification systems can help in increasing accuracy and reliability of diagnosis and minimize the possible errors, as well as making the diagnosis more time efficient.
This paper deals with the application of fuzzy c-means classification and pattern recognition to a medical dataset concerning PD with the aim of automatically classify patients in PD or non-PD.In order to make these processes happen, the clustering centers of features have been found.Then, the distance of each data to these centers have been calculated, so that the distinction between classes is increased in the classification of PD datasets.After classification, test data are assigned to the one of the two known patterns.In order to test the performance the classification accuracy, sensitivity, specificity, positive and negative predictive values measures were used.The paper is organized as follows.Section 2 deals with the Parkinson Dataset that is used in this research work.Section 3 handles the proposed classification model to classify the patient as PD or non-PD.Section 4 explains the pattern recognition method.In Section 5 performance parameters are introduced.Section 6 is dealt with the experimental results and Section 7 with discussion of the classifier algorithms.Section 8 concludes research paper.

II. PARKINSON DATASET
Voice measurement has shown a great progress in the advancement of Parkinson Disease detection.About 90% of people with Parkinson's disease present some kind of vocal deterioration.And hence, in this paper dataset which mainly focus on the speech signals is chosen.This dataset is taken from UCI machine learning database [15].The features of dataset are given in Table 1.The Parkinson Disease dataset used for classification purpose was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado.This organization recorded the speech signals.
The dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD).The time since diagnoses ranged from 0 to 28 years, and the ages of the subjects ranged from 46 to 85 years.Averages of six phonations were recorded from each subject, ranging from 1 to 36 s in length.There are 195 instances comprising 48 normal and 147 PD cases in the dataset.The main aim of the data is to discriminate healthy people from those with PD.Thus, the dataset is divided into two classes according to its "status" column which is set to 0 for healthy subjects and 1 for those with PD.It is a two-decision classification problem.
Little applied a correlation filter and of these 23 attributes 12 are removed.Each correlation coefficient, which is less than 0.95 is considered not to contribute to classification accuracy, thus the attribute is removed.A total of 11 attributes are kept after the correlation filter has been applied.Table 2 indicates which features are kept.First 10 are used as inputs to the classifiers.

III. CLUSTERING
For fields dealing with diagnosis, we often seek to find structure in the data obtained from observation.Finding the structure in data is the essence of classification.Our experimental observations lead us to develop relationships between the inputs and outputs of an experiment.As we are able to conduct more experiments, we see the relationships forming some recognizable, or classifiable, structure.By finding structure, we are classifying the data according to similar patterns, attributes, features, and other characteristics.The general area is known as classification, also termed clustering.
Clustering is an unsupervised learning task that aims at decomposing a set of object into subgroups or cluster based on similarity.The goal is to divide the dataset in such a way that objects belonging to the same cluster are as similar as possible, whereas objects belonging to different clusters are as dissimilar as possible.In nonfuzzy or hard clustering, data is divided into crisp clusters, where each data point belongs to exactly one cluster.In fuzzy clustering, the data points can belong to more than one cluster, and associated with each of the points are membership grades which indicate the degree to which the data points belong to the different clusters.
Clustering can also be thought of as a form of data compression, where a large number of samples are converted into a small number of representative prototypes or clusters.Depending on the data and the application, different types of similarity measures may be used to identify classes, where the similarity measure controls how the clusters are formed.Some examples of values that can be used as similarity measures include distance, connectivity, and intensity.

A. Overview of Fuzzy C-Means (FCM)
Fuzzy c-means clustering (FCM) is one of the most popular fuzzy clustering algorithm.Fuzzy c-means (FCM) is a classification or clustering algorithm that has been applied successfully to a number of problems involving feature analysis, clustering and classifier design, such as in agricultural engineering, remote sensing, astronomy, chemistry, geology, image analysis, medical diagnosis and shape analysis.It is an unsupervised clustering algorithm.Clustering refers to identifying the number of subclasses of c clusters in a data universe X comprising n data samples, and partitioning X into c clusters (2 ≤ c < n).Two important issues to consider in this regard are how to measure the similarity between pairs of observations and how to evaluate the partitions once they are formed.One of the simplest similarity measures is distance between pairs of feature vectors in the feature space.If one can determine a suitable distance measure and compute the distance between all pairs of observations, then one may expect that the distance between points in the same cluster will be considerably less than the distance between points in different clusters.
To describe a method to determine the fuzzy c-partition matrix U for grouping a collection of n data sets into c classes, we define an objective function J m for a fuzzy c-partition.The FCM algorithm is based on an iterative optimization of an objective function J m : ∑∑ The algorithm given in equation ( 1) is a least squares function, where the parameter n is the number of data sets and c is the number of classes (partitions) into which one is trying to classify the data sets.The squared distance, d ik 2 is then weighted by a measure, (u ik ) m , of the membership of x k in the ith cluster.A new parameter is introduced in equation ( 1) called a weighting parameter, m >1.This parameter controls the amount of fuzziness in the classification process.
The value of J m is then a measure of the sum of all the weighted squared errors; this value is then minimized with respect to two constraint functions.First, J m is minimized with respect to the squared errors within each cluster, that is, for each specific value of c. Simultaneously, the distance between cluster centers is maximized.
FCM method uses concepts in n-dimensional Euclidean space to determine the geometric closeness of data points by assigning them to various clusters or classes and then determining the distance between the clusters: The distance measure, d ik is a Euclidean distance between v i the ith cluster center and the kth data set (data point in m space).Each of the centers for each class can be calculated as: where j is a variable on the feature space, that is, j = 1, 2, …,m.The optimum fuzzy c-partition will be the smallest of the partitions described in equation ( 1) [3].
In general, FCM algorithms depend on certain assumptions in order to define the subgroups present in a dataset.These assumptions include the optimal number of classes, c, the initial centroid values, the initial partition, U 0 , the optimal fuzzy exponent, m value, and the iteration termination threshold value.The optimal number of classes, c, may be known a priori or determined by cluster validity process.
In this paper, we determine optimal number of clusters by partitioning the dataset into two clusters.Many past works have proposed methods of determining optimal fuzzy exponent m.However, many researchers simply use m=2 as the ideal fuzzy exponent [3].Also, different termination threshold can be chosen.In this paper, two iteration termination threshold value were used for two methods, difference between two adjacent partition matrices less than 10 -10 , and difference between two adjacent iterations cluster centers less than 10 -6 .
As with many optimization processes, the solution to equation (1) cannot be guaranteed to be a global optimum, that is, the best of the best.What we seek is the best solution available within a prespecified level of accuracy.An effective algorithm for fuzzy classification, called iterative optimization, is explained in the following section.

B. FCM Algorithm
The algorithm works as follows.Initially a random selection of fuzzy partition matrix U is chosen, and centers are calculated.The centers v i and the membership strengths are calculated through the use of equations.Once the cluster centers are found, points are classified according to their Euclidean distance to each of the centers identified.A point is assigned to that cluster that results in a minimum distance from its center.Usually, there is some stopping criterion with its threshold, and when the result is below a certain threshold, the algorithm terminates.Fuzzy cmeans clustering algorithm can be described in the following steps: 1. Consider a set of n data points to be clustered, x i .

IV. PATTERN RECOGNITION
Pattern recognition can be defined as a process of identifying structure in data by comparisons to known structure; the known structure is developed through methods of classification.In the statistical approach to numerical pattern recognition, each input observation is represented as a multidimensional data vector (feature vector) where each component is called a feature.The purpose of the pattern recognition system is to assign each input to one of c possible pattern classes (or data clusters).
Presumably, different input observations should be assigned to the same class if they have similar features and to different classes if they have dissimilar features.The data used to design a pattern recognition system are usually divided into two categories: design (or training) data and test data.Basically, classification establishes (or seeks to determine) the structure in data, whereas pattern recognition attempts to take new data and assign them to one of the classes defined in the classification process.Simply stated, classification defines the patterns and pattern recognition assigns data to a class.
A typical problem in pattern recognition is to collect data from a physical process and classify them into known patterns.Suppose we have c typical patterns represented as fuzzy sets A i on X (i=1,2…,c) and a new piece of data, perhaps consisting of a group of observations, is represented by a fuzzy set B on X.The task now is to find which A i the sample B most closely matches.To address this issue, we use fuzzy vectors.Let us define a and b as a fuzzy vectors of length n, and define as the fuzzy inner product of a and b, and as the fuzzy outer product of a and b.These two norms, the inner product and the outer product, can be used simultaneously in pattern recognition studies because they measure closeness or similarity.
We can extend fuzzy vectors to the case of fuzzy sets.Let P(X) be a group of fuzzy sets.Now we define two fuzzy sets from this family of sets, that is, A,B∈P(X); then, either of the expressions (10) describe two metrics to assess the degree of similarity of the two sets A and B: In particular, when either of the values of (A,B) approaches 1, the two fuzzy sets A and B are "more closely similar"; when either of the values (A,B) approaches a value of 0, the two fuzzy sets are "more far apart" (dissimilar).The metric in equation ( 9) uses a minimum property to describe similarity, and the expression in equation (10) uses arithmetic metric to describe similarity [3].
Another way to measure the similarity between two fuzzy sets A and B is using Euclidean distance.
∑ (12) This distance is smaller when two sets are closer to each other.

V. PERFORMANCE MEASURES
As the performance measures, the classification accuracy, sensitivity, specificity, positive and negative predictive values have been used and explained as follows.A confusion matrix contains information about actual and predicted classifications done by a classification system.The confusion matrix is shown in Table 3 (actual vs. predicted) and the other parameters which are computed using confusion matrix are shown with the following formulas.(17) Sensitivity, specificity, and accuracy are the terms which are most commonly associated with a binary classification test and they statistically measure the performance of the test.In a binary classification, we divide a given data set into two categories on the basis of whether they have common properties or not by identifying their significance and in a binary classification test, as the name itself conveys, we deal with two datasets.Of these two categories, in general, sensitivity indicates, how well the test predicts one category and specificity measures how well the test predicts the other category.Whereas accuracy is expected to measure how well the test predicts both categories.In our case, test sensitivity is the ability of a test to correctly identify those with the disease (true positive rate), whereas test specificity is the ability of the test to correctly identify those without the disease (true negative rate).Accuracy indicates total success of both positive and negative cases.Positive predictive value, or precision rate is the proportion of positive test results that are true positives (such as correct diagnoses).Similar, negative predictive value is defined as the proportion of subjects with a negative test result who are correctly diagnosed.

VI. RESULTS
The main goal of this paper was to understand how different classifiers would behave when encountering the chosen data and to compare their performance.In this study, fuzzy c-mean clustering and pattern recognition have been used for classification of Parkinson's disease.
First method used here is fuzzy c-means classification method, with c=2 and with randomly chosen initial partition matrix U.As the stopping criterion in this case we used 10 -10 difference between two partition matrices.The classification true/false counts for this case are listed in column 2 of Table 4.Because of the unbalanced data, FCM clustering applied on Parkinson's dataset yields relatively poor classification result of 58.46 % success.The low percentage of false positives still makes it usable in diagnosis of the disease.In order to improve results, neural networks can be used.
In the second part of the research, we used pattern recognition method.Whole data is divided into training and testing sets.First half of the positive data set is used for training and second half for testing.Same thing we did with negative data.Therefore, total train and test data have same number of positive and negative samples.The FCM algorithm is used for a prior training by using the representative samples of both healthy and sick people and generating the two different clusters.We ended the simulation when difference between two adjacent centers of the clusters was less than 10 -6 .
After two classes have been identified, the Euclidean distance measures between mean of centers of these two classes and new testing data are applied to pattern recognitions.In this paper we also used the similarity measure in (9), and (10) for pattern recognition, but we get the best results by using Euclidean distance.
The detection performances of combined training and test sets are illustrated in Table 5.By definition sensitivity relates to the test's ability to identify positive results.For example, a sensitivity of 100% means that the test recognizes all actual positives -i.e.all sick people are recognized as being ill.So does the specificity for negative results.A high rate of specificity shows the ability of our test to rule out the disease in the subject.Conclusively, the highest rate is desired for all these criteria given in the Table 6.Table 6 provides a performance measure for FCM and pattern recognition methods with same number of train and test data, and without repeating any data.As it can be seen basic FCM classifier used on PD dataset provides high performance in positive predictive value meaning it can be used to diagnose positive cases.

VII. DISCUSSION
Comparing the presented results with those reported in other studies one can notice that the methods of FCM clustering and pattern recognition yields at least as good results as others.Even though this method has produced a little lower classification rate, the benefits accruing from its simplicity and repeatability may far outweigh this in real applications.The algorithms showing higher performance are quite complicated.Methods thus seem to have excellent reproducibility besides being much simpler.
Thus, the fuzzy c-means clustering and pattern recognition can be widely used for segregating objects according to some measure of similarity and for revealing dissimilarity among pattern vectors of a dataset.Consequently, it can be used for detection of disease such as Parkinson's disease.

VIII. CONCLUSION
Early detection of any kind of disease is an essential factor.This helps in treating the patient well ahead.In this research paper, we aimed to design a system that would assist doctors in medical diagnosis.This paper presents a diagnostic fuzzy cluster means and pattern recognition systems to help in diagnosis of Parkinson's disease using a set of speech signals.The paper is intended to verify the effectiveness of the application of these classifiers to the Parkinson Dataset.This dataset comprises of 11 attributes with various range of values.Combination of FCM and pattern recognition with randomly chosen test and train data improve the classification success.The best result is in positive predictive value which is 80.88%.This result is quite satisfactory, due to the fact that detecting the PD is a very complex problem, and used methods are very simple.After all, the results obtained here are very encouraging and open the doors of the future research towards the detection of Parkinson's disease.

2 . 8 .
Assume the number of clusters (classes) c, is known, 2 ≤ c < n. 3. Choose an appropriate level of cluster fuzziness, m∈R>1.4. Initialize the (n × c) sized membership matrix U to random values such that u ik ∈[0,1] ,c and data point k=1,…,n.7. Update the fuzzy membership matrix U according to the d ik data.If d ik >0 then ∑ Repeat from (5) until the stopping criterion is less that a given tolerance.

TABLE II FEATURES
USED IN THIS PAPER Table gives the results obtained from FCM and pattern recognition methods.