HUMAN CHROMOSOME CLASSIFICATION USING COMPETITIVE SUPPORT VECTOR MACHINE TEAMS

Classification of chromosome is a challenging task and requires very precise autonomous classifier. This paper proposes to employ competing support vector machines (SVMs) placed in a grid. Each agent in cells of the grid is responsible to distinguish two classes. Overall output is determined by simple majority voting of SVMs. Relying same principle as the work by Palalic and Can [17], we compared the results obtained where the algorithms delivers better accuracy.


INTRODUCTION
Human chromosome classification represents a very important part of the human chromosome analysis process. The classification of chromosomes into classes from medical images is called as karyotyping which aims to diagnose possible disease.
Counting and karyotyping of chromosomes is very time consuming process. Since 1980s researchers are trying to develop reliable systems for automatic chromosome karyotyping and analysis. Most of these systems are semiautomated, requiring manual assistance of the laboratory technicians, especially in the images where there are touching and overlapping chromosomes.
Classification of the human chromosomes based on the classification of the chromosome features is the most studied area in the chromosome analysis. For the classification purposes, several databases have been used: Copenhagen database, Edinburg database, Philadelphia database and Denver database.

LITERATURE OVERVIEW
Many classification algorithms and their performances on these databases have been reported, among them: neural network based (ANN) [1], [2], distance and statistical based [3][4][5],neuro-fuzzy based [6], nearest-neighbor based [7], rule based [8], fuzzy-rule based [9]. The greatest success among all is a tribute to ANN with the most accurate classification rate of 93.5% on the Copenhagen data set, as reported in [10].
Recently, Ventura et al [11] have experimented classification approach of pairing chromosomes using a set of geometric and band pattern features using an algorithm based on the Bayesian framework. This method resulted in a maximum average paring rate of 92.8%.
As medical diagnosis is a challenging task which requires employing very precise classification algorithm, the mentioned correct classification rates are not truly satisfied. This necessitates more powerful algorithms to be designed.

DATASET
The data used in this work is taken from Copenhagen data base with 4400 samples, 200 samples for each chromosome types. Randomly selected 80% of the data is used for training while the remaining 20% is kept for testing purposes. The dataset composed of 28 input features where the first two are the length of chromosome and the position of the centromere, called as centromere index (CI). Based on the preliminary experiments, another dimension contributing to the success of the algorithm is taken as the ratio of the first two components, the ratio of CI to the total length. The other 25 features are the first 25 principal components obtained from the standardized gray levels of a chromosome vertical to the main axis.
One classification scheme is so called seven Denver Groups (A-G) as seen in Table1. Table 1: Classification of chromosomes in Denver classification

SUPPORT VECTOR MACHINES
Support vector machines (SVMs) have gained more importance in the late decade due to its specific properties favoring them as en efficient classification tool. SVM is based on statistical learning theory pioneered by Vapnik [12] and can be used for pattern classification and nonlinear regression.
In the case of separable patterns, the main idea of SVM is to construct a hyperplane as the decision surface in such a way that the margin of separator between the two classes is maximized [13]. Hence, classification task is directly performed based on support vectors. While separating the classes, SVM performs two main mathematical operations on the data set.
1. Nonlinear mapping of the input data into a high-dimensional hidden feature space which can be separated linearly. 2. Construction of an optimal hyperplane for separating the features discovered in step 1 [13]. We can state the motivation for employing SVM as a classification tool as follows: the support vector classifier chooses one particular solution with the highest generalization ability: the classifier which separates the classes with maximal margin [14]. Figure 3. Linear support vector classifier [14] Given a training set for the SVM as, = { ( , )| ∈ ℝ , ∈ {−1,1}} =1 , where stands for the n-dimensional input vector and is output vector. Remembering that SVM tries to build a decision boundary with maximum possible margin between the classes, for linearly nonseparable input data a transfer to a feature space with higher dimension is performed as: Vapnik [12] suggested selecting the transformation function from a family of universal approximation with linear separable character.
The aim is to find the values of w and b for the optimal hyperplane, maximizing the margin of separation, given the training set P. The maximization of the margin is realized by minimizing the Euclidian norm of the weight vector w. For nonseparable problems, the constraints in Eq (1) can be weaken by replacing it with a soft margin involving slack variables ε i . That is, for non-separable case, we allow that some data points may lie inside the margin area while the sum of the slack variables is used to limit the extent of violations. In this case, the optimization problem can be expressed as: Min Subject to: where C is a user defined penalty parameter controlling the tradeoff between the complexity of the machine and the number of nonseparable points [13].

(4)
Radial basis, Gaussian, and polynomial kernels are mostly used kernel functions in the literature. Basically, they permit us to construct a decision surface that is nonlinear in the input space, but its image is linear in the feature space [13].
Practically, it is simpler to optimize the dual problem. With a specific kernel function K(. , . ), the optimization problem can be formulated in the dual form as follows: Given the training set subject to the constraints: While designing an efficient SVM with RBFs as kernels, two parameters are of paramount importance, ( , 2 ), the penalty parameter and the width of the RFB respectively [15]. In the formulation given in (6), C is also referred as box constraint parameter serving as upper bound for the constraints.
In order to find the optimal values of the couple, (C, σ 2 ), for each SVM, we employed a numerical optimization method, pattern search (PS). PS is an efficient directionbased deterministic search algorithm with no requirement of derivative information. The general scheme of all PS algorithms involves the construction of mesh of points, around the current solution. Then, the mesh is refined and the process repeated, if the current solution remained unimproved [16].

COMPETITIVE SVM TEAMS
Although SVMs are not designed for the classification problems with multiple classes as in the case of chromosome sorting, on-against-one (OAO) SVM classifiers can be trained to distinguish any pair of classes, {i, j}. However, OAO approach comes with the difficulty of combining the outputs of all SVMs and providing single final output.
Let us define a single Support Vector Machine (SVM) to be , where , = , , … , , where ≠ , . The , is trained to distinguish classes of type and type , creating the output i, if this new data is of type , and creates the output j, if this new data is of type . Thus, the overall structure of the classifier can be represented as a 7x7 matrix as seen in Table 2. Clearly, there is no SVM assigned on the diagonal axis.
After the training stage, assume a new data of unknown type is to be classified. If the new data belongs to a chromosome of type k, then it is expected that all SVMs in the k-th column, , j=A,B, …, G will give an output of k. On the other hand, it is expected that all SVMs in the k-th row , , i=A,B, …, G will give an output of k. Hence, the output will be a 7x7 matrix with zeros on the diagonal axis. The entries on this matrix represent the votes of each SVM assigned to the classes.
The idea of competing classifier teams was implemented by Palalic and Can [17] referring the designed classifier as Competitive Artificial Neural Network Teams (CANNT). In this work, the same structure of competing classifiers is implemented with SVMs. That is, a competition will take place between the teams of SVMs trained for a target class. Unlike [17], the overall output of the classifier will be determined by majority voting instead of Euclidean distance.

RESULTS
The obtained results during training and testing stages are given in Table 3 and Table 4, where the high correct classification of the proposed algorithm is clearly visible. On the training data, correct classification ratio is 99.66% while the algorithm was able to identify 99.54% of the test samples correctly. Comparing the same indicators obtained with CANNT on the same dataset, we can conclude that SVM based classifier proposed here outperforms this algorithm.