Denver Groups Classification of Human Chromosomes Using CANN Teams

Unbanded human chromosome can be classified into seven Denver Groups (A-G) based on their lengths and the ratio of the length of the shorter arm to the whole length of the chromosome, which is called the centromere index (CI). In this article, the novel artificial neural network committee machines technique (CANNT) developed earlier, is applied to the Denver Groups and the correct classification rate in Denver Groups Classification of Human Chromosomes raised from 96%, to a level of 98%.


INTRODUCTION
Human chromosome analysis plays an important role in medical diagnostics since any disorder or abnormality in the chromosomes of the cell may be a powerful indicator in diagnosis of leukemia, skin and breast cancers, Down's syndrome and other genetic diseases.For the chromosomes to be analyzable, a sample of blood smear or amniotic fluid must be cultured and fixed in the stage metaphase.Then, by standard, cytologist must find twenty cells in the sample and make their karyograms which is the image of the correctly classified chromosomes.Healthy human cell consists of 46 chromosomes, out of which are 22 pair of autosomes and two sex chromosomes, which come in pair as XX or XY.Thus there are 24 classes of chromosomes in total (22+X+Y).Fig. 1 shows the healthy female cell and its karyogram.
Human chromosome analysis plays an important role in medical diagnostics since any disorder or abnormality in of the cell may be a powerful indicator in diagnosis of leukemia, skin and breast cancers, Down's syndrome and other genetic diseases.For the chromosomes to be analyzable, a sample of blood smear or amniotic fluid must be cultured and fixed in the stage of must find twenty cells in the sample and make their karyograms which is the image of the correctly classified chromosomes.Healthy human cell consists of 46 chromosomes, out of two sex chromosomes, which come in pair as XX or XY.Thus there are 24 classes of chromosomes in total (22+X+Y).Fig. 1 shows If performed manually, human chromosome karyotyping is very tedious and time consuming job.Computer aided system in great helps the speed of the karyotyping and classification.Since 1980s, automated chromosome classification has been extensively researched area.While automatic metaphase finding, automatic segmentation and feature extraction are more challenging areas, most of the researchers were dealing with the classification stage of the chromosome analysis only.As there are publicly available databases containing chromosome feature sets (among them the most famous ones are Copenh Edinburg, Philadelphia datasets), many articles proposed different algorithms and classifiers.Among them, Artificial Neural Networks (ANN) is the most popular tool owing to its capability of modeling the human brain decision making process to recogn incomplete or partial information, as well as its simple topographic structure and easier traini 1997; Haykin, 2009).If performed manually, human chromosome karyotyping is ing job.Computer aided system in great helps the speed of the karyotyping and classification.Since 1980s, automated chromosome classification has been extensively researched area.While automatic metaphase finding, automatic segmentation and action are more challenging areas, most of the researchers were dealing with the classification stage of the chromosome analysis only.As there are publicly available databases containing chromosome feature sets (among them the most famous ones are Copenhagen, Edinburg, Philadelphia datasets), many articles proposed different algorithms and classifiers.Among them, Artificial Neural Networks (ANN) is the most popular tool owing to its capability of modeling the human brain decision making process to recognize objects based on incomplete or partial information, as well as its simple topographic structure and easier training process (Mitchel, A large number of different ANNs have been tested in classification of human chromosomes, which include supervised neural network architecture.Multi-layer neural networks are studied in (Lu and Ya, 1989;Wu et al., 1989;Erington and Graham 1993;El Emary, 2006;Wang et al., 2009;Can and Palalic, 2012) and Hopfield network (Ruan, 2000); fuzzy neural techniques (Ruspini, 1973ab;Ramstein et al., 1992;Keller et al., 1995;Sjahputera et al. 1999); and unsupervised architecture of nonlinear maps (Sarosa et al, 2005), self-organizing feature maps (Kyan et al. 1999) and mutual information maximization based training method (Mousavi et al., 1999).
In chromosome classification and pairing, back propagation training method is used to train ANNs.In multi-layer feed-forward ANNs, the number of output neurons is equal to the number of human chromosome types.The number of input neurons is equal to the dimension of the input data, which is the number of features used for classification.The number of hidden layers, number of hidden neurons, steepness of the activation function, learning rate, and momentum factor, number of learning iterations and upper bound of training error are chosen by the user experimentally.While the proper choice of these parameters is important for the performance and robustness of an ANN used in chromosome classification (Cho, 2000), studies indicated that ANN performance was slightly lower than that obtained using simpler statistical methods (Granum and Thomason, 1990;Sweeney and Mousavi, 1993;Conroy et al., 2000).Unnecessary complexity of the ANN architecture and overtraining of ANNs dramatically reduce the robustness of the ANN in chromosome classification.One study (Mitchell, 1997) using multi-layer perception based ANN obtained 0% error rate in the training data set but 24.2% error rate in the testing data set.To increase ANN performance, another study showed that by reducing the complexity of an ANN, its testing accuracy can be increased from 75.8% to 88.3% [22].Recently one study showed that by using two-layer ANN, correct classification rate of 93.7% can be achieved.In the first layer of the proposed method, a single ANN was employed to classify 24 chromosomes into seven classes.In the second layer, seven ANNs were adaptively optimized (using training-testing-validation) for seven classes to identify individual chromosomes (Delshadpour, 2003).
One of the other more sophisticated neural networks proposed and tested in this area is a fuzzy Hopfield neural network.It holds fuzzy clustering capability and learning mechanism of acquiring knowledge about the human chromosomes from noisy inputs.In a test involving 100 human chromosomes Ruan (Ruan, 2000) succeeded to achieve a very high identification rate of 96.67%.
Recently Palalic and Can, (Gagula-Palalic and Can, 2013) developed a novel committee of neural network machines, competing artificial neural network teams technique (CANNT) which over scores almost all previous human chromosome classifiers.The rest of the article is organized as follows.In the second part, dataset used in the experiment is described.Third part gives the classification results when nearest neighbor (NN) applied.In the fourth section a new method called competing ANNs is applied to the data set and the correct classification rate in the training, validation and testing stages are shown.Part 5 describes the improvement of the results obtained in the part 4 when mixed signals are used.The article finishes with conclusion and discussion on the results.

CHROMOSOMES DATA SET
According to the Denver system of chromosome classification, chromosomes can be classified into seven groups (A-G) (H.C. S. Group, 1960) as seen in Table1.Denver Group classification is mainly based on: (1) the length or size of each chromosome and (2) the ratio of the length of the shorter arm to the whole length of the chromosome, which is called the centromere index (CI).Based on these two features we differentiate metacentric, submetacentric and acrocentric chromosomes (see Table 1).The classification of chromosomes into these 7 groups is the first stage of the classification process and its performance merely influences the correct chromosome classification into 22 classes, which is the second stage of the classification process.In this article, only the first stage, classification into seven Denver classes is performed.The data used in this work is taken from the Copenhagen data base.We omitted gray level features, and only kept (1) the length of each chromosome and (2) the centromere index (CI).

NEAREST NEIGHBOR CLASSIFIER
Input data is two dimensional and they are in two dimensional clusters as seen in Figure 2. A natural classifier for such a set is nearest neighbor technique.
From each of seven Denver class, 100 chromosome chosen as training set, 100 chromosome chosen as testing set.Test data is classified according to the Euclidean distance to the seven training clusters.The correct classification rates of seven test clusters are as in the Table 2.

Architecture of ANN
We represent the network consisting of 2 inputs x[i], i=1, 2, 12 neurons in the hidden layer and one neuron in the output layer as shown in the Fig 1 .A special organized committee of 42 simple perceptrons is used to improve the rate of correct classification of 7 types of unbanded human chromosomes.Each of these simple perceptrons is trained to distinguish between two types of chromosomes.These multilayer perceptrons use Back-Propagation algorithm.

Assembling votes
Let Pሺi, jሻ be the simple perceptron which is trained to distinguish chromosomes of type i, and of type j, and let of the same team Tሺxሻ creates mostly an output -1.The perceptrons of other teams also creates outputs either 1, or -1.But since the other teams are not trained to distinguish chromosomes of type x from other chromosome types, their consensus will be weaker than the consensus of team Tሺxሻ.So we expect that the team Tሺxሻ will be the winner of the competition.
For completeness, the dummy perceptrons Pሺj, jሻ, j = 1,2, … ,7 which always give output 0 are added.When 7×7 perceptrons are arranged as a 7×7 grid, the votes of teams appear in crosses: The score of each team is its distance to its consensus.In Figure 3, the score of the team Tሺ3ሻ is zero, while the score of nearest competitor Tሺ4ሻ is four.The team with smallest score is the winner of the competition, and the new chromosome data entered, belongs to the chromosome type of winners label.
Another representation of the winner team can be visualized attaching gray levels to the team members proportional to their scores as seen in Figure 4:

RESULTS
During the training of 462 simple multilayer perceptrons, it is possible to complete training with zero error.But this leads to overtraining that causes lower rates in testing.
From each chromosome type 50 random samples are chosen for training.The same numbers of random samples are also chosen for validation and testing.We have seen that it is possible to go over 97% correct classification rates with this special committee of perceptrons.When a new data is entered, the votes of these 42 simple perceptrons and additional 7 dummy perceptrons create a decision matrix of the size 7×7.By a special assembling of these votes we get a higher rate of correct classification of 7 Denver types of human chromosomes, with an average of 98.00% correct classification when tested on Copenhagen Chromosome Dataset.

Denver
Groups Classification of Human Chromosomes Using CANN Teams Supplemented by a Nearest Neighbor Technique CANNT-S University of Sarajevo, Faculty of Engineering and Natural Sciences, HrasnickaCesta 15, Ilidža Denver groups (A -G) based on the position of the centromere.This classification is an important stage of human chromosome classification, as its output influence the second stage of classification, the correct classification of 24 classes of In this article, the novel artificial neural network committee machines technique (CANNT) developed earlier is supplemented by a nearest neighbor technique, S, and the correct classification rate in Denver Groups to a level of 98%.

Fig 2 .
Fig 2.The distribution of 2200 human chromosomes into seven Denver Group classes from A, to G.

Fig 2 :
Fig 2: Neural network architecture for a simple multilayer perceptron

Fig. 3 :
Fig. 3: Example of the decision matrix.The team Tሺ3ሻis the winner of the competition.The nearest competitor is Tሺ4ሻ.

Fig 4 :
Fig 4: Competing teams.The darkest cross is the one which consist of 3rd row and 3rd column that wins the competition.The nearest competitor to team 3 is team 4.

Table 1 :
The classification of chromosomes based on Denver System

Table 2 :
The classification of Denver Groups by nearest neighbor technique.

Table 1 .
Correct classification rates during training and testing.Using a validation data set, the overtraining is prevented.CONCLUSION In this study we presented a special organized committee of 42 simple perceptrons used to improve the rate of correct classification of 7 Denver types of unbanded human chromosomes.Each of these simple perceptrons is trained to distinguish between two types of chromosomes.