Historical Criteria for Structural Classes of Proteins in Percentages: After 20 Years

Two decades ago scientists proposed some criteria for the classes in percentages. Today experts at SCOP classified hundreds of thousands of proteins into one of the four structural classes manually by inspection, and observation. Nakashima et al. gave criteria. P.Y. Chou also proposed another method to classify proteins according their residue contents in three conformations, helix, sheet, and coil. Later P.Y. Chou revised his method. Today SCOP listed around 100.000 proteins with their structural classes. In this paper two datasets will be used to reveal the percentages of residues in αand coils in proteins of classes all α, all β, α+β, and classifications made by experts in SCOP. The first of the data bases is PDBselect25 which contains 1670 twilight zone proteins whose similarity is less than 25%. The second data base BF30 consists of 10294 proteins picked from PDB database with the similarity threshold of 30%. Structural classes of these proteins are taken from SCOP database. It is seen that there is a very poor correlation between historical criteria, and SCOP’s scientists’ intuition in classification of proteins into structural classes.


INTRODUCTION
In SCOP database, a protein is mainly classified into of the following five structural classes: all α, α/β. Experts at SCOP have an intuition to decide about the structural class of a new protein. However no explicit formulation of this intuition revealed to the time. structural class of a protein is hypnotized that it is somehowcorrelated with its amino acid composition. Various efforts have been made in finding coherent criteria that will help one to find out the structural class of a given protein. This article addresses the start and progress in this field. There are two theoretical approaches to predict the structure of a protein. One is the free-energy minimization classified into one , all β, α+β, Experts at SCOP have an intuition to decide about the structural class of a new protein. However no explicit formulation of this intuition revealed to the time. The hypnotized that it is somehowcorrelated with its amino acid composition.
finding coherent criteria that will help one to find out the structural class of a given rogress in this theoretical approaches to predict the energy minimization method, which is based on the empirical atomic potentials (see, e.g., Scheraga, 1968Scheraga, , 1987Weiner and Kollman, 1981;Levitt, 1983;Gilson and Honig, 1988;Mackay et al., 1989;Rogers, 1989;McCammon et al., 1989;Chou et al., 1990;Karplus and Shakhnovich, 1992). The other is the statistical method, which was developed based on various statistical data extracted from struc proteins (see, e.g., Fasman, 1974, 1978;Lim, 1974;Garnier et al., 1978;Cid et al., 1982;Jones et al., 1994;Orengo et al., 1994). theories of protein structures at different levels have alsobeen proposed for improving the structure (Ptitsyn and Rashin, 1975;Ptitsyn and Finkelstein, 1979, 1980Ptitsyn et al., 1985; Historical Criteria for Structural Classes of Proteins in Percentages: Engineering and Natural Sciences, Hrasnicka Cesta 15, Ilidža Two decades ago scientists proposed some criteria for the structural classes in percentages. Today experts at SCOP classified hundreds of thousands of proteins into one of the four structural classes manually by . gave a classification ed another method to classify proteins according their residue contents in three conformations, helix, sheet, and coil. Later P.Y. Chou revised his method. Today SCOP listed around 100.000 proteins with their structural classes. In this paper two datasets -Helices, β-sheets, , and α/β, in the classifications made by experts in SCOP. The first of the data bases is zone proteins whose similarity is less than 25%. The second data base BF30 consists of 10294 proteins picked from PDB database with the similarity threshold of 30%. Structural classes of these proteins are taken from SCOP database. It is a very poor correlation between historical criteria, and SCOP's scientists' intuition in classification of proteins into structural method, which is based on the empirical atomic potentials (see, e.g., Scheraga, 1968Scheraga, , 1987Weiner and Kollman, vitt, 1983;Gilson and Honig, 1988;Mackay et al., 1989;Rogers, 1989;McCammon et al., 1989;Chou et al., 1990;Karplus and Shakhnovich, 1992). The other is the statistical method, which was developed based on various statistical data extracted from structure-known proteins (see, e.g., Fasman, 1974, 1978;Lim, 1974;Garnier et al., 1978;Cid et al., 1982;Jones et al., 1994;Orengo et al., 1994). Various physical theories of protein structures at different levels have posed for improving the prediction of protein (Ptitsyn and Rashin, 1975;Ptitsyn and Finkelstein, 1979, 1980Ptitsyn et al., 1985;Finkelstein and Ptitsyn, 1987;Chothia and Finkelstein, 1990;Kuwajima et al., 1993;Kolinskiand Skolnick, Vieth et al., 1994;Mitchell et al., 1994;McDonald and Thornton, 1994).
Secondary structures of proteins are obtained of the x-ray analyses in three conformations helix "h", sheet "s" , and others ".". Others are interpreted as coils "c". α+β, α/β, and irregular proteins (Levitt and Chothia, 1976;Richardson and Richardson, 1989). The earlier ideas for classification were based on the percentage of secondary structurecomponents, although there was no unified quantitative criterion. Today scientists at SCOP have some consensus which helps them to classify proteins manually by inspection. In the sequel it will be shown that their common sense fits previous percentage based criteria very poorly.
It is shown in tabulated form in Table 2. Table 2. P.Y. Chou (1989) made classification to the following tabulated criterion: The classification by P.Y. Chou ( proteins: 19 all α, 15 all β, 14 α + but no irregular proteins.
It is shown in tabulated form in Table 3. Table 3. P.Y. Chou (1995)corrected his (1989) criterion as follows:

3.SCOP EXPERTS' INTUITIVE CRITERIA
Today SCOP listed around 100.000 proteins with their structural classes. In this paper two datasets will be used to reveal the percentages of residues in α-Helices, βsheets,parallel and antiparallel β-sheets in proteins of classes all α, allβ,α + β,and α/ β, in the classifications made by experts in SCOP.

Figure 1. The four classes of protein structure
The first of the data bases is PDBselect25 which contains 1670 twilight zone proteins whose similarity is less than 25% (Hohohm, Sander 1994; Kurgan and Homaeian, 2006).). The second data base BF30 consists of 10294 proteins picked from PDB database with the similarity threshold of 30%.Structural classesof these proteins are taken from SCOP database.

Features to Specify the Percentages
Since the first proposed standard for protein structure classificationis the content of the secondary structural elements (Chou,2005), ConH and ConE were proposed to reflect the contents of H and E residues, respectively (Kurgan et al., 2008a(Kurgan et al., , 2008b. β-strands in α/β proteins are usually composedof parallel β-sheets, while the β-strands in α+β proteins are usuallycomposed of anti-parallel β-sheets, the second and the third featuresare based on the number of residues in βstrands that form parallel β-sheets(Pr) and the number of residues inβ-strands that form anti-parallel (Apr)β-sheets, respectively (Fig. 2a).
The features of the secondary structure were proposedon the basis of the structural characteristics of proteins from α/βand α+β classes.β-strands are usually separated by αhelices forming parallel β-sheets in α/β proteins,but are usually joined only by coils forming anti-parallel β-sheets in α+β proteins. Consider that the β-strands in α/β proteins are usually composedof parallel β-sheets, while the β-strands in α+β proteins are usuallycomposed of antiparallel β-sheets, the third and the fourth featuresare based on the number of residues in β-strands that form parallel βsheets(Pr) and the number of residues in β-strands that form anti-parallel (Apr)β-sheets, respectively (Fig. 2a). We proposed that if two β-strands(segments of E) are separated by α-helix (segments of H), these twoβ-strands would form parallel β-sheets. Otherwise, they would formanti-parallel β-sheets. Take the secondary structure sequence in example (Fig. 2b), β-strand 1 and β-strand 2 are supposed to form parallel β-sheets, and β-strand 3 and β-strand 4 are supposed to formanti-parallel β-sheets (Fig.  2c). So there are three β-strands thatform parallel β-sheets (Pr), and two β-strands that form antiparallelβ-sheets in the secondary structure sequence (Apr).
(a) A sample protein. (c)Antiparallel β-sheets Fig. 2. Graphical representation of the proposed determination of β-strands composing parallel β-sheets or anti-parallel β-sheets directly from protein secondary structural sequences. In the secondary sequence of the protein in the example, β-strands are labeled from 1 to 4 (Liu, and Jia2010).
It is clearly seen that SCOP experts' manual class estimations for 25PDB data do not support any of the historical sets of criteria especially in ,parallel and antiparallel β-sheets to define the structural classes of proteins. 3.3. Percentages of Residues in α-Helices, βsheets,Parallel and Antiparallel β-sheets in BF30 data BF30 data covers10294 proteins, of which 2438are all α, 2160all β,2887α + β, and2809 α/ β proteins. Average percentages of number of residues in α-Helices, βsheets,parallel and antiparallel β-sheets in proteins of classes all α, allβ,α + β,and α/ β ofBF30 data is shown in Table 5. Table 5. Average percentages of number of residues in α-Helices, β-sheets,parallel and antiparallel β-sheets in proteins of classes all α, allβ,α + β,and α/ β ofBF30 data.
It is clearly seen that SCOP experts' manual class estimations for BF30 data do not support any of the historical sets of criteria to define the structural classes of proteins as well. Apparently SCOP experts' manual class estimations are conformal for both datasets.

Conclusion
The percentages of four features in the historical classification criteria is proven to be neither conformal with the percentages obtained from SCOP structural class predictions, nor so effective in protein structural class prediction. The number of residues inβ-strands that form anti-parallel (Apr)β-sheets, which are designed toimprove the prediction accuracy of proteins from α + βclass does not do so, instead causes confusion, since there are more anti-parallel (Apr)β-sheets in all βclass than α + β class. In addition to helix, sheet, and coin contents of proteins, recently developed features, especially inclusion of features derived from PSSM matrix, improvedthe prediction accuracy of protein classes enormously (Zhang, et al. 2011;Zhang, et al. 2014;Zhang, 2015;Ding, et al. 2014;Liu, and Jia, 2010;Liu, et. al., 2012).