Wavelet Transform-Based Phylogenetic Analysis of Protein Sequences

Cagin Kandemir-Cavas


With the acceleration of gene sequencing studies, many biological data emerges. By analyzing these data, it contributes greatly to the studies on understanding the metabolic disorders in the organism and increasing the efficiency of the drugs. For this purpose, it is critical to classify the data in a way that is accurate, fast and low-cost according to its characteristics and relationships. Besides experimental methods, machine learning and bioinformatics methods are used. Artificial neural networks, support vector machines, flexible calculation methods are frequently used methods. However, the effectiveness of these methods on biosecence data depends on the method of using the method with the most appropriate parameters and converting protein sequences into numerical sequences. When the sequences are transformed with amino acid frequencies, the properties of amino acids are ignored. For this purpose, handling the physicochemical (hydrophobicity, hydrophilicity ...) properties of amino acids increases the performance of classification techniques. The phylogenetic tree is the best method to visualize the classification among species. In the project, the wavelet transform used in the analysis of digital signals has been adapted to protein sequences defined by hydrophobicity values. Each protein sequence was defined to correspond to a signal, the wavelet transform was divided into approach and detail components, and the similarities between them were calculated, and the phylogenetic tree of the species was created. As an application, phylogenetic trees of ND5 protein sequences of 22 species were created in the MatlabR2017 program of NeighborJoining (NJ) and Unweighed Pair Group Method of Aritmetic Averages (UPGMA) methods.


Bioinformatics; Protein sequence; Phylogenetic tree; Wavelet transform

Full Text:



A. Lesk, "Introduction to bioinformatics," Oxford university press, 2nd edition, New York, USA, 2006.

D. Baker, and A. Sali, "Protein structure prediction and structural genomics," Science, vol. 294 no. 5540, pp. 93–96, 2001, doi: 10.1126/science.1065659

M. S. Rosenberg, "Evolutionary distance estimation and fidelity of pair wise sequence alignment," BMC Bioinformatics, vol. 6, no. 102, 2005, doi: 10.1186/1471-2105-6-102

D. J., Rigden, and D. J. Rigden, "From protein structure to function with bioinformatics," 2nd ed., Springer, Heidelberg, 2017.

S. Xie, Z .Li, and Hu, H., "Protein secondary structure prediction based on the fuzzy support vector machine with the hyperplane optimization," Gene, vol. 642, pp. 74–83, 2018, doi: 10.1016/j.gene.2017.11.005.

R. Kumar, A. Srivastava, B. Kumari, and M. Kumar, "Prediction of β-lactamase and its class by Chou’s pseudo-amino acid composition and support vector machine," J. Theor. Biol., vol. 365, pp. 96–103, 2015, doi: 10.1016/j.jtbi.2014.10.008.

P. D. Dobson and A. J. Doig, "Distinguishing Enzyme Structures from Non-enzymes Without Alignments," J. Mol. Biol., vol. 330, pp. 771–783, 2003, doi: 10.1016/s0022-2836(03)00628-4.

M. S. Patel, and H. S. Mazumdar, "Knowledge base and neural network approach for protein secondary structure prediction," J. Theor. Biol., vol. 361, pp. 182–189, 2014, doi: 10.1016/j.jtbi.2014.08.005.

M. Can and O. Gürsoy, "Artificial Neural Networks in Bacteria Taxonomic Classification," Southeast Eur. J. Soft Comput., vol. 7, no. 2, pp. 1–7, 2018, doi: 10.21533/scjournal.v7i2.144

W. L. Huang, H. M. Chena, S. F. Hwang, and S. Y. Ho, "Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method," Biosystems, vol. 90, pp. 405–413, 2007, doi: 10.1016/j.biosystems.2006.10.004

E. Nasibov, and C. Kandemir-Cavas, "Protein subcellular location prediction using optimally weighted fuzzy k-NN algorithm," Comput. Biol. Chem., vol. 32, no. 6, pp. 448–451, 2008, doi: 10.1016/j.compbiolchem.2008.07.011.

R. Tripathy, D. Mishra, and V. B. Konkimalla, "A novel fuzzy C-means approach for uncovering cholesterol consensus motif from human G-protein coupled receptors (GPCR)," Karbala Int. J. Mod. Sci., vol. 1, no. 4, pp. 212–224, 2015, doi: 10.1016/j.kijoms.2015.11.006.

W. J. Bruno, N. D. Socci, and A. L. Halpern, "Weighted neighbor joining a likelihood-based approach to distance-based phylogeny reconstruction," Mol. Biol. Evol., vol. 17, no.1, pp. 189–197, 2000, doi: 10.1093/oxfordjournals.molbev.a026231.

E. Nasibov, and C. Kandemir-Cavas, "Efficiency analysis of KNN and minimum distance-based classifiers in enzyme family prediction," Comput. Biol. Chem., vol. 33, no. 6, pp. 461–464, 2009, doi: 10.1016/j.compbiolchem.2009.09.002.

M. Lasfar, and H. Bouden, "A method of data mining using Hidden Markov Models (HMMs) for protein secondary structure prediction," Procedia Comput. Sci., 127, pp. 42–51, 2018, doi: 10.1016/j.procs.2018.01.096.

C. R. Munteanu, H. Gonzalez-Dıaz, and A. L. Magalhaes, "Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices," J. Theor. Biol., vol. 254, pp. 476–482, 2008, doi: 10.1016/j.jtbi.2008.06.003.

M. Can, " Conformational Parameters for Amino Acids in Helical, β-Sheet, and Random Coil Regions Calculated from Proteins: After 40 Years," Southeast Eur. J. Soft Comput., vol. 4, no. 1, pp. 1–6, 2015, doi: 10.21533/scjournal.v4i1.83.

D. Pradhan, S. Padhy, and B. Sahoo, "Enzyme classification using multiclass support vector machine and feature subset selection," Comput. Biol. Chem., vol. 70, pp. 211–219, 2017, doi: 10.1016/j.compbiolchem.2017.08.009.

S. Chaohong, and S. Feng, "Wavelet transform for predicting apoptosis proteins subcellular location," J. Nat. Sci., vol. 15, no. 2, pp. 103–108, 2010, doi: 10.1007/s11859-010-0203-z.

J. Su, and J. Bao, "A wavelet transform based protein sequence similarity model," Appl. Math. Inf. Sci., vol. 7, no. 3, pp. 1103–1110, 2013, doi: 10.12785/amis/070330.

L. Yang, Y. Y. Tang, Y. Lu, and H. Luo, "A Fractal dimension and wavelet transform based method for protein sequence similarity analysis," IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 12, no. 2, pp. 348–359, 2015, doi: 10.1109/tcbb.2014.2363480.

C. H. De Trad, Q. Fang, and I. Cosic, "Protein sequence comparison based on the wavelet transform," Protein Eng., vol. 15, no. 3, pp. 193–203, 2002, doi: 10.1093/protein/15.3.193.

S. Zhu, and S. Zhu, "Functional comparisons of proteins using the wavelet packet transform," 10th Int. Conf. Fuzzy Syst. Knowl. Discov., pp. 724–729, 2013, doi: 10.1109/fskd.2013.6816290.

J. J. Shu, and K. Y. Yong, "Fourier-based classification of protein secondary structures," Biochem. Biophys. Res. Commun., vol. 485, pp. 731–735, 2017, doi: 10.1016/j.bbrc.2017.02.117.

W. Hou, Q. Pan, Q. Peng, and M. He, "A new method to analyze protein sequence similarity using Dynamic Time Warping," Genomics, vol. 109, pp. 123–130, 2017, doi: 10.1016/j.ygeno.2016.12.002.

A. Bairoch, (2000), "The ENZYME database in 2000. Nucleic Acids Research," vol. 28, pp. 304–305, 2000, doi: 10.1093/nar/28.1.304.

P. K. Ponnuswamy, "Hydrophobic characteristics of folded proteins," Prog. Bio-phys. Mol. Biol., vol. 59, no. 1, pp. 57–103, 1993, doi: 10.1016/0079-6107(93)90007-7.

D. C. Hong, "MATLAB Wavelet Analysis Theory and Application of MATLAB application toolbox series," Defense Industry Pub., 2000, ISBN-13: ‎ 978-7118033656.

Daubechies I, "Orthonormal Bases of Compactly Supported Wavelets," Commun. Pure Appl. Math., vol. 41, 909–996, 1988, doi: 10.1002/cpa.3160410705.

DOI: http://dx.doi.org/10.21533/scjournal.v11i1.221


  • There are currently no refbacks.

Copyright (c) 2022 Cagin Kandemir-Cavas

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License