Regression Analysis to Predict the Secondary Structure of Proteins

Betul Akcesme, Faruk Berat Akcesme


A method is presented for protein secondary structure prediction based on the use of multidimensional regression. 200 proteins are chosen from RCSB Protein Database. Their secondary structures obtained through x-ray crystallography analyses are downloaded from the same source. Primary and secondary structure of proteins are concatenated separately to create a sequence of 169 026 residues. First 150 000 of the amino acid residues and corresponding secondary structures are chosen to create a regression model. The remaining 19 026 residues are used for testing. Since we expect three outputs a-helices "S", b-sheets "H", and coiled coils "C", our regression modes consists of  parameters. These parameters are tuned and a correct classification rate of 62.50% is achieved on the test data. Furthermore, the performance of the regression model compared with online secondary structure estimation algorithms on 14 unused proteins, and the performance of the regression model is found comparable with the online estimation tools.

Full Text:




  • There are currently no refbacks.

Copyright (c) 2015 SouthEast Europe Journal of Soft Computing

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License