Protein Secondary Structure Prediction Using Super-chains in PDB

Faruk Berat Akcesme, Mehmet Can


The completeness of the protein structures in the current Protein Data Bank (PDB) library for use in secondary structure prediction of unknown structure of protein is examined. To deal with this issue, randomly several 1000 protein chains batches are chosen from PDB. For each protein chain in the batch of PDB dataset that who contain the query protein chain as a subsequence are identified and named as a super-chain and prediction of the secondary structure of the query protein is performed by the use of the corresponding sub sequences of the secondary structure sequence of these chains. The technique is repeated for well known datasets such that CB513, FC699, 640, 25PDB, SCOP, and 1189 as well. It is seen that sequences of around 18% of proteins in the batch are present in other chains of PDB dataset. The average prediction accuracy of this method is found to be 80%. Therefore an unknown protein has a chance of 20% to have a super-chain in Protein Data Bank (PDB), and if a protein has a super-chain in the PDB database, there is a possibility that its secondary structure be predicted with around 80% accuracy.


Protein Secondary Structure Prediction; PDB; Super chains

Full Text:




  • There are currently no refbacks.

Copyright (c) 2016 Faruk Berat Akcesme, Mehmet Can

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License