Secondary Structure Segments are Much More Conserved than Primary Sequence Segments

Faruk Berat Akcesme, Mehmet Can


To be biologically functional, all proteins must adopt specific folded three-dimensional structures. Some believes in that the genetic information for the protein specifies only the primary structure, the linear sequence of amino acids in the polypeptide backbone, and most purified proteins can spontaneously refold in vitro after being completely unfolded, so the three-dimensional structure must be determined by the primary structure (Creighton, 1990). How this occurs has come to be known as 'the protein folding problem'. As a part of the protein folding problem, the existence of similar substrings in diverse proteins is remarkable. Some scientist call it “conserved core” which echoes the claim that all proteins diversified from a common ancestor protein, and these similar pieces of the two or several proteins are the substrings that resisted the pressure of the evolution. Due to naturally-occurring (DNA fails to copy accurately) and external influences just like ultraviolet radiation, electromagnetic fields, atomic radiations, protein coding genes and proteins may undergo some changes by the time in response to mutations. The rate of these mutations is strongly correlated to the intensity of the environmental conditions, and it is not possible to estimate a constant rate just in the case of radioactive decay. Also there is no much evidence that the diversity of proteins relies on only these mutations. For this reason we prefer the term "similar substrings". In this paper we focused in the relation between primary and secondary structure mismatches of the substrings of length seventeen residues. We have seen that the mismatches in the corresponding secondary structure sequence substrings of the same length lags behind primary mismatches. We constructed a conditional probability landscape that resembles the conditional probability of a certain secondary substring mismatch given the primary substring mismatch. This landscape shows that even when 6-7 mismatches exist in two primary substrings of length 17 that belong to the two different proteins, the probability of full match of corresponding secondary structure substrings is remarkable. We downloaded primary and secondary sequences of all 303,524 proteins of the PDB protein databank. Eliminating the duplicates and proteins of residue length less than 30, we have got a non redundant database of 80,592. We developed a search algorithm FIND-SIM to find similar primary sequence substrings in a query protein and target proteins. Some examples of full secondary structure matches of short substrings corresponding to short primary structure substrings with high mismatches are given.


protein amino acid sequence; secondary structure; FIND-SIM

Full Text:




  • There are currently no refbacks.

Copyright (c) 2016 Faruk Berat Akcesme, Mehmet Can

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License