Protein Conformational Features

The following designations are systematically used to describe the geometrical features of pentapeptides:

A is the CA-atom of the 1st residue;
B is the CA-atom of the 2nd residue;
C is the CA-atom of the 3rd residue;
D is the CA-atom of the 4th residue;
E is the CA-atom of the 5th residue;


K is the center of mass of the 1st, 2nd and 3rd residues;
I is the center of mass of the 2nd, 3rd and 4th residues;
J is the center of mass of the 3rd, 4th and 5th residues;


M is the center of mass of a pentapeptide;
P is the center of mass of a protein analyzed;
L is the center of mass of the 10-peptide preceding given pentapeptide;
R is the center of mass of the 10-peptide following given pentapeptide;
G is the center of mass of all residues in 25Å shell around pentapeptide analyzed.




Local conformational features of a pentapeptide takes into account:

(a) the distances, {AC, AD, AE, BE, CE, AB, BC, BD, CD and DE} defined for the 5 CA-atoms, ABCDE, of a pentapeptide (here: XY is for the distance between X and Y);

(b) the distances, {MA, MB, MC, MD and ME}, between the center of mass, M, of the pentapeptide and each of the CA-atoms;

(c) all posible plane angles, {ABE, ACD, ACE, ADE, BCE, ABC, ABD, BCD, BDE, CDE}, defined by the pentapeptide ABCDE (here: XYZ is for the angle between vectors XY and YZ.

(d) all posible plane angles, {AMC, AMD, AME, BME, CME, AMB, BMC, BMD, CMD, DME} at the center of mass, M, of the pentapeptide ABCDE;

(e) all posible dihedral angles, {ABCD, ABCE, ABDE, ACDE, BCDE} defined for pentapeptide ABCDE (here: XYZU is for the angle between the 1st plane defined by points XYZ and the 2nd plate defined by points YZU);

(f) all posible dihedral angles, {MBDA, MBDC, MBDE}, that are used to estimate diviations of a pentapeptide conformation from the plane defined by points MBD mentioned above;


(g) torsion angles, {AMCE and BMCD}, that are used to estimate a twist of the pentapeptide around its symmetry axis defined by points M and C;

(h) phi angles for residues in the pentapeptide;

(i) psi angles for residues in the pentapeptide;

(j) exposure values for side chains of residues in the pentapeptide;

(k) polarity values for the the side chains of residues in the pentapeptide;

(l) secondary structure values for residues in the pentapeptide defined by Kabsh-Sander algorithm.



Sequence similarity by mutational matrix takes into account the standard PAM-matrix (Dayhoff, 1972) of "Accepted point mutation" that is used to compare amino acid sequencies of aligned proteins. For pentapeptides, residue types at all 5 positions are used as a conformational feature. These features describe a "substitution ability" of side chains in the pentapeptide.

Properties of amino acids takes into account physical, chemical and statistical features of amino acid residues for the pentapeptide analysed. These features are fully defined by the amino acid sequence and describe an "optimal conformation" of both main and side chains of a pentapeptide. Properties include:
  1. exposure, polarity, hydrophylity, isoelectric point, volume, number of chemical bonds, molecular weight, Chou-Fasman alpha- and beta-structural coefficiens;
  2. observed frequencies of residues in protein analysed;
  3. observed frequencies of residues in a fragment of 25Å centered at the pentapeptide.
When applied to measuring protein similarity these features can be combined.

Neighboring 10-peptide conformational features takes into account 10-peptides preceding and following the pentapeptides analysed. These features are used to characterize the orientation of the main chain of a pentapeptide relative to the nearest environment to reflect spatial positioning of this pentapeptide in the main-chain of the protein analyzed. Each of the 10mers is described by one point at its center of mass (R is for the center of mass of the preceding peptide ; L the center of mass following peptide). Conformational features described here can be easily obtained by applying the same calculations as for computing the Globular conformational features, but replacing the protein center of mass P with R and L, the centers of mass of 10-peptides preceding and following the pentapeptide, respectively.

Conformational features about a 25Å Centroid takes into account the residues of the protein which are located in a 25Å shell around the pentapeptide. These features are used to represent pentapeptide location relative to its environment in the globular protein. All residues in the shell are described by their center of mass, G. Conformational features described here can be easily obtained by applying the same calculations as for computing globular conformational features, but replacing the protein center of mass P with G.

Topological conformational features take into account the above mentioned centers of mass, {L, M, P, G, R}, and three additional centers of mass. These features are defineded to take into account some of the topological properties of the main-chain fragment centered at the pentapeptide. The following three additional masss-centers are introduced here:
  1. K defined by 3 CA-atoms A, B and C of pentapeptide;
  2. I defined by 3 CA-atoms B, C and D of pentapeptide;
  3. J defined by 3 CA-atoms C, D and E of pentapeptide.

Analogous to
Local conformational features of the pentapeptide where the pentapeptide ABCDE and its center of mass M have been used, topological conformational features are calculated for 2 pseudo-pentapeptides and centers of mass chosen as follows:
  1. pseudo-pentapeptide LKMJR and center of mass G;
  2. pseudo-pentapeptide MLIRG and center of mass P.
All conformational features provided for local conformational features of the pentapeptide are likewise calculated for topological conformational features.

Globular conformational features take into account the center of mass, P, of the protein analysed. This feature describes how a pentapeptide is located with respect to the center of mass of the protein. The following conformational features are used:
  1. the distances, {PA, PB, PC, PD and PE}, between the center of mass, P, and each of the CA-atoms of a pentepeptide;
  2. the distance function d2(XY,ZU)=(d(XY-d(ZU))/(d(XY)+d(ZU) (where d(XY) is Euclid's distance between X and Y) applied to PX and MX distances defined above (here X is for A, B, C, D or E);
  3. distance function d2 (see 2.) applied to the distances PX and PM (from CA-atoms to the center of mass of both protein and pentepeptide analysed, here X is for A, B, C, D or E);
  4. all posible plane angles, {APC, APD, APE, BPE, CPE, APB, BPC, BPD, CPD, DPE}, at the center of mass of the protein, P, and the CA-atoms of the pentapeptide ABCDE;
  5. dihedral angles, {PABE, PACD, PACE, PADE, PBCE, PABC, PABD, PBCD, PBDE, PCDE, APMC, APMD, APME, BPME, CPME, APMB, BPMC, BPMD, CPMD, DPME, PACM, PADM, PAEM, PBEM, PCEM, PABM, PBCM, PBDM, PCDM, PDEM} defined by combinations of four points from the following: protein center of mass P, pentapeptide center of mass M and CA-atoms A, B, C, D, E.
  6. plane angles defined by the vector P(M between the protein's center of mass and the pentapeptide and by each of the vectors {A(C, A(D, A(E, B(E, C(E, A(B, B(C, B(D, C(D, D(E}.