Conformational Likeness - Tutorial

Last Update August 19, 1997


CL Home | Complete Database Search | Detailed 1-on-1 | HELP | Tutorial | Citing | Similar Resources


Contents

Overview
Methodology
Examples
Comparison to Other Methods


Overview

Conformational Likeness (CL)is a methodology for finding proteins with common 3-D motifs, where a motif is a complete polypeptide chain or a fragment of a polypeptide chain. Motifs are defined by a set of geometric-, physicochemical- and sequence-based properties. CLcan be used in one of two modes:

  1. A complete database search using as a probe a user submited motif or a motif from the PDB againsts a target database of the complete PDB;
  2. A detailed analysis comparing one motif against another.

The methodology is differentiated from other methods by the variety of properties that can be used in the alignment and the speed with which it operates. The speed makes it Web accessible.

The Methodology - A Step by Step Qualitative Explanation

Step 1 Define likeness profiles for each structure in the PDB based upon pentamer values of:

A total of 495 properties are stored.

Step 2 Characterize these properties in a way that makes biological sense when querying:

  1. Local - local features of the pentapeptide (geometry, properties of amino acids, secondary structure)
  2. Neighbor - with respect to decamers on either side of the pentamer
  3. Environment - within a 25 Å radius
  4. Topology - with respect to centers of mass of neighbors and environment
  5. Protein - with respect to center of mass of the protein
  6. Feature - Environmental and static properties with respect to the sequence
  7. Sequence - PAM matrix (Pearson)

For efficiency each query will use a subset of roperties from the 7 groups.

At this point you have a conformational likeness database for all protein structures in the PDB. Our goal is to update this nightly so that it remains current with the primary source of structural data..

There are now two options:

  1. A detailed one-on-one comparison (Step 3) or
  2. A fast comparison against the whole database (Step 4).

Step 3 (for detailed one-on-one comparison)

Step 3a Calculate partial differences - done differently for different properties

Step 3b Convert partial differences to absolute scale - a profile - by comparison to random pentamers..

Step 3c Use dynamic programming to align profiles

Step 3d Display the results as:

Step 4 (for comparison against the whole database)

The methodology used depends on whether you are searching for a fragment or a complete polypeptide chain. For meaningful results (i.e. not too many and not too few hits), fragments demand a more exact match that whole polypeptide chains.

Step 4a (for protein fragments)

A fast search is performed for fragments of the same size where the likeness exceeds a user defined likeness threshold between 0 an 1. This is performed on a subset of properties.

Step 4b (for complete polypeptide chains)

This is more difficult since you would like to find matches when only part of the chain matches part of other chains in the database - you do not want common features lost because of parts of the chains that do not match. This introduces the idea of frequencies of particular conformational features within a given polypeptide chain. If the given chain has no alpha helix, i.e. a frequency of zero, this is a strong indicator for use in comparison. That is, while it will not distinguish between a beta barrel type structure and a beta sandwich type structure, it will reduce the number of possible structures significantly. The frequency of other properties can be used in a similar way.

For both steps 4a and 4b, results are displayed as a list of entities (polypeptide chains for the most part) with a likeness index. At the very least the starting chain or fragment will be returned with a likeness index of 1.0


Examples

Example 1

What structures are conformationally similar to thw cAMP dependent protein kinase (PDB code 1ATP)?

We have found that NeighDist (distances associated with decamers proceeding and folowing the pentemer) is a good parameter to use in finding structures with a similar overall topology. A database search with:

revealed the following 31 structures (based upon a June 1997 database):

1)  1.000 # 1ATP:E # $C-/AMP$-DEPENDENT PROTEIN KINASE (E.C.2.7.1.37) 
2)  0.856 # 2CPK:E # $C-/AMP$-DEPENDENT PROTEIN KINASE (E.C.2.7.1.37) 
3)  0.854 # 1YDR:E # MOL_ID: 1; MOLECULE: C-AMP-DEPENDENT PROTEIN KINA 
4)  0.838 # 1YDS:E # MOL_ID: 1; MOLECULE: C-AMP-DEPENDENT PROTEIN KINA 
5)  0.819 # 1APM:E # $C-/AMP$-DEPENDENT PROTEIN KINASE (E.C.2.7.1.37) 
6)  0.817 # 1YDT:E # MOL_ID: 1; MOLECULE: C-AMP-DEPENDENT PROTEIN KINA 
7)  0.750 # 1CDK:B # MOL_ID: 1; MOLECULE: CAMP-DEPENDENT PROTEIN KINAS 
8)  0.723 # 1CDK:A # MOL_ID: 1; MOLECULE: CAMP-DEPENDENT PROTEIN KINAS 
9)  0.518 # 1CMK:E # CAMP-DEPENDENT PROTEIN KINASE CATALYTIC SUBUNIT ( 
10) 0.511 # 1CSN:_ # MOLECULE: CASEIN KINASE-1; EC: 2.7.1.-; HETEROGEN 
11) 0.440 # 2CSN:_ # MOL_ID: 1; MOLECULE: CASEIN KINASE-1; CHAIN: NULL 
12) 0.421 # 1CTP:E # CAMP-DEPENDENT PROTEIN KINASE (E.C.2.7.1.37) (CAP 
13) 0.388 # 1BMF:E # MOL_ID: 1; MOLECULE: BOVINE MITOCHONDRIAL F1-ATPA 
14) 0.384 # 1GOL:_ # MOL_ID: 1; MOLECULE: EXTRACELLULAR REGULATED KINA 
15) 0.374 # 1CXT:A # MOL_ID: 1; MOLECULE: DIMETHYLSULFOXIDE REDUCTASE; 
16) 0.367 # 1BMF:D # MOL_ID: 1; MOLECULE: BOVINE MITOCHONDRIAL F1-ATPA 
17) 0.365 # 1EFR:E # MOL_ID: 1; MOLECULE: BOVINE MITOCHONDRIAL F1-ATPA 
18) 0.364 # 1FIN:A # MOL_ID: 1; MOLECULE: CYCLIN-DEPENDENT KINASE 2; C 
19) 0.363 # 1KOB:B # MOL_ID: 1; MOLECULE: TWITCHIN; CHAIN: A, B; FRAGM 
20) 0.362 # 1EFR:D # MOL_ID: 1; MOLECULE: BOVINE MITOCHONDRIAL F1-ATPA 
21) 0.360 # 1COW:D # MOL_ID: 1; MOLECULE: BOVINE MITOCHONDRIAL F1-ATPA 
22) 0.360 # 1CKJ:B # MOL_ID: 1; MOLECULE: RECOMBINANT CASEIN KINASE I 
23) 0.350 # 1COW:E # MOL_ID: 1; MOLECULE: BOVINE MITOCHONDRIAL F1-ATPA 
24) 0.349 # 1CXS:A # MOL_ID: 1; MOLECULE: DIMETHYLSULFOXIDE REDUCTASE; 
25) 0.343 # 1GPM:A # MOL_ID: 1; MOLECULE: GMP SYNTHETASE; CHAIN: A, B, 
26) 0.342 # 1KOB:A # MOL_ID: 1; MOLECULE: TWITCHIN; CHAIN: A, B; FRAGM 
27) 0.342 # 1TCO:A # MOL_ID: 1; MOLECULE: SERINE/THREONINE PHOSPHATASE 
28) 0.333 # 1JST:C # MOL_ID: 1; MOLECULE: CYCLIN-DEPENDENT KINASE-2; C 
29) 0.333 # 1FIN:C # MOL_ID: 1; MOLECULE: CYCLIN-DEPENDENT KINASE 2; C 
30) 0.332 # 1BMF:F # MOL_ID: 1; MOLECULE: BOVINE MITOCHONDRIAL F1-ATPA 
31) 0.325 # 1JST:A # MOL_ID: 1; MOLECULE: CYCLIN-DEPENDENT KINASE-2; C 

These are the list of structures that would be expected and corresponds to the list found with DALI, with two exceptions. 1IRK - the tyrosine receptor kinase and 1PHK - the phosphorylase kinase. [BTW DALI did not detect the mitogen activated kinase (1GOL)].

Notice the difference in the likeness values of say 1ATPE(1.0) and 1CTPE(0.421). The difference is the shift between the open and closed conformation that occurs on substrate binding. 1ATPE is a closed conformation, whereas 1CTPE is an open conformation. This difference is clearly seen in the detailed alignment by conformational likeness using:

The stereo view clearly shows the displacement of the smaller upper lobe relative to the more stable larger lobe and illustrates the sensitivity of topological similarity to small changes in overall conformation.

Example 2

How sensitive is the method in detecting the similarity reported by Vriend and Sander (Proteins 11:52-58, 1991) between ferredoxin (2Fe-2S) and ubiquitin?

Ubiquitin is involved in protein breakdown via covalent conjugates, whereas ferredoxin in an electron carrier in the photoreduction of cytochrome c. That is there is no apparent functional similarity and no significant sequence homology.

Here we use a local and topological comparison between the two proteins to detect a similarity. The cutoff is based upon pentapeptide likeness. The structure superposition is on the best fragment and not the whole structure.

The following indicates the strong structural homology that exists between these two proteins.

In fact the structure alignment agrees closely with that of Vriend and Sander.

The structure superposition shown at the begining of all CL pages illustrates the homology between these two structures.

Example 3

How good a job can we do of detecting a EF hand as a classic calcium binding motif? Consider two well known calcium binding proteins:

Both have two calcium binding domains as shown in the following:

The approximate motifs are:

These intersections are quite clear on the following conformational likeness plot:

Comparison to Other Methods

This is reported in a Proteins paper currently in the review stage.