Pseudo-symmetry in SCOP domains

Introduction

Proteins are highly symmetric molecules. This observation goes back to the early days of crystallography, when hemoglobin was discovered to be built up of symmetric pairs of subunits. In the 4 decades since, more than 70,000 protein structures have been solved, many of which form symmetric oligomeric complexes. Symmetry has been found to be important for the understanding of protein evolution [1], DNA binding [2], allosteric regulation [3-4], and cooperative enzyme effects [5]. Symmetry can be observed relatively easily when it is part of the quaternary structure and is related to the duplication of polypeptide chains.

Method

The CE algorithm has been described elsewhere [6,7]. Here we apply two simple modifications to CE. The first is for the detection of internal symmetries. This modification is applicable to other existing structure alignment algorithms. When aligning a set of Calpha atoms against itself, any alignment algorithm would find the obvious result, an exact match of amino acid pairs. In order to be able to detect internal similarity this obvious match needs to be prevented. This can be achieved by masking the main diagonal of the alignment matrix. Here a window of size W (by default 8 residues) is masked with unfavourable scores. This forces the alignment to go through alternate paths and results in the detection of the maximum alternate similarity between two sets of atoms. The second modification that is required is the detection of circular permutations. This can be done by duplicating the alignment matrix and searching for paths which cross the duplication boundary, in a fashion analogous to [6].

[1] Lee and Blaber. PNAS (2011) vol. 108 (1) pp. 126-30
[2] Juo et al. J Mol Biol (1996) vol. 261 (2) pp. 239-54
[3] Monod et al. J Mol Biol (1965) vol. 12 pp. 88-118
[4] Changeux and Edelstein. Science (2005) vol. 308 (5727) pp. 1424-8[4]
[5] Goodsell and Olson. Annu Rev Biophys Biomol Struct (2000) vol. 29 pp. 105-53
[6] Shindyalov and Bourne. Protein Eng (1998) vol. 11 (9) pp. 739-47
[7] Prlic et al. Bioinformatics (2010) vol. 26 (23): 2983-2985.
[8] Uliel et al. Bioinformatics (1999) vol. 15 (11): 930-936

Results

Based on superfamilies in SCOPe 2.0, CE-Symm determines the following results:
SCOP class Number of superfamilies % symmetric
Overall 1766 18%
Class a - alpha 503 18.5%
Class b - beta 354 24.6%
Class c - alpha/beta 244 16.8%
Class d - alpha+beta 549 14.3%
membrane 108 23.8%
Download census : [ html ] [ xml ]