Molecular Evolution of protein COMplexes

Identifying properties that influence protein evolutionary rate is an important objective in Biology. Structural determinants deserve particular attention, because different regions within a protein can evolve differentially. This is particular relevant for polypeptides that are part of large multimeric complexes. Thus, we have developed a new Perl program, MECOM, which combines sequence analyses with structural information to assists us to test whether different subsets of residues (defined according to structural criteria) evolve a different paces. The workflow implemented by MECOM is as follows:
  1. First, for each subunit from a given protein complex, structural data are processed so that each residue is labelled as “Exposed” or “Buried” depending on its accessibility to the solvent. Equally, each residue is labelled as “Contact” or “NonContact” depending on its physical proximity to residues from a different chain. The information regarding the contacted residue (and chain) is retrieved and can be used to further characterize the contacting residue. For instance, using the cytochrome c oxidase complex, which is formed by nDNA- and mtDNA-encoded subunits, one can distinguish the residues contacting with mtDNA-encoded amino acids from those interacting with nDNA-encoded residues.
  2. Second, once each residue has been conveniently labelled, the program, using a nucleotide multiple sequence alignment provided by the user, carries out a codon sorting and it returns files that contain multiple sequence alignment for each subset (for instance, “Exposed NonContact”, “Buried NonContact”, etc).

    Figure 1: An example illustrating the workflow implemented by MECOM. Each COX I residue (a mtDNA-encoded subunit of cytochorme c oxidase)is classified into six different categories or subsets. Afterwards, new multiple alignments are built following that classification. Click here or over the image to enlarge.

  3. Third, using these alignment subsets, MECOM calls the program ‘yn00’ from the PAML package to calculate, among other statistics, the synonymous (dS) and nonsynonymous (dN) substitutions for all the pairwise comparison within a subset. The sum of these nonsynonymous sequence divergence for all the pairwise comparisons is denoted as ΣdN[i], where i indicates the considered subset.
  4. Finally, MECOM calculates the so-called interaction ratio, ΣdN[i]/ΣdN[j], and it runs a Z-Test to assess the significance of any departure from 1.

As an open source software, this program can be improved by collaborative development. Other structural, evolutive and statistical analyses could be easily implemented to gain deeper insights into the molecular evolution of protein complexes.