User Manual

MECOM is a command line program. It can be launched just by typing mecom in the command line. However, several parameters are request in order to carry out a proper analysis.

Once MECOM is correctly installed, this manual can be obtained by typing:

$ man mecom

Also, you can obtain a summarized help typing:

$ mecom --help

General usage

$ mecom [--pdb <pdbfile> --contactfile <strfile>] --chain <chainid> --alignment <msafile>
[ --proximityth <float> --exposureth <float> --exposuretherror <float> --informat <msaformat>
--oformat <msaformat> --gc <int> --ocontact <filepath> --report <htmpath> --struct]

Options summary

Command line option Description
--help Display a summarized help document
--pdb [1] A valid PDB file path
--contactfile [1] A valid *.str file path. See Contact File section for further explanation.
--alignment (*) A valid multiple sequence alignment (MSA) file path
--chain (*) Chain Id annotated at the PDB
--ocontact A valid file path where output structural analysis will be written (default 'data.str'). See Contact File section for further explanation.
--exposureth Exposure threshold. The value used to distinguish between exposed and buried residues (default 0.05)
--exposuretherror Exposure threshold margin (default 0)
--proximityth The value in Angstroms for the maximum distance between two atoms to be considered as contact pair (default 4)
--informat File format for multiple alignment provided by the user (default 'fasta'). See MSA Valid format section for a complete list of readable MSA formats
--oformat File format for multiple alignment retrieved by MECOM (default 'clustalw'). See MSA Valid format section for a complete list of readable MSA formats
--gc Genetic code for sequences within the input alignment (default 0). See Genetic codes section for a complete list of genetic codes available
--report File path where the program will write a HTML report with the results (default ./report.html)
--struct Carry out just the structural analysis

(*) Required arguments
[1] Just one of them is required

Data preparation

Before running MECOM, input data (PDB or MSA files) must meet certain criteria:

  1. For PDB files: chain ids must be unique. That is, in PDBs with more than one atomic model, such as homo-dimers, each chain must be uniquely identified. If necessary, the user may edit the PDB file to satisfy this requirement
  2. For MSA files: the alignment must be an ungapped alignment. Unexpected error may occur if MSA file contains gaps.

Contact file (*.str)

The so-called "Contact file", which usually uses the extension .str, is a plain text file that contains a table with the results of the structural analysis carried out during the first step of the program. It contains the information regarding which subunits and residues are involved in intermolecular contacts, as well as information about exposure and residue type.

Example:

Raw Table for subunit M

ChainID ChainID2 Res num. AA AA2 Contact (th=4) Exposition (th=0.05)
M A 1 I V(A)|K(A)|D(A)|T(A) 1 1
M A 2 T V(A)|L(A)|T(A) 1 1
M A 3 A L(A)|T(A)|E(A) 1 1
M A|D 4 K S(D)|E(A)|R(A)|L(A) 1 1
M -- 5 P - 0 1
M A 6 A K(A)|S(A) 1 1
M A 7 K K(A)|S(A)|E(A) 1 1
M A|L 8 T S(A)|K(A)|A(A)|K(L) 1 1
  • ChainID: The identity of the subunit currently being analysed
  • ChainID2: The identity of the subunit (or subunits) being contacted
  • Res num.: The PDB annotated residue number
  • AA: Contacting residue
  • AA2: Contacted residue(s) and, in brackets, the corresponding chain(s)
  • Contact (th=prox. th.): If the value is 1, the current residue is in close proximity (<--proximityth) to other subunit, in any other case
  • Exposition (th= exp. th.): If the value is 1, the current residue is exposed (>--exposureth) and 0 if the residue is buried

This information is used as a conventional database. MECOM extracts fractions from this table in order to build multiple sequence alignments.

MECOM will write this file into the specified path through the argument --ocontact or as data.str by default.

For subsequent analyses, the user may wish to use this contact file instead of the pdb, as an input file. In this way, the more heavy computational process is bypassed.

Genetic codes

There are 11 different genetic codes, corresponding to transl_table of GENEBANK. As value for the argument --gc, the user must provide one of the following integers to specify the genetic code used to translate DNA alignments. The default value is 0 (Standard).

Value Genetic code
0 Standard
1 Vertebrate mitochondrial
2 Yeast mitochondrial
3 Mold mitochondrial
4 Invertebrate mitochondrial
5 Ciliate nuclear
6 Echinoderm mitochondrial
7 Euplotid mitochondrial
8 Alternative yeast nuclear
9 Ascidian mitochondrial
10 Blepharisma nuclear

If the selected genetic code do not correspond with the origin of the user provided MSA, stop codons may be introduced in translation. If that occurs, the program will not work correctly and an unexpected error will be dumped.

MSA Valid formats

Through the argument --informat and/or --oformat, the user must give a valid MSA format (see above). The valid MSA formats are listed below:

Format Description
bl2seq Bl2seq Blast output
clustalw clustalw (.aln) format
emboss EMBOSS water and needle format
fasta FASTA format
maf Multiple Alignment Format
mase mase (seaview) format
mega MEGA format
meme MEME format
msf msf (GCG) format
nexus Swofford et al NEXUS format
pfam Pfam sequence alignment format
phylip Felsenstein PHYLIP format
prodom prodom (protein domain) format
psi PSI-BLAST format
selex selex (hmmer) format
stockholm stockholm format

Specifically, mase, stockholm and prodom have only been implemented for input.
If no format is specified and a filename is given, then the module will attempt to deduce the format from the filename suffix. If this is unsuccessful, a fasta format is assumed.

The format name is case insensitive; FASTA, Fasta and fasta are all treated equivalently.

Single subunit analysis examples

  • The workflow for a single subunit analysis is carried out as explained in home page. The simplest way to launch it, using the datasets provided as examples here, is leaving all the optionals values by default, and only the requested arguments are introduced:

    $ mecom --pdb 2OCC.pdb --chain M --alignment ChainM_alignment.fas

    In this example, MECOM will carry out the analysis for the subunit M (COX 8B) from the cytochrome c oxidase complex (also referred to as complex IV), also called subunit 8B.

  • In complex IV of the respiratory chain, the subunits A, B and C are encoded by the mitochondrial genome. Thus, the option --gc, whose default value is 0 (Standar) must be set to 1 (Vertebrate mitochondrial) in the case of the example alignments. Thus, to analyse a mtDNA-encoded subunit, the command line instruction must be as follows:

    $ mecom --pdb 2OCC.pdb --chain A --alignment ChainA_alignment.fas --gc 1

  • The execution of the previous examples may be somewhat slow because the involved structural analyses are demanding in computational terms. However, once these calculations have been carried out, a new file with the structural results is written and allocated in the path specified by the option --ocontact (default data.str). Thus, for subsequent analyses, this file can be provided by the user as an input file through the option --contactfile. In this case, the option --pdb becomes optional, and the analysis will be faster:

    $ mecom --pdb 2OCC.pdb --contactfile data_for_chain_A.str --chain A --alignment ChainA_alignment.fas --gc 1

    By this way, the option --pdb becomes optional.

  • Other interesting option is --contactwith. This option allows to focus on those interactions with residues belonging to the specified chains. For example, if the user wants to ignore contacts from chain A with the others mtDNA-encoded subunits (B and C), and restrict the analyses to contacts with residues from other chains, then the user can specify the chains to be included in the analysis. For instance, to analyse the contacts of chain A with nDNA-encoded chains, we must type:

    $ mecom --pdb 2OCC.pdb --contactfile data_for_chain_A.str --chain A --alignment ChainA_alignment.fas --gc 1 --contactwith "D E F G H I J K L M Q R S T U V W X Y Z"

    Ignoring chains A, B, C, N, O and P, which are encoded by the mitochondrial genome.

Multiple subunits analysis example

MECOM implements a Bioperl method to concatenate alignments. Thus, the program can carry out evolutive analysis of several subunits simultaneously.

  • The following example adresses the evolutive behaviour of mitochondrial encoded subunits as a whole. The contacts with nDNA-encoded residues are analysed:

    $ mecom --pdb 2OCC.pdb --contactfile data_for_chains_ABC.str --chain "A B C" --alignment "ChainA_alignment.fas ChainB_alignment.fas ChainC_alignment.fas" --gc 1 --contactwith "D E F G H I J K L M Q R S T U V W X Y Z" --report reportABC_nu.html

To avoid problems during the execution of the process, it is important to realise the following: i) The order of the options --chain and --alignment, given between quotes, should not been alterated, ii) The same number chain identifiers and MSA files should be provided. In this example, the results will be written in the file reportABC_nu.html.


Output

This program provides multiple output files in order to report a detailed view at each step. Four different classes of files are created after running:

  1. Structural file: A file (usually with the extension *.str) containing the results from structural analysis (surface exposure and residue proximity). Path can be set by the option --ocontact
  2. Sub-alignments: Several MSA files, one for each existent category, with the specified format by the option --informat (default fasta)
  3. Evolutive results: Several files, one for each existent category, with the extension *.dat, which contain the results of the evolutive analysis carried out by PAML (yn00)
  4. HTML report: A html file with a summary of the input data and parameters, path to the recently created files, statistical results and the list of codon positions corresponding to each category. This html output file path can be set by the option --report

Contact with authors

Any feedback from users will be very welcome. For this purpose a contact form can be found here. They will take into account every questions and suggestions.