LcaMap: Simultaneous Identification of Duplications, Losses and Lateral Gene Transfers

 

This website provides a software package, called LcaMap, for simultaneous identification of duplication, losses and lateral gene transfers. LcaMap takes a gene tree G, a species tree S, and the costs of a lateral gene transfer, a gene duplication, and a gene loss as its input. It outputs all minimum-cost LCA-reconciliations between G and S.



Download

This package is implemented in C, and we provide executable files for download that can run on a Windows XP (x86), Windows 7 (x64), or Linux machine, respectively.

LcaMap : Windows XP (x86), Windows 7 (x64), Linux.

 

How to Use

Once you start the program, you are required to input

(1) the file name of a gene tree

(2) the file name of a species tree

(3) the costs of a lateral gene transfer, a gene duplication and a gene loss

(4) the file name to hold the outputs in format 1

(5) the file name to hold the outputs in format 2

 

To run the program properly, you need to prepare two files geneTreeFile and speciesTreeFile which contain a gene tree G and a species tree S on the same set of species, respectively. Both the gene tree and the species tree are in Newick format which ends with a semicolon. The name of each species should be a string consisting of letters in {a,...z,A,...,Z,0,...,9,_,.}. Here are an example gene tree and species tree. One output file for the example gene tree and species tree (obtained by setting the cost of each event to be~1) is here and the other is here. You can view the gene tree and the species tree using Dendroscope by Daniel H.Huson.

 

Output and Visualization

We provide two kinds of output formats.

Format 1: The output file contains, for each minimum-cost LCA-reconciliation M between G and S, the duplication vertices and transfer edges with respect to M. Each vertex u in this format is identified with the pre-order number of u in G. We provide a GUI(Java application) for visualizing G and the pre-order number of each vertex in G.

Format 2: For each minimum-cost LCA-reconciliation M between G and S, the output file contains a string s in a modified Newick format obtained from Newick format of G. The modified Newick format is defined as follows: for each transfer edge (v,u) associated with M, if u is a leaf of G, we add an asterisk (namely, *) immediately after u; otherwise, we add an asterisk immediately after the closing parenthesis corresponding to u in the original Newick format of G. Then, for each duplication vertex u, we add a caret (namely, ^) immediately after the closing parenthesis corresponding to u in the original Newick format of G. We also provide a GUI(Java application) for visualizing the modified Newick format. In the GUI, we use a filled red circle to indicate a duplication and a dashed line to indicate a lateral gene transfer.

After running the program, it will produce two data files which are specified by the user to hold the outputs in the two different formats.