LIden

ˇˇ

1. What does LIden do?

LIden determines allele sharing status among family members using high-density SNP data. It takes SNP genotype data, pedigree structure and physical-locus-file as input, and outputs allele sharing graphical display files and linked regions. For details, see the paper (submitted).

2. Download

The program is written in java. And we provide java binary class files for download. The main class is in "Haplotyping.class".  Currently the program is only for Windows users. Here is the package, including a sample case.

3. How to USE

JRE is required on the machine to run java program, please first download and install JRE1.5.0_15-b04 (from www.sun.com) on the machine.

Please use command line options to run the program. First set the working path to the directory of the program. Then use command line options as follows:

    [-h] <--for help

    -p <--pedigree file

    -g <--genotype filename

    -l <--physical locus filename

    [-n]<--number of markers, default 30000

    [-d]<--whether to delete unuseful markers, 0 do not delete; 1, delete; default(recommended): 1

    [-e]<--estimated error rate

    [-m] <--inheritance model, 0 for dominant, 1 for recessive. default dominant

    [-u] <--penetrance (maximum number of unaffected individuals allowed to share mutation allele)

    [-a] <--phenocopy (maximum number of affected individuals allowed not to share mutation allele)

     [] <--- optional parameters

ˇˇ

A typical command is:

>java -ms64m -mx256m -cp Haplotyping.jar Haplotyping -p sample_pedigree.txt -g sample_genotype.txt -l sample_physical_locus.txt

(note: "-ms64m" and "-mx256m" are parameters for Java Virtual Machine, which allocate memory for the program with minimum 64MB and maximum 256MB)

4. Input files

LIden requires three input files:

(1)Pedigree file:

Each row contains information for one individual. There are 6 columns, person-name, father-name, mother-name, gender, disease status, and availability with '-1' indicating missing genotype data. The following is a small example containing three individuals, D, E and F.

        D    -1    -1    male      affected        A
        E    -1    -1    female    unaffected    A
        F    D     E     male       affected       A

  A complete sample is in the package named "sample_pedigree.txt".

(2)Genotype file:

The first row contains individual-name, as used in the pedigree file. For the rest of rows, each row contains genotype information of all individuals for one SNP marker. Every cell in the row contains genotype data of corresponding individual for that locus represented by 'A','B', or any other letters describing the genotypes, while '?' indicates a missing value. The following is a small example containing three individuals, D, E and F, with genotype data for five SNP sites.

    D     E     F 
  AA   AA   AA
  BB   BB   BB
  AA   AA   AA
  BA   AA   AA
  BA   BB   BB

A complete sample is in the package named "sample_genotype.txt".

 (3)Physical locus file:

This file contains the physical loci for the SNP markers in "genotype file". Thus the number of rows should be equal to the number of the genotyped loci. The following is a sample containing five SNP sites.

886727
2948696
2960027
2974991
2996733

A complete sample is in the package named "sample_physical_locus.txt".

5. Generating graphical figures:

The current version produces .plt files and text files for founder maternal and founder paternal allele, and both files are required for generating graphical display files for allele sharing status. Users need to download Windows version of gnuplot( http://www.gnuplot.info) and generate .eps file from the .plt files. One can start "wgnuplot.exe" program by double clicking. A window will pop up. Click the ˇ°openˇ± tab, and select a .plt file, then  an .eps file with the same name will be generated.

6. Output

The program can output the inferred haplotype for each member, generate the graphical display for founder allele sharing status, and report the linked regions.

(1).   Inferred haplotype for every individual is stored in "haplotype1.txt".  A complete sample is in the package named "haplotype1.txt".

(2).   Allele sharing status are shown in the .eps file generated by the program (See 5.Generating graphical figures) . For the meaning of the figure, please refer to the paper (submitted).

(3).   The physical position of the linked region is reported in the file "linked_region.txt". For each dataset, we provide three solutions, which are produced by three methods respectively. The first one is always the best. An example is shown as follows.

   Solution 1:
169174855 ---> 197295627
Solution 2:
165007734 ---> 197295627
Solution 3:
38792697 ---> 54898706
165007734 ---> 197295627
205198475 ---> 213511064
236005585 ---> 245120412

ˇˇ

ˇˇ