1. What does LIden do?
LIden determines allele sharing status among family members using high-density SNP data. It takes SNP genotype data, pedigree structure and physical-locus-file as input, and outputs allele sharing graphical display files and linked regions. For details, see the paper (submitted).
The program is written in java. And we provide java binary class files for download. The main class is in "Haplotyping.class". Currently the program is only for Windows users. Here is the package, including a sample case.
3. How to USE
JRE is required on the machine to run java program, please first download and install JRE1.5.0_15-b04 (from www.sun.com) on the machine.
Please use command line options to run the program. First set the working path to the directory of the program. Then use command line options as follows:
[-h] <--for help
-p <--pedigree file
-g <--genotype filename
-l <--physical locus filename
[-n]<--number of markers, default 30000
[-d]<--whether to delete unuseful markers, 0 do not delete; 1, delete; default(recommended): 1
[-e]<--estimated error rate
[-m] <--inheritance model, 0 for dominant, 1 for recessive. default dominant
[-u] <--penetrance (maximum number of unaffected individuals allowed to share mutation allele)
[-a] <--phenocopy (maximum number of affected individuals allowed not to share mutation allele) <--- optional parameters
A typical command is:
>java -ms64m -mx256m -cp Haplotyping.jar Haplotyping -p sample_pedigree.txt -g sample_genotype.txt -l sample_physical_locus.txt
(note: "-ms64m" and "-mx256m" are parameters for Java Virtual Machine, which allocate memory for the program with minimum 64MB and maximum 256MB)
4. Input files
LIden requires three input files:
Each row contains information for one individual. There are 6 columns, person-name, father-name, mother-name, gender, disease status, and availability with '-1' indicating missing genotype data. The following is a small example containing three individuals, D, E and F.
-1 -1 male
E -1 -1 female unaffected A
F D E male affected A
A complete sample is in the package named "sample_pedigree.txt".
The first row contains individual-name, as used in the pedigree file. For the rest of rows, each row contains genotype information of all individuals for one SNP marker. Every cell in the row contains genotype data of corresponding individual for that locus represented by 'A','B', or any other letters describing the genotypes, while '?' indicates a missing value. The following is a small example containing three individuals, D, E and F, with genotype data for five SNP sites.
AA AA AA
BB BB BB
AA AA AA
BA AA AA
BA BB BB
A complete sample is in the package named "sample_genotype.txt".
(3)Physical locus file:
Thisfile contains the physical loci for the SNP markers in "genotype file". Thus the number of rows should be equal to the number of the genotyped loci. The following is a sample containing five SNP sites.
A complete sample is in the package named "sample_physical_locus.txt".
5. Generating graphical figures:
The current version produces .plt files and text files for founder maternal and founder paternal allele, and both files are required for generating graphical display files for allele sharing status. Users need to download Windows version of gnuplot( http://www.gnuplot.info) and generate .eps file from the .plt files. One can start "wgnuplot.exe" program by double clicking. A window will pop up. Click the ˇ°openˇ± tab, and select a .plt file, then an .eps file with the same name will be generated.
The program can output the inferred haplotype for each member, generate the graphical display for founder allele sharing status, and report the linked regions.
(1). Inferred haplotype for every individual is stored in "haplotype1.txt".A complete sample is in the package named "haplotype1.txt".
(2). Allele sharing status are shown in the .eps file generated by the program (See 5.Generating graphical figures) . For the meaning of the figure, please refer to the paper (submitted).
(3). The physical position of the linked region is reported in the file "linked_region.txt". For each dataset, we provide three solutions, which are produced by three methods respectively. The first one is always the best. An example is shown as follows.
169174855 ---> 197295627
165007734 ---> 197295627
38792697 ---> 54898706
165007734 ---> 197295627
205198475 ---> 213511064
236005585 ---> 245120412