New Page 1

1. Input

Data type

LocRepeat accepts three kinds of data, DNA, RNA or Protein. A protein string contains 23 letters {A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V, B, Z, X}, where B represents D or N, Z represents E or Q, X represents an unknown or nonstandard amino acid.

Data format

The input data file is a text file, which contains a string over a specified alphabet. A number could be used to indicate the position of the first letter in a line. Space, numbers and other characters in the input file will be ignored during the process.

Input data

There are two ways to input data to the program.

1. Click “New Data” button and input the data in the edit box.

2. Click “Input Data from File” and select the input file.

2. Set Parameters for score scheme

For DNA or RNA sequences, there are four parameters, match, mismatch, indel (insertion and deletion) and granularity factor. For Protein sequence, one should set granularity factor and select a similarity matrix, for example PAM.

Granularity factor is a negative integer between -100 and -1. The default value is -1.

Match is a positive integer between 1 and 100. The default value is 6.

Mismatch is a negative integer between -1 and -100. The default value is -10.

Indel (Insertion and Deletion) is a negative integer between -100 and -1. The default value is -10. And we assume that the value of an indel is less than that of a granularity factor.

Scoring Matrix: Six matrixes are provided in the software: PAM40, PAM120, PAM250, BLOSUM45, BLOSUM62 and BLOSUM80. If SELF DEFINED is selected, a matrix file should be provided. The default file is “matrix.txt”. The matrix file is a text file and “matrix.txt” is an example of the standard format.

3. Run

To run the program, click the “Start” button.

4. Save Output

To save the result into a file,

1. Click the “Save Output” button.

2. Input the file name in the file dialog.

3. Click “OK” in the file dialog.