MSBE Online Document

Introduction
Installation
   Linux
   Windows
Input data
Running MSBE
   Constant bi-cluster
   Additive bi-cluster
Output

Introduction

MSBE is a tool for the analysis of gene expression data using a new bi-clustering method. It can find constant bi-clusters and additive bi-clusters. MSBE is now available for Linux and Windows XP. MSBE is implemented in Java. To run MSBE, Sun's Java Running Environment (JRE) 5.0 is required. If JRE is not installed, please go to http://java.sun.com/ to download and install the JRE.

Installation

Linux

1. Download the file MSBE_linux_1.0.5.tar.gz. Click here.
2. Extract the downloaded file.

tar xvfz MSBE_linux_1.0.5.tar.gz
cd MSBE_linux_1.0.5

3. Make sure that execute permission is set on the setup shell script

chmod +x setup.sh

4. Run the setup shell script

./setup.sh

5. Move the two generated shell scripts constantBi and additiveBi to a /bin directory.
For example

mv constantBi ~/bin
mv additiveBi ~/bin

Windows XP

1. Download the file MSBE_win_1.0.5.zip. Click here.
2. Installation. Extract the file "MSBE_win_1.0.5.zip". Double click the batch file "setup.bat" to install.

Input data

In the input file, the expression values in the data file are the the values precessed by logarithm from the raw expression data. The separating token of the input file is tab delimiter. The first line contains condition names. The first column contains the gene names. Non-missing elements are represented by strings of real numbers. Missing elements are represented by empty strings. All elements are tab delimited. For example, see file input_example.dat.

Running MSBE

1. Constant bi-cluster

Run constantBi with seven arguments: (1) input file name (2) alpha (3) beta (4) gamma (5) reference gene mode (6) reference gene (7) result file name. Input file name and result file name are self explanatory. Alpha, beta and gamma are three real number arguments discussed in the paper. The fifth and the sixth arguments designate the reference genes. We will illustrate the usage of the two arguments by examples. Suppose the input file name is "input_example.dat", the result file name is "result_example.txt", alpha = 0.4, beta = 0.5 and gamma = 1.2.

One reference gene is known.

Suppose the reference gene is the 1st gene in the expression data. We use

constantBi input_example.dat 0.4 0.5 1.2 single 1 result_example.txt

where the fifth argument "single" indicates there is only one reference gene, and the sixth argument "1" indicates the first gene is the reference gene.

Several reference genes are known.

Suppose the reference genes are the 10th 20th, 30th, 40th and 50th genes in the expression data. First, the reference genes are saved in a gene list file (for example gene_list_example.txt). Then, use

constantBi input_example.dat 0.4 0.5 1.2 list gene_list_example.txt result_example.txt

where the fifth argument "list" indicates there are a list of reference genes, and the sixth argument "gene_list_example.txt" indicates the gene list is in the file "gene_list_example.txt".

No reference gene is known.

In this case, a randomized algorithm is used. Supposed 100 reference genes are needed to consider. Use

constantBi input_example.dat 0.4 0.5 1.2 random 100 result_example.txt

where the fifth argument "random" indicates the reference genes are randomly selected., and the sixth argument "100" indicates 100 genes are randomly selected.

2. Additive bi-cluster

Run additiveBi with nine arguments: (1) input file name (2) alpha (3) beta (4) gamma (5) reference gene mode (6) reference gene (7) reference condition mode (8) reference condition (9) result file name. Beside of the seventh and the eighth arguments, the other seven arguments are the same with constantBi. The seventh and the eighth arguments designate the reference conditions. Similar with the reference genes, there are three cases. Suppose the input file name is "input_example.dat", the result file name is "result_example.txt", alpha = 0.4, beta = 0.5, gamma = 1.2 and the reference gene is the 1st gene.

One reference condition is known.

Suppose the reference condition is the 1st condition in the expression data. We use

additiveBi input_example.dat 0.4 0.5 1.2 single 1 single 1 result_example.txt

where the seventh argument "single" indicates there is only one reference condition, and the eighth argument "1" indicates the first condition is the reference condition.

Several reference conditions are known.

Suppose the reference conditions are the 1th 2th, 3th, 4th and 5th conditions in the expression data. First, the reference genes are saved in a condition list file (for example condition_list_example.txt). Then, use

additiveBi input_example.dat 0.4 0.5 1.2 single 1 list condition_list_example.txt result_example.txt

where the seventh argument "list" indicates there are a list of reference conditions, and the eighth argument "condition_list_example.txt" indicates the condition list is in the file "condition_list_example.txt".

No reference condition is known.

In this case, a randomized algorithm is used. Supposed 100 reference conditions are needed to consider. Use

additiveBi input_example.dat 0.4 0.5 1.2 single 1 random 100 result_example.txt

where the seventh argument "random" indicates the reference conditions are randomly selected., and the eighth argument "100" indicates 100 conditions are randomly selected.

Output

In the bi-clustering result file, the discovered bi-clusters are arranged by the decreasing order of their sizes . Each bi-cluster is described using three lines. For constant bi-clusters, the first line contains the serial number, the number of genes, the number of conditions, the reference gene and the average similarity of the bi-cluster. For additive bi-clusters, the first line contains the serial number, the number of genes, the number of conditions, the reference gene, the reference condition and the average similarity of the bi-cluster. The second line is the list of the genes of the bi-cluster and the third line is the list of conditions of the bi-cluster. For example, see constant_result_example.txt and additive_result_example.txt and.