Bioinformatics Project

Protein Subcellullar Localization Using Evolution Information  

 


CONTENT


TEAM MEMBERS

Top of this page


INTRODUCTION

Subcellular location is a key functional characteristic of proteins. However, experimental subcelluar localization analysis is time-consuming and can not be performed on genome scales. With the rapidly increasing number of sequences in databases, an accurate, reliable and efficient system is necessary to automate the prediction of protein subcellular locations.

At present, a number of subcellular localization methods have been introduced. These methods may be grouped into three categories. One category is based on the existence of sorting signals in N-terminal sequences including signal peptides, mitochondrial targeting peptides and chloroplast transit peptides. Emanuelsson et al. proposes an integrated prediction system using an artificial neural network based on individual sorting signal predictions. This system can be used to find cleavage sites in sorting signals and simulate the real sorting process to a certain extent. Nevertheless, the prediction accuracy of the methods based on sorting signals is highly dependent on the quality of the protein N-terminal sequence assignment. Unfortunately, it is usually unreliable to annotate the N-terminal using known gene identification methods. As a result, the prediction accuracy and reliability decrease when signals are missing or are only partially included.

The second category studies whole sequence information such as amino acid composition, dipeptide and residue couples etc. This category should trace back to Nakashima & Nishikwa’s study in which they discovered that the intracellular and the extracellular proteins could be accurately discriminated only by amino acid composition. After that, different statistical methods and machine learning methods have been used to improve prediction accuracy based on amino acid compostion, didpeptide compostion or residue couple composition.

The third category is is synthetical. It integrates signal peptide information or whole sequence information with other features such as physical & chemical properties protein domain information and n-gram et al. A number of multi-predictors have also been introduced in recent years. PSORT-B integrated amino acid composition, similarity to proteins of known localization, presence of a signal peptide, transmembrane alpha-helices and motifs corresponding to specific localizations; Bhasin & Raghava developed another synthetical method which infused amino acid composition, composition of physico-chemical properties, dipeptide composition, residue couples and EuPSI-BLAST.

Top of this page


OBJECTIVE

Top of this page


SYSTEM DESCRIPTION

 

1. Input the primary sequence of an compartment-unknown protein.
2. Use the input sequence as a query to search the SWISSPROT protein database using PSI-BLAST program.
3. Find a number of homologous proteins from SWISSPROT and generate position specific scoring matrix from the multiple alignment of these sequences.
4. Calculate the amino acid composition from N-terminal part and whole position specific scoring matrix.
5. Input the feature vector into a probabilistic neural network classifier.
6. Output the predicted subcellular compartment (location).

Top of this page


CONTACT

Any suggestions or comments are welcome. Please send them to Howard Leung

Top of this page