Function prediction system based on oligopeptide frequency distance

Japanese


Overview

We have developed a method to correlate function-unknown proteins with COG categories solely on oligopeptide frequency distance (OPD). The OPD method is suitable for predictiong the functions of proteins with low sequence homology.
The procedure of function prediction is as follows.

  1. Protein fragmentation
     Fragmentation is performed for analysis independent of sequence length.
  2. Di, tri, and tetra peptide frequencies calculations
     Dipeptide frequencies were calculated with 20 amino acid.
    (The 400 (=20^2) dimensional vectorial data were abbreviated as Di20.)
     Tripeptide frequencies were calculated with the degenerate 11 groups of residues, in which amino acids having similar physico-chemical properties were grouped as the same residue: {V, L, I}, {T, S}, {N, Q}, {E, D}, {K, R, H}, {Y, F, W}, {M}, {P}, {C}, {A} and {G}.
    (The 1331 (=11^3) dimensional vectorial data were abbreviated as Tri11.)
     Tetrapeptide frequencies were calculated with the degenerate 6 groups of residues, in which amino acids having similar physico-chemical properties were grouped as the same residue: {V, L, I, M}, {T, S, P, G, A}, {E, D, N, Q}, {K, R, H}, {V, F, W} and {C}
    (The 1296 (=6^4) dimensional vectorial data were abbreviated as Tetra6.)
  3. Directly compare the results of di, tri, and tetra peptide frequencies calculations with the total protein continuous frequency information in the database
     Calculate the Euclidean distances of both, and make the protein with the smallest Euclidean distance as the prediction candidate
  4. Selection of final prediction function candidates
     Of the functions selected from each fragment, more than 60% of the functions will be final candidates
  5. Final prediction
     Final prediction of protein function according to final prediction conditions


Prediction of novel protein function

Based on the input file data, protein function prediction is performed.
Please enter the file in FASTA format and select the final prediction condition, fragmentation size and step size.
※There are currently no jobs waiting to be processed
File
※Please enter the FASTA format file with 1000 data items and amino acid 1 letter abbreviation
※This is used for fragmentation
※This is used for fragmentation

Confirmation of protein function prediction result

Display the protein function prediction result page.
Enter the Function prediction result reference ID displayed when predicting new protein function.

Function prediction system Download

You can download and use the program of the method used in this system.
It is recommended to download and use in the following cases.
  • When you have a better analysis environment than this system
    Usage of this service CPU :
    Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz(cpu cores 6, processors 12) × 2
  • When there are many jobs waiting to be processed by this system
  • When you want to analyze more than 1000 data at the same time
  • When you want to set an arbitrary value (other than 50, 100, 200) to the fragmentation size
A file called README.docx is included in the file group to be downloaded.
Please refer to this file for explanation of this method and usage of the program in the file.