MalSig - Malaria Secretory Signal Predictions


Help File

  1. Enter one or more sequences in FASTA format (this should include an identification line beginning with ">" and followed by the protein amino acid sequence). The sequences can be typed in, or a file containing FASTA sequences can be uploaded, using the "Browse..." button.   eg.

    Sequences should begin with M. As the aim of this program is to predict classical and non-canonical secretory signals at or near the N-terminus of a protein it is important that the N-terminal region of the protein sequence is present.

    Any characters within the protein sequence that are not contained in the 20 amino acid single letter abbreviations 'acdefghiklmnpqrstvwy' will be ignored by the program (this includes numbers and spaces).

  2. Select any advanced options required.

  3. Click on the "Submit sequences" button.

  4. A results table will be displayed showing predictions made for your proteins sequences. Note that this may take a little time if the advanced options are chosen or many sequences are entered.

Advanced Options

External predictions
As another source of information about possible secretory signals, additional information can be obtained from PSORT, SignalP and PlasmoAP. The protein sequences are sent externally to these sites and cleavage sites (or in the case of PlasmoAP apicoplast target peptide predictions) are returned for each sequence and displayed in a table alongside fuzzy predictions. This increases the reliability of predictions made, but also increases the time the web page takes to display the predictions.

N-terminal cleavage
Recessed secretory signals are not predicted well by standard secretory signal prediction algorithms. To improve these predictions the N-terminal region before the hydrophobic region can be removed before the sequence is sent to the external prediction servers. Only use this option if you are interested the presence of a recessed secretory signals in your protein sequences.

Explanation of Output

Fuzzy Logic Predictions
Four columns are present in the output table from the fuzzy logic prediction. The first three represent the likelihood of the protein containing a classic signal, recessed signal, and transmembrane domain. These range from 0 to 1. A prediction of 0.5 or more is considered significant, and when these occur they are highlighted. These three predictions are summarised textually in the column titled "Prediction".

Low Complexity and Hydrophobicity
The low complexity scale represents the percentage of low complexity regions, identified using SEG (Wooten and Federhen, 1996). The average hydrophobicity is calculated using the sum of the hydrophobicity values (from the Kyte and Doolittle scale) after removing membrane-spanning regions and signal sequences. These scales were used in our analysis to identify potential antigens.

External Predictions
Cleavage site predictions are made by PSORT and SignalP. SignalP makes two predictions, one using neural networks (NN) and one using a hidden markov model (HMM). The values presented for these cleavage sites are either amino acid positions or -/*, the latter representing problems in collecting results from the prediction server. In some cases SignalP doesn't predict a cleavage site. For the NN this results in a blank output or a -, and for the HMM a -1 is displayed.

PlasmoAP Symbol Definition
PlasmoAP makes apicoplast targeting predictions in terms of +,- and 0. These are defined as:
++   very likely apicoplast
+     likely apicoplast
0     unknown
-     unlikely apicoplast

If there is a problem with the input, such as the sequence not beginning with M, * will be displayed in the fuzzy predictions and the error message will be in the prediction column.
If there is a problem with predictions from an external source - or * will be displayed in the column for that prediction. For proteins where this occurs try going to the web page of the external program and running your query from there (for the address see the Links page).

System details
This prediction server uses a Fuzzy logic algorithm, with Mamdani-style fuzzy inference(Mamdani and Assilian, 1975). The system is composed of six membership functions, three for input and three for output, and 20 rules. It uses centroid defuzzification. The fuzzy logic prediction is made via a stand-alone fuzzy C-file provided by MATLAB, and Python is used to process sequences, obtain predictions, and present results.
When a protein sequence is entered the program first generates a hydrophobicity plot and uses this to calculate values for where the hydrophobic region starts, its length, and the maximum value of hydrophobicity in the region. This uses a Kyte and Doolittle hydropathy plot with window size 15 (Kyte and Doolittle, 1982). These values are then used as inputs to generate a prediction using fuzzy logic.

Kyte, J. and Doolittle, R.F. (1982) A Simple Method for Displaying the Hydropathic Character of a Protein. Journal of Molecular Biology, 57, 105-132.
Mamdani, E.H. and Assilian, S. (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Journal of Man-Machine Studies, 7, 1-13.
Wooten, J.C. and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods in Enzymology, 266, 554-71.