Explanation of Output
Fuzzy Logic Predictions
Four columns are present in the output table from the fuzzy logic prediction. The first three represent the likelihood of the protein containing a classic signal, recessed signal, and transmembrane domain. These range from 0 to 1. A prediction of 0.5 or more is considered significant, and when these occur they are highlighted. These three predictions are summarised textually in the column titled "Prediction".
Low Complexity and Hydrophobicity
The low complexity scale represents the percentage of low complexity regions, identified using SEG (Wooten and Federhen, 1996). The average hydrophobicity is calculated using the sum of the hydrophobicity values (from the Kyte and Doolittle scale) after removing membrane-spanning regions and signal sequences. These scales were used in our analysis to identify potential antigens.
Cleavage site predictions are made by PSORT and SignalP. SignalP makes two predictions, one using neural networks (NN) and one using a hidden markov model (HMM). The values presented for these cleavage sites are either amino acid positions or -/*, the latter representing problems in collecting results from the prediction server. In some cases SignalP doesn't predict a cleavage site. For the NN this results in a blank output or a -, and for the HMM a -1 is displayed.
PlasmoAP Symbol Definition
PlasmoAP makes apicoplast targeting predictions in terms of +,- and 0. These are defined as:
++   very likely apicoplast
+     likely apicoplast
-     unlikely apicoplast
If there is a problem with the input, such as the sequence not beginning with M, * will be displayed in the fuzzy predictions and the error message will be in the prediction column.
If there is a problem with predictions from an external source - or * will be displayed in the column for that prediction. For proteins where this occurs try going to the web page of the external program and running your query from there (for the address see the Links page).
This prediction server uses a Fuzzy logic algorithm, with Mamdani-style fuzzy inference(Mamdani and Assilian, 1975). The system is composed of six membership functions, three for input and three for output, and 20 rules. It uses centroid defuzzification. The fuzzy logic prediction is made via a stand-alone fuzzy C-file provided by MATLAB, and Python is used to process sequences, obtain predictions, and present results.
When a protein sequence is entered the program first generates a hydrophobicity plot and uses this to calculate values for where the hydrophobic region starts, its length, and the maximum value of hydrophobicity in the region. This uses a Kyte and Doolittle hydropathy plot with window size 15 (Kyte and Doolittle, 1982). These values are then used as inputs to generate a prediction using fuzzy logic.
Kyte, J. and Doolittle, R.F. (1982) A Simple Method for Displaying the Hydropathic Character of a Protein. Journal of Molecular Biology, 57, 105-132.
Mamdani, E.H. and Assilian, S. (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Journal of Man-Machine Studies, 7, 1-13.
Wooten, J.C. and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods in Enzymology, 266, 554-71.