PROSPERous: an integrative tool for predicting protease cleavage sites

 
General

The PROSPERous server facilitates the in silico identification of cleavage sites of various proteases (See a full list below). It covers four major protease families- Aspartic (A), Cysteine (C), Metallo (M) and Serine (S), encompassing 75 individual proteases. A number of cleavage site scoring functions are provided, based on different cleavage site P4-Pn', where n=1, 2, 3 and 4.

A complete analysis of a submitted substrate sequence involves the following steps:

  1. Input sequence(s): This step involves submission of the substrate sequence(s) in the FASTA format.
  2. Select protease: The users need to specify a protease of interest in order to submit the sequence and predict potential cleavage sites of that protease.
  3. Select cleavage site P4-Pn': PoPS works by calculating scores at each position of the cleavage site P4-Pn' (n=1, 2, 3 and 4 ) based on the scoring functions. In other words, selection of a cleavage site P4-Pn' window has an impact on the prediction performance. Depending on the protease family of interest, prediction performance difference between different window sizes vary between 1 and 4%.
  4. Select scoring function: Users need to specify one of the seven different scoring functions in order to make the prediction: Nearest Neighbor Similarity (NNS), Amino Acid Frequency (AAF), WebLogo-based Sequence conservation (WLS), BLOSUM62 Substitution Index (BSI), their combinations: AAF+NNS, WLS+BSI, NNS+WLS as well as the Logistic Regression model.
  5. Select top ranking results: PROSPERous provides users an option to list the top 1, 3, 5, 10 and 20 predicted results which will appear at the first result webpage.

 

Table 1. The statistics of substrate datasets used to develop PROSPERous server. All the substrates of proteases were extracted from the MEROPS database (Rawlings et al., 2006; 2008). Each substrate dataset of a protease can be downloaded by clicking the hyperlink of each MEROPS ID of the corresponding protease family in this table.

 

Protease class
Protease family
Number of substrate sequences
Number of cleavage sites
P4-P4' sequence logo
Aspartic protease
pepsin A(A01.001)
11
34
 
cathepsin D (A01.009)
38
141
 
cathepsin E (A01.010)
17
60
 
phytepsin (A01.020)
5
22
 
nemepsin-2 (A01.068)
4
123
 
HIV-1 retropepsin (A02.001)
284
473
 
Cysteine protease
papain (C01.001)
5
28
 
cathepsin L (C01.032)
21
63
 
cathepsin L1 ({Fasciola} sp.) (C01.033)
6
172
 
cathepsin S (C01.034)
6
23
 
cathepsin K (C01.036)
99
115
 
falcipain-2 (C01.046)
3
120
 
cathepsin B (C01.060)
22
45
 
falcipain-3 (C01.063)
2
97
 
peptidase 1 (mite) (C01.073)
7
20
 
cathepsin B-like peptidase, nematode (C01.101)
4
43
 
calpain-1 (C02.001)
45
87
 
calpain-2 (C02.002)
38
125
 
caspase-1 (C14.001)
44
54
 
caspase-3 (C14.003)
304
426
 
caspase-7 (C14.004)
81
96
 
caspase-6 (C14.005)
64
174
 
caspase-8 (C14.009)
43
61
 Metallopeptidase
matrix metallopeptidase-1 (M10.001)
28
59
 
matrix metallopeptidase-8 (M10.002)
22
76
 
matrix metallopeptidase-2 (M10.003)
705
1661
 
matrix metallopeptidase-9 (M10.004)
47
225
 
matrix metallopeptidase-3 (M10.005)
56
155
 
matrix metallopeptidase-7 (M10.008)
47
105
 
matrix metallopeptidase-12 (M10.009)
27
119
 
matrix metallopeptidase-13 (M10.013)
29
94
 
membrane-type matrix metallopeptidase-1 (M10.014)
38
116
 
membrane-type matrix metallopeptidase-3 (M10.016)
4
20
 
matrix metallopeptidase-20 (M10.019)
6
26
 
membrane-type matrix metallopeptidase-6 (M10.024)
6
40
 
mirabilysin (M10.057)
2
28
 
meprin beta subunit (M12.004)
13
32
 
procollagen C-peptidase (M12.005)
18
20
 
ADAM10 peptidase (M12.210)
12
20
 
ADAM17 peptidase (M12.217)
23
37
 
ADAMTS4 peptidase (M12.221)
17
57
 
ADAMTS5 peptidase (M12.225)
15
37
 
insulysin (M16.002)
12
56
 
mitochondrial processing peptidase beta-subunit (M16.003)
52
54
 
eupitrilysin (M16.009)
10
34
 
aminopeptidase Ap1 (M28.002)
9
38
 Serine protease
chymotrypsin A (cattle-type) (S01.001)
221
531
 
granzyme B ({Homo sapiens}-type) (S01.010)
265
318
 
kallikrein-related peptidase 5 (S01.017)
11
21
 
kallikrein-related peptidase 14 (S01.029)
15
34
 
elastase-2 (S01.131)
191
321
 
cathepsin G (S01.133)
168
270
 
myeloblastin (S01.134)
8
21
 
granzyme A (S01.135)
210
261
 
granzyme B, rodent-type (S01.136)
156
162
 
chymase ({Homo sapiens}-type) (S01.140)
25
33
 
kallikrein-related peptidase 2 (S01.161)
13
27
 
kallikrein-related peptidase 3 (S01.162)
12
45
 
coagulation factor Xa (S01.216)
14
27
 
thrombin (S01.217)
91
113
 
plasmin (S01.233)
45
100
 
glutamyl peptidase I (S01.269)
512
959
 
HtrA2 peptidase (S01.278)
18
55
 
subtilisin Carlsberg (S08.001)
5
27
 
high alkaline protease ({Alkaliphilus transvaalensis}) (S08.028)
1
24
 
peptidase K (S08.054)
6
39
 
kexin (S08.070)
37
58
 
furin (S08.071)
78
90
 
proprotein convertase 1 (S08.072)
30
61
 
proprotein convertase 2 (S08.073)
24
45
 
cucumisin (S08.092)
1
20
 
prolyloligopeptidase (S09.001)
14
22
 
signal peptidase I (S26.001)
291
291
 
thylakoidal processing peptidase (S26.008)
49
50
 
signalase (animal) 21 kDa component (S26.010)
359
359

 

Detailed explanation of individual fields of the input form will be given below. Some fields contain default values.

Detailed explanations

Input sequence

Please input and submit the substrate sequence(s) in the FASTA format. PROSPERous can accept a maximum number of 1000 substrate sequences once a time.

An example of two substrate sequences in the FASTA format are shown below:

>P55957
MDCEVNNGSSLRDECITNLLVFGFLQSCSDNSFRRELDALGHELPVLAPQWEGYDELQTDGNRSSHSRLGRIEADSESQEDIIRNIARHLAQVGDSMDRSIPPGLVNGLALQLRNTSRSEEDRNRDLATALEQLLQAYPRDMEKEKTMLVLALLLAKKVASHTPSLLRDVFHTTVNFINQNLRTYVRSLARNGMD
>O75496
MNPSMKQKQEEIKENIKNSSVPRRTLKMIQPSASGSLVGRENELSAGLSKRKHRNDHLTSTTSSPGVIVPESSENKNLGGVTQESFDLMIKENPSSQYWKEVAEKRRKALYEALKENEKLHKEIEQKDNEIARLKKENKELAEVAEHVQYMAELIERLNGEPLDNFESLDNQEFDSEEETVEDSLVEDSEIGTCAEGTVSSSTDAKPCI

In cases where users have a significant number of substrate sequences to predict, we recommend that you contact us.

Select protease

Please select one of the proteases from the drop-down menu in order to submit your query sequence. PROSPERous can predict the substrate cleavage sites for 90 proteases.

 

Select cleavage site P4-Pn':

Users Select cleavage site P4-Pn': P4-P1', P4-P2', P4-P3' or P4-P4'

Cleavage site scoring functions:

The scoring function is a critical determinant of the prediction performance of the tool. Briefly, several different types of scoring functions to choose from are available in PROSPERous. These include Nearest Neighbor Similarity (NNS), Amino Acid Frequency (AAF), WebLogo-based Sequence conservation (WLS), BLOSUM62 Substitution Index (BSI), as well as combinations of pairs of scoring functions, namely AAF+NNS, WLS+BSI and NNS+WLS. For each protease, the scoring functions used to assess and rank the potential cleavage sites are further used as input features to inform a logistic regression model.

 

Flowchart

                                                                 Fig. 1. The flowchart of the PROSPERous web server

 

The flowchart of PROSPERous is shown in Fig. 1. Processing a query sequence using the server involves several steps. Firstly, users need to choose a proper cleavage site pattern P4-Pn' (n = 1, 2, 3 and 4) to score the potential cleavage site. Choosing an optimal window for the cleavage site sequence is most relevant for the prediction, and primarily depends on expert knowledge. However, in the absence of such knowledge we recommend users to choose the P4-P2' window to make the prediction, as previous studies have indicated that this window can lead to the overall best performance for predicting cleavage sites for a number of proteases. Secondly, users need to choose an appropriate scoring function, or a combination of two such functions. Upon query submission, the submitted sequence will be scanned against the known cleavage site database. The score for each potential P4-Pn' cleavage site will be calculated based on the selected scoring function and the top-ranking results will be displayed on the screen.

 

If you have any queries about or suggestions to improve PROSPERous, please send Email to Jiangning.Song@monash.edu