BactPC is a user-friendly tool. you can classify your sequence in just the process of two steps.
1.1 Upload your sequence
Upload the sequences through the input fields. you can either enter/paste the sequences manually in the text box (in FASTA format)
or upload a file that contains the sequence information. supported formats are .fasta, .fna, .ffn, .faa, .frn, .fa, .txt
1.2 click on the submit button to run the query
your results will be displayed in the classification table like below
1. Sequence - your query sequence
2. Class - Prediction result (Bacteriocin / Non-Bacteriocin)
3. Probability - Probability of the prediction being correct
4. Subclass - Subclassification result if the sequence is a bacteriocin (Lantibiotic, Class II, Channel forming colicin family, Colicin/Pyosin nuclease family, Colicins ColE2/ColE8/ColE9 and pyocins S1/S2 family, other category)
5. Probability - Probability of the subclassification being correct
!!! [other category - class III, IV, V, unclassified, thiocillin]
The Prediction and Classification of bacteriocins are done by two different SVM models trained on different datasets. SVC has been selected for the job based on its performance when compared to other ML algorithms (Logistic Regression, Random forest, GuassianNB) by doing grid search. Here, the support vector classifier uses the rbf kernel for both the prediction and classification. The prediction model achieved an accuracy of 99.29% on the training data and 92.14% on the test data. On the other hand, the classification model achieved an accuracy of 100% on the training data and 94.64% on the test data.
Prediction model
Training data | Test data | |||||
---|---|---|---|---|---|---|
precision | recall | f1-score | precision | recall | f1-score | |
Bacteriocins | 0.99 | 1.00 | 0.99 | 0.96 | 0.89 | 0.92 |
Non-Bacteriocins | 1.00 | 0.99 | 0.99 | 0.89 | 0.96 | 0.92 |
Accuracy : 0.99 | Accuracy : 0.92 |
Classification model
Training data | Test data | |||||
---|---|---|---|---|---|---|
precision | recall | f1-score | precision | recall | f1-score | |
Lantibiotic | 1.00 | 1.00 | 1.00 | 0.96 | 0.92 | 0.94 |
Class II | 1.00 | 1.00 | 1.00 | 0.91 | 0.95 | 0.93 |
Channel forming colicin family | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.99 |
Colicin/pyosin nuclease family | 1.00 | 1.00 | 1.00 | 0.97 | 0.97 | 0.97 |
Colicins ColE2/ColE8/ColE9 and pyocins S1/S2 family | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
other | 1.00 | 1.00 | 1.00 | 0.78 | 0.82 | 0.80 |
Accuracy : 1.00 | Accuracy : 0.95 |
No.of sequence per class | Total no.of sequence | ||
---|---|---|---|
Prediction model | Bacteriocins | 350 | 700 |
Non-Bacteriocin | 350 |
Classification model | Lantibiotic | 150 | 836 |
---|---|---|---|
Class II | 150 | ||
Channel forming colicin family | 150 | ||
Colicin/Pyosin nuclease family | 150 | ||
Colicins ColE2/ColE8/ColE9 and pyocins S1/S2 family | 150 | ||
Other | 86 |
The sequences for training and testing the models have been collected from the UniProtKB database. The Bactibase database was used for cross-verification.
UniProtKB - https://www.uniprot.org/
Bactibase - http://bactibase.hammamilab.org/