Home Tool Contact About Learn more Write to us

Learn more
about BactPC

Gain insights into the user manual,
the algorithm behind and
information on the datasets

User Manual


1. Submission

BactPC is a user-friendly tool. you can classify your sequence in just the process of two steps.
1.1 Upload your sequence
Upload the sequences through the input fields. you can either enter/paste the sequences manually in the text box (in FASTA format)


or upload a file that contains the sequence information. supported formats are .fasta, .fna, .ffn, .faa, .frn, .fa, .txt


1.2 click on the submit button to run the query


2. Results

your results will be displayed in the classification table like below



3. Column description

1. Sequence - your query sequence
2. Class - Prediction result (Bacteriocin / Non-Bacteriocin)
3. Probability - Probability of the prediction being correct
4. Subclass - Subclassification result if the sequence is a bacteriocin (Lantibiotic, Class II, Channel forming colicin family, Colicin/Pyosin nuclease family, Colicins ColE2/ColE8/ColE9 and pyocins S1/S2 family, other category)
5. Probability - Probability of the subclassification being correct

!!! [other category - class III, IV, V, unclassified, thiocillin]


Algorithm


The Prediction and Classification of bacteriocins are done by two different SVM models trained on different datasets. SVC has been selected for the job based on its performance when compared to other ML algorithms (Logistic Regression, Random forest, GuassianNB) by doing grid search. Here, the support vector classifier uses the rbf kernel for both the prediction and classification. The prediction model achieved an accuracy of 99.29% on the training data and 92.14% on the test data. On the other hand, the classification model achieved an accuracy of 100% on the training data and 94.64% on the test data.



Prediction model

Training data Test data
precision recall f1-score precision recall f1-score
Bacteriocins 0.99 1.00 0.99 0.96 0.89 0.92
Non-Bacteriocins 1.00 0.99 0.99 0.89 0.96 0.92
Accuracy : 0.99 Accuracy : 0.92


Classification model

Training data Test data
precision recall f1-score precision recall f1-score
Lantibiotic 1.00 1.00 1.00 0.96 0.92 0.94
Class II 1.00 1.00 1.00 0.91 0.95 0.93
Channel forming colicin family 1.00 1.00 1.00 1.00 0.98 0.99
Colicin/pyosin nuclease family 1.00 1.00 1.00 0.97 0.97 0.97
Colicins ColE2/ColE8/ColE9 and pyocins S1/S2 family 1.00 1.00 1.00 1.00 1.00 1.00
other 1.00 1.00 1.00 0.78 0.82 0.80
Accuracy : 1.00 Accuracy : 0.95

Data information


Sequence samples

No.of sequence per class Total no.of sequence
Prediction model Bacteriocins 350 700
Non-Bacteriocin 350
Classification model Lantibiotic 150 836
Class II 150
Channel forming colicin family 150
Colicin/Pyosin nuclease family 150
Colicins ColE2/ColE8/ColE9 and pyocins S1/S2 family 150
Other 86

Databases

The sequences for training and testing the models have been collected from the UniProtKB database. The Bactibase database was used for cross-verification.

UniProtKB - https://www.uniprot.org/

Bactibase - http://bactibase.hammamilab.org/