The algorithm of metaDBSite

The metaDBSite algorithm

There are six popular web servers used in our metaDBSite approach: DISIS, DNABindR, BindN, BindN-RF, DP-Bind and DBS-PRED. Python programs are implemented to automatically submit the query sequence to each web server, to retrieve the result pages and parse the predicted binding residues. For each residue, six prediction servers predict it to be DNA-binding residue or not. In the same time, the six prediction results play roles as six parameters in the SVM machine learning method (LibSVM). MetaDBSite predicts whether a residue is DNA-binding residue through LibSVM using the models trained from a non-redundant 316 protein-DNA complexes dataset derived from PDB. In the result page provided by metaDBSite, the predicted results of each residue of the query protein sequence of six methods and metaDBSite method are all listed. The whole process for each query normally takes no more than 10 minutes with a parallel computational processes on our Ubuntu 9.04 system.

The metaDBSite algorith is illustrated in below:

The DISIS algorithm    top

DISIS was released in Bioinformatics in 2007. DISIS constructed its own dataset, which is not proteins, but residues. There are 23862 residues contact with DNA and 103202 residues not contact with DNA. The input features in DISIS contains evolutionary profile, sequence conservation, predicted secondary structure, and predicted solvent accessibility. The unique character of the machine learning method in DISIS was that they combined support vector machine (SVM) and neural network (NN) together to predict. Due to the use of evolutionary profile information, about five minutes or more calculation time was needed. The web server also provide users prediction results through Email.
Reference: Ofran Y, Mysore V, Rost B: Prediction of DNA-binding residues from sequence. Bioinformatics 2007, 23(13):i347-353. link.
The DISIS server is available here .

The DNABindR algorithm    top

DNABindR was developed in 2006. The dataset PISCES contains 171 protein-DNA complexes, was built by the author. The protein data was from PDB database, has the sequence identity <= 30%, resolution better than 3.0 Å, and at least 40 amino acid residues. The machine learning method was Naïve Bayes classifier. Sequence features used in DNABindR were relative solvent accessibility, sequence entropy, secondary structure, electrostatic potential, and hydrophobicity. The calculation time is very short, just several seconds. As described in the article, this method achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying DNA binding sites.
Reference: Yan C, Terribilini M, Wu F, Jernigan RL, Dobbs D, Honavar V: Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinformatics 2006, 7:262 link.
The DNABindR server is available here .

The BindN algorithm    top

BindN was a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid. Here, we only focus on its DNA binding site prediction. It shows its prediction results on the website directly. Just several minutes were needed. The machine learning method was support vector machine. The features used in BindN were the side chain pKa value, hydrophobicity index, molecular mass of an amino acid. The dataset were two. One is PDNA-62, which contains 62 protein-DNA complexes and was used in several studies. The other is PRINR25 dataset, which was collected from PDB database by the author, the principle of PRINR25 was the protein sequence identity <= 25%, and resolution better than 3.5 Å. 174 protein-DNA complexes were available in PRINR25.
Reference: Wang L, Brown SJ: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 2006, 34(Web Server issue):W243-248. link.
The BindN server is available here .

The BindN-rf algorithm    top

BindN-rf was developed by the same author of BindN. There are several difference between BindN and BindN-rf. A new dataset named PDC25t was collected here, it was not included in PDNA-62, also had less than 25% identity among them. The machine leaning method became random forest. Besides the input features used in BindN, three new parameters were added. They are blast-based conservation, biochemical feature, and position-specific scoring matrix (PSSM). BindN-rf performs quite well in metaDBSite, so we import it, although BindN existed.
Reference: Wang L, Yang MQ, Yang JY: Prediction of DNA-binding residues from protein sequence information using random forests. Bmc Genomics 2009, 10 Suppl 1:S1. link.
The DISIS server is available here .

The DP-Bind algorithm    top

DP-Bind is a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins, published in Bioinformatics APPLICATIONS NOTE in 2007. Three learning methods were used. They are support vector machine (SVM), kernel logistic regression (KLR), and penalized logistic regression (PLR). The server provide two features selection, sequence-based BLOSUM62 and PSSM-based matrix. Several minutes needed in calculation.
Reference: Hwang S, Gou Z, Kuznetsov IB: DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 2007, 23(5):634-636. link.
The DISIS server is available here .

The DBS-PRED algorithm    top

DBS-PRED is a DNA-binding site prediction tool developed by Japanese Shandar Ahmad. There were 3 dataset used in DBS-PRED. One was the general PDNA-62. The second dataset was NRTF-915, which was non-redundant and representative transcription factors collected from SWISS-PROT database. The third named CNTR-3332, was the control database of sequences not including transcription factors generated from SWISS-PROT database. The machine learning method was neural network. The sequence features contain protein sequence information, solvent accessibility, and secondary structure. The server can provide prediction results through Email, and calculation is very fast, only several seconds are needed.
Reference: Ahmad S, Gromiha MM, Sarai A: Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 2004, 20(4):477-486. link.
The DISIS server is available here .

 

 

If this server is useful for your work, please cite:

JingNa Si, Zengming Zhang, Biaoyang Lin, Michael Schroeder, Bingding Huang (2011), metaDBSite: a meta approach to improve protein DNA-binding site prediction , BMC Systems Biology, 5:S7 link.

Contact us

Bingding Huang(Project leader), Email: bhuang@biotec.tu-dresden.de
Zengming Zhang, Email: zmzhang@mail.systemsbiozju.org
Jingna Si, Email: jingna@mail.systemsbiozju.org

Report Bugs

If you find some bugs of this server, please help us improve it by reporting bugs to <Zengming Zhang>, any help from you will be greatly appreciated!

Acknowledgement

Funding from MOST China (grant no: 2008DFA11320) and EU 7th Framework Marie Curie Actions IRSES project (grant no: 247097) is acknowledged!