We label all structural units of the PDB on the following way:
Only amino acid residues and water molecules placed in the intersection of structural unit shapes are potential interactors. We apply atom type and distance criteria to compute interactions between structural unit pairs at physicochemical level. For hydrogen bonds we apply a ≤ 3.6 Å donor-acceptor distance. For salt bridges, we apply a ≤ 4 Å distance criteria. Van der Waals energies are defined by hydrophobic atoms at van der Waals radii distance.
We performed all-against-all PSAs of the contacting units for each family to be able to measure the similarity among binding regions. The alignments were performed with MAMMOTH program taking the Cα atoms into account and using a gap penalty function for opening and extension. The root-mean-squared deviation (RMSD) was not considered for measuring the similarity between two interfaces, as the superimposed members of the same family share a common structure.
The residues described in SCOWLP to be forming an interface were mapped onto the domain-pair structural alignment. We calculated a similarity index (Si) based on the number of interacting residues that overlap and the length of both interacting regions. We exclude the interacting residues located in gap regions in the structural alignment.
We cluster the binding regions of each SCOP family using the agglomerative hierarchical algorithm following several steps:
To re-compute the distances we used the complete-linkage method, which considers the distance between two clusters to be equal to the minimum similarity of the two members.
The result of the clustering can be represented in an intuitive tree or dendrogram, which shows how the individual contacting domains are successively merged at greater distances into larger and fewer clusters. The final PBRs depend on the Si cut-off that is set up. We pre-calculated the results for Si cut-offs at 0, 0.1, 0.2, 0.3 and 0.4 to offer a range of values that allow flexibility in the final analysis of PBRs. The SCOWLP web application offers the possibility to display the classification at any of these cut-off values.
In order to differentiate binding regions having single-interfaces from multi-interfaces, we identified in each binding region the partner for each contacting domain. Each binding region was divided in sub-clusters when there were different domain families interacting in the same binding region.
SCOWLP binding region clusters are taken at zero similarity for the first five classes of SCOP. A protein representative is taken per family that includes all family binding regions mapped on its sequence.
For each binding region, the interacting residues are mapped onto the structure-based sequence alignment. We also calculate the solvent accessibility for both representative proteins to distinguish the residues located in the core region from the solvent exposed ones, since core residues can not participate in recognition. We used NACCESS to calculate the solvent accessibility of each residue using a probe sphere of radius 1.4Å. A residue is considered accessible if its total relative accessible surface area (RSA) is more than 5%. We calculate the binding region conservation (BRC) as the ratio between the number of interacting residues located in structurally aligned regions that are also solvent exposed and the total number of interacting residues.
We assess the statistical significance of the BRC by estimating the p-values under the null hypothesis that two random protein families do not contain conserved binding regions. The estimation was carried out by calculating the BRC of 105 randomly selected samples of protein representative pairs and a binding region for each pair. The distribution of these scores was used to estimate the p-values obtained as (r+1)/(n+1), where n is the number of samples that have been simulated (105), and r is the number of these replicates that have a score greater than or equal to the BRC value for which we are estimating the p-value.41 Note that, as the sampling procedure can possibly contain undetermined cases of similar binding regions from the alternate distribution, these p-values are likely to be an underestimate of true significance, i.e. in some instances the real p-values will be much more significant. In a pairwise ns-SA, a binding region is inferred from one to another protein family if the conservation significance has a p-value ≤ 0.05.
The inferred binding regions (iBR) and the known family binding regions (kBR) were collected for each family. Since binding inferences may occupy equivalent surface regions in the family, we re-clustered the binding regions in a similar way as described above for SCOWLP. To make sure that the obtained kBR clusters from SCOWLP are not modified in this process, we set the similarity between these initial kBR to zero. Three distinguishable cluster types were obtained: 1) those that only contained one kBR, 2) those that only contained iBR (putative binding regions), 3) those that contained both.
Structural Bioinformatics Group. SCOWLP has been developed by Dr. Joan Teyra, Sven Schreiber and Dr. MT Pisabarro at BIOTEC of the TU-Dresden, Germany. All comments, suggestions, corrections and advices, should be sent to:
The tables composing the SCOWLP database can be download here:
Here, you can download a compressed file in sql format of the results obtained using the methodology explained in the paper: "Studies on the inference of protein binding regions across fold space based on structural similarities"
Please enter one of the following:
Use can use the filter to apply several restrictions to the choices shown below for better matching your interest.
Your can filter by the following categories:
Selecting all of a specific category will remove any existing restrictions for this category. To remove all existing restrictions at once just click reset all filters
Note: If you filter by more than one category, the results you get will have to match all of these restrictions.
Hint: To optimize the page space for analyzing the displayed information, you can minimize the filter without loosing the applied restrictions.
This is the linear representation of your current position within the SCOWLP hierarchy, starting by the structural classification of proteins levels (SCOP):
complemented with the interacting levels:
This line will always display your current "position" within this hierarchy. You can easily go back to a specific previous selection step by clicking on the according link. Selecting root will bring you back to the starting point of your search.
Note: If you click on one of the links, existing filters will remain active, until you select root
Use this inputs to control the marking of the interacting residues within the aminoacid sequences and alos in the viewer below.
Click here to switch between a more readable and a more compressed view of the aminoacid sequences.
Click here to choose a one-page layout optimized for interacting with the viewer.
Click here to choose a the standard one-page layout.
Click here to choose a one-page layout optimized for analyzing the aminoacid sequences.
Click here to choose a scrollable-page layout.