The algorithm of LIGSITEcsc.
LIGSITEcsc is an extension of LIGSITE. Instead of defining
protein-solvent-protein events on the basis of atom coordinates, it
uses the Connolly surface instead and defines surface-solvent-surface
First, the protein is projected onto a 3D grid. In order to minimize
the necessary grid size, we apply principal component analysis so that
the principal axis of the protein aligns with the x-axis, the second
principal axis with the y-axis and the third with the z-axis. For
the grid we use a step size of 1.0 Angstrom. The rotation does not affect
the quality of the results (data not shown), it only minimizes the
necessary grid size. Second, grid points are labelled as protein, surface,
or solvent using the following rules:
A grid point is marked as protein if there is at least one atom
within 1.6 Angstrom. Next, the solvent excluded surface is calculated
using the Connolly algorithm and the surface
vertices' coordinates are stored. In the Connolly algorithm, a
hypothetical probe sphere (usual radius 1.4 Angstrom) rolls over the
protein. The Connolly surface is a combination of the van der Waals
surface of the protein and the probe spheres surface, if the probe is
in contact with more than one atom.
A grid point is marked as surface if a surface vertex is
within 1.0 Angstrom. Note, that the distance thresholds ensure that all
surface grid points are also labelled as protein.
All other grid points are marked as solvent.
A sequence of grid points, which starts and ends with surface
grid points and which has solvent grid points inbetween, is called a
LIGSITEcsc scans the x, y, z directions and four cubic diagonals for
such surface-solvent-surface events. If a solvent grid point is part of at least
five surface-solvent-surface events, it is marked as pocket. Finally, all pocket
grid points are clustered according to their spatial proximity. I.e. if a pocket grid point
is within 3.0 Angstrom to a pocket grid point cluster, it is added to this cluster.
Otherwise, it becomes a new cluster. Next, the clusters are ranked by the number of grid points in the
cluster. The top three clusters are retained and re-ranked according to the degree of conservation of
the involved surface residues. To be precise, the conservation score is the average conservation of
all residues within 8 Angstrom of the pocket's surface grid points according to the