The algorithm of LIGSITEcsc.

The Algorithm:
LIGSITEcsc is an extension of LIGSITE. Instead of defining protein-solvent-protein events on the basis of atom coordinates, it uses the Connolly surface instead and defines surface-solvent-surface events.

First, the protein is projected onto a 3D grid. In order to minimize the necessary grid size, we apply principal component analysis so that the principal axis of the protein aligns with the x-axis, the second principal axis with the y-axis and the third with the z-axis. For the grid we use a step size of 1.0 Angstrom. The rotation does not affect the quality of the results (data not shown), it only minimizes the necessary grid size. Second, grid points are labelled as protein, surface, or solvent using the following rules: A grid point is marked as protein if there is at least one atom within 1.6 Angstrom. Next, the solvent excluded surface is calculated using the Connolly algorithm and the surface vertices' coordinates are stored. In the Connolly algorithm, a hypothetical probe sphere (usual radius 1.4 Angstrom) rolls over the protein. The Connolly surface is a combination of the van der Waals surface of the protein and the probe spheres surface, if the probe is in contact with more than one atom.

A grid point is marked as surface if a surface vertex is within 1.0 Angstrom. Note, that the distance thresholds ensure that all surface grid points are also labelled as protein. All other grid points are marked as solvent. A sequence of grid points, which starts and ends with surface grid points and which has solvent grid points inbetween, is called a surface-solvent-surface event. LIGSITEcsc scans the x, y, z directions and four cubic diagonals for such surface-solvent-surface events. If a solvent grid point is part of at least five surface-solvent-surface events, it is marked as pocket. Finally, all pocket grid points are clustered according to their spatial proximity. I.e. if a pocket grid point is within 3.0 Angstrom to a pocket grid point cluster, it is added to this cluster. Otherwise, it becomes a new cluster. Next, the clusters are ranked by the number of grid points in the cluster. The top three clusters are retained and re-ranked according to the degree of conservation of the involved surface residues. To be precise, the conservation score is the average conservation of all residues within 8 Angstrom of the pocket's surface grid points according to the ConSurf-HSSP database.

Bioinformatics group Biotec TU Dresden