clusterdihedral

Assign frames into clusters based on binning of backbone dihedral angles in amino acids.

clusterdihedral [phibins <N>] [psibins <M>] [out <outfile>] [dihedralfile <dfile> | <mask>] [framefile <framefile>] [clusterinfo <infofile>] [clustervtime <cvtfile>] [cut <CUT>]

Cluster frames in a trajectory using dihedral angles. To define which dihedral angles will be used for clustering either an atom mask or an input file specified by the dihedralfile keyword should be used. If dihedral file is used, each line in the file should contain a dihedral to be binned with format:

ATOM#1   ATOM#2   ATOM#3   ATOM#4   #BINS

where the ATOM arguments are the atom numbers (starting from 1) defining the dihedral and #BINS is the number of bins to be used (so if #BINS=10 the width of each bin will be 36º). If an atom mask is specified, only protein backbone dihedrals (Phi and Psi defined using atom names C-N-CA-C and N-CA-C-N) within the mask will be used, with the bin sizes specified by the phibins and psibins keywords (default for each is 10 bins).

Output will either be written to STDOUT or the file specified by the out keyword. First, information about which dihedrals were clustered will be printed. Then the number of clusters will be printed, followed by detailed information of each cluster. The clusters are sorted from most populated to least populated. Each cluster line has format:

Cluster   CLUSTERNUM   CLUSTERPOP   [ dihedral1bin, dihedral2bin ... dihedralNbin ]

followed by a list of frame numbers that belong to that cluster. If a cutoff is specified by cut, only clusters with population greater than CUT will be printed. If specified by the clustervtime keyword, the number of clusters for each frame will be printed to <cvtfile>. If specified by the framefile keyword, a file containing cluster information for each frame will be written with format:

Frame     CLUSTERNUM      CLUSTERSIZE      DIHEDRALBINID

where DIHEDRALBINID is a number that identifies the unique combination of dihedral bins this cluster belongs to (specifically it is a 3*number-of-dihedral-characters long number composed of the individual dihedral bins). If specified by the clusterinfo keyword, a file containing information on each dihedral and each cluster will be printed. This file can be read by SANDER for use with REMD with a structure reservoir (-rremd=3). The file, which is essentially a simplified version of the main output file, has the following format:

#DIHEDRALS
dihedral1_atom1 dihedral1_atom2 dihedral1_atom3 dihedral1_atom4
...
#CLUSTERS
CLUSTERNUM1 CLUSTERSIZE1 DIHEDRALBINID1
...