Comparing the structural difference between a series of structures.

You want to compare  the structural difference caused by the binding of 6 different ligands to the same protein receptor.

First we need to create a reference structure that has no ligand that will serve to compare the differences, we can create an average structure using the average command. A brief example to create the reference structure is:

parm nowater.topo
trajin 1/nowater.nc 1 last 1
trajin 2/nowater.nc 1 last 1
strip :WAT
strip :Na+,Cl-
rms fit :1-156
 average reference.pdb pdb :1-156
go

The script will read the topology file ‘nowater.topo’ and read two trajectory files named ‘nowater.nc’ starting from frame 1, ending in the last frame and will read all the files (offset is 1 frame). It will strip all residues with the name ‘WAT, Na+, Cl-‘.  Next we use the rms command with the fit keyword to remove global rotational/translational motion, since we do not want any of that in our average structure. Then we call the average command which will generate a file called ‘reference.pdb’ and will use the residues 1 to 156.

With our reference structure, we then load each of the structure we want to compare with it.  In this example, we have six PDB files that contain the same protein as the reference but they also have six different ligands in each file.  Each file was extracted as the last frame of a molecular dynamics simulations and represents a snapshot that we can use to compare how much the ligand influences or changes the protein structure.  The CPPTRAJ code is:

parm reference.pdb [average]
reference reference.pdb parm [average]
parm protein-ligand1.pdb [ligand1]
trajin protein-ligand1.pdb parm [ligand1]
parm protein-ligand2.pdb [ligand2]
trajin protein-ligand2.pdb parm [ligand2]
parm protein-ligand3.pdb [ligand3]
trajin protein-ligand3.pdb parm [ligand3]
parm protein-ligand4.pdb [ligand4]
trajin protein-ligand4.pdb parm [ligand4]
parm protein-ligand5.pdb [ligand5]
trajin protein-ligand5.pdb parm [ligand5]
parm protein-ligand6.pdb [ligand6]
trajin protein-ligand6.pdb parm [ligand6]
rmsd All-atoms :1-156 reference :1-156 out rms.dat
rmsd Backbone :1-156@C,N,O,CA,CB reference :1-156@C,N,O,CA,CB out rms.dat
go

In this case we only have PDB files.  When we load them in CPPTRAJ we have to load a topology file and name that topology  using the [ ] syntax.  Then we load the same file again but now using trajin so that CPPTRAJ reads the atomic coordinates. For each trajin command we use, we need to assign its proper topology file with the use of 'parm [name]‘, so CPPTRAJ know what topology to use with each trajin call. Notice that the second line assigns the reference to the reference.pdb.

At this point we can use the commands: list parm and list trajin to see currently loaded topologies and trajectories respectively, and review for any possible errors.

The rmsd command will run twice, the first one:

rmsd All-atoms :1-156 reference :1-156 out rms.dat

This line calls the rmsd command and will assign it the name ‘All-atoms’. It will read residues 1 through 156 and will compare the root mean square deviation using the reference atoms 1 to 156. The results will be written to rms.dat

The second rmsd command:

rmsd Backbone :1-156@C,N,O,CA,CB reference :1-156@C,N,O,CA,CB out rms.dat

Calls the rmsd command and assigns the name ‘Bachbone’, then it will read the coordinates of the atoms C, N, O, CA and CB of residues 1 to 156. Those coordinates will be compared to the same atoms of the reference and the results will be written in the same file, with the name rms.dat. It is very important that the number of atoms and atom ordering in the target and reference structures match exactly in order for RMSD values to be meaningful. CPPTRAJ will check the number of atoms, but cannot currently detect bad atom ordering.

The file rmsd.dat is:

#Frame   All-atoms   Backbone
1        8.0020      7.2520
2        6.1288      5.4200
3        7.4288      6.7713
4        9.6327      9.0838
5        7.0681      6.3620
6        4.4302      3.9824

Notice that the file has 3 columns. The first states the frame number, which in our case represents the 6 input files that we loaded and the second and third column represents the rmsd calculation. The names of the column follow the name to the dataset that we declared (‘All-atoms’ and ‘Backbone’). We can see from these results that using only the backbone atoms to compare structures generates lower results since we are not including any sidechains from the amino acids of the protein. Also, we can see that frame #4, which represents our ligand4.pdb structure displays the highest rms deviation with respect to our loaded reference.

All data is presented in angstroms. The files to reproduce this tutorial are available here as a compressed file.