Comparing the structural difference between a series of structures.
You want to compare the structural difference caused by the binding of 6 different ligands to the same protein receptor.
First we need to create a reference structure that has no ligand that will serve to compare the differences, we can create an average structure using the average command. A brief example to create the reference structure is:
parm nowater.topo trajin 1/nowater.nc 1 last 1 trajin 2/nowater.nc 1 last 1 strip :WAT strip :Na+,Cl- rms fit :1-156 average reference.pdb pdb :1-156 go
The script will read the topology file ‘nowater.topo’ and read two trajectory files named ‘nowater.nc’ starting from frame 1, ending in the last frame and will read all the files (offset is 1 frame). It will strip all residues with the name ‘WAT, Na+, Cl-‘. Next we use the rms command with the fit keyword to remove global rotational/translational motion, since we do not want any of that in our average structure. Then we call the average command which will generate a file called ‘reference.pdb’ and will use the residues 1 to 156.
With our reference structure, we then load each of the structure we want to compare with it. In this example, we have six PDB files that contain the same protein as the reference but they also have six different ligands in each file. Each file was extracted as the last frame of a molecular dynamics simulations and represents a snapshot that we can use to compare how much the ligand influences or changes the protein structure. The CPPTRAJ code is:
parm reference.pdb [average] reference reference.pdb parm [average] parm protein-ligand1.pdb [ligand1] trajin protein-ligand1.pdb parm [ligand1] parm protein-ligand2.pdb [ligand2] trajin protein-ligand2.pdb parm [ligand2] parm protein-ligand3.pdb [ligand3] trajin protein-ligand3.pdb parm [ligand3] parm protein-ligand4.pdb [ligand4] trajin protein-ligand4.pdb parm [ligand4] parm protein-ligand5.pdb [ligand5] trajin protein-ligand5.pdb parm [ligand5] parm protein-ligand6.pdb [ligand6] trajin protein-ligand6.pdb parm [ligand6] rmsd All-atoms :1-156 reference :1-156 out rms.dat rmsd Backbone :1-156@C,N,O,CA,CB reference :1-156@C,N,O,CA,CB out rms.dat go
In this case we only have PDB files. When we load them in CPPTRAJ we have to load a topology file and name that topology using the [ ] syntax. Then we load the same file again but now using trajin
so that CPPTRAJ reads the atomic coordinates. For each trajin
command we use, we need to assign its proper topology file with the use of 'parm [name]
‘, so CPPTRAJ know what topology to use with each trajin
call. Notice that the second line assigns the reference
to the reference.pdb.
At this point we can use the commands: list parm
and list trajin
to see currently loaded topologies and trajectories respectively, and review for any possible errors.
The rmsd
command will run twice, the first one:
rmsd All-atoms :1-156 reference :1-156 out rms.dat
This line calls the rmsd
command and will assign it the name ‘All-atoms’. It will read residues 1 through 156 and will compare the root mean square deviation using the reference atoms 1 to 156. The results will be written to rms.dat
The second rmsd command:
rmsd Backbone :1-156@C,N,O,CA,CB reference :1-156@C,N,O,CA,CB out rms.dat
Calls the rmsd
command and assigns the name ‘Bachbone’, then it will read the coordinates of the atoms C, N, O, CA and CB of residues 1 to 156. Those coordinates will be compared to the same atoms of the reference
and the results will be written in the same file, with the name rms.dat. It is very important that the number of atoms and atom ordering in the target and reference structures match exactly in order for RMSD values to be meaningful. CPPTRAJ will check the number of atoms, but cannot currently detect bad atom ordering.
The file rmsd.dat is:
#Frame All-atoms Backbone 1 8.0020 7.2520 2 6.1288 5.4200 3 7.4288 6.7713 4 9.6327 9.0838 5 7.0681 6.3620 6 4.4302 3.9824
Notice that the file has 3 columns. The first states the frame number, which in our case represents the 6 input files that we loaded and the second and third column represents the rmsd calculation. The names of the column follow the name to the dataset that we declared (‘All-atoms’ and ‘Backbone’). We can see from these results that using only the backbone atoms to compare structures generates lower results since we are not including any sidechains from the amino acids of the protein. Also, we can see that frame #4, which represents our ligand4.pdb structure displays the highest rms deviation with respect to our loaded reference.
All data is presented in angstroms. The files to reproduce this tutorial are available here as a compressed file.