Hydrogen bond analysis between a protein and a small molecule

You want to know the most populated hydrogen bonds between a small molecule and a protein.

We can use the hbond command that will use a geometric consideration (distance of 3.0 Å and an angle cutoff of 135°) to consider a hydrogen bond is formed.

For more information regarding the hbond command, follow the Introductory tutorial in the official AMBER website.

For this recipe  we are going to use a 1 microsecond simulation of the human papilloma virus E6 protein and the natural product kappa-carrageenan. A short trajectory of the system is presented in the following video:

We can see from the trajectory that the ligand (using spacefill representation) is interacting with the binding site of the protein throughout the simulation. We will use the topology file vph.prmtop and the trajectory file vph.nc. The system consists of a protein receptor with 151 amino acids and two zinc ions. The ligand residue id is 154. The trajectory consists of 5074 frames and has been pre-processed to remove the waters and re-oriented with the autoimage command.

parm vph.prmtop
trajin vph.nc 1 last 1
hbond contacts :1-154 avgout avg.dat series uuseries hbond.gnu nointramol 
lifetime contacts[solutehb] out contacts-lifetime.dat

The CPPTRAJ scripts reads the topology file and the trajectory file using every frame.  Then we call the hbond command that will have the name ‘contacts’. The command will look for Hbond donors/acceptors in the regions specified by the mask that includes residues one through 154.  Very important here is the nointramol keyword, which will make the hbond command to only look for hbonds in different molecules. If we forget the keyword, the command will look for hbonds between all the 154 residues. The avgout keyword will write solute-solute hydrogen bond averages to a file called avg.dat. The series keyword will save hydrogen bond formed (1) or not formed (0) per frame for any detected hydrogen bond. The uuseries keyword will write the raw hydrogen bond time series data to hbond.gnu, which can be visualized using gnuplot. The nointramol keyword will make sure that only hydrogen bonds between different molecules are calculated.

We have to run the command to actually look for hydrogen bonds between the protein and the ligand using the go keyword.  As an extra analysis, we will calculate the lifetime of those hydrogen bonds found.  The lifetime will use as input the previously calculated data from the hbond command and print the results to contacts-lifetime.dat

Running the CPPTRAJ script will generate the files: avg.dat, hbond.gnu, contacts-lifetime.dat and crv.contacts-lifetime.dat. The avg.dat file will print the average of each solute-solute hydrogen bond (sorted by population) formed over the course of the trajectory. Looking at the first 10 lines of the avg.dat, we have:

#Acceptor       DonorH        Donor       Frames   Frac     AvgDist   AvgAng 
CYS_51@O        LIG_154@HO2   LIG_154@O20     4536     0.8940   2.7367    163.5635
LIG_154@O19     TYR_32@HH     TYR_32@OH   3686     0.7264   2.7974    163.7224
LIG_154@O22     CYS_51@H      CYS_51@N    1367     0.2694   2.8457    162.6448
LIG_154@O24     CYS_51@H      CYS_51@N    1243     0.2450   2.8504    162.6914
LIG_154@O7      TYR_54@H      TYR_54@N    882      0.1738   2.9019    161.8689
LIG_154@O23     CYS_51@H      CYS_51@N    872      0.1719   2.8483    162.5888
TYR_54@O        LIG_154@HO    LIG_154@O1  370      0.0729   2.7324    162.1249
LIG_154@O10     ARG_55@HH11   ARG_55@NH1  369      0.0727   2.8210    155.3269
LIG_154@O14     ARG_129@HH22  ARG_129@NH2      351      0.0692   2.8229    153.6910

We can see from the fraction column that for 89% of the simulation time there is a hydrogen bond between the oxygen atom of the residue CYS51 and the hydrogen atom of atom #2 of the ligand. Following this there is a hydrogen bond present for 72% of the simulation between the O19 of the ligand and the OH group of TYR32. In addition, 26% and 24% of the time of the simulation, a hydrogen bond is populated between oxygen atoms 22 and 24 of the ligand and the same CYS51 residue. The rest of the hydrogen bonds corresponds to 17% of the simulation between the ligand’s O7 and O23 oxygen atoms and the protein residues TYR54 and CYS51. From the file we can see the the rest of the hydrogen bonds are very short lived and we can assume that do not provide significant stability of the ligand-protein interaction.

The lifetime analysis allows the study of the time-dependent behavior of the hydrogen bonds. The first 10 lines of the file contacts-lifetime.dat are:

#Set lifetime lifetime[max] lifetime[avg] lifetime[frames] lifetime[name]
1    5        3             1.4000        7                LIG_154@O4-GLN_6@NE2-HE21
2    1        1             1.0000        1                LIG_154@HC7-GLN_6@NE2-HE21
3    1        1             1.0000        1                LIG_154@HC9-GLN_6@NE2-HE21
4    3        1             1.0000        3                LIG_154@O4-GLN_6@NE2-HE22
5    4        1             1.0000        4                LIG_154@HC6-GLN_6@NE2-HE22
6    1        1             1.0000        1                LIG_154@HC7-GLN_6@NE2-HE22
7    36       2             1.0556        38               LIG_154@O4-ARG_8@NH2-HH21
8    4        1             1.0000        4                LIG_154@HC7-ARG_8@NH2-HH21
9    46       2             1.0217        47               LIG_154@O4-ARG_8@NH2-HH22

Using the bash command: sort -g contacts-lifetime.dat -k5 to sort the file using the 5th column (lifetime frames) returns:

38   266     17          3.2782         872    LIG_154@O23-CYS_51@N-H
      41   515     10          1.7126         882    LIG_154@O7-TYR_54@N-H
      39   329     24          3.7781         1243   LIG_154@O24-CYS_51@N-H
      37   383     24          3.5692         1367   LIG_154@O22-CYS_51@N-H
      25   760     35          4.8500         3686   LIG_154@O19-TYR_32@OH-HH
     244  276     105        16.4348         4536   CYS_51@O-LIG_154@O20-HO2

This information show that the longest consecutive number of frames containing the CYS_51@O-LIG_154@O20-HO2 hydrogen bond is 105 (in the lifetime[max] column, which still appears to be the longest). So the longest “lifetime” for this hydrogen bond is 105. It’s a subtle but important  difference, since it could occur to have a hydrogen bond that is formed for a long time but maybe only 1 or 2 times (low frequency), or a hydrogen bond that is formed for most of the simulation but maybe for only 1 or 2 frames at a time (high frequency).