Hydrogen bond analysis between a protein and a small molecule
You want to know the most populated hydrogen bonds between a small molecule and a protein.
We can use the hbond command that will use a geometric consideration (distance of 3.0 Å and an angle cutoff of 135°) to consider a hydrogen bond is formed.
For more information regarding the hbond command, follow the Introductory tutorial in the official AMBER website.
For this recipe we are going to use a 1 microsecond simulation of the human papilloma virus E6 protein and the natural product kappa-carrageenan. A short trajectory of the system is presented in the following video:
We can see from the trajectory that the ligand (using spacefill representation) is interacting with the binding site of the protein throughout the simulation. We will use the topology file vph.prmtop and the trajectory file vph.nc. The system consists of a protein receptor with 151 amino acids and two zinc ions. The ligand residue id is 154. The trajectory consists of 5074 frames and has been pre-processed to remove the waters and re-oriented with the autoimage command.
parm vph.prmtop trajin vph.nc 1 last 1 hbond contacts :1-154 avgout avg.dat series uuseries hbond.gnu nointramol go lifetime contacts[solutehb] out contacts-lifetime.dat go
The CPPTRAJ scripts reads the topology file and the trajectory file using every frame. Then we call the hbond command that will have the name ‘contacts’. The command will look for Hbond donors/acceptors in the regions specified by the mask that includes residues one through 154. Very important here is the
nointramol keyword, which will make the hbond command to only look for hbonds in different molecules. If we forget the keyword, the command will look for hbonds between all the 154 residues. The avgout keyword will write solute-solute hydrogen bond averages to a file called avg.dat. The series keyword will save hydrogen bond formed (1) or not formed (0) per frame for any detected hydrogen bond. The uuseries keyword will write the raw hydrogen bond time series data to hbond.gnu, which can be visualized using gnuplot. The nointramol keyword will make sure that only hydrogen bonds between different molecules are calculated.
We have to run the command to actually look for hydrogen bonds between the protein and the ligand using the go keyword. As an extra analysis, we will calculate the lifetime of those hydrogen bonds found. The lifetime will use as input the previously calculated data from the hbond command and print the results to contacts-lifetime.dat
Running the CPPTRAJ script will generate the files: avg.dat, hbond.gnu, contacts-lifetime.dat and crv.contacts-lifetime.dat. The avg.dat file will print the average of each solute-solute hydrogen bond (sorted by population) formed over the course of the trajectory. Looking at the first 10 lines of the avg.dat, we have:
#Acceptor DonorH Donor Frames Frac AvgDist AvgAng CYS_51@O LIG_154@HO2 LIG_154@O20 4536 0.8940 2.7367 163.5635 LIG_154@O19 TYR_32@HH TYR_32@OH 3686 0.7264 2.7974 163.7224 LIG_154@O22 CYS_51@H CYS_51@N 1367 0.2694 2.8457 162.6448 LIG_154@O24 CYS_51@H CYS_51@N 1243 0.2450 2.8504 162.6914 LIG_154@O7 TYR_54@H TYR_54@N 882 0.1738 2.9019 161.8689 LIG_154@O23 CYS_51@H CYS_51@N 872 0.1719 2.8483 162.5888 TYR_54@O LIG_154@HO LIG_154@O1 370 0.0729 2.7324 162.1249 LIG_154@O10 ARG_55@HH11 ARG_55@NH1 369 0.0727 2.8210 155.3269 LIG_154@O14 ARG_129@HH22 ARG_129@NH2 351 0.0692 2.8229 153.6910
We can see from the fraction column that for 89% of the simulation time there is a hydrogen bond between the oxygen atom of the residue CYS51 and the hydrogen atom of atom #2 of the ligand. Following this there is a hydrogen bond present for 72% of the simulation between the O19 of the ligand and the OH group of TYR32. In addition, 26% and 24% of the time of the simulation, a hydrogen bond is populated between oxygen atoms 22 and 24 of the ligand and the same CYS51 residue. The rest of the hydrogen bonds corresponds to 17% of the simulation between the ligand’s O7 and O23 oxygen atoms and the protein residues TYR54 and CYS51. From the file we can see the the rest of the hydrogen bonds are very short lived and we can assume that do not provide significant stability of the ligand-protein interaction.
The lifetime analysis allows the study of the time-dependent behavior of the hydrogen bonds. The first 10 lines of the file contacts-lifetime.dat are:
#Set lifetime lifetime[max] lifetime[avg] lifetime[frames] lifetime[name] 1 5 3 1.4000 7 LIG_154@O4-GLN_6@NE2-HE21 2 1 1 1.0000 1 LIG_154@HC7-GLN_6@NE2-HE21 3 1 1 1.0000 1 LIG_154@HC9-GLN_6@NE2-HE21 4 3 1 1.0000 3 LIG_154@O4-GLN_6@NE2-HE22 5 4 1 1.0000 4 LIG_154@HC6-GLN_6@NE2-HE22 6 1 1 1.0000 1 LIG_154@HC7-GLN_6@NE2-HE22 7 36 2 1.0556 38 LIG_154@O4-ARG_8@NH2-HH21 8 4 1 1.0000 4 LIG_154@HC7-ARG_8@NH2-HH21 9 46 2 1.0217 47 LIG_154@O4-ARG_8@NH2-HH22
Using the bash command:
sort -g contacts-lifetime.dat -k5 to sort the file using the 5th column (lifetime frames) returns:
38 266 17 3.2782 872 LIG_154@O23-CYS_51@N-H 41 515 10 1.7126 882 LIG_154@O7-TYR_54@N-H 39 329 24 3.7781 1243 LIG_154@O24-CYS_51@N-H 37 383 24 3.5692 1367 LIG_154@O22-CYS_51@N-H 25 760 35 4.8500 3686 LIG_154@O19-TYR_32@OH-HH 244 276 105 16.4348 4536 CYS_51@O-LIG_154@O20-HO2
This information show that the longest consecutive number of frames containing the CYS_51@O-LIG_154@O20-HO2 hydrogen bond is 105 (in the lifetime[max] column, which still appears to be the longest). So the longest “lifetime” for this hydrogen bond is 105. It’s a subtle but important difference, since it could occur to have a hydrogen bond that is formed for a long time but maybe only 1 or 2 times (low frequency), or a hydrogen bond that is formed for most of the simulation but maybe for only 1 or 2 frames at a time (high frequency).