Article Index

The Production Simulation

After the position restrained equilibration has finished it is time for the real simulation where we will collect data to analyze. We will disable position restraints, generate more output data, and use the velocities from the end of the last simulation (saved in the .gro file) instead of generating new ones. The contents of run.mdp is:

integrator  = md
nsteps      = 50000
dt          = 0.002
nstxout     = 10000
nstvout     = 10000
nstxtcout   = 100
nstlog      = 1000
nstenergy   = 100
energygrps  = protein sol
rlist       = 1.0
coulombtype = cut-off
rcoulomb    = 1.0
vdwtype     = cut-off
rvdw        = 1.0
tcoupl      = berendsen
tc-grps     = system
tau-t       = 0.1
ref-t       = 298
constraints = h-bonds

The total simulation time is 100 ps (50,000 steps), which is still very short but enough for a tutorial. Every 10,000 steps we save full precision coordinates and velocities to the trajectory run.trr so we can restart the simulation if something goes wrong, and every 100 steps we will write compressed coordinates to run.xtc (10 times smaller than general trajectories). Energies and similar data like pressure is written every 100 steps to run.edr, and energy terms are written separately for all combinations between protein and solvent groups. Generate the run input file:

grompp -f run.mdp -p topol.top -c pr.gro -o run.tpr -np 4

This simulation will take a while, so let's create a script so it can be submitted as a batch job. The PBS/Torque options can be embedded as a comment in the script, and since most clusters have SMP nodes we ask for 2 nodes and 2 processors per node:

#PBS -l nodes=2:ppn=2
lamboot -v $PBS_NODEFILE
cd /your/project/directory
mpirun -np 4 /path/to/mdrun -deffnm run
wipe

Make the scriptfile executable, and submit the job as qsub scriptfile. Go and have a cup of coffee while you swear over other users competing for resources, and when you are back the output files should be ready in your project directory. All Gromacs binary files are portable, so if you want to you can copy the results and do all post-run analysis on your workstation if Gromacs is installed there.

Analyzing The Energy File

The Gromacs energy files are a bit special in that they store partial sums of energy terms. If you used nstenergy=100 the instantaneous values are written every 100 steps, but the averages and drift statistics we extract from the file are still exact and based on every single step in the simulation. This even works if you are analyzing parts of a simulation, for instance the interval 5.0-10.0 ps in our production run. Use the Gromacs program g_energy to extract data from energy files:

g_energy -f run.edr

You will be presented with a table of different energy terms, pressures, temperature, etc. Select a couple of terms you are interested in and finish with 0. g_energy will write the average, fluctuation and systematic drift to the screen, and generate the plotfile energy.xvg. Remember that we talked about the Grace program before? If you installed it you can simply type

xmgrace -nxy energy.xvg

and you will get a finished graph with captions and legends. Figure Two shows the total potential energy, all we did in Grace was to make the graph a little wider. You can even use the command line options to xmgrace and produce postscript or PDF plots automatically!

al Lysozyme and water system, generated directly by g_energy
Figure Two: Total potential energy for the tutorial Lysozyme and water system, generated directly by g_energy

Analyzing Trajectories

What else can we do? Well, a common property to be interested in is the RMSD variation of atoms with respect to the initial structure. Let's generate it from the trajectory:

g_rms -s em.tpr -f run.xtc

The reference structure is taken from the tpr file, so we use the energy minimization run input file that started from the PDB structure. You will first be asked which group should be used for the fitting (select C-alpha), then how many groups you want to calculate RMSD for, and which these groups should be. The result will be written to rmsd.xvg, and just as for the energy file plot you can directly fire it up in xmgrace. For Figure Three we selected the three groups C-alpha, the whole protein (except for hydrogens), and only sidechains (except for hydrogens).

Root-mean-square displacement of alpha carbons, the whole protein, and sidechains for the Lysozyme production run. The system was fitted to minimize C-alpha RMSD, and hydrogens were not used in the calculation. Plot generated by g_rmsd
Figure Three: Root-mean-square displacement of alpha carbons, the whole protein, and sidechains for the Lysozyme production run. The system was fitted to minimize C-alpha RMSD, and hydrogens were not used in the calculation. Plot generated by g_rmsd

Another useful measure is the number of hydrogen bonds, both inside the protein and between protein and water atoms. Of course, there is a Gromacs program for this too:

g_hbond -s run.tpr -f run.xtc
 

Select “protein” for one group and "SOLA" for the other. Rename the output file hbnum.xvg, and then run the program again, but this time with "€œprotein" for both groups. If you open one file in Grace and then import the other one as an additional set you will get something like Figure Four. That wasn't too hard, was it? Don't forget that most Gromacs programs have a huge list of options to alter their behavior, and if the predefined groups don't match your needs you can easily create a custom index file with the program make_ndx!

Number of hydrogen bonds between protein atoms and between protein and water atoms for Lysozyme. This plot was created by reading two g_hbond results
Figure Four: Number of hydrogen bonds between protein atoms and between protein and water atoms for Lysozyme. This plot was created by reading two g_hbond results

Exporting the Trajectory for Pymol

Pymol is a great visualization program, but we need to massage the trajectory a bit before we can create really nice movies. First we will apply a lowpass filter to the trajectory to remove nice and make the atoms move more smoothly. As usual, Gromacs has a special program to do this:

g_filter -s run.tpr -f run.xtc -ol lowpass.xtc -nx 10 -all

This time we won't tell you what all the options do, find out yourself instead by executing g_filter -h! Pymol cannot read Gromacs xtc trajectories (yet), and we also want to remove all the water in the trajectory to concentrate on the proteins. This is easy to fix by using another Gromacs program to convert the trajectory to PDB format and only select one group:

trjconv -s run.tpr -f lowpass.xtc -o lzmtraj.pdb

Select “protein” when asked for an output group, and visualize the resulting trajectory with the command pymol lzmtraj.pdb. You can push the VCR-style buttons in Pymol to play back the trajectory, and if you really want to impress the audience at a talk you should choose to raytrace/render all frames from the movie menu. If you are using a Macintosh, Pymol can save the movie directly in quicktime format. Otherwise you will need to export a series of images and use an external program to create a movie. Figure Five shows the last frame of a movie we made of the tutorial simulation!

Raytraced Lysozyme structure from the tutorial, created with Pymol
Figure Five: Raytraced Lysozyme structure from the tutorial, created with Pymol

Other Programs

Gromacs is a pretty big package with more than 70 analysis programs, so we've really only scratched the surface here. Make sure you check out the options to the existing programs before you start writing your own code from the included template source. Some of the tools we find most useful are:
  • g_gyrate to calculate radii of gyration, both average and around the x/y/z axes.
  • g_bond and g_angle to calculate values and averages of arbitrary distances and angles over time in a trajectory.
  • trjorder to reorder the atoms in an arbitrary group in a trajectory with respect to the distance to another arbitrary group. This way you can create a trajectory with only a first hydration shell of water, or the water close to an active site.
  • g_rama to calculate values of phi/psi backbone dihedrals as a function of time.
  • g_cluster to cluster structures in a trajectory with several different algorithms, and to calculate the distances between these clusters.
  • do_dssp is a wrapper around the dssp program to create secondary structure plots as a function of time for proteins.
  • g_msd calculates mean square displacements and does automatic fitting to calculate diffusion coefficients for different groups.
  • g_sas to compute hydrophobic, hydrophilic and total solvent accessible surface area.
  • g_order if you need to determine order parameters, in particular for lipid bilayers.
  • g_velacc for autocorrelation functions of atoms or molecules, both velocity and momentum.

You have no rights to post comments

Search

Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.

Feedburner


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.