Tutorial 5: Molecular Dynamics and Water Models - Pair Correlation Functions in TIP3P and TIP4P Water
Preface:
This example simulates one of the most common test systems imaginable, viz., a box of water in the canonical (NVT) ensemble. We will test two different water models: the ubiquitous three-site TIP3P model and the slightly less common four-site TIP4P model both developed by the Jorgensen group (references).Water is a dense (about 1g·cm-3), polar liquid characterized by high cohesion, i.e., very strong intermolecular forces through hydrogen bonds. It has many unique properties experimentally that are not touched upon here. So far, all the tutorials used Monte Carlo simulation techniques. While in the early days of simulation work it was rather common to simulate dense simple liquids using Monte Carlo, it is important to understand that fundamentally this is close to a worst-case scenario in terms of applications for MC due to the complete coupling of all degrees of freedom in the system. For more details on this, consult this recent reference. Conversely, correlation between degrees of freedom is most naturally introduced by gradient-based methods, such as molecular dynamics. Therefore, we will use this tutorial as an introduction to the use of dynamics methods in CAMPARI. Given the simplicity of the system, you should feel free to run this as a Monte Carlo simulation as well.
The structure of liquids is most commonly analyzed by site-site pair correlation functions. Here, we will introduce you to CAMPARI's analysis facility and show how to generate data that is directly comparable to published reference data for these water models.
Step-by-Step:
Step 1 - Set size and determine number of water molecules to be usedFirst, create two empty directories (one for each water model). Let us assume you start with TIP4P. Here, we choose a 32.0x32.0x32.0Å periodic, cubic box:
FMCSC_BOUNDARY 1
FMCSC_SHAPE 1
FMCSC_SIZE 32.0 32.0 32.0
FMCSC_BASENAME TIP4P
FMCSC_SEQFILE <full path to working directory for TIP4P>/T4P.seq
You should be able to compute the number of water molecules yourself with the knowledge that the in silico density at room temperature is extremely close to 1g·cm-3 for both TIP3P and TIP4P. However, remember that in case the density is only approximately known, it is always safer to i) use overall larger boxes, and ii) err in the setup of an NVT simulation on the lower density side (take a few molecules less). This reduces the impact of pressure artifacts. Now create two sequence files in the respective directories with the corresponding number of entries for either "T3P" or "T4P" and both terminated by "END". Call them T3P.seq and T4P.seq, respectively.
Step 2 - Define interaction model and cutoff scheme
The original TIP3P and TIP4P models are available - for instance - through the OPLS-AA/L parameter file. Therefore, we can copy and paste straight from the documentation:
PARAMETERS <full path to CAMPARI directory>/params/oplsaal.prm
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
Note that some keywords were dropped that are not directly applicable to the present example. One tricky aspect about the use of periodic boundary conditions is that the interaction model has to be truncated at a specific distance (less than half the shortest box length, i.e. 16Å in our case). This is commonly achieved by a cutoff scheme. The reason lies in the typically used minimum-image convention, i.e. the convention that each particle interacts only with a single copy of another particle present in the actual simulation cell, but that its copy is chosen as the nearest image. If the interaction model were truncated at a longer distance, it would become necessary to compute interactions with multiple images so as to avoid completely artificial cutoff asymmetries from occurring:
FMCSC_N2LOOP 0 # avoid evaluations of full N^2 sums for energies
FMCSC_CUTOFFMODE 3
FMCSC_NBCUTOFF 12.0
FMCSC_ELCUTOFF 12.0
FMCSC_LREL_MD 3
FMCSC_NBL_UP 5
FMCSC_GRIDDIM 8 8 8
FMCSC_GRIDMAXGPNB 512
The above instructs CAMPARI to use grid-based cutoffs. During dynamics, all non-bonded interactions will be truncated at 12.0Å with electrostatic contributions being "switched" to be exactly zero at that distance via the reaction-field method. The implied dielectric of 78.2 (default) is not enough to avoid force discontinuities at the cutoff distance, however. There is no twin-range (distance regime for interactions to be computed less frequently) because this technique is not compatible with the reaction-field method. Grid- or cell-based methods underlie all common and efficient ways of maintaining neighbor lists during molecular simulations in dense fluids, and these are what makes cutoffs work efficiently. One of the results examined in this tutorial depends on the choice for the overall cutoff.
Step 3 - Set OpenMP options
While small, the system we use can benefit from CAMPARI's OpenMP-based threads parallelization a lot. You may have used the threads parallelization in earlier tutorials for MC simulations. For gradient-based calculations, dynamic load balancing (DLB) becomes important as it applies to most aspects of the major CPU time consumer, viz., the force calculation.
FMCSC_NRTHREADS 4 # if you can allow it, set it to the number of physical or even logical cores in your system (hyperthreading can be beneficial)
FMCSC_THREADS_VERBOSE 1 # level of diagnostic output
FMCSC_THREADS_DLB_FREQ 500 # restarts DLB segments every X steps
FMCSC_THREADS_DLB_STOP 300 # stops DLB for each segment after X steps
FMCSC_THREADS_DLB_EXT 1 # accumulate load data over X steps for DLB adjustments
As mentioned in the comment above, you should set the number of threads to a value appropriate for your system. Modern many-core architectures can provide tens of cores in a single shared-memory environment. The settings for dynamic load balancing are chosen such that it operates pretty much continuously. This is because the loads experienced by individual threads often vary through factors that can be controlled poorly, e.g., cache use, hardware mapping and scheduling, or the presence of competing processes.
Step 4 - Prepare starting structure
The numerical integration of equations of motion relies on the use of forces that can be extremely large for conformations with steric overlap. In MC simulations, we do not use forces, and can start the algorithm from arbitrary starting conformations. In dynamics, such a protocol would lead to "explosion" of the simulation (integration becomes unstable until eventually a critical numerical overflow error crashes the program). Therefore, we will first run MC for a while to obtain a pdb file with a sufficiently equilibrated water box:
FMCSC_NRSTEPS 200000
FMCSC_EQUIL 500000
FMCSC_TEMP 300.0
FMCSC_RANDOMIZE 1
FMCSC_RANDOMATTS 5000
FMCSC_RANDOMTHRESH 3.0 # the threshold is a mean residue-residue value
FMCSC_RIGIDFREQ 1.0
FMCSC_RIGIDRDFREQ 0.1
FMCSC_TRANSSTEPSZ 1.0
FMCSC_ROTSTEPSZ 10.0
FMCSC_POLOUT 100000000
FMCSC_ACCOUT 100000000
FMCSC_ENOUT 1000
You should be familiar with the move set instructions and general simulation specifications by now. We also started adding some keywords that will prevent output not interesting to us. After pasting all the above keywords into a file called "water.key", execute:
campari_threads -k water.key >& log
This will only take a few minutes. After the run is complete, have a look at the final structure snapshot written out by CAMPARI as "TIP4P_END.pdb" (or "TIP3P_END.pdb", respectively). It should look like a box of water with homogeneous density and no discernible steric overlaps between molecules. Rename this file to "TIP4P_MC.pdb" or "TIP3P_MC.pdb", respectively. It is important that energetically this file is relaxed enough to be used as an input for a gradient-based simulation. If not (consult log-output), repeat or extend the above MC calculation.
Step 5 - Run the molecular dynamics simulation
In molecular dynamics, we have to consider additional implementation issues. In order for the integrated equations of motion to yield an ensemble other than the microcanonical one (NVE), we need to introduce a coupling device to an external bath. These coupling devices and "baths" are almost always utterly conceptual and may appear confusing at first. Append the existing "water.key" as follows:
FMCSC_ENSEMBLE 1
FMCSC_DYNAMICS 2
FMCSC_TIMESTEP 0.004
FMCSC_TSTAT 4
FMCSC_TSTAT_TAU 1.0
Here, we specify explicitly the ensemble to be the canonical one, request Newtonian (rather than stochastic) dynamics, set a time step of 4fs, and instruct CAMPARI to couple the system to an external bath by means of a stochastic velocity rescaling thermostat with a coupling time of 1ps. Note that it is, by default, implied that the dynamics are performed in rigid-body space (→ FMCSC_CARTINT). This is important because the use of rigid body degrees of freedom as opposed to Cartesian degrees of freedom means that we do not have to deal with keeping the water molecules rigid through other means. The same tutorial could of course be run in Cartesian space by using appropriate constraints and analytical or iterative solvers (see FMCSC_SHAKEMETHOD for an overview). The use of constraints is of critical importance since these water models were parameterized as rigid models using primarily MC simulations in the early phases of development. Details of the rigid-body integrator are provided elsewhere and a complete explanation is given in the reference publication that uses liquid TIP4P as one of its validation examples. To improve stability for the challenging case of rigid rotation of water (very small inertia), set the following:
FMCSC_TMD_INTEGRATOR 1
FMCSC_TMD_INT2UP 4
With these settings, the time step could be pushed up to about 5fs for this system without major losses in accuracy. In general, the effective degrees of freedom with very small inertia can be automatically scaled with the help of keyword FMCSC_FUDGE_DYN. This is an advantage of the diagonal formalism used for the mass matrix as explained in the paper. The expected consequence of such scaling operations is simply that the rotational (or torsional) autocorrelation time will increase. However, this is not necessarily a large or even measurable effect if these motions are limited by the potential energy (through interactions or bonded potentials), which usually dominate. Here, we choose to keep the true inertial masses so that any dynamical properties you might measure can be compared directly to literature data.
That said, for analysis, the tutorial focuses on static (rather than dynamic) properties alone, in particular the mean solution structure, and these properties do not depend on scale factors applied to masses. To proceed, we instruct CAMPARI to use the MC-generated structure as input, and to perform only the necessary analyses on-the-fly:
FMCSC_PDBFILE TIP4P_MC.pdb
FMCSC_DISABLE_ANALYSIS 1
FMCSC_PCCALC 500
FMCSC_PCCODEFILE <full path to CAMPARI directory>/examples/tutorial5/tutorial5.t4p
FMCSC_ENSOUT 100
FMCSC_CHECKFREQ 100
FMCSC_FLUSHTIME 1.0 # in minutes
FMCSC_DISABLE_ANALYSIS is a meta-keyword that is a shortcut for explicitly disabling all running output and/or all on-the-fly analyses (here: both). Because of CAMPARI's logic in keeping only the last value for each keyword, the subsequent use of specific analysis flags (e.g., FMCSC_PCCALC) overrides this choice specifically for the analysis in question. Thus, the above settings mean that only ensemble output (FMCSC_ENSOUT) and pair correlation analyses (FMCSC_PCCALC) are performed. The latter use an input file that has different options for formatting. Here, we take advantage of the matrix input format (that is perfect for this type of application) to instruct CAMPARI to collect statistics on three possible intermolecular site-site correlation functions (O-O, O-H, H-H). Make sure to change the path for FMCSC_PCCODEFILE; a missing input file will simply disable the analysis.
Lastly, we of course have to adjust the simulation length parameters, so change the following keywords that are already present in "water.key":
FMCSC_NRSTEPS 1250000
FMCSC_EQUIL 250000
We will simulate for a total of 5ns with the first 1ns being discarded as equilibration (i.e., not entering for instance the pair correlation analysis). Now type:
campari_threads -k water.key >& log2 &
The runtime will depend on the number of threads available (check output file THREADS.log for this). If you want to keep it short, lower both parameters (FMCSC_NRSTEPS and FMCSC_EQUIL) by one or even two orders of magnitude. This should still give sufficient data to obtain pair correlation functions (but not most other properties) at sufficient quality.
Step 6 - Analyze the results
After the simulation has successfully completed, first plot the total system potential energy (column 2 in ENSEMBLE.dat) as a function of step number (column 1 in ENSEMBLE.dat). The energy should be stable (reach a plateau) and, after this, there should be symmetric fluctuations around the mean, i.e., there should be no extreme spikes, no sudden drops, etc. Any such behavior would be indicative of a problem with the simulation itself (bad settings). In general, an energy trace is of course a poor way of analyzing a simulation, but for this homogeneous system it serves well as a first sanity check. An additional such check would be to plot the instantaneous system temperature by plotting column 5 of ENSEMBLE.dat. System temperature will almost always be a direct or indirect reporter on erroneous simulations (you can also find the average system temperature over the production portion of the run in the log-file ["log2" above]).
From the potential energy plot, extract the average system potential energy and divide it by the number of molecules you specified in the sequence file. The resultant value, the mean potential energy per particle, is a typical hallmark quantity for water models and should under ambient conditions be close to -10kcal/mol. Note, however, that as hinted at above the precise value will be dependent on the cutoff settings and the cutoff implementation and that values listed in the literature are typically obtained with a particular scheme that is most likely not identical to the one employed here. In addition, this is an NVT simulation, for which the actual density might deviate slightly from the "correct" density for the water model in question. Still, drastic deviations will indicate a fundamental flaw in the program or parameters.
To analyze solution structure, plot the obtained site-site correlation functions (O-O, O-H, and H-H) and compare them to published values (see references). They can be found in the output file GENERAL_PC.dat as columns 2-4 (with the first column indicating distance). Note that these pair correlation functions (gOO(r) and so on) have been obtained by normalizing raw distance distributions by an analytical prior (maximum entropy distribution). This so-called volume element can normally be found in RBC_PC.dat, and is non-trivial for distances exceeding half the smallest box length. Here, we did not obtain these data since the output to RBC_PC.dat was completely suppressed given that all water molecules were regarded as solvent from the default assignment of the analysis group parsing. This could easily be remedied of course.
When comparing to the literature, all prominent short-range features should match the published data accurately (since these are fairly robust). The most variation can be expected for the peak heights of sharp peaks (due to variations in the binning). After you have run both simulations by making the appropriate modifications above, TIP3P and TIP4P must be easily distinguishable in gOO(r) by the much more pronounced long-range features in TIP4P compared to the very flat curve observed for TIP3P.
Note, however, that two artifacts will be introduced that become visible when plotting the pair correlation functions out to larger distances. The first will show up mostly in gOO(r) and is a cutoff artifact generated by the reaction-field method (i.e., it is physically present). You will note very subtle undulations precisely near the 12.0Å reaction-field cutoff used in this tutorial. The second are purely numerical artifacts due to the normalization. As discussed elsewhere, the analytical volume element assumes that all sites are completely independent of one another. However, for polyatomic molecules, image shift vectors are determined not on a per-atom basis, but on a per-residue basis. This leads to artifacts at distances corresponding to geometrical characteristics of the simulation cell (for a cubic box, at half the side length and at half the lengths of the two possible diagonals, which includes the distance of maximal separation). These latter artifacts could be partially eliminated by the use of a numerical prior specific to each pair. They show up here for gOH(r) and gHH(r) because the hydrogen is about 1.0Å away from the site used for computing the image shift vectors.