Tutorial 5: Molecular Dynamics and Water Models - Pair Correlation Functions in TIP3P and TIP4P Water


This example simulates one of the most common test systems imaginable, viz., a box of water in the canonical (NVT) ensemble. We will test two different water models: the ubiquitous three-site TIP3P model and the slightly less common four-site TIP4P model both developed by the Jorgensen group (references).
Water is a dense (about 1g·cm-3), polar liquid characterized by high cohesion, i.e., very strong intermolecular forces through hydrogen bonds. It has many unique properties experimentally that are not touched upon here. So far, all the tutorials used Monte Carlo simulation techniques. While in the early days of simulation work it was rather common to simulate dense simple liquids using Monte Carlo, it is important to understand that fundamentally this is close to a worst-case scenario in terms of applications for MC due to the complete coupling of all degrees of freedom in the system. For more details on this, consult this recent reference. Conversely, correlation between degrees of freedom is most naturally introduced by gradient-based methods, such as molecular dynamics. Therefore, we will use this tutorial as an introduction to the use of dynamics methods in CAMPARI. Given the simplicity of the system, you should feel free to run this as a Monte Carlo simulation as well.
The structure of liquids is most commonly analyzed by site-site pair correlation functions. Here, we will introduce you to CAMPARI's analysis facility and show how to generate data that is directly comparable to published reference data for these water models.


Step 1 - Set size and determine number of water molecules to be used

First, create two empty directories (one for each water model). Let us assume you start with TIP4P. Here, we choose a 32.0x32.0x32.0Å periodic, cubic box:

FMCSC_SIZE 32.0 32.0 32.0
FMCSC_SEQFILE <full path to working directory for TIP4P>/T4P.seq

You should be able to compute the number of water molecules yourself with the knowledge that the in silico density at room temperature is extremely close to 1g·cm-3 for both TIP3P and TIP4P. However, remember that in case the density is only approximately known, it is always safer to i) use overall larger boxes, and ii) err in the setup of an NVT simulation on the lower density side (take a few molecules less). This reduces the impact of pressure artifacts. Now create two sequence files in the respective directories with the corresponding number of entries for either "T3P" or "T4P" and both terminated by "END". Call them T3P.seq and T4P.seq, respectively.

Step 2 - Define interaction model and cutoff scheme

The original TIP3P and TIP4P models are available - for instance - through the OPLS-AA/L parameter file. Therefore, we can copy and paste straight from the documentation:

PARAMETERS <full path to CAMPARI directory>/params/oplsaal.prm

Note that some keywords were dropped that are not directly applicable to the present example. One tricky aspect about the use of periodic boundary conditions is that the interaction model has to be truncated at a specific distance (less than half the shortest box length, i.e. 16Å in our case). This is commonly achieved by a cutoff scheme. The reason lies in the typically used minimum-image convention, i.e. the convention that each particle interacts only with a single copy of another particle present in the actual simulation cell, but that its copy is chosen as the nearest image. If the interaction model were truncated at a longer distance, it would become necessary to compute interactions with multiple images so as to avoid completely artificial cutoff asymmetries from occurring:

FMCSC_N2LOOP 0 # avoid evaluations of full N^2 sums for energies

The above instructs CAMPARI to use grid-based cutoffs. During dynamics, all non-bonded interactions will be truncated at 12.0Å with electrostatic contributions being "switched" to be exactly zero at that distance via the reaction-field method. The implied dielectric of 78.2 (default) is not enough to avoid force discontinuities at the cutoff distance, however. There is no twin-range (distance regime for interactions to be computed less frequently) because this technique is not compatible with the reaction-field method. Grid- or cell-based methods underlie all common and efficient ways of maintaining neighbor lists during molecular simulations in dense fluids, and these are what makes cutoffs work efficiently. One of the results examined in this tutorial depends on the choice for the overall cutoff.

Step 3 - Set OpenMP options

While small, the system we use can benefit from CAMPARI's OpenMP-based threads parallelization a lot. You may have used the threads parallelization in earlier tutorials for MC simulations. For gradient-based calculations, dynamic load balancing (DLB) becomes important as it applies to most aspects of the major CPU time consumer, viz., the force calculation.

FMCSC_NRTHREADS 4 # if you can allow it, set it to the number of physical or even logical cores in your system (hyperthreading can be beneficial)
FMCSC_THREADS_VERBOSE 1 # level of diagnostic output
FMCSC_THREADS_DLB_FREQ 500 # restarts DLB segments every X steps
FMCSC_THREADS_DLB_STOP 300 # stops DLB for each segment after X steps
FMCSC_THREADS_DLB_EXT 1 # accumulate load data over X steps for DLB adjustments

As mentioned in the comment above, you should set the number of threads to a value appropriate for your system. Modern many-core architectures can provide tens of cores in a single shared-memory environment. The settings for dynamic load balancing are chosen such that it operates pretty much continuously. This is because the loads experienced by individual threads often vary through factors that can be controlled poorly, e.g., cache use, hardware mapping and scheduling, or the presence of competing processes.

Step 4 - Prepare starting structure

The numerical integration of equations of motion relies on the use of forces that can be extremely large for conformations with steric overlap. In MC simulations, we do not use forces, and can start the algorithm from arbitrary starting conformations. In dynamics, such a protocol would lead to "explosion" of the simulation (integration becomes unstable until eventually a critical numerical overflow error crashes the program). Therefore, we will first run MC for a while to obtain a pdb file with a sufficiently equilibrated water box:

FMCSC_RANDOMTHRESH 3.0 # the threshold is a mean residue-residue value
FMCSC_POLOUT 100000000
FMCSC_ACCOUT 100000000

You should be familiar with the move set instructions and general simulation specifications by now. We also started adding some keywords that will prevent output not interesting to us. After pasting all the above keywords into a file called "water.key", execute:

campari_threads -k water.key >& log

This will only take a few minutes. After the run is complete, have a look at the final structure snapshot written out by CAMPARI as "TIP4P_END.pdb" (or "TIP3P_END.pdb", respectively). It should look like a box of water with homogeneous density and no discernible steric overlaps between molecules. Rename this file to "TIP4P_MC.pdb" or "TIP3P_MC.pdb", respectively. It is important that energetically this file is relaxed enough to be used as an input for a gradient-based simulation. If not (consult log-output), repeat or extend the above MC calculation.

Step 5 - Run the molecular dynamics simulation

In molecular dynamics, we have to consider additional implementation issues. In order for the integrated equations of motion to yield an ensemble other than the microcanonical one (NVE), we need to introduce a coupling device to an external bath. These coupling devices and "baths" are almost always utterly conceptual and may appear confusing at first. Append the existing "water.key" as follows:


Here, we specify explicitly the ensemble to be the canonical one, request Newtonian (rather than stochastic) dynamics, set a time step of 4fs, and instruct CAMPARI to couple the system to an external bath by means of a stochastic velocity rescaling thermostat with a coupling time of 1ps. Note that it is, by default, implied that the dynamics are performed in rigid-body space (→ FMCSC_CARTINT). This is important because the use of rigid body degrees of freedom as opposed to Cartesian degrees of freedom means that we do not have to deal with keeping the water molecules rigid through other means. The same tutorial could of course be run in Cartesian space by using appropriate constraints and analytical or iterative solvers (see FMCSC_SHAKEMETHOD for an overview). The use of constraints is of critical importance since these water models were parameterized as rigid models using primarily MC simulations in the early phases of development. Details of the rigid-body integrator are provided elsewhere and a complete explanation is given in the reference publication that uses liquid TIP4P as one of its validation examples. To improve stability for the challenging case of rigid rotation of water (very small inertia), set the following:


With these settings, the time step could be pushed up to about 5fs for this system without major losses in accuracy. For analysis, we will focus here on static (rather than dynamic) properties alone, in particular the mean solution structure.
To proceed, we instruct CAMPARI to use the MC-generated structure as input, and to perform only the necessary analyses on-the-fly:

FMCSC_PCCODEFILE <full path to CAMPARI directory>/examples/tutorial5/tutorial5.t4p
FMCSC_FLUSHTIME 1.0 # in minutes

FMCSC_DISABLE_ANALYSIS is a meta-keyword that is a shortcut for explicitly disabling all running output and/or all on-the-fly analyses (here: both). Because of CAMPARI's logic in keeping only the last value for each keyword, the subsequent use of specific analysis flags (e.g., FMCSC_PCCALC) overrides this choice specifically for the analysis in question. Thus, the above settings mean that only ensemble output (FMCSC_ENSOUT) and pair correlation analyses (FMCSC_PCCALC) are performed. The latter use an input file that has different options for formatting. Here, we take advantage of the matrix input format (that is perfect for this type of application) to instruct CAMPARI to collect statistics on three possible intermolecular site-site correlation functions (O-O, O-H, H-H). Make sure to change the path for FMCSC_PCCODEFILE; a missing input file will simply disable the analysis.
Lastly, we of course have to adjust the simulation length parameters, so change the following keywords that are already present in "water.key":


We will simulate for a total of 5ns with the first 1ns being discarded as equilibration (i.e., not entering for instance the pair correlation analysis). Now type:

campari_threads -k water.key >& log2 &

The runtime will depend on the number of threads available (check output file THREADS.log for this). If you want to keep it short, lower both parameters (FMCSC_NRSTEPS and FMCSC_EQUIL) by two orders of magnitude. This should still give sufficient data to obtain pair correlation functions (but not most other properties) at sufficient quality.

Step 6 - Analyze the results

After the simulation has successfully ended, first plot the total system potential energy (column 2 in ENSEMBLE.dat) as a function of step number (column 1 in ENSEMBLE.dat). The energy should be stable (reach a plateau) and, after this, there should be symmetric fluctuations around the mean, i.e., there should be no extreme spikes, no sudden drops, etc. Any such behavior would be indicative of a problem with the simulation itself (bad settings). In general, an energy trace is of course a poor way of analyzing a simulation, but for this homogeneous system it serves well as a first sanity check. An additional such check would be to plot the instantaneous system temperature by plotting column 5 of ENSEMBLE.dat. System temperature will almost always be a direct or indirect reporter on erroneous simulations (you can also find the average system temperature over the production portion of the run in the log-file ["log2" above]).
From the potential energy plot, extract the average system potential energy and divide it by the number of molecules you specified in the sequence file. The resultant value, the mean potential energy per particle, is a typical hallmark quantity for water models and should under ambient conditions be close to -10kcal/mol. Note, however, that as hinted at above the precise value will be dependent on the cutoff settings and the cutoff implementation and that values listed in the literature are typically obtained with a particular scheme that is most likely not identical to the one employed here. In addition, this is an NVT simulation, for which the actual density might deviate slightly from the "correct" density for the water model in question. Still, drastic deviations will indicate a fundamental flaw in the program or parameters.
To analyze solution structure, plot the obtained site-site correlation functions (O-O, O-H, and H-H) and compare them to published values (see references). They can be found in the output file GENERAL_PC.dat as columns 2-4 (with the first column indicating distance). Note that these pair correlation functions (gOO(r) and so on) have been obtained by normalizing raw distance distributions by an analytical prior (maximum entropy distribution). This so-called volume element can normally be found in RBC_PC.dat, and is non-trivial for distances exceeding half the smallest box length. Here, we did not obtain these data since the output to RBC_PC.dat was completely suppressed given that all water molecules were regarded as solvent from the default assignment of the analysis group parsing. This could easily be remedied of course.
When comparing to the literature, all prominent short-range features should match the published data accurately (since these are fairly robust). The most variation can be expected for the peak heights of sharp peaks (due to variations in the binning). After you have run both simulation by making the appropriate modifications above, TIP3P and TIP4P must be easily distinguishable in gOO(r) by the much more pronounced long-range features in TIP4P compared to the very flat curve observed for TIP3P.
Note, however, that two artifacts will be introduced that become visible when plotting the pair correlation functions out to larger distances. The first will show up mostly in gOO(r) and is a cutoff artifact generated by the reaction-field method (i.e., it is physically present). You will note very subtle undulations precisely near the 12.0Å RF cutoff used in this tutorial. The second are purely numerical artifacts due to the normalization. As discussed elsewhere, the analytical volume element assumes that all sites are completely independent of one another. However, for polyatomic molecules, image shift vectors are determined not on a per-atom basis, but on a per-residue basis. This leads to artifacts at distances corresponding to geometrical characteristics of the simulation cell (for a cubic box, at half the side length and at half the lengths of the two possible diagonals, which includes the distance of maximal separation). These latter artifacts could be partially eliminated by the use of a numerical prior specific to each pair. They show up here for gOH(r) and gHH(r) because the hydrogen is about 1.0Å away from the site used for computing the image shift vectors.

Design downloaded from free website templates.