Tutorial 3: An Advanced Monte Carlo Example - The Helix-Coil Transition in the FS-peptide and the Utilization of Replica Exchange
Preface:
In previous tutorials we have seen how to use Monte Carlo (MC) sampling within CAMPARI to study homopolypeptides under conditions such that the peptides were intrinsically disordered. Here, we extend this approach to a system that undergoes a well-defined order-to-disorder transition, viz. the α-helical FS-peptide, an alanine-rich peptide doted with three arginine residues. The control parameter to scan the transition will be temperature, as there are experimental data available (circular dichroism for instance) that are directly comparable.Within this tutorial, you will be familiarized with two different new concepts. The first is a sampling tool, specifically the use of the replica-exchange (RE) technique. In RE, we set up - for the same topology/sequence - a set of simulations in parallel which differ in one or more system parameters. These individual simulations (replicas) exchange information while they are running, thereby effectively creating a multicanonical ensemble. Movement within this multicanonical approach is governed by trials that are accepted or rejected based on a Metropolis acceptance criterion very similar to what happens in a normal Metropolis Monte Carlo simulation. It is extremely common to do the following:
- The ways in which information exchange can occur are usually restricted to neighboring conditions, i.e., those replicas which are most similar to each other. This is done for several reasons that are not elaborated upon here.
- By far the most common exchange parameter is temperature.
- Exchanges (swaps) are attempted with some fixed and large enough interval.
The second aspect is to use on-the-fly analysis of polypeptide secondary structure. This allows us to define measures of overall helicity as a function of the control parameter (temperature) that can be directly compared to experimental data. We will use the ABSINTH implicit solvation model to describe the effects of water. We have shown previously that the temperature-dependent helix-coil transition for this peptide is well-described by the model (reference). In that reference, we perform a rigorous assessment of sampling quality by testing for hysteresis when simulations are started from either the "folded" (fully helical) or "unfolded" (coil-like) states. This tutorial focuses on the latter, and we leave it as a possible extension to the user to introduce the appropriate modifications to test for hysteresis.
Step-by-Step:
Step 1 - Set up systemTo have a more faithful mimicking of experimental conditions, we will simulate the FS-peptide in the presence of explicitly represented ions that neutralize the peptide's net charge and also constitute an electrolyte bath at a given concentration (here around 150mM). The corresponding sequence input file can be found in the example directory. For such a scenario, the droplet boundary condition is perfect:
FMCSC_SEQFILE <full path to CAMPARI directory>/examples/tutorial3/tutorial3.seq
FMCSC_BOUNDARY 4
FMCSC_SHAPE 2
FMCSC_SIZE 40.0
FMCSC_RANDOMIZE 1
The resultant concentration cannot be set explicitly, but rather emerges implicitly from the combination of volume and particle numbers. There is a little tool in the tools-directory ("box2conc.m") to help with this (if you have access to Matlab). We will also set an appropriate length for the simulations:
FMCSC_NRSTEPS 40000000
FMCSC_EQUIL 10000000
Note that it is the default to use Monte Carlo sampling in the NVT ensemble in rigid-body/torsional space when working with CAMPARI. We therefore - just like in the previous tutorials - do not need to specify this explicitly (check later tutorials for examples where sampling deviates from these defaults). Note as well that there are no a priori rules one can use to set an appropriate overall length and equilibration period for a given simulation. The above numbers can be called appropriate only in light of i) the complexity of analysis we wish to perform, and ii) prior experience with this very system under these very conditions. This caveat holds not only for most tutorials presented here but indeed for any unbiased simulation attempting to compute ensemble averages (and even for most biased sampling methods).
Step 2 - Set up interaction parameters
We want to use the published ABSINTH implicit solvent model with one important modification. The first requirement is to turn on the necessary energy terms (the excluded volume terms are turned on by default):
FMCSC_SC_ATTLJ 1.0 # enables dispersive interactions
FMCSC_SC_POLAR 1.0 # enables (partial) charge interactions
FMCSC_SC_IMPSOLV 1.0 # enables direct mean-field interactions and charge screening
The one change we want to introduce is to check how robust the helix-coil transition can be described when changing the parent force field. Most ABSINTH-based publications have used a specific parameter file, viz., "abs3.2_opls.prm". Here, we investigate what happens if we use CHARMM36 as the source for partial charges and (the few required) bonded parameters. The remainder of the settings, which match exactly the published ones, can be taken straight from the documentation (although it is of course recommended that users approximately understand each of the following keywords):
PARAMETERS <full path to CAMPARI directory>/params/abs3.2_charmm36.prm
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 1 # avoid computing redundant interactions
FMCSC_ELECMODEL 2 # use the ABSINTH exclusion model for short-range electrostatics
FMCSC_MODE_14 1
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_BONDED_B 0.0
FMCSC_SC_BONDED_A 0.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 0.0
FMCSC_SC_EXTRA 0.0
FMCSC_FOSMID 0.1
FMCSC_FOSTAU 0.25
FMCSC_SCRMID 0.9
FMCSC_SCRTAU 0.5
FMCSC_SAVPROBE 2.5
FMCSC_IMPDIEL 78.2
FMCSC_SCRMODEL 2
Many of the settings above, in particular the ABSINTH-related settings, are actually the default settings. Additional keywords are required to specify the exact nature of the truncation scheme used for interactions (cutoffs). This was not discussed in detail in Tutorial1 or Tutorial2 so we spend some more time on it here:
FMCSC_CUTOFFMODE 4 # topology-assisted cutoffs
FMCSC_MCCUTMODE 2
FMCSC_NBCUTOFF 10.0 # cutoff value in Angstrom for IPP/ATTLJ/DMFI
FMCSC_ELCUTOFF 14.0 # cutoff value in Angstrom for POLAR
FMCSC_LREL_MC 3
Choosing option 4 for FMCSC_CUTOFFMODE means that we use a simplified neighbor search that relies on molecular topology for efficiency. This is an appropriate choice for systems having hundreds of residues (but, unlike grid-based cutoffs, it does not fundamentally solve the scaling problem). Note that the two cutoff values differ, which in Monte Carlo simulations means that long-range terms (here Coulomb dipole-dipole and dipole-monopole terms) are truncated at a longer distance. Monopole-monopole terms are an exception. In general, the accounting for these terms at large separation is handled by keyword FMCSC_LREL_MC, and the chosen value implies that monopole-monopole terms are calculated in a point-to-point approximation at large distance (this is conceptually resemblant of fast multipole methods). The choice of 2 for FMCSC_MCCUTMODE means that we do not truncate the short-range potentials at the cutoff distance exactly but rather calculate all interactions given by the neighbor lists. In general, we do not recommend setting FMCSC_NBCUTOFF and FMCSC_ELCUTOFF to different values. One important reasons for avoiding too small values for FMCSC_NBCUTOFF is that in implicit solvent the dispersion term is often anisotropic due to the inhomogeneous density (whereas in a dense, mono-phasic system, it is typically isotropic, which makes it relatively safe to disregard for energy/force calculations). As a final note of caution, it should be pointed out that the truncation scheme above can (of course) give rise to artifacts on account of missing monopole-dipole terms. An example would be a block copolymer with both a polyelectrolytic and a polar (but neutral) stretch.
Step 3 - Pick an appropriate move set
The efficiency of any MC simulations relies on its move set. Fortunately, the FS-peptide is a relatively simple system ("smooth" energy landscape) and is well-sampled even when using a very simple move set as follows:
FMCSC_RIGIDFREQ 0.2
FMCSC_RIGIDRDFREQ 0.4
FMCSC_RIGIDRDBUF 1.1
FMCSC_CHIFREQ 0.05
FMCSC_CHIRDFREQ 0.4
FMCSC_CHISTEPSZ 20.0
FMCSC_OMEGAFREQ 0.1
FMCSC_OMEGARDFREQ 0.1
FMCSC_PIVOTRDFREQ 0.2
FMCSC_PIVOTSTEPSZ 10.0
All other move set-relevant settings are implied as their defaults. We use infrequent sampling of side chain dihedral angles, as there are only the three arginine side chains to worry about. Unlike the polyglutamine example, here we need sampling of rigid-body degrees of freedom to be able to equilibrate the relative positions of the ions and the peptide amongst and to each other. We choose 20% of our moves as rigid-body moves, keeping in mind that the dilute ionic bath will equilibrate in almost barrier-less fashion. The fully randomizing moves (40% of all rigid-body moves) are particularly effective at this. However, they do require an additional setting (keyword FMCSC_RIGIDRDBUF) to avoid minor artifacts at the boundary (the occupancy of ions at distances larger than 40Å from the origin would be slightly underestimated if this parameter were left at its default setting of 1.0). Sampling of the ω-angle is needed but covers only a small volume in phase space due to the tight rotational barriers around the C-N bond. The setting for FMCSC_OMEGARDFREQ is crucial to allow for cis/trans-isomerization of the ω-bond. The bulk of our sampling will be in φ/ψ-moves of the pivot variety since the flexible dihedral angles on the peptide backbone are what describes peptide conformation, and ultimately the helix-coil transition.
Step 4 - Instruct CAMPARI to collect only those data we need
We are specifically interested here to investigate the helix-coil transition, so we will disable most other analyses:
FMCSC_DISABLE_ANALYSIS 1
FMCSC_FLUSHTIME 2.0 # in minutes
FMCSC_SEGCALC 50
FMCSC_BBSEGFILE <full path to CAMPARI directory>/data/bbseg2.dat
FMCSC_DSSPCALC 50
FMCSC_INSTDSSP 0
FMCSC_ANGCALC 50
FMCSC_ENOUT 10000
FMCSC_XYZOUT 2000
FMCSC_XYZMODE 2
FMCSC_XYZPDB 3
FMCSC_RSTOUT 1000000
The above keywords instruct CAMPARI to collect the segment statistics we are using for the helix-coil-analysis every 50 elementary steps. These segments provide a torsion-based inference regarding α-helicity. As an alternative we also collect data for a hydrogen bond-based inference using DSSP. These two alternatives were explored in detail in a 2012 reference publication. In addition, Ramachandran data are also collected every 50 steps (FMCSC_ANGCALC), and trajectory data are written out every 2000 steps. The latter will enable us to reanalyze the simulation later in case the on-the-fly analyses are deemed insufficient. Lastly, running output for energies is produced to be able to easily track simulation progress and sanity.
Step 5 - Setup up replica-exchange Monte Carlo (REMC)
The last section to our key-file we need is the one defining that we wish to perform a REMC calculation. The details of such a run are mostly controlled through a dedicated input file. Here, we specify the values for the exchange parameters for each condition (such as temperature):
FMCSC_REFILE <full path to CAMPARI directory>/examples/tutorial3/tutorial3.rex
FMCSC_REPLICAS 16
FMCSC_REDIM 1 # the number of exchange dimensions
FMCSC_REMC 1 # enables the replica exchange method in an MPI-parallel simulation run
FMCSC_RESWAPS 15
FMCSC_RENBMODE 2 # only swap with neighboring replicas
FMCSC_REFREQ 2000
This will implement a temperature REMC calculation attempting swaps between neighboring conditions every 2000 elementary MC Steps. There are 16 different conditions ranging in the only RE parameter (dimension), viz., temperature, from 250K to 420K (see input file). In each swap cycle, 15 pairs are randomly picked and their structures are attempted to be exchanged. It is perfectly possible for two such swaps to be exactly compensating at finite temperature meaning that the total impact of such a random scheme is less than that of the more typical scheme in which neighbor pairs are systematically suggested for exchange. You may wonder what temperatures outside of the liquid range of water mean. The answer is manifold, but here it suffices to say that no simulation force field, in particular not implicit solvent models are parameterized in temperature-dependent fashion. Simulation temperatures that deviate substantially from 300K are therefore almost always approximations and would have to be remapped to a more realistic range (unfortunately, such a quantitative remapping is not usually possible).
Note that conceptually speaking it is irrelevant whether the system parameter is exchanged between two simulations or whether the current structures are exchanged. In CAMPARI, RE calculations are organized such that each node always remains associated with a fixed condition (i.e., in the present example, the very first node always corresponds to the condition of 240K). This simplifies analysis since we can approximate the correctly re-weighted multicanonical simulation average at any temperature by the pseudo-canonical average obtained by taking data from just a single condition (here: temperature). The latter are what is automatically provided by CAMPARI (as if the replicas where independent). As an additional remark, it should be pointed out that trajectory data will therefore include structure "jumps" upon successful RE swaps. They can be disentangled by a subsequent trajectory analysis run processing the printed exchange trace.
Combine all the key-file sections mentioned thus far and create a file called "fspep.key" in an empty directory.
Step 6 - Run the simulation
This run requires 16 logical cores to be available on your system(s) in order to run with normal efficiency. The deployment of MPI jobs onto cluster architectures is touched upon in Tutorial7 and Tutorial9, and here we just provide the most basic MPI instructions. Using the MPI code requires a working MPI installation and it requires CAMPARI to be compiled with MPI support (see installation instructions). Assuming OpenMPI syntax, in the execution directory, you should then be able to run:
mpirun -np 16 campari_mpi -k fspep.key >& mpilog
If you compiled with and linked against a different MPI library, the MPI syntax may be different as well. Note that this executable has no OpenMP support (OpenMP and OpenMPI are not at all related but unfortunately tend to look like typos of each other). In certain cases, additional flags may have to be provided to OpenMPI as well, but those issues are not related to CAMPARI. If you cannot spend this much CPU time, you can lower the values for FMCSC_NRSTEPS and FMCSC_EQUIL by an order of magnitude each, and you should still obtain interpretable (albeit noisier) results.
Step 7 - Analyze the results
The first recommended thing to do is to pick a temperature in the transition regime and visually inspect the simulation. You can for example use VMD and the automatic scripts CAMPARI provides for this purpose. You should be seeing that the FS-peptide adopts different α-helical states throughout the simulation and that the ion bath remains relatively inert. Typically, the C-terminal portion will have a higher likelihood of being in coil-like states than the N-terminal portion. The single rod-like helix is favored over bundle-like conformations with two shorter stretches of helix.
After this visual inspection, proceed to analyze the system quantitatively. If you have access to the free statistical package R, you can use the script provided in the examples directory after copying it to the working directory (holding all the output files):
doread <- TRUE
docalc <- TRUE
source("tutorial3.R")
This takes a while due to the employed MC search used to determine helix-coil parameters in the Lifson-Roig formalism. The latter is not explained further in this tutorial, and the reader is referred to the recent literature for an analysis of the difficulties associated with this formalism when applied to simulation data. If everything works, this will create plots of various metrics of the helicity as a function of temperature according to the two independent structural determinants we collected data for - backbone torsional residency in the α-basin (via BB_SEGMENTS_NORM.dat) and formation of i→i+4 hydrogen bonds (via DSSP_NORM.dat). You should observe a well-defined transition from helix-rich states to helix-poor states in the first three plots. Both data sets should agree relatively well, although DSSP-based data will show fewer helical segments than the torsional-based estimate. Differences arise because DSSP measures α-helical H-bonds directly rather than having to infer them from torsional statistics (it is possible both for DSSP-continuous segments to be interrupted according to torsional statistics, and vice versa). Moreover, DSSP cannot report segments shorter than a certain minimum length (of six consecutive residues involving two i→i+4 hydrogen bonds, the central four form the shortest possible α-helix segment according to DSSP). This is technically inconsistent with the Lifson-Roig model, and several corrections would have to be applied in order to make the data directly applicable. This is left to the user to explore/research if so desired. As a direct consequence, the helix nucleation and propagation parameters reported based on DSSP assignment are to be treated as qualitative/speculative in nature.
As the most important consideration, you should find that our choice of parameter file influences the helix-coil transition only weakly with respect to the data in the original ABSINTH publication. Feel free to inspect and modify the script and input files. The system is simple enough such that meaningful sampling can be achieved, and it is therefore possible to design meaningful derivatives of this tutorial, for example testing the influence of additional parameter files or changes to the parameters of the ABSINTH solvation model itself. Note lastly that any 1:1 comparison to prior data should make sure that all settings, for example those in the R-script, should be/remain 100% identical. Otherwise, results are not going to be exactly comparable.