Tutorial 3: An Advanced Monte Carlo Example - The Helix-Coil Transition in the FS-peptide and the Utilization of Replica Exchange

Preface:

In previous tutorials we have seen how to use Monte Carlo (MC) sampling within CAMPARI to study homopolypeptides under conditions such that the peptides were intrinsically disordered. Here, we extend this approach to a system that undergoes a well-defined order-to-disorder transition, viz. the α-helical FS-peptide, an alanine-rich peptide doted with three arginine residues. The control parameter to scan the transition will be temperature, as there are experimental data available (circular dichroism for instance) that are directly comparable.
Within this tutorial, you will be familiarized with two different new concepts. The first is a sampling tool, specifically the use of the replica-exchange (RE) technique. In RE, we set up - for the same topology/sequence - a set of simulations in parallel which differ in one or more system parameters. These individual simulations (replicas) exchange information while they are running, thereby effectively creating a multicanonical ensemble. Movement within this multicanonical approach is governed by trials that are accepted or rejected based on a Metropolis acceptance criterion very similar to what happens in a normal Metropolis Monte Carlo simulation. It is extremely common to do the following:
  1. The ways in which information exchange can occur are usually restricted to neighboring conditions, i.e., those replicas which are most similar to each other. This is done for several reasons that are not elaborated upon here.
  2. By far the most common exchange parameter is temperature.
  3. Exchanges (swaps) are attempted with some fixed and large enough interval.
The resultant parallel simulation needs multiple processors, but very limited communication bandwidth. If CAMPARI is compiled with MPI support, it offers wide support for RE calculations. The reader is referred to the documentation for details.
The second aspect is to use on-the-fly analysis of polypeptide secondary structure. This allows us to define measures of overall helicity as a function of the control parameter (temperature) that can be directly compared to experimental data. We will use the ABSINTH implicit solvation model to describe the effects of water. We have shown previously that the temperature-dependent helix-coil transition for this peptide is well-described by the model (reference). In that reference, we perform a rigorous assessment of sampling quality by testing for hysteresis when simulations are started from either the "folded" (fully helical) or "unfolded" (coil-like) states. This tutorial focuses on the latter, and we leave it as a possible extension to the user to introduce the appropriate modifications to test for hysteresis.

Step-by-Step:

Step 1 - Set up system

To have a more faithful mimicking of experimental conditions, we will simulate the FS-peptide in the presence of explicitly represented ions that neutralize the peptide's net charge and also constitute an electrolyte bath at a given concentration (here around 150mM). The corresponding sequence input file can be found in the example directory. For such a scenario, the droplet boundary condition is perfect:

FMCSC_SEQFILE <full path to CAMPARI directory>/examples/tutorial3/tutorial3.seq
FMCSC_BOUNDARY 4
FMCSC_SHAPE 2
FMCSC_SIZE 40.0
FMCSC_RANDOMIZE 1

The resultant concentration cannot be set explicitly, but rather emerges implicitly from the combination of volume and particle numbers. There is a little tool in the tools-directory ("box2conc.m") to help with this (if you have access to Matlab). We will also set an appropriate length for the simulations:

FMCSC_NRSTEPS 40000000
FMCSC_EQUIL 10000000

Note that it is the default to use Monte Carlo sampling in the NVT ensemble in rigid-body/torsional space when working with CAMPARI. We therefore - just like in the previous tutorials - do not need to specify this explicitly (check later tutorials for examples where sampling deviates from these defaults). Note as well that there are no a priori rules one can use to set an appropriate overall length and equilibration period for a given simulation. The above numbers can be called appropriate only in light of i) the complexity of analysis we wish to perform, and ii) prior experience with this very system under these very conditions. This caveat holds not only for most tutorials presented here but indeed for any unbiased simulation attempting to compute ensemble averages (and even for most biased sampling methods).

Step 2 - Set up interaction parameters

We want to use the published ABSINTH implicit solvent model. The first requirement is to turn on the necessary energy terms:

FMCSC_SC_ATTLJ 1.0 # enables dispersive interactions
FMCSC_SC_POLAR 1.0 # enables (partial) charge interactions
FMCSC_SC_IMPSOLV 1.0 # enables direct mean-field interactions and charge screening

The remainder of the settings that make the implicit solvent model exactly the published one can be taken straight from the documentation, although it is of course recommended that users approximately understand each of the following keywords:

PARAMETERS <full path to CAMPARI directory>/params/abs3.2_opls.prm
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 1
FMCSC_ELECMODEL 2
FMCSC_MODE_14 1
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_BONDED_B 0.0
FMCSC_SC_BONDED_A 0.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 0.0
FMCSC_SC_EXTRA 0.0
FMCSC_FOSMID 0.1
FMCSC_FOSTAU 0.25
FMCSC_SCRMID 0.9
FMCSC_SCRTAU 0.5
FMCSC_SAVPROBE 2.5
FMCSC_IMPDIEL 78.2
FMCSC_SCRMODEL 2

Additional keywords are required to specify the exact nature of the truncation scheme used for interactions (cutoffs):

FMCSC_CUTOFFMODE 4
FMCSC_NBCUTOFF 10.0
FMCSC_ELCUTOFF 14.0
FMCSC_LREL_MC 3

In an effort to speed up the simulations, we use a trick to simplify the treatment of long-range ionic interactions between the arginine sidechains and the explicit salt ions which constitute a background electrolyte solution (see elsewhere for details).

Step 3 - Pick an appropriate move set

The efficiency of any MC simulations relies on its move set. Fortunately, the FS-peptide is a relatively simple system ("smooth" energy landscape) and is well-sampled even when using a very simple move set as follows:

FMCSC_RIGIDFREQ 0.2
FMCSC_RIGIDRDFREQ 0.4
FMCSC_CHIFREQ 0.05
FMCSC_CHIRDFREQ 0.4
FMCSC_CHISTEPSZ 20.0
FMCSC_OMEGAFREQ 0.1
FMCSC_OMEGARDFREQ 0.1
FMCSC_PIVOTRDFREQ 0.2
FMCSC_PIVOTSTEPSZ 10.0

All other move set-relevant settings are implied as their defaults. We use infrequent sampling of sidechain dihedral angles, as there are only the three arginine sidechains to worry about. Unlike the polyglutamine example, here we need sampling of rigid-body degrees of freedom to be able to equilibrate the relative positions of the ions and the peptide amongst and to each other. We choose 20% of our moves as rigid-body moves, keeping in mind that the dilute ionic bath will equilibrate in almost barrier-less fashion. Sampling of the ω-angle is needed but covers only a small volume in phase space due to the tight rotational barriers around the C-N bond. The setting for FMCSC_OMEGARDFREQ is crucial to allow for cis/trans-isomerization of the ω-bond. The bulk of our sampling will be in φ/ψ-moves of the pivot variety since the flexible dihedral angles on the peptide backbone are what describes peptide conformation, and ultimately the helix-coil transition.

Step 4 - Instruct CAMPARI to collect only those data we need

We are specifically interested here to investigate the helix-coil transition, so we will disable most other analyses:

FMCSC_SEGCALC 50
FMCSC_BBSEGFILE <full path to CAMPARI directory>/data/bbseg2.dat
FMCSC_DSSPCALC 50
FMCSC_INSTDSSP 0
FMCSC_ANGCALC 50
FMCSC_CONTACTCALC 200000000
FMCSC_POLCALC 200000000
FMCSC_DIPCALC 200000000
FMCSC_DIFFRCALC 200000000
FMCSC_SAVCALC 200000000
FMCSC_COVCALC 200000000
FMCSC_PCCALC 200000000
FMCSC_HOLESCALC 200000000
FMCSC_INTCALC 200000000
FMCSC_TOROUT 200000000
FMCSC_POLOUT 200000000
FMCSC_ENSOUT 200000000
FMCSC_ENOUT 10000
FMCSC_ACCOUT 10000
FMCSC_XYZOUT 2000
FMCSC_XYZMODE 2
FMCSC_XYZPDB 3
FMCSC_RSTOUT 1000000

The above keywords instruct CAMPARI to collect the segment statistics we are using for the helix-coil-analysis every 50 elementary steps. In addition, Ramachandran data are also collected every 50 steps, and trajectory data are written out every 2000 steps. The latter will enable us to reanalyze the simulation later, in case the on-the-fly analyses are deemed insufficient. Lastly, more running output is produced to be able to easily track simulation progress and sanity.

Step 5 - Setup up replica-exchange Monte Carlo (REMC)

The last section to our key-file we need is the one defining that we wish to perform a REMC calculation. The details of such a run are mostly controlled through a dedicated input file. Here, we specify the values for the exchange parameters for each condition (such as temperature):

FMCSC_REFILE <full path to CAMPARI directory>/examples/tutorial3/tutorial3.rex
FMCSC_REPLICAS 16
FMCSC_REDIM 1
FMCSC_REMC 1
FMCSC_RESWAPS 15
FMCSC_RENBMODE 2
FMCSC_REFREQ 2000

This will implement a temperature REMC calculation attempting swaps between neighboring conditions every 2000 elementary MC Steps. There are 16 different conditions ranging in the only RE parameter (dimension), viz., temperature, from 250K to 420K (see input file). In each swap cycle, 15 pairs are randomly picked and their structures are attempted to be exchanged. It is perfectly possible for two such swaps to be exactly compensating at finite temperature meaning that the total impact of such a random scheme is less than that of the more typical scheme in which neighbor pairs are systematically suggested for exchange. You may wonder what temperatures outside of the liquid range of water mean. The answer is manifold, but here it suffices to say that no simulation force field, in particular not implicit solvent models are parameterized in temperature-dependent fashion. Simulation temperatures that deviate substantially from 300K are therefore almost always approximations and would have to be remapped to a more realistic range (unfortunately, such a quantitative remapping is not usually possible).
Note that conceptually speaking it is irrelevant whether the system parameter is exchanged between two simulations or whether the current structures are exchanged. In CAMPARI, RE calculations are organized such that each node always remains associated with a fixed condition (i.e., in the present example, the very first node always corresponds to the condition of 240K). This simplifies analysis since we can approximate the correctly re-weighted multicanonical simulation average at any temperature by the pseudo-canonical average obtained by taking data from just a single condition (here: temperature). The latter are what is automatically provided by CAMPARI (as if the replicas where independent). As an additional remark, it should be pointed out that trajectory data will therefore include structure "jumps" upon successful RE swaps.
Combine all the key-file sections mentioned thus far and create a file called "fspep.key" in an empty directory.

Step 6 - Run the simulation

In the execution directory, you should now be able to run:

mpirun -np 16 campari_mpi -k fspep.key >& mpilog

This assumes that you compiled CAMPARI with MPI-support (see installation instructions) and that you are using OpenMPI or a similar MPI implementation. Otherwise, the MPI syntax may be different. In certain cases, additional flags may have to be provided to OpenMPI as well, but those issues are not related to CAMPARI.

Step 7 - Analyze the results

The first recommended thing to do is to pick a temperature in the transition regime and visually inspect the simulation. You can for example use VMD and the automatic scripts CAMPARI provides for this purpose. You should be seeing that the FS-peptide adopts different α-helical states throughout the simulation and that the ion bath remains relatively inert. Typically, the C-terminal portion will have a higher likelihood of being in coil-like states than the N-terminal portion. The single rod-like helix is favored over bundle-like conformations with two shorter stretches of helix.
After this visual inspection, proceed to analyze the system quantitatively. If you have access to the free statistical package R, you can use the script provided in the examples directory after copying it to the working directory (holding all the output files):

doread <- TRUE
docalc <- TRUE
source("tutorial3.R")

This takes a while due to the employed MC search used to determine helix-coil parameters in the Lifson-Roig formalism. The latter is not explained further in this tutorial, and the reader is referred to the recent literature for an analysis of the difficulties associated with this formalism when applied to simulation data. If everything works, this will create plots of various metrics of the helicity as a function of temperature according to two independent structural determinants - backbone torsional residency in the α-basin (via BB_SEGMENTS.dat) and formation of i→i+4 hydrogen bonds (via DSSP.dat). You should observe a well-defined transition from helix-rich states to helix-poor states in the first three plots. Both data sets should agree relatively well, although DSSP-based data will show fewer helical segments than the torsional-based estimate. Differences arise because DSSP measures α-helical H-bonds directly rather than having to infer them from torsional statistics (it is possible both for DSSP-continuous segments to be interrupted according to torsional statistics, and vice versa). Moreover, DSSP cannot report segments shorter than a certain minimum length (of six consecutive residues involving two i→i+4 hydrogen bonds, the central four form the shortest possible α-helix segment according to DSSP). This is technically inconsistent with the Lifson-Roig model, and several corrections would have to be applied in order to make the data directly applicable. This is left to the user to explore/research if so desired. As a direct consequence, the helix nucleation and propagation parameters reported based on DSSP assignment are to be treated as qualitative/speculative in nature.
Feel free to inspect and modify the script and input files. The system is simple enough such that meaningful sampling can be achieved, and it is therefore possible to design meaningful derivatives of this tutorial. Note lastly that the settings in the script may not always be/remain 100% identical to the settings used in published work, so that results are not going to necessarily be exactly comparable (this can always be fixed, though, by simple modifications to the R-script).

Design downloaded from free website templates.