Tutorial 12: Simulations of Systems Containing Entities Not Supported Natively by CAMPARI
Preface:
Tutorial 10 has already illustrated a variety of concepts useful in dealing with entities (residues) that are not supported natively by CAMPARI. While it is going to be tedious simulating completely new polymer types or complex assemblies such as lipid bilayers using the ideas presented here, simulations of proteins containing nonstandard amino acids or nucleotides and exotic small molecules are extremely common and can be accomplished without much difficulty. As with any simulation of nonstandard systems in any software package, parameterization is a concern. CAMPARI's small molecule screening facility provides a partially automatic way of dealing with this for the specific case of a single unsupported small molecule placed at the end of the sequence input. In the general case, the first step should be to figure out what parameters are going to be needed and how to obtain them. The latter is not part of this tutorial as it is a task that is largely independent. We will merely demonstrate how to incorporate parameters into a CAMPARI simulation using the extensive patch facility. Before going into the details of the system, it is useful to raise awareness of the fact that some parameters are or can be guessed by CAMPARI. Specifically, these are "simple" bonded parameters (angles and bond lengths) that can be set with flat force constants based on the initial geometry (see FMCSC_GUESS_BONDED), and automatic guesses for the Lennard-Jones atom type (based on matching valency, symbol, etc), and certain types of dihedral angle potentials (see option 3 for FMCSC_GUESS_BONDED).It is also crucial to know whether simulations are meant to happen in internal coordinate space and/or Cartesian space. Both are fraught with different types of difficulties. In Cartesian space, all atoms are theoretically mobile requiring extensive incorporation of bonded parameters (via FMCSC_BPATCHFILE) and/or the setting up of custom constraint sets. Since this work flow is typical for the majority of simulation software packages, it may be worthwhile to think about creating a script to translate parameters generated by or assigned within a different package. In addition to all other parameters, additional difficulties can be associated with 14-fudge factors and similar exceptions meant to correct errors with nonbonded interactions applied to atoms that are topologically close. In general, the standard models for the general interaction model and short-range electrostatic interactions will behave most straightforwardly. The other interaction models need to know specifics of the system, e.g., regarding local rigidity of ring systems. CAMPARI tries to infer as many of these properties as it can. The goal is to pass a stringent test, i.e., that results for a run starting from a pdb input file containing supported residues are not altered upon renaming residues such that CAMPARI treats them as unsupported ones. Ideally, this should hold for a wide range of simulation settings. In reality, it cannot be expected to work perfectly under all instances. Consider for example the unique GROMOS exclusion rules for nucleotide bases. There is basically no way to transfer the underlying "concept" automatically to analogous systems, since these adjustments are more or less ad-hoc.
A treatment in rigid-body / torsional space may obviate the need for a number of bonded parameters (for rigid ring systems). Here, the main additional complication is that one has to rely on a correct (or meaningful) identification of the degrees of freedom. Because the Z-matrix is inferred from the input structure, reordering atoms in the template may sometimes be the only way to achieve the desired result (this is one area where the small molecule screening facility is substantially more general and convenient to use). The situation is worse if the desired degrees of freedom are not generally suitable for simple sampling algorithms in torsional space, i.e., if there are flexible rings. This, however, is not a problem specific to the scope of this tutorial (e.g., proline and ribose rings are currently frozen during internal coordinate space dynamics runs). By fiddling with atom names in the template file, it is possible to direct CAMPARI to assign appropriate atom types but in general it will be safer to set these explicitly with an atom type patch or a biotype patch.
The tutorial will proceed through a series of steps that would be typical for setting up a simulation with unsupported residues. First, we relax the input geometry, which requires appropriate bonded parameters. Next, we test whether the system can be sampled successfully with CAMPARI's Monte Carlo engine. With this in place, we implement a crude parameterization for simulations with a physically realistic model (ABSINTH implicit solvation model). Finally, we perform a short production simulation.
Step-by-Step:
Step 1 - Basic settings and geometry relaxation
The specific system we wish to simulate is defined in tutorial12.pdb. It consists of a small molecule not natively supported by CAMPARI (2-hydroxyethyl ammonium ion), a short peptide with a single residue of unsupported type ((5S)-5-hydroxylysine), and ADP represented as AMP (RPA) with an unsupported 5' phosphate cap. This parsing into residues is one possible choice, and it would also have been possible to define ADP as a single residue. If there are components of the system natively supported by CAMPARI, however, there usually is little advantage in masking this. The sequence file naturally needs to reflect the choice we made. Note that the small molecule (EAM) requires explicit termini settings (unlike the built-in small molecules, i.e., it must read "EAM_N_C"). If you visualize tutorial12.pdb, it is possible to see that the geometry is not very accurate in parts. Note that this is not necessarily a safe starting point, i.e., if the system is meant to contain rings or is in a conformation creating relatively short nonbonded distances, then the topology may be inferred incorrectly. This is why it is always recommended to check the Z-matrix file generated initially.Our first objective is to perform a short relaxation run (minimization) in Cartesian space. For this, we do not want to use any constraints, so correct bonded potentials are required for all system components. Given a suitable parameter file, the supported residues are covered, but the unsupported entities including their potential linkages to supported entities are not. While possible, the manual patching of bonded parameters with the help of the corresponding patch file becomes tedious even for systems as tiny as this one. Unless scripts are in place to handle this semiautomatically, it is usually easier to take advantage of the fact that the parameter file itself may already provide bonded parameters for various types of chemical linkages. If so, then a biotype patch will be much more efficient. Biotype patches are the highest-level patches CAMPARI offers, and change all atomic parameters assigned via the parameter file (atom type for Lennard-Jones interactions, partial charge, bonded interactions, mass, etc).
To provide a more thorough overview of various keywords involved, we write the key-file from scratch. Here, we take advantage of many default settings, so if you are missing something you think should have been specified, have a look at the summary portions of the simulation outputs and/or consult the documentation of the keyword in question.
As usual, create an empty directory and copy the files in the example directory for Tutorial 12 over. In addition, make a copy of parameter file abs4.2_charmm36.prm in the local directory (it is going to be modified → link to the actual file). We then start as follows:
PARAMETERS abs4.2_charmm36.prm
FMCSC_SC_IPP 0.0
FMCSC_SIZE 40. 40. 40.
FMCSC_ORIGIN 0. 0. 0.
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_PDBFILE tutorial12.pdb
FMCSC_SEQFILE tutorial12.in
FMCSC_CARTINT 2
FMCSC_DYNAMICS 6
FMCSC_MINI_GRMS 0.0001
FMCSC_MINI_MODE 3
FMCSC_NRSTEPS 10000
FMCSC_EQUIL 10000000
FMCSC_PDB_OUTPUTSTRING "a6,i5,1x,a4,a1,a3,1x,a1,i4,a1,3x,3f18.10"
FMCSC_UNSAFE 1 # frequently required to relegate errors to warnings when working with unsupporteds
This defines our system to simulate, selects an appropriate box size, and creates a Hamiltonian encompassing only bonded terms. It asks for a short minimization run in Cartesian space (in CAMPARI, it is not possible to relax or sample covalent geometries using Monte Carlo). You should feel free to read up on the minimization options in case they are not clear. If we tried to run this simulation, it would most likely run through. However, the presence of covalent bonds (inferred from initial topology) in conjunction with the absence of any potentials acting on them would lead to the unsupported atoms drifting away, which is obviously not what was meant by "relax the geometry." The continuous evaluation of these terms may also lead to numerical instabilities (bond angles near 180°). Warnings would be printed to log output indicating the missing support for the atoms in unsupported residues and/or valence errors (the latter can always be skipped by using FMCSC_UNSAFE). This is because, by default, CAMPARI will create a new biotype for every unique atom type occurring in unsupported residues. These types only have the most basic support, i.e., one of the existing atom (Lennard Jones) types in the parameter file has been assigned to every unsupported atom by CAMPARI. Since our system does not contain chemical moieties or linkages that are fundamentally different from what is found in supported residues, we will go the route of reassigning biotypes to fix the majority of these issues.
Inspect the corresponding patch file. It contains a simple list of assignment operations. Since the order of biotypes is the same for all CAMPARI parameter files, these types of patches are also more portable (some dummy atom types in CHARMM-based parameter files appended at the end may require a constant shift when switching between CHARMM-based or other parameter sets). From the file you can see, for example, that the majority of biotype assignments for the modified lysine residue correspond to the standard lysine biotypes, which is obviously the best starting point. The three largest biotype values in the file should not be supported by the parameter file (but it may already have been modified → adjustments needed). You should append the parameter file directly with:
biotype 1281 CD "Hydroxylysine" 2 0 15
biotype 1282 OE2 "Hydroxylysine" 6 0 45
biotype 1283 HE2 "Hydroxylysine" 4 0 1
Index numbers may have to be adjusted. Note that we are only setting LJ atom types and bonded types. It would also be possible to repurpose existing biotypes for this task, but new biotypes are the cleaner solution. Next, append the key-file with:
FMCSC_BIOTYPEPATCHFILE tutorial12.btpi
Now try running the simulation:
campari -k tutorial12.key > try.log
The minimization should finish normally. If you inspect "try.log", you should first of all notice the reporting on the matching of unsupported residues in the input pdb file. CAMPARI expects, for every unsupported residue in the sequence, a residue with that name (offset from neighboring residues by differences in residue number) surrounded by the same nearest-neighbor sequence context if it is a polymer residue (and this is the range of atoms it reports). This means that, for the example here, a pdb file containing "EAM" followed by the stretch "GLY-HLY-SER", and finally "ADC-RPA" would be sufficient. The difference would of course be that the remaining (which here would only be ACE and NME) parts are constructed randomly (we will return to this issue below). Further inspection of "try.log" reveals some information about inferences made regarding the unsupported residues, a report of the applied biotype patch, and, finally, warnings about missing bond angle terms. The relevant terms are listed in detail. In this case, they all come from the presence of the tertiary carbon atom at the alcohol site (Cδ). Ignoring them may lead to an unacceptable geometry that is also conceptually unusable as input for future runs. We have two options of dealing with this problem. First, we can explicitly patch the missing terms (via FMCSC_BPATCHFILE). Second, we can instruct CAMPARI to derive default terms from the input geometry (via FMCSC_GUESS_BONDED). The latter is not generally a useful option in Cartesian space sampling, because input geometries read from pdb can be inaccurate to begin with (this may be more tolerable if these bond lengths and angles are frozen anyway). Here, we decide to patch 3 of the 4 terms with the supplied patch file. These are the ones for which the parent force field (CHARMM) does not appear to make distinctions between CH, CH2, or CH3 moieties. For the sake of simplicity in this tutorial, the last term will be inferred from the input geometry. Append the key-file with:
FMCSC_BPATCHFILE tutorial12.bpi
FMCSC_GUESS_BONDED 1 # only guess missing essential terms
Try running the simulation with these additions. The output we are interested in is the final pdb file of the minimization, which will be printed in an extended format to avoid errors due to pdb precision.:
campari -k tutorial12.key >&minim.log
This should finish in less than a second. Store the final structure for use in subsequent steps:
mv yourcalc_END.pdb tutorial12_minimized.pdb
The file "tutorial12_minimized.pdb" cannot be visualized correctly by standard molecular viewers due to the extended precision. For visualization, you could either use a script to convert it back to standard precision, or you could simply run the calculation again after commenting the line, which we will have to do in any case to proceed below:
# FMCSC_PDB_OUTPUTSTRING "a6,i5,1x,a4,a1,a3,1x,a1,i4,a1,3x,3f18.10"
You should visually compare "tutorial12.pdb" with the structure after minimization. Note that molecular visualization software may have problems inferring the correct bonds in the 5'-phosphate due to the unusual residue name (the "CONECT" records are not maintained in the CAMPARI-generated pdb-files). Since we intend to reuse this CAMPARI-generated pdb file, it should be properly formatted. There are a few nuisances specific to unsupported residues to watch out for, such as to avoid having two atoms with the same pdb name when using biotype patches (these patches supplant the original names with the biotype-derived ones). In our example here, everything should be appropriate to begin with.
As a note of caution, it should be pointed out that the parameter file contains only stiff bonded terms. Even so, however, it is possible that these parameters were in reality coparameterized with other (bonded or nonbonded) terms and/or that they represent a frustrated landscape, i.e., not all minimum positions of the harmonic terms are attainable at the same time. Both issues can in theory lead to undesired results for a protocol as used here. Lastly, the question as to what could have gone wrong at this stage is of relevance. The most likely scenario involves erroneous ring closures in the inferred Z-matrix. These will result from steric clashes, i.e., distances between two nonbonded atoms that are too short. The consequences of assuming rings where there are none can be dramatic. Specifically, the setup of nonbonded interactions can be compromised (leading to nonsensical energies reported), the identification of affected dihedral angle degrees of freedom will fail, and the ability to use appropriate bonded potentials will be limited. Users are advised to check the input geometry before suspecting a different type of error. Fortunately, most visualization software packages include structure editing/manipulation tools by now, which are capable of making the required adjustments to an input structure.
Step 2 - Setting up the system for Monte Carlo sampling
The degrees of freedom of the system in unsupported residues (in terms of dihedral angles) are determined automatically from the input structure (→ details). We will first enable only two types of moves for sampling this system: rigid-body moves and single dihedral angle pivot moves. Append the key-file as follows:FMCSC_RIGIDFREQ 0.2
FMCSC_NUCFREQ 0.0
FMCSC_CHIFREQ 0.0
FMCSC_OMEGAFREQ 0.0
FMCSC_OTHERFREQ 1.0
FMCSC_OTHERUNKFREQ 0.5
FMCSC_OTHERNATFREQ 0.8
This turns on the two move types mentioned with remaining parameters left at default values. Of course, we also need to make modifications to existing keywords:
FMCSC_SC_IPP 1.0
FMCSC_CARTINT 1
FMCSC_DYNAMICS 1
FMCSC_NRSTEPS 100000
FMCSC_EQUIL 10000
## FMCSC_PDBFILE tutorial12.pdb
This disables the original read-in from tutorial12.pdb, changes the sampler to MC in internal coordinate space, extends the run length, and enables excluded volume interactions and on-the-fly analyses. Of course, we still need to supply CAMPARI with the topology information about the unsupported residues. However, we want to use the minimized structure for this purpose, and we want the initial structure to be randomized:
FMCSC_RANDOMIZE 1
FMCSC_PDB_TEMPLATE tutorial12_minimized.pdb
FMCSC_PDB_INPUTSTRING "a6,a5,1x,a4,a1,a3,1x,a1,i4,a1,3x,3f18.10"
Here, we instruct CAMPARI to just extract a functional Z-matrix for all unsupported entities from the structural input file, and to then randomize a well-defined set of degrees of freedom (details). The switch from using FMCSC_PDBFILE to FMCSC_PDB_TEMPLATE is what enables this. This implies that the supported residues are built in their well-defined covalent geometries in the same way as if no structural input file was present. Since nonbonded interactions are now in use, we will select settings consistent with the final model we have in mind (the ABSINTH implicit solvent model). In addition, we want a cutoff treatment in place.
FMCSC_INTERMODEL 1
FMCSC_SIGRULE 1
FMCSC_EPSRULE 2
FMCSC_CUTOFFMODE 4
FMCSC_NBCUTOFF 12.
FMCSC_ELCUTOFF 12.
Lastly, we are not interested in all types of analyses, which means that as in other tutorials we are disabling them globally followed by a specific enabling of the ones of interest:
FMCSC_DISABLE_ANALYSIS 1
FMCSC_INTCALC 10
FMCSC_WHICHINT 0 1 1 1
FMCSC_ENOUT 100
FMCSC_TOROUT 100
In this first step, we want to assess how the sampler is working, and for this purpose internal coordinate statistics are generated, both instantaneously and in histogram form. Additionally, we monitor the evolution of individual terms to the potential energy. Execute:
campari -k tutorial12.key >&mctest.log
For the given output mode, the file with instantaneous torsion angles (FYC.dat) lists all those torsional degrees of freedom that are natively supported by CAMPARI. If you inspect the file, it becomes apparent that we do not yet sample the φ/ψ angles in the peptide (except for one ψ angle, which has a very technical reason not elaborated upon here) or the sugar pucker in ADP. None of these degrees of freedom are accessible to the current sampler, which means we have to extend/alter the move set:
FMCSC_NUCFREQ 0.2 # modify
FMCSC_SUGARFREQ 1.0 # add
FMCSC_SUGARRDFREQ 0.2 # add
FMCSC_OTHERFREQ 0.6 # modify
Sugar pucker moves are part of a different branch of the move type decision tree, which mandates that FMCSC_NUCFREQ is nonzero. The reduction in the value for FMCSC_OTHERFREQ achieves that the default move type is included (polypeptide backbone φ/ψ pivot moves). Execute again:
campari -k tutorial12.key >&mctest.log
Inspecting FYC.dat, it becomes apparent that all the desired degrees of freedom are now indeed sampled. Furthermore, instantaneous energy values in ENERGY.dat reveal that the system does not appear to suffer from any major sources of energetic frustration. This means that we are now ready to prepare the system for simulations with a realistic Hamiltonian.
Step 3 - Setting up the system for the ABSINTH implicit solvation model
The full Hamiltonian has two missing ingredients, viz., partial charges and the setup of solvation groups for the ABSINTH DMFI. For the former, inspect the corresponding patch file. The patch accomplishes that the atoms having been assigned new biotypes receive an explicit partial charge, and that the partial charges on the terminal phosphate are slightly modified to produce the correct net charge on both nucleotide residues.For the setup of the ABSINTH DMFI, we have to take a few more things into account. It appears reasonable to decompose the hydroxyammonium in both the small molecule and the peptide into two groups each (NH3 and OH), whereas the additional phosphate should constitute its own solvation group. Solvation groups across residue boundaries may be compromised by the presence of unsupported residues, and we will (re)assign them explicitly to be safe. The decomposition strategy of the ABSINTH model may not always be obvious, and it can be useful to first look at how it works for the supported residues. Append the key-file as follows:
FMCSC_SC_POLAR 1.0
FMCSC_SC_IMPSOLV 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_FOSREPORT 1
FMCSC_CPATCHFILE tutorial12.cpi
The report flag will ensure that we obtain a diagnostic output file named FOS_GROUPS.vmd. Execute:
campari -k tutorial12.key >&absinthmc.log
vmd -e FOS_GROUPS.vmd
The simulation will take longer than the excluded volume calculation above due to the much more complex energy function. In the visualization of solvation groups (for every residue, the different solvation groups are shown as transparent envelopes to the relevant atoms always using the same color sequence of the VMD default starting at "1" for "ColorID"), you will note that the 5'-phosphate and the small molecules have no groups set up. Conversely, the hydroxylysine residue is already set up with a standard backbone amide solvation group because it is identified as a proper polypeptide residue. If you now inspect the DMFI patch file, whose format is explained elsewhere, you can see that we set up a total of 6 additional solvation groups. The 5'-phosphate is actually realized as a sum of two contributions (OH vs. OPO-). The reference free energy of solvation for the group carrying the net negative charge is -85 kcal/mol, i.e., it is decreased (more negative) in the same way as is normal for charged moieties on polymers in these newer-generation ABSINTH parameters. Still, the value is unlikely to be appropriate given that the diphosphate has characteristics of a divalent anion (rather than that of two distinct monovalent, anionic functionalities) with typical free energies of solvation closer to -250kcal/mol (i.e., values do not scale linearly with ion valence). As mentioned in the preface, the process of parameterization is not part of this tutorial, and values are assigned by analogy and without any amount of careful thought. The patch also shows that the hydroxyammonium moiety is split identically in both cases. Note that we use uneven atom weights for the primary amines. These weights (third column) are generally not expected to have much influence because the solvation states of nearby atoms will be highly correlated (this is no longer true if solvation groups are very large or if the assumed solvation shell thickness is reduced dramatically). The standard ABSINTH convention is to always use equal weights for all atoms in a solvation group. It is important to realize that additivity of solvation groups holds at all levels in ABSINTH, i.e. it is possible to further subdivide a solvation group corresponding to a small molecule with a reference free energy that is known experimentally. As long as the net total for all constituent groups reproduces the reference number, the fully solvated and fully desolvated reference states continue to be well-defined (this is used, for example, for the adenine base). More details are provided elsewhere. If you visualize "yourcalc_END.pdb" you should see that the lack of DMFI has most likely caused the formally charged moieties to cluster (the hydroxyammonium, the diphosphate, and the hydroxylysine side chain).
The output in "absinthmc.log" also tells you details of the solvation model including the important piece of information that most parameters are set to match the published model by default (note that we have set them explicitly in all prior tutorials using the ABSINTH model). However, there is one crucial error and that is the setting for the short-range electrostatic interaction model. To fix this error and to enable the DMFI patch, append the key-file:
FMCSC_FOSPATCHFILE tutorial12.fos
FMCSC_ELECMODEL 2
FMCSC_INTERMODEL 1
It is highly recommended to specify explicit settings for FMCSC_ELECMODEL and FMCSC_INTERMODEL in every CAMPARI key-file you use for actual simulations. Now re-run:
campari -k tutorial12.key >&absinthmc.log
vmd -e FOS_GROUPS.vmd
The visualization of the solvation groups should reveal quickly that the patch works as intended. This information is also reported in detail in "absinthmc.log." The final structure serves as the starting point for the production run.
Step 4 - Performing and analyzing a brief production simulation
There are a few additional warnings in "absinthmc.log" that we need to take into account. The first relevant one pertains to the simplified computation of additional atom-level parameters for the solvation model (adjustable by a separate patch). We make no further adjustments here, but it is worthwhile to keep in mind that for this approach the input geometry should ideally reproduce a state of maximum solvent accessibility for each rigid fragment yielded by the solvation group decomposition. The second group of interesting warnings is related to non-spherical symmetries of the cutoff scheme in conjunction with a periodic box, and we will adjust this by switching to a droplet boundary condition. Change existing or add missing keywords as follows:FMCSC_SIZE 35. # modify
FMCSC_ORIGIN 20. 20. 20. # modify
FMCSC_NRSTEPS 6000000 # modify
FMCSC_EQUIL 1000000 # modify
FMCSC_SHAPE 2 # add
FMCSC_BOUNDARY 4 # add
FMCSC_SOFTWALL 0.1 # add
By the above, we also increase simulation length to a level that will allow making statements of statistical relevance for a system of this size. We do not use a structural input file meaning that the initial conformation will be random again. While it is possible to supply separate pdb files for inference of topology (FMCSC_PDB_TEMPLATE) and structural input (FMCSC_PDBFILE), these must use the same format (FMCSC_PDB_INPUTSTRING). When working with unsupported residues, it is important not to change the template between runs because some of the parameters may depend on the initial geometry (meaning that data might theoretically not be explicitly reproducible for runs with different initial conditions). Our system contains ionic groups, so a complete absence of counterions (except for the hydroxyammonium ion) providing screening could be problematic, and there is indeed a warning to this effect in "absinthmc.log". Insert the following into the local copy of "tutorial12.in" (just before the line saying "END"; the placement is important because the patch files uses explicit atom indices, which would have to reflect any shifts resulting from residue insertions earlier):
K+
K+
K+
K+
K+
K+
K+
K+
K+
K+
CL-
CL-
CL-
CL-
CL-
CL-
CL-
CL-
CL-
CL-
Because the ionic interactions can create a very rugged energy landscape, Monte Carlo algorithms alone may not be efficient at sampling this system. We therefore use a hybrid scheme, in which the focus of the Monte Carlo moves is on large-scale rigid body motion and on the sugar sampling. Change existing keywords:
FMCSC_DYNAMICS 5
FMCSC_NUCFREQ 1.0
And append the key-file:
FMCSC_TSTAT 2
FMCSC_TSTAT_TAU 5.
FMCSC_TMD_INTEGRATOR 1
FMCSC_TMD_INT2UP 4
FMCSC_TMD_UNKMODE 3
FMCSC_TIMESTEP 0.005
FMCSC_FUDGE_DYN 1 # help with stability
FMCSC_MCCUTMODE 2
FMCSC_N2LOOP 0
The above keywords define the sampling engine for the (Newtonian) dynamics in mixed rigid body and dihedral angle coordinate space. Specifically, we use the adapted Andersen thermostat (this and this paper), which is more appropriate for dilute systems with mixed degrees of freedom than the default choice. We specify a coupling time of 5ps and instruct CAMPARI to sample all dihedral angles including those in unsupported residues (the setting for FMCSC_TMD_UNKMODE). The keywords related to the TMD integrator are not discussed here except for mentioning that an integration time step of 5fs is made safe by keyword FMCSC_FUDGE_DYN in case there are torsional or other rotational degrees of freedom with miniscule inertia (like the rotation around the C-O bond in serine and hydroxylysine). In a hybrid calculation, particular care should be taken in making sure that the Hamiltonians (energy functions and truncation schemes) in both MC and MD are the same. This is what the (nondefault) specifications for keywords FMCSC_MCCUTMODE and FMCSC_N2LOOP accomplish (in addition to the settings for FMCSC_LREL_MD and FMCSC_LREL_MC, which match up by default). It is also important that the sampled degrees of freedom are the same, which is not achievable fully here (because of the TMD engine's inability to sample flexible rings, which you can also find a warning about in "absinthmc.log"). As a third equivalence, the thermodynamic ensembles should match, which is satisfiable only approximately in CAMPARI's hybrid scheme (because discretization errors are unavoidable in the dynamics engine).
A CAMPARI hybrid sampling protocol consists of running alternating stretches of MD and MC segments of controllable length. Because the initial structures may be poorly relaxed, the initial segment is always an MC one, which has a separate parameter:
FMCSC_CYCLE_MC_FIRST 50000
FMCSC_CYCLE_MC_MIN 500
FMCSC_CYCLE_MC_MAX 1000
FMCSC_CYCLE_DYN_MIN 5000
FMCSC_CYCLE_DYN_MAX 10000
FMCSC_RANDOMATTS 5000
FMCSC_RANDOMTHRESH 1.
The above settings yield, stochastically, 90% TMD and 10% MC sampling (in terms of numbers of elementary steps). In addition, we want to minimize the risk of early failures by choosing tighter randomization settings and relaxing initially for 50000 steps.
Our last concern is what data to collect from the production run. Ideally, one would save all information at every snapshot, but in reality storage requirements and the ease of data processing and analysis mean that we need to have at least a vague idea beforehand what we are going to look for. Here, the molecules are so small that we focus our analysis on contact formation between the various species. Change and or append the key-file:
FMCSC_TOROUT 10000000 # modify
FMCSC_ENOUT 500 # modify
FMCSC_CONTACTCALC 100 # add this and the rest
FMCSC_CLUSTERCALC 1
FMCSC_CONTACTOFF 0
FMCSC_CONTACTMIN 5.0
FMCSC_CONTACTCOM 0.5
FMCSC_FLUSHTIME 2.0
FMCSC_ANGRPFILE tutorial12.agp
FMCSC_SEQREPORT 1
FMCSC_MPIAVG 1
Note that FMCSC_CLUSTERCALC refers to the analysis of spatial clusters of molecules, and must not be confused with the much more complex functionality controlled by FMCSC_CCOLLECT (see Tutorial 11). We are only interested in the minimum distance criterion for a contact (threshold set by FMCSC_CONTACTMIN), which is why the other threshold is unreasonably small. The analysis group file is needed to include the small molecule and the ions in the contact analysis. We have also disabled or reduced some instantaneous output. This brings us to the production simulation. If you have compiled an MPI version, run:
mpirun -np [NP] campari_mpi -k tutorial12.key >&mpi.log
Here, [NP] should be a number of processes you can use comfortably on the architecture at hand (e.g., 3 for a desktop with a 4-core CPU). By adding the keyword FMCSC_MPIAVG, we have asked CAMPARI to perform a run consisting of multiple independent copies of the same simulation with automatic data collection from all replicas at the end (to increase statistics). With just the serial version, run instead:
campari -k tutorial12.key >&production.log
There is a tiny chance that this simulation will fail. An unstable simulation produces erroneous heat through integrator (meaning discretization) errors, which might be specific to certain types of degrees of freedom, and this heat is dissipated to all other (coupled) degrees of freedom, which eventually causes the simulation to crash. Note that the crash behavior is stochastic because our system is dilute (heterogeneous levels of energetic coupling) and the coordinate representation is nonuniform. Note that the simulation would be quite stable with a 5fs integration time step without using FMCSC_FUDGE_DYN. An alternative workaround would be to explicitly patch masses of specific atoms (set FMCSC_MPATCHFILE to "tutorial12.mpi"). This patch file increases hydrogen mass for the atoms in question to 3 Da but this is both more tedious and more invasive with the internal coordinate-space integrator (because of its assumption of a diagonal mass matrix, see reference). If you are in doubt about that impact either of these workarounds has on rotational and translational degrees freedom, you can use keyword FMCSC_TMDREPORT to answer this question.
Once completed successfully, we can turn to the analysis. You should first check the output written to the MPI log-files (N_000.log, N_001.log, and so on) or to "production.log" (if run in serial mode) after the actual simulation steps have finished. There is useful information regarding sampling and ensemble statistics that can help diagnose the basic soundness of our approach. For example, the average temperatures for individual degrees of freedom should be close to the target ensemble temperature of 300K (if all or some are not, this could indicate a simulation that was at least partially unstable). By not saving any coordinates, the types of analyses we can perform are of course restricted to the output we requested. The tutorial is accompanied by an R-file that helps visualizing the contact map. We expect that the ADP molecule will probably be associated with two monovalent cations for most of the time. Due to the additional associativity resulting from the presence of other groups, it is likely that the hydroxylysine residue in the peptide is one of these. Open R and execute:
source("tutorial12.R")
The resultant plot is likely to reveal that the peptide and ADP associate strongly. We use a minimum distance criterion of 5Å to identify contacts, and thresholds of this type can always lead to misinterpretation of data. For the specific example here, we have to keep the overall spatial proximity in mind, i.e., many contacts are indirect symptoms of other contacts and must not be misinterpreted as causative (and causal relationships will generally be impossible to discern from an equilibrium simulation at a single condition). The contact map may reveal sporadic contacts of potassium ions to ADP. You should also have a look at output file MOLCLUSTERS.dat. Here, each row corresponds to probabilities per molecule to be part of a solute cluster of the given size. This file should point out that ADP and the peptide are never in isolation. However, these data do also rely on the setting for FMCSC_CONTACTMIN.
Epilogue - Variations on the theme
There are a few things you could try as follow-ups to this tutorial. The first is to figure out how to make the simulation use resources more efficiently (balance of Monte Carlo move set, balance of Monte Carlo vs. dynamics). This should involve an assessment of the appropriateness of sampling on a small subset of degrees of freedom with Monte Carlo (output in INTHISTS_DI.dat may reveal unwanted initial state biases). The second is to change the simulation settings such that trajectory output is written. As noted above, simulation analysis requires some prior knowledge to inform choices, and the setting for FMCSC_CONTACTMIN is a good example here. It would certainly be useful to reanalyze an existing trajectory with different values for FMCSC_CONTACTMIN, and you should consider creating a separate key-file for this with the help of the key-file builder. As demonstrated in Tutorial 10, the requirements for analyzing data containing unsupported residues are much less stringent than the ones for simulating unsupported residues correctly. The third is to think about the parameterization, in particular the data in the solvation group patch file.The fourth and final suggestion is also the most instructive one. We suggest as an exercise to change the simulated sequence such that the peptide has an additional 5-10 residues (do not change the immediate neighbors of HLY, though) and that we use the hydroxyammonium ion as the counterion instead of potassium ("K+") above. CAMPARI will have no problem producing random structures for this setup from the template in "tutorial12_minimized.pdb". However, as mentioned above, the patch facilities do rely on absolute atom indices. Some of them (for example, the DMFI patch and the biotype patch) have type-generalized input modes so that the patches at least do not have to be repeated over and over again. For others, support through the parameter file could be added, and this is ultimately the preferable option (here, the use of charge patches and bonded potential patches could be replaced with such a solution for HLY and the ADP cap). Of course, the by far most convenient, long-term solution is to add unsupported residues being used frequently to the residue database at the code level (and there are some detailed hints available for how to do this).