Tutorial 12: Simulations of Systems Containing Entities Not Supported Natively by CAMPARI
Preface:Tutorial 10 has already illustrated a variety of concepts useful in dealing with entities (residues) that are not supported natively by CAMPARI. While it is going to be tedious simulating completely new polymer types or complex assemblies such as lipid bilayers using the ideas presented here, simulations of proteins containing nonstandard amino acids or nucleotides and exotic small molecules are extremely common and can be accomplished without much difficulty. As with any simulation of nonstandard systems in any software package, parameterization is a concern. Therefore, the first step should be to figure out what parameters are going to be needed and how to obtain them. The latter is not part of this tutorial as it is a task that is largely independent. We will merely demonstrate how to incorporate parameters into a CAMPARI simulation using the extensive patch facility. Before going into the details of the system, you should remind yourself that some parameters are or can be guessed by CAMPARI. Specifically, these are bonded parameters (angles and bond lengths) that can be set with flat force constants based on the initial geometry (see FMCSC_GUESS_BONDED) and automatic guesses for the Lennard-Jones atom type (based on matching valency, symbol, etc).
It is also crucial to know whether simulations are meant to happen in internal coordinate space and/or Cartesian space. Both are fraught with different types of difficulties. In Cartesian space, all atoms are theoretically mobile requiring extensive incorporation of bonded parameters (via FMCSC_BPATCHFILE) and/or the setting up of custom constraint sets. Since this work flow is typical for the majority of simulation software packages, it may be worthwhile to think about creating a script to translate parameters generated by or assigned within a different package. In addition to all other parameters, additional difficulties can be associated with 14-fudge factors and similar exceptions meant to correct errors with nonbonded interactions applied to atoms that are topologically close. In general, the standard models for the general interaction model and short-range electrostatic interactions will behave most straightforwardly. The other treatments need to know specifics of the system, e.g., regarding local rigidity of ring systems. CAMPARI tries to infer as many of these properties as possible. The goal is to pass a stringent test, i.e., that results for a run starting from a pdb input file containing supported residues are not altered upon renaming residues such that CAMPARI treats them as unsupported ones. Ideally, this should hold for a wide range of simulation settings. In reality, it cannot be expected to work perfectly under all instances. Consider for example the unique GROMOS exclusion rules for nucleotide bases. There is basically no way to transfer the underlying "concept" automatically to analogous systems, since these adjustments are more or less adhoc.
A treatment in rigid-body / torsional space may obviate the need for a number of bonded parameters (for rigid ring systems). Here, the main additional complication is that one has to rely on a correct (or meaningful) identification of the degrees of freedom. Because the Z-matrix is inferred from the input structure, reordering atoms in the template may sometimes be the only way to achieve the desired result. The situation is worse if the desired degrees of freedom are not generally suitable for simple sampling algorithms in torsional space, i.e., if there are flexible rings. This, however, is not a problem specific to the scope of this tutorial (e.g., proline and ribose rings are currently frozen during internal coordinate space dynamics runs). By fiddling with atom names in the template file, it is possible to direct CAMPARI to assign appropriate atom types, but in general it will be safer to set these explicitly with an atom type patch or a biotype patch.
The tutorial will proceed through a series of steps that would be typical for setting up a simulation with unsupported residues. First, we relax the input geometry, which requires appropriate bonded parameters. Next, we test whether the system can be sampled successfully with CAMPARI's Monte Carlo engine. With this in place, we implement a crude parameterization for simulations with a physically realistic model (ABSINTH implicit solvation model). Finally, we perform a short production simulation.
Step 1 - Basic settings and geometry relaxationThe specific system we wish to simulate is defined in tutorial12.pdb. It consists of a small molecule not natively supported by CAMPARI (2-hydroxyethyl ammonium ion), a short peptide with a single residue of unsupported type ((5S)-5-hydroxylysine), and ADP represented as AMP (RPA) with an unsupported 5' phosphate cap. This parsing into residues is one possible choice, and it would also have been possible to define ADP as a single residue. If there are components of the system natively supported by CAMPARI, however, there usually is little advantage in masking this. The sequence file naturally needs to reflect the choice we made. Note that the small molecule (EAM) requires explicit termini settings (unlike the built-in small molecules, i.e., it must read "EAM_N_C"). If you visualize tutorial12.pdb, it is possible to see that the geometry is not very accurate in parts. Note that this is not necessarily a safe starting point, i.e., if the system is meant to contain rings or is in a conformation creating relatively short nonbonded distances, then the topology may be inferred incorrectly. This is why it is always recommended to check the Z-matrix file generated initially.
Our first objective is to perform a short relaxation run (minimization) in Cartesian space. For this, we do not want to use any constraints, so correct bonded potentials are required for all system components. Given a suitable parameter file, the supported residues are covered, but the unsupported entities including their potential linkages to supported entities are not. While possible, the manual patching of bonded parameters with the help of the corresponding patch file becomes tedious even for systems as tiny as this one. Unless scripts are in place to handle this semiautomatically, it is usually easier to take advantage of the fact that the parameter file itself may already provide bonded parameters for various types of chemical linkages. If so, then a biotype patch will be much more efficient. Biotype patches are the highest-level patches CAMPARI offers, and change all atomic parameters assigned via the parameter file (atom type for Lennard-Jones interactions, partial charge, bonded interactions, mass, etc).
To provide a more thorough overview of various keywords involved, we write the key-file from scratch. Here, we take advantage of many default settings, so if you are missing something you think should have been specified, have a look at the summary portions of the simulation outputs and/or consult the documentation of the keyword in question.
As usual, create an empty directory and copy the files in the example directory for Tutorial 12 over. In addition, make a copy of parameter file abs3.2_charmm.prm in the local directory (it is going to be modified → link to the actual file). We then start as follows:
FMCSC_SIZE 40. 40. 40.
FMCSC_ORIGIN 0. 0. 0.
This defines our system to simulate, selects an appropriate box size, and creates a Hamiltonian encompassing only bonded terms. It asks for a short minimization run in Cartesian space (it is not possible to relax or sample covalent geometries using Monte Carlo). You should feel free to read up on the minimization options in case they are not clear. If we tried to run this simulation, it would most likely run through. However, the presence of bonded interactions (inferred from initial topology) in conjunction with the absence of any potentials citing on them would lead to the unsupported atoms drifting away, which is obviously not what was meant by "relax the geometry." The continuous evaluation of these terms may also lead to numerical instabilities (bond angles near 180°). Warnings would be printed to log output indicating the missing support for the atoms in unsupported residues and/or valence errors (the latter can always be skipped by using FMCSC_UNSAFE). This is because, by default, CAMPARI will create a new biotype for every unique atom type occurring in unsupported residues. These types only have the most basic support, i.e., one of the existing atom (Lennard Jones) types in the parameter file has been assigned to every unsupported atom by CAMPARI. Since our system does not contain chemical moieties or linkages that are fundamentally different from what is found in supported residues, we will go the route of reassigning biotypes to fix the majority of these issues.
Inspect the corresponding patch file. It contains a simple list of assignment operations. Since the order of biotypes is the same for all CAMPARI parameter files, these types of patches are also more portable (some dummy atom types in CHARMM-based parameter files appended at the end may require a constant shift when switching between CHARMM-based or other parameter sets). From the file you can see, for example, that the majority of biotype assignments for the modified lysine residue correspond to the standard lysine biotypes, which is obviously the best starting point. The three largest biotype values in the file should not be supported by the parameter file (but it may already have been modified → adjustments needed). You should append the parameter file directly with:
biotype 1098 CD "Hydroxylysine" 2 0 15
biotype 1099 OE2 "Hydroxylysine" 6 0 45
biotype 1100 HE2 "Hydroxylysine" 4 0 1
Index numbers may have to be adjusted. Note that we are only setting LJ atom types and bonded types. It would also be possible to repurpose existing biotypes for this task, but new biotypes are the cleaner solution. Next, append the key-file with:
Now try running the simulation:
campari -k tutorial12.key > try.log
The minimization appears to finish normally, but inspection of "try.log" reveals some warnings about missing bond angle terms. The relevant terms are listed in detail. In this case, they all come from the presence of the tertiary carbon atom at the alcohol site (Cδ). Ignoring them may lead to an unacceptable geometry that is also conceptually unusable as input for future runs. We have two options of dealing with this problem. First, we can explicitly patch the missing terms (via FMCSC_BPATCHFILE). Second, we can instruct CAMPARI to derive default terms from the input geometry (via FMCSC_GUESS_BONDED). The latter is rarely a useful option, because input geometries read from pdb are inaccurate to begin with (lacking precision). Here, we decide to patch 3/4 terms with the supplied patch file. These are the ones for which the parent force field (CHARMM) does not appear to make distinctions between CH, CH2, or CH3 moieties. Fur the purpose of this tutorial, the last term will be inferred from the input geometry. Append the key-file with:
Now try running the simulation:
campari -k tutorial12.key >&minim.log
This should finish in less than a second. Store the final structure for use in subsequent steps:
mv yourcalc_END.pdb tutorial12_minimized.pdb
You should visually compare the two structures. Note that molecular visualization software may have problems inferring the correct bonds in the 5'-phosphate due to the unusual residue name (the "CONECT" records are not maintained in the CAMPARI-generated pdb-files). As a note of caution, it should be pointed out that the parameter file contains only stiff bonded terms. Even so, however, it is possible that these parameter were in reality coparameterized with other (bonded or nonbonded) terms and/or that they represent a frustrated landscape, i.e., not all minimum positions of the harmonic terms are attainable at the same time. Both issues can in theory lead to undesired results for a protocol as used here. Lastly, the question as to what could have gone wrong at this stage is of relevance. The most likely scenario involves erroneous ring closures in the inferred Z-matrix. These will result from steric clashes, i.e., distances between two nonbonded atoms that are too short. The consequences of assuming rings where there are none can be dramatic. Specifically, the setup of nonbonded interactions can be compromised (leading to nonsensical energies reported), the identification of affected dihedral angle degrees of freedom will fail, and the ability to use appropriate bonded potentials will be limited. Users are advised to check the input geometry before suspecting a different type of error. Fortunately, most visualization software packages include structure editing/manipulation tools by now, which are capable of making the required adjustments to an input structure.
Step 2 - Setting up the system for Monte Carlo samplingThe degrees of freedom of the system in unsupported residues (in terms of dihedral angles) are determined automatically from the input structure (→ details). We will first enable only two types of moves for sampling this system: rigid-body moves and single dihedral angle pivot moves. Append the key-file as follows:
This turns on the two move types mentioned with remaining parameters left at default values. Of course, we also need to make modifications to existing keywords:
## FMCSC_PDBFILE tutorial12.pdb
This disables the original read-in from tutorial12.pdb, changes the sampler to MC in internal coordinate space, extends the run length, and enables excluded volume interactions and on-the-fly analyses. Of course, we still need to supply CAMPARI with the topology information about the unsupported residues. However, we want to use the minimized structure for this purpose, and we want the initial structure to be randomized:
Here, we instruct CAMPARI to just extract a functional Z-matrix for all unsupported entities from the structural input file, and to then randomize a well-defined set of degrees of freedom (details). The switch from using FMCSC_PDBFILE to FMCSC_PDB_TEMPLATE is what enables this. This implies that the supported residues are built in their well-defined covalent geometries in the same way as if no structural input file was present. Since nonbonded interactions are now in use, we will select settings consistent with the final model we have in mind (the ABSINTH implicit solvent model). In addition, we want a cutoff treatment in place.
Lastly, we are not interested in all types of analyses enabled by default at this point, which means that some of them are going to be disabled:
FMCSC_WHICHINT 0 1 1 1
In this first step, we want to assess how the sampler is working, and for this purpose internal coordinate statistics are generated, both instantaneously and in histogram form. Additionally, we monitor the evolution of individual terms to the potential energy. Execute:
campari -k tutorial12.key >&mctest.log
For the given output mode, the file with instantaneous torsion angles (FYC.dat) lists all those torsional degrees of freedom that are natively supported by CAMPARI. If you inspect the file, it becomes apparent that we do not yet sample the φ/ψ angles in the peptide (except for one ψ angle, which has a very technical reason not elaborated upon here) or the sugar pucker in ADP. None of these degrees of freedom are accessible to the current sampler, which means we have to extend/alter the move set:
Sugar pucker moves are part of a different branch of the move type decision tree, which mandates that FMCSC_NUCFREQ is nonzero. The reduction in the value for FMCSC_OTHERFREQ achieves that the default move type is included (polypeptide backbone φ/ψ pivot moves). Execute again:
campari -k tutorial12.key >&mctest.log
Inspecting FYC.dat, it becomes apparent that all the desired degrees of freedom are now indeed sampled. Furthermore, instantaneous energy values in ENERGY.dat reveal that the system does not appear to suffer from any major sources of energetic frustration. This means that we are now ready to prepare the system for simulations with a realistic Hamiltonian.
Step 3 - Setting up the system for the ABSINTH implicit solvation modelThe full Hamiltonian has two missing ingredients, viz., partial charges and the setup of solvation groups for the ABSINTH DMFI. For the former, inspect the corresponding patch file. The patch accomplishes that the atoms having been assigned new biotypes receive an explicit partial charge, and that the partial charges on the terminal phosphate are slightly modified to produce the correct net charge on both nucleotide residues.
For the setup of the ABSINTH DMFI, we have to take a few more things into account. It appears reasonable to decompose the hydroxyamine in both the small molecule and the peptide into two groups each (NH3 and OH), whereas the additional phosphate should constitute its own solvation group. Solvation groups across residue boundaries may be compromised by the presence of unsupported residues, and we will (re)assign them explicitly to be safe. The decomposition strategy of the ABSINTH model may not always be obvious, and it can be useful to first look at how it works for the supported residues. Append the key-file as follows:
The report flag will ensure that we obtain a diagnostic output file named FOS_GROUPS.vmd. Execute:
campari -k tutorial12.key >&absinthmc.log
vmd -e FOS_GROUPS.vmd
The simulation will take longer than the excluded volume calculation above due to the much more complex energy function. In the visualization of solvation groups (for every residue, the different solvation groups are shown as transparent envelopes to the relevant atoms always using the same color sequence of the VMD default starting at "1" for "ColorID"), you will note that the 5'-phosphate and the small molecules have no groups set up. Conversely, the hydroxylysine residue is already set up with a standard backbone amide solvation group because it is identified as a proper polypeptide residue. If you now inspect the DMFI patch file, whose format is explained elsewhere, you can see that we set up a total of 6 additional solvation groups. The 5'-phosphate is actually realized as a sum of two contributions (OH vs. OPO-). The reference free energy of solvation for the group carrying the net negative charge is -100 kcal/mol, i.e., it is decreased (more negative) in the same way as is normal for charged moieties on polymers in the published ABSINTH model. Still, the value is unlikely to be appropriate given that the diphosphate has characteristics of a divalent anion (rather than that of two distinct monovalent, anionic functionalities) with typical free energies of solvation closer to -250kcal/mol (i.e., values do not scale linearly with ion valence). As mentioned in the preface, the process of parameterization is not part of this tutorial, and values are assigned by analogy and without any amount of careful thought. The patch also shows that the hydroxyamine is split identically in both cases. Note that we use uneven atom weights for the primary amines. These weights (third column) are generally not expected to have much influence because the solvation states of nearby atoms will be highly correlated (this is no longer true if solvation groups are very large or if the assumed solvation shell thickness is reduced). The standard ABSINTH convention is to always use equal weights for all atoms in a solvation group. It is important to realize that in ABSINTH additivity of solvation groups holds at all levels, i.e. it is possible to subdivide a solvation group corresponding to a small molecule with a reference free energy that is known experimentally further. As long as the net total for all constituent groups reproduces the reference number, the fully solvated and fully desolvated reference states continue to be well-defined (this is used, for example, for the adenine base). More details are provided elsewhere.
The output in "absinthmc.log" also tells you details of the solvation model including the important piece of information that most parameters are set to match the published model by default (note that we have set them explicitly in all prior tutorials using the ABSINTH model). However, there is one crucial error and that is the setting for the short-range electrostatic interaction model. To fix this error and to enable the patch, append the key-file:
It is highly recommended to specify explicit settings for FMCSC_ELECMODEL and FMCSC_INTERMODEL in every CAMPARI key-file you use for actual simulations. Now re-run:
campari -k tutorial12.key >&absinthmc.log
vmd -e FOS_GROUPS.vmd
mv yourcalc_END.pdb tutorial12_eq.pdb
The visualization of the solvation groups should reveal quickly that the patch works as intended. This information is also reported in detail in "absinthmc.log." The final structure serves as the starting point for the production run.
Step 4 - Performing and analyzing a brief production simulationThere are a few additional warnings in "absinthmc.log" that we need to take into account. The first relevant one pertains to the simplified computation of additional atom-level parameters for the solvation model (adjustable by a separate patch). We make no further adjustments here, but it is worthwhile to keep in mind that for this approach the input geometry should ideally reproduce a state of maximum solvent accessibility for each rigid fragment yielded by the solvation group decomposition. The second group of interesting warnings is related to non-spherical symmetries of the cutoff scheme in conjunction with a periodic box, and we will adjust this by switching to a droplet boundary condition. Change existing or add missing keywords as follows:
FMCSC_ORIGIN 20. 20. 20.
By the above, we also increase simulation length to a level that will allow making statements of statistical relevance for a system of this size. The change to "tutorial12_eq.pdb" as structural input implies that topology continues to be inferred from the file passed to keyword FMCSC_PDB_TEMPLATE, i.e., "tutorial12_minim.pdb." When working with unsupported residues, it is important not to change the template between runs, because some of the parameters may depend on the initial geometry (meaning that data could not be explicitly reproducible for runs with different initial conditions). The change to FMCSC_RANDOMIZE is required to make sure that the simulation does indeed start from the arrangement in "tutorial12_eq.pdb." Visualization of "tutorial12_eq.pdb" may reveal that an ionic cluster has formed. This is not unexpected for this system, but the details may of course differ from run to run. Part of the reason for this behavior could be an absence of counterions providing screening, and there is a warning in this regard in "absinthmc.log". Insert the following into the local copy of "tutorial12.in" (just before the line saying "END"):
We will use CAMPARI's ability to generate initial positions for these ions automatically without editing "tutorial12_eq.pdb." Because the ionic interactions can create a very rugged energy landscape, Monte Carlo algorithms alone may not be efficient at sampling this system. We therefore use a hybrid scheme, in which the focus of the Monte Carlo moves is on large-scale rigid body motion and on the sugar sampling. Change existing keywords:
And append the key-file:
The above keywords define the sampling engine for the (Newtonian) dynamics in mixed rigid body and dihedral angle coordinate space. Specifically, we use the default thermostat (Bussi et al.) with a coupling time of 5ps, and instruct CAMPARI to sample all dihedral angles including those in unsupported residues (the setting for FMCSC_TMD_UNKMODE). The other keywords are not discussed here. Our last concern is what data to collect from the production run. Ideally, one would save all information at every snapshot, but in reality storage requirements and the ease of data processing and analysis mean that we need to have at least a vague idea beforehand what we are going to look for. Here, the molecules are so small that we focus our analysis on contact formation between the various species. Change and or append the key-file:
Note that FMCSC_CLUSTERCALC refers to the analysis of spatial clusters of molecules, and must not be confused with the much more complex functionality controlled by FMCSC_CCOLLECT (see Tutorial 11). We are only interested in the minimum distance criterion for a contact (threshold set by FMCSC_CONTACTMIN), which is why the other threshold is unreasonably small. The analysis group file is needed to include the small molecule and the ions in the contact analysis. We have also disabled or reduced some instantaneous output. This brings us to the production simulation. If you have compiled an MPI version, run:
mpirun -np [NP] campari_mpi -k tutorial12.key >&mpi.log
Here, [NP] should be a number of processes you can use comfortably on the architecture at hand (e.g., 3 for a desktop with a 4-core CPU). By adding the keyword FMCSC_MPIAVG, we have asked CAMPARI to perform a run consisting of multiple independent copies of the same simulation with automatic data collection from all replicas at the end (to increase statistics). With just the serial version, run instead:
campari -k tutorial12.key >&production.log
It is possible that this simulation will fail. When working with molecular dynamics, we have to keep in mind the fastest degrees of freedom in relation to the chosen integration time step of 5fs. Here, our system features several hydroxyl groups, which are associated with very low effective mass. If the terminal hydrogen atoms encounter a hard potential (e.g., contact with another atom or the droplet boundary), their velocity may increase catastrophically. This erroneous heat is normally dissipated to other degrees of freedom, which eventually causes the simulation crash. Note that the crash behavior is stochastic because our system is dilute (heterogeneous levels of energetic coupling) and the coordinate representation is nonuniform. In case of a crash, append the key-file with:
This patch file increases hydrogen mass for the atoms in question to 3 Da. Start a new simulation using the same command as above.
Once completed successfully, we can turn to the analysis. You should first check the output written to the MPI log-files (N_000.log, N_001.log, and so on) or to "production.log" (if run in serial mode) after the actual simulation steps have finished. There is useful information regarding sampling and ensemble statistics that can help diagnose the basic soundness of our approach. For example, the average temperatures for individual degrees of freedom should be close to the target ensemble temperature of 300K (if all or some are not, this could indicate a simulation that was at least partially unstable). By not saving any coordinates, the types of analyses we can perform are of course restricted to the output we requested. The tutorial is accompanied by an R-file that helps visualizing the contact map. We expect that the ADP molecule will probably be associated with two monovalent cations for most of the time. Due to the additional associativity resulting from the presence of other groups, it is likely that the hydroxylysine residue in the peptide is one of these. Open R and execute:
The resultant plot is likely to reveal that the peptide and ADP associate strongly. We use a minimum distance criterion of 5Å to identify contacts, and thresholds of this type can always lead to misinterpretation of data. For the specific example here, we have to keep the overall spatial proximity in mind, i.e., many contacts are indirect symptoms of other contacts and must not be misinterpreted as causative (and causal relationships will generally be impossible to discern from an equilibrium simulation at a single condition). The contact map may reveal sporadic contacts of potassium ions to ADP. You should also have a look at output file MOLCLUSTERS.dat. Here, each row corresponds to probabilities per molecule to be part of a solute cluster of the given size. This file should point out that ADP and the peptide are never in isolation. However, these data do also rely on the setting for FMCSC_CONTACTMIN.
There are a few things you could try as follow-ups to this tutorial. The first is to figure out how to make the simulation use resources more efficiently (balance of Monte Carlo move set, balance of Monte Carlo vs. dynamics). This should involve an assessment of the appropriateness of sampling on a small subset of degrees of freedom with Monte Carlo (output in INTHISTS_DI.dat may reveal unwanted initial state biases). The second is to change the simulation settings such that trajectory output is written. As noted above, simulation analysis requires some prior knowledge to inform choices, and the setting for FMCSC_CONTACTMIN is a good example here. It would certainly be useful to reanalyze an existing trajectory with different values for FMCSC_CONTACTMIN. The third is to think about the parameterization, in particular the data in the solvation group patch file.