CAMPARI Keywords


Full Keywords Index:

  1. Parameter File:
  2. Random Number Generator:
  3. Simulation Setup:
  4. Box Settings:
  5. Integrator Controls (MD/BD/LD/Minimization):
  6. Move Set Controls (MC):
  7. Files and Directories:
  8. Structure Input and Manipulation:
  9. Energy Terms:
  10. Cutoff Settings:
  11. Parallel Settings (Replica exchange (RE) and MPI Averaging) Settings:
  12. Output and Analysis:


Preamble


The overall setup of simulations becomes more and more involved and complicated with increasing numbers of options offered by simulation software, and CAMPARI is no exception here. Not all settings are relevant in all circumstances (in fact, often very few are), and a complete understanding of all keywords is clearly not required to use subsets of CAMPARI's functionality. Users should keep the following points in mind:
  • Most keywords have default choices. In case of doubt, check parsekey.f90 to locate the variable associated with the selection, and then initial.f90, allocate.f90, and sometimes other files for default assignments.
  • Not all keywords can be connected and arranged such that they group nicely. The documentation here groups keywords into a small number of sections, some of which end up being very large. This has both advantages and disadvantages.
  • For navigation, it is highly recommended to a) search for terms within the page with the help of the browser (all keywords are described within a single html-page), b) follow the links that are provided everywhere.
  • If an option is unclear, but easily testable, it is probably fastest to just try it out. If it is difficult to test, post a question on the SF forums.
  • The understanding of many implemented, standard methodologies requires the corresponding literature. This is why a bibliography is provided.
  • The fastest way to learn how to run basic simulations or perform trajectory analyses is to consult the various tutorials. Tutorials offer the chance to group information in a more natural workflow compared to the documentation here. They cannot explain all options in detail, though, and it is crucial to follow the links within the tutorial pages that point back to this and the other documentation pages.
A lot of very important and fundamental keywords are grouped in the section on simulation setup. A few other keywords in other sections, which can be very important and serve as hubs for finding associated keywords, but which are not necessarily quick to find are CARTINT (choice of degrees of freedom), SEQFILE (definition of system), RANDOMIZE (initial structure generation), INTERMODEL, LREL_MC, LREL_MD (all related to energetics), or REMC (replica exchange).

Notes on Nomenclature and File Parsing:

All keywords used by CAMPARI are named FMCSC_* where the different possible strings for "*" are explained below. This means that in your key-file the correct keyword to use to specify the simulation temperature is FMCSC_TEMP and not just "TEMP". There are only two exceptions to this, viz. keywords PARAMETERS and RANDOMSEED. This has purely historical reasons (as does the ad libitum acronym "FMCSC").
The beginning of log output will print some information regarding the parsing of information in the key-file. Superfluous lines should be masked as comments using the hash character ("#"). Lines that are neither empty nor comments will be pointed out unless they correspond to the two exceptional keywords just mentioned or unless they begin with the canonical prefix "FMCSC_". The keyword parser operates hierarchically meaning that some legitimate keywords will not be processed because the required base functionality has not been enabled (e.g., thermostat settings are not processed unless a gradient-based method is in use). This is done mostly to avoid needless warnings from popping up. All apparent keywords that have not been processed will be reported by the parser. However, the hierarchical dependence is not enforced stringently, which means that a keyword not reported in this list but appearing in the key-file does not automatically control a setting relevant to the attempted calculation. It is important to realize that the list of unprocessed keywords can also include misspelled ones. To make the detection of typos easier, it is recommended to comment or remove unused keywords from the key-file.
Finally, most read operations of simulation settings are prone to data type mismatch errors. Supplying a character value to a numerical setting will trigger a Fortran I/O error. The error message is usually informative yet the relevant position in the key-file is not reported. I/O in general (also for input files) may be made less error-sensitive in the future, but for now we apologize for this limitation.



Parameter File Keywords:


PARAMETERS

This keyword allows the user to provide the location and name of the parameter file to be used for the simulation. The different files offered by default (shipped with CAMPARI) are listed below:

Custom Parameter Sets:  

The parameter sets fmsmc*.prm are outdated and should be used with utmost caution. They contain no bonded parameters except dummy declarations and are therefore only suitable for torsional space calculations.
In general, the Lennard-Jones parameters for ions in these files require a cautionary note as they simply are those from Aqvist's work. They have not been specifically parameterized to work together with the ABSINTH continuum solvation model in case a full Hamiltonian is used (they merely have been shown to reside on the "safe" side). This is a matter of ongoing development. It may be be more appropriate to use parameters for ions that feature harder cores and better congruence between σii parameters and actual contact distances.


fmsmc.prm:
This are basic parameters fit for simulations in the excluded volume ensemble. As Lennard-Jones parameters, they employ Hopfinger radii with generic (and generally small) interaction parameters. They contain a reduced charge set derived from the OPLS brand of force fields but are thoroughly unsuitable for simulations with "complete" Hamiltonians if just for the fact that they lack support in many places.

fmsmc_exp.prm:
This file is identical to fmsmc.prm only that pairwise LJ-terms (σij) for pairs involving a polar atom and a polar hydrogen are specifically reduced. It also lacks support for phosphorus.

fmsmc_exp3.prm:
This file is identical to fmsmc_exp.prm only that LJ interaction parameters (εii) are raised for polar heavy atoms (nitrogen and oxygen).

fmsmc_exp2.prm:
This file is identical to fmsmc_exp3.prm only that LJ size parameters (σii) for common atoms atoms are bloated to approximately 107% which makes the parameter set more OPLS-AA-like in terms of LJ parameters.

abs3.2_opls.prm:
This file combines ABSINTH LJ parameters with the full OPLS-AA/L charges including the Kaminski et al. revision. OPLS-AA/L's bonded parameters are only retained inasmuch as they are required to maintain quasi-rigid geometries (i.e., bond length and angle potentials, improper dihedral potentials, and torsional potentials around bonds with hindered rotation). Comparison to the reference parameter set may be useful. In addition, the free energies of solvation are reduced by ~30 kcal/mol for ionic groups on biomolecules. This is the file used for most published work employing the ABSINTH implicit solvation model thus far.

abs3.1_opls.prm:
This file is identical to abs3.2_opls.prm only that the free energies of solvation are not artificially lowered by ~30 kcal/mol for ionic groups on biomolecules.

abs3.2_charmm.prm:
This file combines ABSINTH LJ parameters with the full CHARMM charges from version 22 (polypeptides) and 27 (polynucleotides), respectively. CHARMM's bonded parameters are only retained inasmuch as they are required to maintain quasi-rigid geometries (i.e., bond length and angle potentials, improper dihedral potentials, and torsional potentials around bonds with hindered rotation). Comparison to the reference parameter set may be useful. In addition, the free energies of solvation are reduced by ~30 kcal/mol for ionic groups on biomolecules. In conjunction with the ABSINTH implicit solvent model, CHARMM parameters probably offer the best combination of simplicity (small enough dipole groups) and completeness (support for both nucleotides and peptides as well as most terminal groups and some small molecules).

abs3.1_charmm.prm:
his file is identical to abs3.2_charmm.prm only that the free energies of solvation are not artificially lowered by ~30 kcal/mol for ionic groups on biomolecules.

abs3.2_a94.prm:
This file combines ABSINTH LJ parameters with the full AMBER charge set from the '94-revision (Cornell et al.). AMBER charges are generally not well-suited to be used in conjunction with the ABSINTH paradigm since the latter is most meaningful for small dipole groups with local neutrality. AMBER charges are determined by a more or less unconstrained QM-fit and spread polarization across the (arbitrary) unit of each residue (see FMCSC_ELECMODEL). AMBER's bonded parameters are only retained inasmuch as they are required to maintain quasi-rigid geometries (i.e., bond length and angle potentials, improper dihedral potentials, and torsional potentials around bonds with hindered rotation). Comparison to the reference parameter set may be useful. In addition, the free energies of solvation are reduced by ~30 kcal/mol for ionic groups on biomolecules. Please refer to the details provided for AMBER reference force fields below in order to obtain answers concerning AMBER-specific implementation details of force field parameters.

abs3.1_a94.prm:
This file is identical to abs3.2_a94.prm except that the free energies of solvation are not artificially lowered by ~30 kcal/mol for ionic groups on biomolecules.

abs3.2_a99.prm, abs3.1_a99.prm, abs3.2_a03.prm, abs3.1_a03.prm:
These files are analogous to abs3.2_a94.prm and abs3.1_a94.prm only that they incorporate AMBER parameters of revisions '99 (Wang et al., abs3.2_a99.prm, abs3.1_a99.prm) and '03 (Duan et al., abs3.2_a03.prm, abs3.1_a03.prm), respectively.

abs3.2_GR53a6.prm:
This file combines ABSINTH LJ parameters with full GROMOS53a6 charges. Note that GROMOS53 is a united atom model and that aliphatic hydrogens (which do exist here) therefore carry no charge. This appears inconsistent - at least compared to other force fields in which aliphatic hydrogens almost universally carry a small positive charge of less than 0.1e - but speeds up simulations with screened electrostatics interactions considerably. Bonded parameters are only retained inasmuch as they are required to maintain quasi-rigid geometries (i.e., bond length and angle potentials, improper dihedral potentials, and torsional potentials around bonds with hindered rotation). Comparison to the reference parameter set may be useful. In addition, the free energies of solvation are reduced by ~30 kcal/mol for ionic groups on biomolecules.

abs3.1_GR53a6.prm:
This file is identical to abs3.2_GR53a6.prm except that the free energies of solvation are not artificially lowered by ~30 kcal/mol for ionic groups on biomolecules.

abs3.2_GR53a5.prm and abs3.1_GR53a5.prm:
These files are analogous to abs3.2_GR53a6.prm and abs3.1_GR53a6.prm only for the a5-revision of the GROMOS53 charge set.

Some recommended settings for using any of these custom parameter files are listed below. Note that these are also the settings required to achieve an exact match with the ABSINTH reference.

FMCSC_UAMODEL 0
FMCSC_INTERMODEL 1
FMCSC_ELECMODEL 2
FMCSC_MODE_14 1
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_BONDED_B 0.0
FMCSC_SC_BONDED_A 0.0
FMCSC_SC_BONDED_T 0.0
FMCSC_SC_BONDED_I 0.0
FMCSC_SC_EXTRA 1.0

We do, however, recommend replacing FMCSC_SC_EXTRA being unity with FMCSC_SC_BONDED_T set to unity since the above files will typically contain (unless otherwise noted) the required and "native" bonded potentials for each parent force field. This ensures better parameter coherence (the ones used SC_EXTRA are taken from OPLSAA/L) and - more importantly - control over all torsional potentials (and bonded potentials in general) through the parameter file. If the system to be sampled contains proline residues, other flexible rings, or chemical crosslinks, it will also be necessary to set FMCSC_SC_BONDED_A, FMCSC_SC_BONDED_B, and FMCSC_SC_BONDED_I to 1.0 to avoid obtaining nonsensical results.



Reference Parameter Sets:  

The parameter sets below attempt to be as complete as possible for the biopolymer types supported by CAMPARI. In general, support for small molecules (which often use derived parameters) will often be limited (but can easily be added by the user). In addition, rare and generally poorly parameterized biopolymer constructs (such as zwitterionic amino acids or free nucleosides) may have incomplete parameter portings in particular of bonded parameters. If a perfect match of a certain parameter set paradigm cannot be achieved (against the reference implementation), this is stated explicitly.

oplsaal.prm (reference implementation: GROMACS 4.5.2)
This file provide full OPLS-AA/L parameters, i.e., it includes the Kaminski et al. revision of peptide torsions and sulphur parameters. Note that GROMACS 4.5.2 was used as the reference implementation (and not BOSS or MCPRO).
Required settings for emulating reference standard:

FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.5
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2

GROM53a6.prm, GROM53a5.prm (reference implementation: GROMACS 4.0.5)
This file provide full GROMOS53 parameters. Torsional potentials for which the same biotype is attached multiple times to an axis atom are only approximately supported by replacing the potential acting on just an arbitrary and single one of those atoms in the GROMACS reference implementation with proportionally reduced potentials acting on all of those atoms. This should be chemically more correct but prevents exact matches of torsional terms. The choice within GROMOS is motivated by computational efficiency, but evaluation of torsional terms is not a time-critical execution component in almost all present-day simulations (and trivially parallelizable). Moreover, cap- and terminal residues may have been adjusted to use more consistent parameters (terminal and cap residues are generally not specifically parameterized in GROMOS from what we can tell, in particular for polynucleotides). GROMOS uses a rather specific interaction model and represents aliphatic CHn moieties in united-atom representation. Note that revisions a5 and a6 only differ in a few partial charge parameters.
Required settings for emulating reference standard:

FMCSC_UAMODEL 1
FMCSC_INTERMODEL 3
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0

amber94.prm, amber99.prm, amber03.prm (reference implementation: AMBER port in GROMACS 4.5.2)
These files provide full AMBER parameters in three different revisions which differ mostly in their parameterization of torsional potentials for polypeptides. Note that support for terminal amino acid residues through the parameter file is marginal since AMBER's charge set is so detailed that each atom in each terminal residue would have to be an independent biotype. Normal polypeptide caps are fully supported, however. To allow a more accurate emulation of the AMBER standard for terminal polypeptide residues, the charge patch functionality within CAMPARI can be used. We have tested this for a few examples, and recovered 100% accurate matches to the AMBER standard that way. Keep in mind as well that the parameterization of terminal polymer residues is often the "sloppiest" component in a biomolecular force field since their impact on overall conformational equilibria is deemed small. Note that we did not use the actual AMBER software in the porting.
Required settings for emulating reference standard (skipping eventual charge patches):

FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.833
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2

charmm.prm (reference implementations: CHARMM35b2 and CHARMM38b1)
This file provides access to simulation employing the full CHARMM parameters as provided in parameter set 27 for polypeptides and polynucleotides. CMAP corrections for polypeptides are supported and included. Note that <ABSINTH_HOME> should be the exact same directory specified in the localization of the Makefile (see installation instructions). To simulate polynucleotides with 5'-phosphate groups using 100% authentic CHARMM parameters for the terminal phosphate, the charge patch functionality within CAMPARI has to be used. The same applies to the polarization on the hydrogen atoms on the NH2 groups in guanine and cytosine (this is a much smaller effect, though; also compare FMCSC_AMIDEPOL). Similarly, the use of the amidated (NH2) C-terminus in polypeptides requires use of the biotype patch and other patch functionalities. CAMPARI's port of CHARMM parameters generally offers the most complete support for the systems supported natively by CAMPARI, e.g., for phosphorylated amino acid side chains.
Required settings for emulating reference standard:

FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_AMIDEPOL 0.01 # or -0.01
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_M 1.0
FMCSC_CMAPDIR <ABSINTH_HOME>/data
FMCSC_SC_EXTRA 0.0

charmm36.prm (reference implementations: CHARMM38b1 and CHARMM39b1)
This file incorporates the various revisions of the CHARMM force field contained in parameter set 36. All other comments made for parameter set 27 apply here as well.
Required settings for emulating reference standard:

FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_AMIDEPOL 0.01 # or -0.01
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_M 1.0
FMCSC_CMAPDIR <ABSINTH_HOME>/data
FMCSC_SC_EXTRA 0.0

In order to create a new parameter file, it is advisable to start with "template.prm". For details on the paradigms underlying the construction of a parameter file consult the detailed documentation on this topic.



Random Number Generator Keywords:


(back to top)

RANDOMSEED

This keyword allows the user to provide a specific seed for the PRNG. This is usually relevant in two contexts:
  1. Reproducibility:
    Eliminate mismatches between different versions of the program (for example) by doing the stringent test that the results must be exactly the same if the PRNG is seeded with the same seed. Such tests may occasionally be hampered by a lack of precision in any input files and in particular by different compiler/architecture optimization levels.
  2. Timing:
    Eliminate identical calculations if jobs are submitted simultaneously. Normally the PRNG uses a seed derived from from system time, which can be identical if jobs are submitted exactly in parallel. Avoiding this behavior by specifying different values for RANDOMSEED is only adequate if the jobs are indeed submitted as individual, serial jobs. Conversely, in intrinsically parallel applications (MPI), CAMPARI uses the node number to vary the seed across different nodes unless RANDOMSEED is specified. This means that a provided value for RANDOMSEED will homogenize the PRNG across all replicas which is almost always undesirable.
(references)



Simulation Setup:


(back to top)

UAMODEL

This keyword is a simple but very important switch. It allows the user to control whether non-polar hydrogens are going to be part of the system's topology or not. In particular in earlier simulation work, it was a common and convenient trick to improve simulation efficiency by uniting all atoms of a methyl or methylene group into a single, coarse-grained "united atom". Different force fields used or use different varieties of this trick. In the GROMOS line of force fields, for instance, all aliphatic hydrogen atoms are merged into the carbon atoms they are bonded to. Conversely, the CHARMM19 protein force field in addition eliminates non-polar hydrogens bound to sp2-hybridized carbon atoms in aromatic rings.
Unlike other simulation software, CAMPARI maintains a complete internal "knowledge" of biomolecular topology of those systems it allows the user to build from scratch. Therefore, choosing between all- or united-atom models is not simply a matter of parameter files (although it is possible to create inefficient united-atom variants of force fields by disabling all interaction parameters pertaining to the required hydrogens). Instead, the software itself requires knowledge of this choice.
Choices are:
  1. Use an all-atom model for those molecules represented explicitly.
  2. Use a united-atom model according to GROMOS convention, i.e., all aliphatic hydrogen atoms are merged into the carbon atoms they are linked to (this does include terminal aldehyde hydrogen atoms).
  3. Use a united-atom model according to CHARMM19 convention, i.e., all aliphatic and all aromatic hydrogens bound to carbon atoms are merged into the latter.
Currently, the only natively supported united-atom force field in CAMPARI is GROMOS53a5/6 (see above). One technical caveat is that using the latter two options may require setting UNSAFE to 1. This is because the atom type parsing reads in atom valences which are altered and - more importantly - may vary between atoms that for simplicity were assigned the same type.
Outside of simulations using the GROMOS force field, and outside of future extensions to support CHARMM19, this keyword is most useful when using CAMPARI to analyze trajectory data generated by other software using such a united-atom force field. Such a run would not tolerate atom number mismatches between the internal representation of the system and what is found in the binary trajectory files (mismatches are acceptable only if the input format is pdb → see below). Note that this keyword has no impact on systems involving residues not supported natively by CAMPARI (→ sequence input and PDB input).

PDBANALYZE

This keyword is a simple but very important logical. It specifies whether the proposed simulation is a trajectory analysis run: in these, a pdb- (or xtc-, dcd-, NetCDF)-trajectory is read from file and analyzed with CAMPARI's internal analysis routines. The desired format is chosen with keyword PDB_FORMAT. All outputs and parameters are completely analogous to normal calculations. Essentially, the snapshot read-in replaces the sampling step. This means that low analysis frequencies will be desirable, since usually the number of snapshots will be relatively small compared to the number of simulation steps in a typical simulation. Note that - in particular for large systems (> 104 atoms) - the analysis run may be slowed down by:
  1. Certain time-consuming analyses scale poorly with the number of atoms (solution structure analyses, see for example PCCALC or CLUSTERCALC).
  2. At each step, the global system energy is calculated using - depending on the setting for DYNAMICS - either CAMPARI's energy (MC) or force (MD/LD) routines and making little to no simplifying assumptions. To ensure decent speed, this may require setting the system Hamiltonian to zero (see below) and/or using an efficient cutoff / neighbor-list routine (see CUTOFFMODE).
  3. Very large files in particular in pdb-format may cause memory shortages which slow down the machine entirely. In general binary trajectory files in conjunction with an optional template file are the preferred and much faster way of performing analysis runs.
Note that it is important in analysis runs to set NRSTEPS and EQUIL to the required values (the number of steps generally becomes the number of snapshots in the trajectory file) and that a fair amount of CAMPARI's simulation options are (naturally) not supported in such a run. Some sanity checks specific to pdb trajectories can be disabled with the help of keyword UNSAFE. The analysis can also be restricted to a subset of simulation snapshots by using a frames file, which can carry analysis weights per snapshot. This is commonly necessary when trying to reweight simulation data to different conditions or Hamiltonians. Frames files can also be used to extract, reorder, or duplicate parts of trajectories. Both weighted analyses and analyses performed on arbitrary subsets of the input data carry some intricacies with the built-in analysis routines, which are described elsewhere.
When using an MPI executable of CAMPARI in parallel, it is also possible to perform trajectory analysis across many processors. This uses the replica exchange setup and is described in detail elsewhere. The three primary applications are simultaneous analysis of several trajectories, the unscrambling of replica exchange trajectories that are normally output continuously for a given condition, and the post facto computation of energetic overlap distributions. Specific analysis routines (such as DSSP analysis may be restricted to specific types of residues, and this may limit the utility of these routines for entities that are not natively supported by CAMPARI (see sequence input). In general, analysis runs on systems featuring unsupported residues should be relatively straightforward. This is true at least as long no energetic analyses are required (which naturally entails the complex issue of parameterization).

NRSTEPS

This keyword sets the total number of simulation steps including equilibration.

EQUIL

This keyword specifies the total number of equilibration steps. This implies that no analysis is performed as long as the current step number does not exceed this value. Note that this also means that no structural output (trajectory) is produced. Conversely, certain necessary diagnostics are provided irrespective of equilibration (see for example ENOUT or ACCOUT).

TEMP

This keyword sets the absolute (target) temperature in K.

PRESSURE

This keyword allows the user to specify the absolute (target) pressure in bar (not yet in use).

ENSEMBLE

This crucial keyword determines which ensemble to simulate the system in. The options available are limited in that they depend strongly on the type of sampler (i.e., there is no NVE (microcanonical) ensemble if sampling is done via Monte Carlo → DYNAMICS).

The options are as follows:
   1) NVT (Constant Particle Number, Constant Volume, Constant Temperature):
Always available, this is the canonical ensemble and currently the only option available for pure Monte Carlo runs.
   2) NVE (Constant Particle Number, Constant Volume, Constant Energy):
The microcanonical ensemble (adiabatic conditions) is only supported (and possible) for non-dissipative, i.e., Newtonian dynamics (see option 2 in DYNAMICS).
   5) μiVT (Constant Chemical Potential(s), Constant Volume, Constant Temperature):
This requests the grand canonical ensemble where the number of particles in the system is allowed to fluctuate. Subscript i indicates that not all particle types may be subject to number fluctuation (typical for example in the simulation of macromolecules and a (co-)solvent atmosphere for which only the small molecule would be treated in "grand" fashion. This implies that technically incorrect hybrid ensembles are populated (sometimes referred to as "partially grand" ensembles). The rigorous grand canonical ensemble would require all particle types to be permitted to fluctuate in number. Such partially grand ensembles are not to be confused with the "semigrand" ensemble (see below). Technically, the GC ensemble is realized in CAMPARI by allowing molecules to transfer between a real and a shadow existence, the latter also serving as the reference state. The discreteness of transitions between shadow and real existence implies that currently the grand ensemble is only available in pure Monte Carlo simulations. Note that currently the reference state is modeled in the infinite dilution limit (there are no intermolecular interactions). This is consistent with the default implementation choice (→ GRANDMODE), in which the bath communicates with the system via an expected bulk concentration and an excess chemical potential correcting for the interactions arising from that finite bulk concentration.
   6) ΔμiNtVT (Constant Chemical Potential Difference(s), Constant Total Particle Number, Constant Volume, Constant Temperature):
This requests the semigrand ensemble as originally formulated by Kofke and Glandt (1988), in which particle types are allowed to fluctuate in number under the constraint that the total particle number (Nt) remains constant. Just like for the μiVT-ensemble, CAMPARI allows the definition of partial semigrand ensembles in which - for example - a bath of water and methanol solvating a macromolecule is subjected to moves attempting to transmute methanol into water or vice versa. Note that the amount of real-world applications for such an ensemble to be appropriate is very small. Technically, the constraints to keep Nt fixed may improve acceptance rates in dense fluid mixtures. For both options (5 and 6), please refer to the documentation for the particle fluctuation file, specified using PARTICLEFLUCFILE, for details. Note that the sanity of results obtained with any partial grand or semigrand ensemble must be investigated with utmost care.

 To be added in the future:
   3) NPT (Constant Particle Number, Constant Pressure, Constant Temperature):
May eventually be made available for MC and Newtonian MD runs
   4) NPE (Constant Particle Number, Constant Pressure, Constant Enthalpy):
May eventually be made available for Newtonian MD runs
  
Note to developers: there is rudimentary support for NPT and NPE ensembles in CAMPARI right now but those branches are completely disabled.

GRANDREPORT

If an ensemble is chosen that allows particle number fluctuations, this keyword acts as a simple logical whether or not to write out a summary of the grand-canonical setup, i.e., which particle types are allowed to fluctuate in numbers, what the initial numbers (bulk concentrations) are, and what (excess) chemical potentials are associated with those.

GRANDMODE

If an ensemble is chosen that allows particle number fluctuations, this keyword acts to choose between two different implementation modes. In the first (choice 1), file input is used to provide CAMPARI with the initial numbers and absolute chemical potentials of fluctuating particle types. This is generally inconvenient for cases with realistic interaction potentials and/or multiple fluctuating particle types that require coupled chemical potentials (such as individual ionic species). The bulk concentrations are set implicitly by the chemical potentials. This formulation involves the "thermal volume" of particles meaning that a monoatomic ideal gas will require a mass-dependent chemical potential. In the second option (choice 2, which is the default), the same file input is used to set the bulk concentration explicitly (based on the initial particle number provided), and the chemical potentials listed are merely the excess terms. This formulation involves no mass-dependent terms, is numerically more stable (accuracy of exponentials), and provides an easy reference limit for dilute solutions (zero excess chemical potential).
To illustrate the difference in implementation, consider the additional contribution to the acceptance probability (term cb in description of keyword MC_ACCEPT) of a particle insertion attempt:
Mode 1:
cb = eβμideal · eβμexcess · V· (N+1)-1· ζ-1
Here, V is the system volume, N is the current number of particles of the type to be inserted, μideal and μexcess are the components of the chemical potential, and ζ is the aforementioned thermal volume.
Mode 2:
cb = eβμexcess · <N> · (N+1)-1
This equation contains the expected bulk concentration as <N>.
While numerically the two cases can be made equivalent, the latter contains a self-consistency check by being able to compare the measured <N> to the assumed <N> given the chosen μexcess. In the former, the assumed <N> is unknown, because the partitioning between μideal and μexcess is not explicit. For a single-component system (or a system with multiple independent components), the measured <N> can be used to derive the μexcess that the simulation essentially corresponded to. With dependent components, however, this becomes very difficult to adjust. For general calibration strategies of excess chemical potentials and background, see references.

DYNAMICS

This is one of the core keywords and specifies how to sample the system:
   1) Pure Monte Carlo sampling (see keyword MC_ACCEPT and section on Monte Carlo move sets)
   2) Molecular Dynamics:
Integration of Newton's equations of motion. In internal coordinate space (see CARTINT), this is fully supported, but is based upon an unpublished algorithm. Some more details are found in the documentation to keywords TMD_INTEGRATOR

          Specifically:
  1. Dynamics are performed on internal degrees of freedom which are assumed to be independent (rigid body translation, rotation around the Cardinal x, y, and z axes of the laboratory frame (static) centered at the center of mass of each molecule, torsional degrees of freedom).
  2. Dynamics for polymers vary along the chain (faster at the termini) as they should, but this does not happen in any fashion proven to comply rigorously with a specific dynamics. By altering the chain alignment mode, more exotic dynamics can be produced. This is because the building directions of any polymer chains represent an arbitrary choice in the method.
  3. By assuming a diagonal mass (inertia) matrix (viz., a block of the mass metric tensor), applicability of simple integrators is a given. In the absence of interaction-based forces, the goal is to preserve rotational kinetic energy (but not angular momentum) by considering the effective masses associated with various rotational degrees of freedom as time-dependent variables in a discrete integration scheme. This treatment is intrinsically consistent, and agreement with data obtained from Monte Carlo simulations has been shown (for select cases). However, no generalized proof exists for thermodynamic averages obtained with this method. CAMPARI provides a simple diagnostic of the impact of assuming a diagonal mass matrix by printing kinetic energies in both internal and Cartesian coordinates to log-output.
  4. Because the algorithm does not produce dynamics that obey Gauss' principle of least constraint or conserve angular momentum, integrator stability can be inferior to that for a case of identical constraints realized as holonomic constraints in Cartesian molecular dynamics. This effect cannot always be quantified since the holonomic constraints implied by the internal coordinate space treatment often become too highly coupled for linear solvers to converge (→ SHAKEMETHOD). Select cases with quickly varying masses highlight the effect, and the most significant example are probably rigid-body simulations of water (water has tiny rotational inertia and is a prototypical test case for rigid-body integrators). Quantification of relative integrator stabilities for such a case can be performed.
  5. Subtle equipartition artifacts (i.e., some individual or collective degrees of freedom heating up at the expense of others because they are either more susceptible to integration error or weakly coupled to the rest of the system) can always occur. Effects differ between internal coordinate and Cartesian treatments. This is because dihedral angles will generally have a rather different level of energetic coupling and integration stability than the positional coordinates of an atom embedded in a polyatomic molecule.
Sampling all Cartesian coordinates of all atoms represents the more canonical approach to molecular dynamics. These algorithms are conceptually much simpler and hence - from a theoretical point of view - more robust. In practice, however, an entire construct of additional procedures is needed in almost all cases, for example the enforcement of holonomic constraints through appropriate algorithms such as SHAKE or LINCS. Most three-point water models are explicitly calibrated as rigid models, and it is therefore necessary to maintain water geometry as a set of holonomic constraints throughout a Cartesian dynamics simulation. However, most of these procedures aim at improving simulation efficiency and overcoming inherent time step limitations in the Cartesian treatment.
   3) Langevin Dynamics:
Integrations of Langevin equation of motion. This is supported via the impulse integrator due to Izaguirre and Skeel (reference). With respect to the torsional dynamics implementation, the same caveats apply as for Newtonian dynamics. There is an additional limitation in that the only implementation supported currently is an approximate scheme (corresponding to keywords TMD_INTEGRATOR being 2 and TMD_INT2UP being 0). This is because the structure of the impulse integrator is more complex, thus allowing a straightforward extension to our torsional dynamics only for the simplest case (research in progress). Note that all LD simulations work in the fluctuation-dissipation limit, which means that all degrees of freedom are automatically coupled to a heat bath, and which assumes an underlying continuum providing frequent collisions as the source of the stochastic term as well as the frictional damping. In addition, note that hydrodynamic interactions are neglected and that currently there is only a single, uniform frictional parameter for all degrees of freedom (see FRICTION). The latter is a major and non-obvious assumption in internal coordinate spaces featuring polymers with flexible dihedral angles. This is because it is not clear what the frictional drag incurred by rotations around molecular bonds is and what the results of ignoring communication between these drag effects are.
   5) Mixed Monte Carlo and Newtonian (Molecular) Dynamics:
This hybrid method mixes MC with MD sampling and assumes consistency of ensembles at all times. Since MC sampling only supports the canonical ensemble at the moment, this means that Newtonian MD has to be performed with a thermostat preserving the correct ensemble, e.g., the Andersen or Bussi et al. schemes. Then, the entire trajectory should be treatable as a Markov chain and analysis is performed as if the sampling engine were one of the two.
A potential caveat lies in velocity autocorrelation. The method is implemented such that segments of MC sampling alternate with MD segments. Upon switching from MC to MD, new velocities are assigned from the proper Boltzmann distribution. This may introduce some amount of noise. Aside from this particular concern, all independent concerns about both Monte Carlo and dynamics-based methods apply. It is up to the user to ensure that either sampler yields the required ensemble rigorously.
A particular concern lies with the selection of degrees of freedom. In general, it will be highly desirable for the set of sampled degrees of freedom to be exactly identical between the two samplers. This is not always possible, however, e.g., when sampling sugar pucker angles in MC, but not in dynamics. In these scenarios it will be desirable to use short segments lengths in order to improve the chances of convergence (in the given example, convergence is unlikely if long dynamics segments only "see" few frozen conformations of the sugar pucker states in the system). This issue is particularly difficult in mixed Cartesian/internal coordinate space simulations attainable by selecting a hybrid scheme here and 2 for CARTINT. Some improvement can be made by including geometric constraints in Cartesian space, but a rigorous match will generally be out of reach.
Technically, the simulation simply alternates between MC-based and dynamics-based segments whose minimum and maximum lengths are controllable by the user (→ keywords CYCLE_MC_FIRST, CYCLE_MC_MIN, CYCLE_MC_MAX,CYCLE_DYN_MIN, and CYCLE_DYN_MAX).
   6) Minimization:  
This uses the potential energy gradient to steer the system to a near minimum through a variety of techniques (see MINI_MODE). Minimization is not a technique to sample phase space in terms of a well-defined ensemble, and the closest approximation of its results is probably that of a locally sampled constant-volume (NVT) condition at extremely low temperature. In general, minimizers are apt at finding local, but not global minima. Note that these algorithms are still numerically discrete schemes, i.e., they employ finite step sizes. This means that irrespective of any theoretical guarantees or expectations an algorithm offers, results may not always be as straightforward. In addition, minimizers are poor tools if the basic step sizes should be heterogeneous for different degrees of freedom, e.g., for a dilute phase of Lennard-Jones atoms or clusters.
   7) Mixed Monte Carlo and Langevin Dynamics:
This is analogous to 5) only that Newtonian dynamics are replaced with Langevin dynamics (see 3). (example reference)

  To be added in the future are:
   4) Brownian Dynamics

Note that in all of the above methods relying on forces (options 2-7), it is very likely that optimized loops will be used (depending on settings for the Hamiltonian). These currently have the property of using stack-allocated array variables that may become large if cutoff settings are very generous or if no cutoffs are in use. This may lead to unannotated segmentation faults (depending on compiler, architecture, and local settings). There are several workarounds (on Unix-systems, the shell command "ulimit" can for example be used to increase stack size for the local environment) some of which will be compiler-specific (for example to force the compiler to always allocate local arrays from the heap). Stack access is faster and therefore generally desirable in the speed-critical portions of the code.

MC_ACCEPT

If the simulation uses (at least partially) Monte Carlo sampling, this very important keyword allows the user to choose between (currently) three different types of acceptance rules for MC moves that are as follows:
  1. The Metropolis criterion is used. A random number sampled uniformly over the interval is compared to the term cb·e-β ΔU. Here, ΔU is the difference in (effective) energy of the new vs. the original conformation (Unew - Uold), β is the inverse temperature, and cb is a bias correction factor that is specific to the move type. If the random number is less than the term above, the move is accepted. Note that cb can encompass different types of bias. It is also important to keep in mind that some advanced move types may imply incorporating biasing terms during the picking of a new conformation (see TORCRMODE), and no longer show up in cb. The Metropolis criterion has the advantage that it is rejection-free in the limit of no energetic or other biases. With a non-zero energy function in place, the distribution sampled from is the Boltzmann distribution.
  2. A Fermi criterion is used. A random number sampled uniformly over the interval is compared to the term (1 + cb-1·eβ ΔU)-1. If the random number is less than the term above, the move is accepted. The Fermi criterion's only advantage over the Metropolis criterion is that it defines an actual probability on the interval [0,1]. The downside is that the limiting acceptance rate is only 50%. However, the impact is much weaker if ΔU is relatively large on average (in absolute magnitude). The sampled distribution is again the Boltzmann distribution.
  3. A Wang-Landau / Metropolis criterion is used. A random number sampled uniformly over the interval is compared to the term cb·eβ Δln T or to the term cb·e-β ΔU - Δln T (see keyword WL_MODE). Here, Δln T is the difference in the logarithms of the current and proposed estimates of the target distribution (e.g., the density of states), i.e., Δln T = ln Tnew - ln Told. The Wang-Landau algorithm is explained in detail elsewhere, but it should be pointed out that the sampled distribution is no longer the Boltzmann distribution (instead it is ill-defined, and the simulation results require snapshot-based reweighting), the simulation does not satisfy detailed balance (the estimate of the density of states changes continuously), and convergence/errors are much more difficult to assess (since the method is essentially an iteration and not an equilibrium sampling scheme). It is crucial to keep in mind that the standard Metropolis criterion is used while the simulation has not exceeded the number of equilibration steps. This is mostly to avoid range problems when starting from random initial configurations.
Note that replica-exchange swap moves are currently not affected by this choice (they always utilize the Metropolis (default) choice to determine move acceptance).

FRICTION

This keyword allows the user to specify the uniform damping coefficient acting on all degrees of freedom. The value is interpreted to be in ps-1. Currently, this is only relevant if DYNAMICS is set to either 3 or 7. In Langevin dynamics, the velocity damping through friction is given by e-γ·δt. Here, γ is the damping coefficient, and δt is the integration time step (see TIMESTEP). Note that in Cartesian dynamics (see CARTINT) each degree of freedom is an orthogonal direction of the Cartesian movement of each atom. Typically, Langevin dynamics integrators may make the friction on those degrees of freedom dependent on atom mass but CAMPARI does not support this at the moment since the hydrodynamic properties of individual atoms are poorly described in any case. Conversely, in torsional dynamics, the rigid-body and torsional degrees of freedom of each molecule are integrated and the friction is applied uniformly to all of those. This means that hydrodynamic properties are - again - ill-represented. Bias torques on account of variable effective masses for most dihedral angle degrees of freedom will continue to be in effect (see elsewhere).
When applying Stokes' law (which should be inapplicable when the diffusion object is strongly aspherical and/or of similar size compared to the molecules comprising the surrounding fluid) to the self-diffusion of water, the measured diffusion constant of around 2.3·10-9 m2s-1 is roughly consistent through the Einstein-Stokes equation with the measured viscosity of about 8.9·10-4 kgm-1s-1 (both at 25°C). By dividing by the mass, a damping constant of about 90ps-1 can be obtained from the Stokes approximation. When performing stochastic dynamics simulations of large, spherical rigid bodies, such a value may be appropriate. For molecular simulations, however, it is not. First, in conjunction with typical time steps, the value is so large that the impulse integrator in use (→ DYNAMICS) can no longer sample the correct ensemble (it becomes overdamped implying temperature artifacts). Second, in a Cartesian treatment, unless one samples a monoatomic fluid of inert particles, the correlations between particles are so high that a treatment as independently diffusing spheres is not just inaccurate, but nonsensical in the absence of hydrodynamic interactions. Third, in internal coordinate spaces, the individual degrees of freedom hardly ever fit the Stokes approximation. Torsional and rigid-body rotational degrees of freedom would require a completely different model of friction. Furthermore, unlike in a Cartesian treatment, the degrees of freedom are not all similar to one another. The above means that the damping constant should be understood as an empirical parameter. Better control over values for individual degrees of freedom will be implemented in the future. It defaults to a value of 1.0 ps-1 on par with the coupling times of thermostats in molecular dynamics (→ TSTAT_TAU).

CYCLE_MC_FIRST

If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the length of the first segment (in number of steps) which is always a MC segment. This is to ensure that hybrid runs can safely be started from poorly equilibrated (random) structures where forces are large and integrators quickly become unstable.

CYCLE_MC_MIN

If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the minimum length of MC segments (in number of steps) with the exception of the first segment.

CYCLE_MC_MAX

If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the maximum length of MC segments (in number of steps) with the exception of the first segment.

CYCLE_DYN_MIN

If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the minimum length of dynamics-based segments (in number of steps). This should probably be significantly larger than the velocity autocorrelation time of the system.

CYCLE_DYN_MAX

If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the maximum length of dynamics-based segments (in number of steps).

PH

This keyword sets the assumed simulation pH which currently possesses significance for titration moves only → PHFREQ. This keyword may later be extended to represent the assumed (bath) pH in constant-pH simulations.

IONICSTR

This keyword sets the assumed simulation ionic strength for simplified pKa computations. The units are molar (M). Ionic strength is used in a grossly simplified Debye-Hückel approach to estimate cross-influences between multiple ionizable sidechains on a polypeptide ( see PHFREQ). Note that this keyword can not be used to set an assumed ionic strength for the generalized reaction-field method (see RFMODE).

RESTART

A simple logical whether to restart a previously discontinued run:

This keyword tells the program to attempt to restart a simulation which was accidentally or intentionally terminated. The program writes out ASCII-files containing all relevant information in comparatively high precision (see RSTOUT). This file (one for each node in MPI calculations) is called {basename}.rst (see elsewhere). If it is successfully read, the simulation is extended from the simulation step the file was last written for. Non-synchronous MPI runs are synchronized to the step number of the slowest node. Note that instantaneous output of the crashed run should be saved separately (i.e., moved to another directory) since with the exception of running trajectory pdb/xtc/dcd-output new files will replace the old ones. All non-instantaneous analysis of the crashed run is unfortunately lost. The simulation will then proceed starting effectively at that step, so the same key-file (with the exception of the RESTART-keyword itself of course) can be used. If it is past the equilibration step, on-the-fly analysis will begin immediately. Final output will reflect only the restarted portion of the run. The program will acknowledge in the log-file that it's restarting, and will post a warning message if the energies of the structures reported in the restart-file and re-computed by the program are inconsistent. Note that it is - rigorously speaking - only safe to restart the exact same calculation, since the information contained in the restart file will depend on the type of calculation performed. It will often be possible to start MC runs (see DYNAMICS) from a non-MC restart file, however. For the opposite and all other cases, consider using the auxiliary keyword RST_MC2MD. Finally, it should be noted that in dynamics calculations the restart is not fully deterministic (i.e., it deviates from the original run (which is typically unknown) after a few thousand steps depending on the system). The reasons for this behavior mostly lie in the lack of precision of the data in the restart file.

RST_MC2MD

This is a rather specialized keyword meant for the specific case of (re)starting a dynamics run from a restart-file generated by an MC run. In this case, the restart file is shorter and only contains atomic positions, the Z-matrix, and whatever else is necessary. When set to 1, this keyword instructs the restart-file reader to assume the MC format even though the run is set to be a dynamics run (see DYNAMICS). Initial velocities are then generated from a Boltzmann distribution using the bath temperature (see TEMP). Ff this keyword is not set, an attempt to read mismatched restart files will crash the program (most likely in a segmentation fault). This is due to the assumed rigid formatting. The inverse procedure (reading a restart file generated by a dynamics run as the starting point for an MC run) is currently not supported. Note that the typical application for this is to use MC for equilibration of a system and to continue the run using a dynamics sampler. In single-CPU calculations, this simplifies the overall procedure and avoids using the generally low-precision pdb format as an intermediate step (although this can be adjusted with keyword PDB_OUTPUTSTRING). For some replica-exchange runs (see REMC), restart files are actually the only option which allows starting the individual nodes from individual, non-random conformations stored in an input file. The primary application for this keyword therefore probably lies in replica-exchange molecular dynamics runs which use Replica-Exchange Monte Carlo runs for equilibration purposes.

DYNREPORT

This minor keyword is a simple logical which ensures that in calculations with different temperature-coupling groups a summary is provided of the partitioning in that regard.

CHECKGRAD

This keyword is a simple logical which instructs CAMPARI to test the gradients for the current calculation given the Hamiltonian, system, and starting structure. It tests Cartesian gradients first, followed by the transformed gradients acting on the internal degrees of freedom (if settings allow that: see CARTINT). It is mostly for developer's usage and creates at most two undocumented output files: NUM_GRAD_TEST_XYZ.dat and NUM_GRAD_TEST_INT.dat). The procedure works by numerically computing gradients using pure energy routines (finite differencing) and juxtaposing the analytical solution. It is slow and can sometimes be misleading or uninformative for the following reasons:
  1. For just a single molecule, rigid-body gradients are always net zero (outside of boundary contributions).
  2. The dynamics Hamiltonian must be identical to the MC Hamiltonian (in particular see LREL_MC and LREL_MD).
  3. For Cartesian gradients to be accurate, no strictly torsional space Hamiltonian terms should be used (see for example SC_ZSEC and SC_TOR). For those, Cartesian gradients are circumvented unless CARTINT is 2.

UNSAFE

This keyword is a simple logical (default off) which allows selected fatal errors to be transformed into warnings (for example the simulation of systems which are not net-neutral). It should be used with caution (obviously) and the log-output should always be studied meticulously. In addition, enabling unsafe execution may skip some costly sanity checks, e.g., when reading in trajectories in pdb format.

CRLK_MODE

CAMPARI currently provides limited support in dealing with chemical crosslinks which either create one (or multiple) intramolecular loops, or link multiple molecules together. For force-based sampling in Cartesian space only (see CARTINT and DYNAMICS), this functionality matters exclusively for the following reasons:
  1. A chemical crosslink can be thought of as a branch in the main-chain. Such non-linear polymers violate CAMPARI's model of identifying topologically connected sequence neighbors purely based upon primary sequence. Therefore, non-bonded interactions have to be corrected if the two residues in question are crosslinked to each other (to comply with the settings provided via INTERMODEL and ELECMODEL). This is supported by CAMPARI independent of crosslink type (even though there currently are only disulfide linkages supported → sequence input).
  2. A single intermolecular crosslink essentially merges two molecules into a single one. However, CAMPARI continues to treat both chains as if they were independent molecules. This has a variety of reasons most of which pertain to the consistency of internal data representation and to the support of internal analysis routines. One area where this is tricky is for simulations in periodic boundary conditions (→ BOUNDARY), as shift vectors are generally applied only to intermolecular contacts. For two crosslinked molecules, this continues to be the case thereby allowing - given a poor simulation system setup - the theoretical possibility of one of the two crosslinked molecules to interact with parts of different images of the other molecule. Trajectory output may also appear confusing for the same reason.
  3. New bonded interactions are created which have to be correctly accounted for. In accordance with the previous point this implies that distance vectors have to be image-corrected in periodic boundary conditions even for those. For the crosslink to be actually established it is necessary that the parameter file offer support for the required bond length, angle, and dihedral terms. This is of course true for any topological interaction in a Cartesian treatment. Request a report to obtain more information at the beginning of the simulation.
  4. For random initial structures it will be necessary for the crosslink to be satisfied to allow stable integration of the equations of motion. This is elaborated upon elsewhere.
  5. If the ABSINTH implicit solvation model is used (→ SC_IMPSOLV), the crosslink usually modifies two solvation groups (one on each "side") to yield a single new unit. CAMPARI will typically split this group such that the solvation groups may remain associated with their "host residue".
This keyword becomes relevant only if sampling occurs in rigid body / torsional space either through Monte Carlo- or torsional dynamics-based sampling. There are two modes of operation to be supported:
  1. The crosslink is treated as restraints and the sampler is unaware of its explicit existence.
  2. The crosslink is treated as a set of (hard) constraints and the sampler is adjusted to preserve these constraints. This mode is currently under development and not yet supported.
This means that in order to use mode 1, bonded terms have to be defined in the parameter file which keep the crosslink intact. New internal degrees of freedom are implicitly defined: Usually, these will comprise of a single bond length, two bond angles, and three dihedral angles (using disulfide bonds as an example: the -S-S- bond, the two -Cβ-S-S- bond angles, the two χ2-torsions in both cysteine sidechains, and the dihedral angle defined by atoms -Cβ-S-S-Cβ-). The former are three non-canonical degrees of freedom in rigid body / torsional space and this mixing certainly is an approximation defensible for technical reasons only. Beyond that, the sampler is expected to continue to be able to explore conformational space for the new phase space. In detail, this means that sampling in torsional / rigid-body space using force-based algorithms (see CARTINT and DYNAMICS) will be vastly favored over MC sampling (although a hybrid setting may be advantageous). This is so because the MC move set is not adjusted to reflect the restraints and will suffer from very poor acceptance rates. For example, consider an intermolecular crosslink and rigid body moves trying to displace one of the two crosslinked moieties. Alternatively, consider the ineffectiveness of pivot-style moves for residues within the loop in the presence of an intramolecular crosslink.
The latter is the primary reason for supporting mode 2 in the future. Here, the move set will be explicitly adjusted to only allow moves which automatically satisfy the crosslink exactly. For torsional dynamics this option will be less useful as CAMPARI does not possess the capability to enforce high-level loop closure constraints in torsional space and consequently all residues within the loop region would have to be completely constrained for the crosslink to remain intact exactly.

BIOTYPEPATCHFILE

This simple keyword lets the user provide the location and name of an optional input file that can be used to (re)set the assigned biotypes for specific atoms or groups of related atoms in the system. The corresponding biotype number has to be available (listed) within the parameter file in use. Biotypes are the most fundamental assignment for atoms within in CAMPARI and can indirectly set many other properties such as charge, mass, etc. This is explained in detail elsewhere. However, there are parameters not affected by biotype assignment, specifically the default geometries and parameters derived from them. This means that it is generally impossible to, for example, mutate a molecule into a different molecule using such patches. Applications of this type may be more feasible for simulations in Cartesian space.
The main domains of application for biotype patches are twofold. First, they allow the fastest and most convenient route to include parameter support for atoms in residues not supported natively by CAMPARI (→ sequence input). Second, they allow to diversify a parameter file regarding natively supported residues, .e.g., by maintaining multiple parameterizations for a small molecule or by including extra distinctions for atoms in terminal polymer residues. Biotype patches are applied first and may be largely overridden by successive application of other patches, e.g., atom type patches, charge patches, etc.

MPATCHFILE

This simple keyword offers the user to provide the location and name of an optional input file that can be used to alter the masses of specific atoms in the system (in g/mol). Normally, masses are chosen for atoms based on the assigned atom types in the parameter file, and this behavior can be overridden by this keyword specifically for atomic mass. Note that this different from changing the atom type of the atom itself, for which a dedicated patch facility is in place. Some more details are given elsewhere.

RPATCHFILE

Similar to keyword MPATCHFILE, this simple keyword offers the user to provide the location and name of an optional input file that can be used to alter specifically the radii of individual atoms in the system (in Å). By default, these radii are inferred either from the assigned atom types, i.e., computed from the Lennard-Jones size parameters, or they are overridden at the level of the parameter file by the "radius" specifications. Because the latter still operate at the resolution of assigned atom types, this keyword offers an atom-specific override facility. Note that there is a distinct hierarchy to this. Specifically, changing the radius via a patch does not change the atom type for that atom. It does, however, alter the default values of parameters that depend on radius, such as maximum SAV fractions or atomic volume reduction factors, which are then again patchable themselves. Furthermore, a radius patch overrides a radius inferred by applying a patch to the Lennard-Jones parameters of a specific atom. Details on the input are given elsewhere.

WL_MODE

By specifying the Wang-Landau acceptance criterion for a (partial) Monte Carlo run, the WL method is enabled. This keyword defines the reaction coordinate of choice and the coupled pair to be iterated (see below). Suppose we have an augmented Hamiltonian as follows:

H = K + λE + X(Y)/β

Here, K and E are kinetic and potential energies, β is the inverse temperature, and X(Y) is an unknown function of a selected reaction coordinate. The factor λ can be either 0 or 1. Assuming that the Hamiltonian is separable, expected sampling weights from the Boltzmann distribution for the augmented Hamiltonian are:

w(Y1)/w(Y2) = (pλ(Y1)/ pλ(Y2)) exp[X(Y2)−X(Y1)]

Here, pλ(Y) is the expected probability (usually treated numerically as the integral over a finite interval, i.e., by binning). If λ is 1, it corresponds to the equilibrium (Boltzmann) probability for the original Hamiltonian. Conversely, if it is 0, pλ(Y) corresponds to the density of states (distribution as T→∞). If Y=E, p(E) can be written simply as p(E) = g(E) exp(-λE/β), with g(E) being the density of (energy) states. This simple form is not available for other reaction coordinates. The Wang-Landau method's key ingredient is choosing X(Y) such that w(Yi)/w(Yj) = 1 ∀ i,j over an interval of interest. This statement is equivalent with the definition of a flat walk in the space of Y. A flat walk eliminates all barriers in the projected space of Y and should therefore be efficient at exploring phase space (see associated keywords for details on this). The main use of the flatness is as a diagnostic, however, and the Wang-Landau algorithm uses X(Y) and the apparent distribution in Y as a coupled pair to iteratively build up X(Y). If the apparent distribution becomes flat, confidence rises that X(Y) corresponds to the target distribution of interest. The target distribution is set by this keyword:
  1. The target distribution is ln g(E) (arbitrary offset). This is achieved by letting λ be zero and Y=E. This is also the implementation chosen in the original publication. Interest in the density of states comes from the fact that it (theoretically) enables reweighting of the flat-walk ensemble to any condition of interest. This is the default.
  2. The target distribution is ln p(Z) or ln p(Za,Zb) (arbitrary offset), where the Z are geometric reaction coordinates (→ WL_RC) restricted to specific molecules (→ WL_MOL). By letting λ be unity, the target distribution is actually the potential of mean force (PMF) for that (pair of) reaction coordinate(s). Unlike for umbrella sampling (see, e.g., Tutorial 9), it is obtained without further post-processing. This variant was introduced here. As stated, it is possible to estimate a two-dimensional target distribution.
  3. The target distribution is ln p(E) or ln p(E,Z) (arbitrary offset). This is achieved by letting λ be unity and Y=E. In comparison to the first option, this will oversample low likelihood states rather than low degeneracy states. It can be combined with a geometric reaction coordinate (Z) in a two-dimensional approach.
The iteration proceeds by simply incrementing an estimate of X(Y) based on the apparent probability. The increment is lowered successively toward convergence. This corresponds to a hierarchical approach in which coarse features of the target distribution are established first before the fine details are added. The three main issues with the Wang-Landau method are as follows (and much literature has been dedicated to these, which is well beyond the scope of this documentation). First, all Wang-Landau results require snapshot-based reweighting techniques to recover quantities not identical with or trivially related to the target distribution. Reweighting techniques have different error properties than standard sampling techniques due to the appearance of exponential weights. Second, it is difficult and system-dependent to guide and assess the convergence of the method. This is partially a result of fundamental problems (degeneracy in Y) and method-specific issues (see keywords WL_HUFREQ, WL_FLATCHECK, WL_FREEZE, etc). Third, the method suffers from discretization errors in particular at steep boundaries of the target distribution. This often necessitates a rather arbitrary restriction of the interval to establish the flat walk on, and this limitation can easily corrode the confidence in the results (see keywords WL_BINSZ, WL_MAX, WL_EXTEND, etc).
A few technical comments are necessary. First, the Wang-Landau acceptance criterion can be combined with a hybrid sampling technique. In such a case, the dynamics segments will propagate the system as usual, but will contribute in no way to the Wang-Landau histograms. They merely serve to evolve the system to find new states that may be hard to access given the Monte Carlo sampler. The MC segments will utilize the Wang-Landau criterion and increment the histograms. As a result, it may be possible that a dynamics segment starts in a high energy state. This may make the integrator unstable initially, and cause unforeseen crashes. Second, Wang-Landau sampling is also supported in parallel runs. For pure Monte Carlo simulations, the MPI averaging technique implies a parallel Wang-Landau implementation, i.e., an implementation in which the histograms are updated globally. Wang-Landau sampling is also supported in conjunction with the replica-exchange method, but here each replica is confined to its own iterative Wang-Landau procedure (since the Hamiltonians are most likely different).

WL_MOL

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, and if a molecular reaction coordinate was chosen as the histogram to consider (→ WL_MODE), this keyword allows the user to select the molecule that the reaction coordinate is computed on. The numbering of molecules follows the user-selected sequence in sequence input. Note that it is up to the user to ensure that the chosen reaction coordinate is defined and has a meaningful range for the chosen molecule (see WL_MAX, WL_EXTEND, and WL_BINSZ). If a two-dimensional variant with two geometric reaction coordinates is chosen, it is theoretically possible to supply two different molecules here. Note that the effective coupling is likely to be low in this scenario, which may lead to poor convergence properties in the 2D space. In conjunction with WL_MODE being 3, specification of a legal entry for WL_MOL will extend the WL estimation of ln p(E) to a two-dimensional case with an additional, geometric reaction coordinate (ln p(E,Z)). Note that this keyword is the only way to control the dimensionality for WL_MODE being either 2 or 3.

WL_RC

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, and if a molecular reaction coordinate was chosen as the histogram (or as one or both axes of the 2D histogram) to consider (→ WL_MODE), this keyword allows the user to select amongst few geometric reaction coordinates as follows:
  1. The molecule's radius of gyration is used (default). The range of this quantity is difficult to predict and depends on the constraints in the system. For example, in Cartesian space, it will be advisable to restrict the range of the histograms (→ WL_MAX and WL_EXTEND) to those values that do not coincide with steric overlap (low end) or stretching of bonds (high end).
  2. The molecule's mean α-content is used as defined for the global seconday structure biasing potential. The quantity always has finite range, but for small systems and typical settings, it exhibits sharp spikes connected by low likelihood regions that may challenge the discretization of the WL scheme.
  3. The molecule's mean β-content is used. See previous option for details and caveats.
Note that it is up to the user to ensure that the chosen reaction coordinate is defined and has a meaningful range for the chosen molecule (see WL_MAX and WL_BINSZ). For two-dimensional cases with two geometric reaction coordinates, the user is may specify up to two numbers. Note that the exploration space grows exponentially with dimensionality at the benefit of resolving degeneracies projected onto the same range of values when considering just one of the two axes at a given time. The total number of relevant bins is a key quantity to keep in mind when directing and assessing convergence. In conjunction with WL_MODE being 3, specification of a legal entry for WL_RC is not sufficient to extend the WL estimation of ln p(E) to a two-dimensional case with an additional, geometric reaction coordinate (ln p(E,Z)). Use keyword WL_MOL for this purpose.

WL_HUFREQ

This is one of the keywords that controls the convergence properties of a Wang-Landau run. The target distribution in question is accumulated as a histogram (always logarithmic), and this keyword sets the frequency (step interval) for updating it with the current value of the f parameter, i.e., the current increment size (equivalent to multiplication by f in the linear space). The accumulation of the target distribution begins only after the equilibration phase has passed. Naturally, a small setting here will quickly increment the histogram, which may accelerate convergence (in case the effective "mobility" of the system defined by system properties and sampling engine is good enough). However, a small setting may also interfere with convergence because it emphasizes the noise in initial estimates of the target distribution (in absolute magnitude), and this may make it harder to refine the guess upon reductions of the f parameter (see WL_HVMODE and WL_FREEZE). The default choice is 10 elementary steps. Note that if the parallel Wang-Landau implementation is used, the step number provided refers to the sampling amount for each individual node.

WL_HVMODE

This is one of the keywords that controls the convergence properties of a Wang-Landau run. It has been argued that the flatness of the accumulated histogram for the target distribution in question (usually tested via some maximum relative deviation criterion) is not generally useful as a criterion for considering a switch to the next stage of refinement (by lowering the f parameter), and can be replaced with a recurrence (minimum visitation) criterion (discussed for example in Zhou and Bhatt). This keyword selects two different options for such a recurrence criterion. Option 2 requires each (relevant) bin to be visited exactly once in every stage, whereas option 1 mandates that each bin be visited the nearest integer of 1/sqrt(f) times (at least once, though). In the parallel parallel Wang-Landau implementation, the condition will always be checked against the combined data. If the condition is fulfilled, and if the number of post-equilibration Wang-Landau steps exceeds the buffer setting, ln f will be reduced (initial value set by keyword WL_F0) by a factor 2. Note that the f parameter is implied to operate on a logarithmic scale (same as target distribution) of counts to avoid numerical issues with large numbers. The rule used here is equivalent to the square root update rule suggested in the original publication. Belardinelli and Pereyra suggest that the exponential update becomes inappropriate for small f and CAMPARI implements their suggestion to switch over to f ∝ 1/Nsteps, where Nsteps is the current number of WL steps having being executed. In the parallel parallel Wang-Landau implementation, this implies the combined total of WL steps from all replicas. This modified update rule is implemented irrespective of the fulfillment of the criterion defined by WL_HVMODE.
It is useful to keep in mind that option 1 will initially lead to fewer reductions of the f parameter, which may be beneficial for establishing correctness, and at the same time may be harmful for the rate of convergence. An issue often affecting convergence adversely are very-low-likelihood bins. In this context, it should be emphasized that the relevance of a bin toward defining flatness is partially controlled by keyword WL_FREEZE, which consequently serves two purposes, and partially controlled by the general range settings (WL_MAX, WL_EXTEND, and WL_BINSZ).

WL_FLATCHECK

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, this keyword can be used to control the step interval at which the evaluation of the visitation criterion for the temporary histogram is performed. If the parallel Wang-Landau implementation is used, this coincides with the requirement to (at least temporarily) combine the data from all replicas and therefore imposes a communication requirement. Should a check return a positive result, the temporary histogram is added to the overall estimate, the temporary histogram is reset to zero, and the f parameter is altered as described elsewhere. In the parallel version, additional operations are performed to broadcast the new total (combined) histogram identically to all replicas. In case the criterion is not fulfilled, the temporary histogram(s) is (are) left unchanged.
The technical use of this keyword is twofold: First, to reduce communication requirements for the parallel implementation; second, to artificially delay the progression of the iteration. The latter can sometimes be useful for complex systems with strong degeneracy in the chosen reaction coordinate (also see WL_RC). Note that for the parallel code the step number provided refers to the sampling amount for each individual node.

WL_F0

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, this keyword defines the starting value for the f parameter (logarithmic). The f parameter is meant to decay from some positive number to 0, which corresponds to multiplicative factors larger than 1 reducing to 1 in the linear space. The default is 1.0. The number of reductions of the f parameter by the exponential rule (see elsewhere) is printed to log output. Depending on the properties of the system and the resultant convergence rate, the rule may change as described for WL_HVMODE.

WL_MAX

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, this keyword sets the (initial) upper bound (given as the bin center of the last bin) of the energy or reaction coordinate histogram (→ WL_MODE and WL_RC). At the beginning, 100 bins of equivalent size are created. Depending on the choice for WL_EXTEND, the histogram and its upper limit may be extended throughout the simulation. It is safe to extend the histogram to values that are impossible to realize for the system in question, since bins that are strictly empty do not meaningfully contribute to the algorithm (see WL_FREEZE). CAMPARI accepts two separate entries for any 2D histogram. Note that the choice for this keyword may be overwritten if a dedicated input file is used to set an initial guess for the target histogram (→ WL_GINITFILE). The maximum value that will not trigger a range exception or an automatic histogram extension is of course the value given here plus half the relevant bin size.

WL_BINSZ

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, this keyword sets the fixed bin size for the energy or reaction coordinate histogram (→ WL_MODE and WL_RC). At the beginning, 100 bins are created. Depending on the choice for WL_EXTEND, the histogram and its lower and upper limits may be extended throughout the simulation. However, the bin size will remain fixed. CAMPARI accepts two separate entries for any 2D histogram. Note that the histogram bin size and the initial number of bins may be overwritten if a dedicated input file is used to set an initial guess for the target histogram (→ WL_GINITFILE).

WL_EXTEND

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, this keyword controls whether the energy or geometric reaction coordinate histogram (→ WL_MODE) is allowed to grow in range during the simulation. Choices are as follows:
  1. The histogram is fixed. Note that any Wang-Landau simulation performed over a restricted interval bares the danger of generating incorrect results even after reweighting. For common interaction potentials and standard energy-based Wang-Landau sampling, this is particularly true for truncation of the energy histogram on the lower end.
  2. The histogram is allowed to grow only towards lower (more negative) values. This can be useful for energy histograms, where the initial energy range is not known.
  3. The histogram is allowed to grow in both directions. It is strongly recommended not to use this feature for energy histograms with a realistic interaction potential (since the energy is unbound on the positive side, and memory exceptions / segmentation faults are likely). This option is meant primarily for histograms defined purely on geometric reaction coordinates (→ WL_MODE).
Note that this keyword only controls changes to the physical dimensions of the histogram arrays in memory that are originally set either by keywords WL_BINSZ and WL_MAX (with 100 bins by default) or by file input (arbitrary number of bins). For issues pertaining to the discarding of unpopulated, terminal bins for convergence, see WL_FREEZE.

WL_GINITFILE

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, this keyword allows the user to replace the default initial guess for the (logarithmic) target distribution with a user-supplied one. The default guess is flat. Supplying a nonflat guess can be useful in several scenarios: i) ongoing refinement of a WL run; ii) cases where a more useful "zero order guess" is available, e.g. an exponentially growing function for a condensed phase system with inverse power potentials; iii) convergence tests. The details regarding the format of this input file are provided elsewhere.

WL_FREEZE

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, this keyword controls whether the range of bins in the energy or reaction coordinate histogram (→ WL_MODE) that is considered for proceeding to the next iteration stage (updating the value of the f-parameter) is fixed after the first such update or not. The update procedure is described for keywords WL_HUFREQ, WL_HVMODE, and WL_FLATCHECK.
Any positive integer specified here will prescribe a minimum number of preliminary simulation steps beyond equilibration that must be exceeded before an update of the f-parameter is considered. After such an update, the range of bins considered for the histograms is the continuous one (and it must be continuous on account of the update rule) currently populated. If during further simulation steps additional bins were to be visited, those moves are instead considered as range exceptions and are rejected (the summary statistics provided in log-output for range exceptions can therefore contain results from two different contributions → WL_EXTEND). Any negative number provided will specify by its absolute value the aforementioned minimum number of preliminary steps in identical fashion. However, in this case, CAMPARI is instructed to allow further bins to be added for consideration during later stages of the algorithm. Note that this violates the refinement idea behind the Wang-Landau scheme, and can lead to severe convergence problems due to the numerical mismatch created by the extra bin "missing out" on f-increments during early stages of the algorithm. It is therefore strongly recommended to choose a relatively large and positive number for this keyword (to ensure that appropriate coverage of the accessible range has been reached).
Note that if the parallel Wang-Landau implementation is used, the step number provided refers to sampling amount for each individual node.

WL_DEBUG

If a Wang-Landau acceptance criterion is used for a (partial) Monte Carlo run, this simple logical allows the user to request debugging information regarding the Wang-Landau iterative algorithm. If turned on, CAMPARI will report in log-output the progression through the various updating stages and may - depending on settings - also write temporary output files for the relevant histograms.



Box Settings:


(back to top)

BOUNDARY

Every simulation has to occur within an explicitly or implicitly defined volume. CAMPARI presently supports three ways of defining this volume listed below: For constant volume ensembles (→ ENSEMBLE), naturally the volume remains exactly constant throughout the simulation What type of boundary condition to use, there are currently three available and one obsolete choice:
  1. Periodic boundary conditions (PBC):
    This is the most commonly used boundary condition in molecular simulations. Here, the polyhedral simulation cell is assumed to be replicated as a - theoretically infinite - periodic system around the central one (which constitutes the actual, physical simulation container). The implementation is such that all distance calculations are amended by determining the smallest distance amongst those between a particle and any of the replicated images of another particle. This so-called minimum image convention implies that for normal pairwise interaction potentials (for example SC_IPP) a particle only interacts with at most one "version" of another particle, never two or more. The idea of PBC is borrowed from crystals in which the assumption of periodicity is justified given that the simulation volume can be chosen such that it coincides with the crystal's unit cell (or exact multiples thereof). Conversely, in liquids there is no persistent long-range order (homogeneous density, no pair correlations), and the approximation of a system of thermodynamic size by infinite replication of a nanoscopic system is at least questionable. Given typical cutoff schemes, however, the contribution of longer-range interaction is often exactly zero unless explicit techniques are used enumerating the periodic sum (→ Ewald summation). This means that the actual impact of PBC is often just to mimic a continuous environment for particles close to the edge of the physical simulation volume. Note that no real-space interaction cutoff should exceed half the shortest linear dimension realizable in the simulation volume since otherwise it becomes possible for multiple images of the same particle to be within interaction distance. In conjunction with the minimum image convention cited above, this invariably leads to artefactual results (reference). Note that in CAMPARI the convention of using the nearest image operates at the molecule level, i.e., the general rule is that intramolecular distances always refer to atoms in the same image of a molecule. CAMPARI will occasionally warn users about cases where an image interaction would be within the cutoff distance, but these warnings are not part of all routines (for efficiency reasons). Enabling box-consistent trajectory output may help in diagnosing such issues independently.
  2. Hard-wall boundary condition (HWBC):
    This option is obsolete and cannot be selected. It may be reactivated in the future to enable simulations in containers with hard, particle momentum-conserving (i.e., reflective) walls.
  3. Residue-based soft-wall boundary condition (RSWBC):
    In simulations employing a continuum description of solvent, the resultant density is almost always low, in particular in the limit of simulating just a single macromolecule. In those cases, it may neither be meaningful nor beneficial to introduce additional replicas of the simulation cell. CAMPARI offers to define a system-volume via a soft-wall for such a scenario. Here, the simulated particles are prevented from leaving a simulation container (most often a spherical droplet) by an applied boundary potential modeled as follows.
    Spherical case:

    EBNDSphere = Σi kBND·H(ri-rD)·(ri-rD)2

    Here, ri is the distance from a suitable reference point on residue i to the simulation sphere's origin, rD is the sphere's radius, kBND is the force constant and H(x) is the Heaviside step function.
    Rectangular box case:

    EBNDBox = Σi Σj=1..3 kBND·H(|di,j|-Lj/2)·(|di,j|-Lj/2)2

    Here, di,j is the jth element of the distance vector of the reference point on residue i to the center point of the box (note that by convention the lower left corner serves as origin of the box), and the Lj are the side lengths.
    In general, hard-wall boundaries may be approximated by letting kBND → ∞. This will deteriorate integrator stability in gradient-based simulations, however. Choosing a RSWBC means that the boundary penalty is imposed on the reference atom of each residue (for peptide residues this is always Cα). This can lead to potential boundary artifacts with parts of large residues sticking out of the sphere and hence being deprived of interactions with smaller residues. Additionally, it must be pointed out that soft-wall boundary conditions lead to somewhat ill-defined system volumes since the code assumes the fixed volume inside the boundary to be the system volume whereas realistically it should be slightly extended depending on temperature and stiffness. The latter is not easily computed, however, since 1) the purely kinetic (entropic) pressure may be altered by the presence of non-rigid molecules, and 2) the virial pressure is generally unaccounted for. Hence, an exact volume is only recovered in the limit of an infinitely stiff boundary (HWBC).
  4. Atom-based soft-wall boundary condition (ASWBC):
    This option is analogous to the previous (RSWBC) option only that the boundary term is computed for each atom in the system (formulas are not repeated). This will minimize artifacts of the aforementioned type, but it is also the most expensive droplet BC to compute. Because multiple atoms will contribute to the boundary penalty for each residue, it is generally recommended to use smaller force constants than for the RSWBC.

SHAPE

This keyword lets the user specify the shape of the simulation container the system is enclosed in. At the moment, there are limited choices as follows:
  1. Rectangular cuboid (= rectangular parallelepiped)
  2. Sphere
All simulations in PBC have to use rectangular cuboids (extensions to general triclinic boxes may be considered in the future) with full 3D periodicity. Conversely, simulations in droplet BCs can use either option. Note that the geometry of a finite (nonperiodic) cuboid is fundamentally mismatched with the radial nature of most nonbonded interactions.

ORIGIN

This keyword lets the user set the origin of the simulation system as a vector of three elements (xyz). The reference point depends on the container's shape and is its origin for a sphere and its lower left corner for a cuboid. Note that for simulations started from "scratch" (no structural input), this keyword is mostly irrelevant. There are two things to consider, though:
  1. Structural output may be compromised if values are used that are far away from zero. This is because binary trajectory files and in particular the strictly formatted PDB-files have finite representation widths and fixed units (Å or nm) such that output may be severely compromised. It is therefore recommended to adjust this keyword such that the minimum and maximum values for Cartesian coordinates (largest dimension) are symmetric around the origin of the coordinate system.
  2. If structural input it used, it is strongly recommended to match the settings for ORIGIN to that implied in whatever structural input is provided. In droplet BC, it may otherwise occur that parts of the system overlap with the ill-placed droplet boundary and that their internal arrangement is destroyed or that the simulation explodes during the first few steps of simulation.
Note that the input vector is simply specified as three blank-separated floating point numbers.

SIZE

This keyword allows the user to define the size of the simulation container. Depending on its shape, SIZE takes on alternative meanings. If the system volume is spherical, just one real number is needed that specifies the sphere's radius. Conversely, if the container is a rectangular cuboid, a vector of three floating-point numbers is read in that specifies the three side lengths of the cuboid in the x, y, and z-directions, respectively. Note that highly asymmetric boxes often place very stringent settings on cutoffs since it is generally the shortest dimension that matters.

SOFTWALL

This keyword sets the harmonic force constant both for the residue-based and the atom-based SWBCs (see BOUNDARY). It is to be provided in units of kcal·mol-1-2 and corresponds to parameter kBND in the equations above.



Integrator Controls (MD/BD/LD/Minimization):


(back to top)

TIMESTEP

If any dynamics-based (including hybrid methods of course) method is used, this keyword lets the user set the integration time step for the integrator in units of ps.

CARTINT

This keyword determines - at a very fundamental level - the choice of degrees of freedom that CAMPARI shall sample. The "native" CAMPARI degrees of freedom are the rigid-body coordinates of all molecules and a subset of internal coordinates (almost exclusively freely rotatable dihedral angles). This option is the default and specified by choosing 1 for this keyword. Alternatively, the Cartesian positions of all atoms in the system may serve as the underlying degrees of freedom as is commonly the case in molecular dynamics calculations (option 2). There are several very important limitations and considerations that are mentioned throughout the documentation and reiterated here.
  1. CAMPARI does not support the direct sampling of Cartesian degrees of freedom in Monte Carlo simulations. This applies to the MC portion of hybrid simulations as well. While it is trivial to design and implement simple move sets doing precisely that, their efficiency is negligible due to the large amount of motional correlation present between an atom and its immediate molecular environment.
  2. Internal space simulations do not require the full amount of bonded interaction parameters that are typically part of molecular mechanics force fields, specifically no bond length terms, and typically no or very few improper dihedral and bond angle terms (→ PARAMETERS).
  3. For freely rotatable dihedral angles, there is a distinction between those deemed important vs. those deemed unimportant. Details are listed in the documentation for providing sequence input. These choices generally pertain to methyl groups and/or to bonds describing electronically hindered rotations with identical groups. The resultant sets of degrees of freedom are not always entirely consistent (.e.g., between polypeptide sidechains and their respective small molecule model compounds). Related keywords are OTHERFREQ (MC) and TMD_UNKMODE (dynamics).
  4. While unsupported residues pose no problems in the setup of Cartesian coordinates, internal coordinate space simulations need to infer which dihedral angles are rotatable from the input topology. This happens automatically and is described elsewhere. For eligible dihedral angles not identified with standard polypeptide or polynucleotide backbone angles, relevant keywords are again OTHERFREQ (MC) and TMD_UNKMODE (dynamics).
  5. The choice of degrees of freedom in internal coordinate space simulations can be customized rather flexibly by introducing additional constraints (see corresponding input file). For MC simulations, the preferential sampling utility offers an additional level of control.
  6. Conversely, algorithms to enforce holonomic constraints in Cartesian space simulations are often limited to weakly coupled constraints (see SHAKEMETHOD for details). This means that it is not (yet) possible to mimic torsional space constraints in a Cartesian space run but that it is possible to follow a typical MD protocol by simulating a flexible macromolecule with some bond length constraints in a bath of rigid water molecules.
  7. The existence of virtual sites (effectively atoms with no mass) poses stringent requirements to Cartesian dynamics, in that those sites have to be constrained exactly relative to real atoms. At each integration time step, the forces acting on these sites are transferred to the surrounding atoms, and their positions are rebuilt post facto (see elsewhere for more details). Virtual sites in internal coordinate space simulations can only cause issues if a degree of freedom's effective mass depends solely on such sites. Then, CAMPARI will automatically freeze the corresponding degree of freedom.

TSTAT

This keyword lets the user choose the thermostat to be used to generate an NVT (or NVT-like) ensemble in dynamics simulations using a Newtonian formalism (option 2 or 5 in DYNAMICS). Currently, three options are fully supported:
  1. Berendsen weak-coupling scheme (reference):
    This is a deterministic and global velocity rescaling scheme which creates an exponential relaxation toward the target temperature. The velocity rescaling factor is computed for each coupling group (see TSTAT_FILE) according to:
    fv,i2 = 1.0 + (δtT)·[ (Ttarget/Ti) - 1.0 ]
    As is apparent, whenever the instantaneous group temperature (Ti) matches the ensemble target (Ttarget), velocities are not rescaled (fv,i is unity). Any deviations from Ttarget will lead to a systematic re-scaling for all velocities that are part of the coupling group toward the target with a relative decay rate of τT (→ TSTAT_TAU). If τT approaches the discrete time step (δt), the relaxation becomes instantaneous. Note that the coupling of subparts of the system to essentially different thermostats is an obsolete method used in early days of simulations to prevent obscure freezing events sometimes encountered when the system is effectively partitioned into subsystems with very different levels of integrator stability, noise, and inherent relaxation. Then such an approach may circumvent the most dramatic pitfalls resulting from the inherent incorrectness of the weak-coupling scheme (and masking said incorrectness in the process). It is crucially important to realize that the Berendsen thermostat does not generate a well-defined ensemble and that the method only relaxes "safely" to the microcanonical one for τT approaching infinity. The quenched fluctuations observed in the Berendsen scheme may severely distort results on fluctuation-sensitive computations such as free energy growth calculations (see GHOST).
  2. Andersen scheme (reference):
    The Andersen thermostat is a stochastic thermostat which introduces "collisions" re-randomizing the velocity associated with a given degree of freedom to one coming from the ensemble at that given temperature. This method is shown to sample the canonical ensemble and one of the recommended options for any calculation sensitive to the details of ensemble fluctuations. Implementation-wise, it works by re-assigning the velocity for each degree of freedom at each time step with a probability equivalent to δtT. This effectively gives rise to a "bath"-induced relaxation over a timescale τT. Note that some prior implementations in other software packages may have synchronized the application of these velocity resets. This is not the case in CAMPARI where each degree of freedom is treated independently (whichever those may be → CARTINT). Much like in Langevin dynamics, a concern here can be the artificial loss of velocity correlations between multiple particles which may slow down large-scale dynamics.
  3. Extended ensemble methods:
    Methods such as those by Nosé-Hoover, Martyna-Tobias-Klein, or Stern are currently not supported, but may be in the future. They often show poor relaxation behavior due to coupled oscillations in particular in the NPT ensemble, which they are most useful for.
  4. Bussi et al. scheme (reference):
    This thermostat can be thought of as a hybrid approach of the Nosé-Hoover and Berendsen thermostats. It preserves the exponential relaxation kinetics of the weak-coupling scheme if the ensemble target is far away but introduces fluctuations to the kinetic energy such that at equilibrium the global re-scaling does not quench fluctuations. The implementation is that of evolving the kinetic energy via an auxiliary stochastic dynamics much like the Langevin piston for pressure coupling does. Here:
    fv,i2 = etT + fT,i(1 - etT) (R12 + RΓ,Nf,i-1) + 2e-0.5δtT·R1[ fT,i(1 - etT) ]0.5
    With:
    fT,i = Nf,i-1·(Ttarget/Ti)
    Here, Nf is the number of degrees of freedom in the respective coupling group, R1 is a normal random number with mean of zero and unity variance, and RΓ,Nf,i-1 is a random number drawn from the gamma distribution with outside scale factor of 2.0 and shape of (Nf,i-1)/2.
The user is reminded again that this keyword is ignored for all simulations not relying on explicit thermostats (like Monte Carlo or stochastic dynamics runs → DYNAMICS).

TSTAT_TAU

If the simulations is performed in the NVT -ensemble and if Newtonian dynamics are used, this keyword allows the user to set the key parameter of the employed thermostat, i.e. its coupling (decay) time, τT, in units of ps (the default is 1.0ps). Note that it is really the ratio of the time step δt (see TIMESTEP) and this number that matters, hence TSTAT_TAU cannot be less than the integration time step.

TSTAT_FILE

If the simulations is performed in the NVT -ensemble and if Newtonian dynamics are used, this keyword sets name and location of an optional input file for defining thermostat coupling groups. These are meaningful only if the Berendsen weak-coupling or the Bussi et al. scheme is used (options 1 or 4 for TSTAT). For details, the user is referred to the description of the input file itself.

SYSFRZ

This keyword controls the removal of net drift artifacts in dynamics runs (which are primarily relevant for fully ballistic MD). Predominantly in periodic boundary conditions (see BOUNDARY), it can happen that all kinetic energy is transferred into global translations or rotations of the system. This collective "degree of freedom" is typically friction-free and therefore represents a stable trap for the system's kinetic energy to accumulate in. Such behavior will give rise to grossly misleading results (the effective ensemble sampled has a much lower temperature). This can be avoided by periodically removing such global motions. For translational displacements, this is easy, but for rotational motion problems arise if subensembles have access to modes that are quasi friction-free themselves. This is often the case in mixed rigid-body/torsional dynamics and at the moment not dealt with properly.
Choices are as follows:
  1. No removal of global motions is performed (the safest setting for most applications).
  2. CAMPARI will attempt to only remove translational motion of the system.
  3. CAMPARI will attempt to remove both global translation and global rotation (this option should be used with caution and is also automatically disabled if certain types of constraints are in use).
Note that CAMPARI writes drift values (fractions of kinetic energy consumed by global motions) to the log output in dynamics calculations. This output can be used to diagnose whether drift removal is necessary and/or working. Importantly, these are the values before application of the instantaneous correction, i.e., they will not typically be completely negligible (this depends on the dynamics equation in use, temperature coupling, the directionality of forces, etc.).

TMD_INTEGRATOR

If a simulation is performed in mixed torsional/rigid-body space that contains a Newtonian dynamics portion, then this keyword allows the user to choose between (currently) two basic integrator variants. All integrators are derived from the following discrete scheme that relies on the aforementioned assumptions, i.e., a diagonal mass matrix (equations of motion formally decoupled) and the accuracy/correctness of the total kinetic energy expressed in terms of this diagonal mass matrix. Then, we can define pseudo-symplectic conditions as shown below for a (rotational) degree of freedom with index k:

Ik(t2k(t2)2 - Ik(t1k(t1)2 - δt [ωk(t1) + ωk(t2) ] Fk(t1.5) = 0

Here, δt is the integration time step, Ik denotes the diagonal element of the mass matrix for the kth degree of freedom (function of time), ωk is the associated angular velocity, and Fk denotes the deterministic force projected onto this degree of freedom (torque). The projection yielding the torques and the mass matrix elements are computed with recursive schemes, i.e., they operate in linear time with the number of atoms in the molecule (more or less irrespective of how many rotatable bonds there are). More information on this recursive scheme can be obtained indirectly with the help of keyword TMDREPORT. The above scheme defines a quadratic equation that has a maximum of two solutions for ωk(t2) (formula omitted). The correct one must be picked (which may be difficult), and an alternative must be defined if no solutions are possible. For both purposes, we use a well-defined approximation to the full solution that yields:

ωk(t2) ≈ [Ik(t1)/Ik(t2)]1/2 ωk(t1) + δt Fk(t1.5)/Ik(t2)

This solution is always available and can be used to pick the correct solution amongst two alternatives for the full quadratic equation (simply as the closer one). The setting for TMD_INTEGRATOR determines whether the correct solution to the quadratic equation should be used whenever possible (option 1), or whether the approximation is used exclusively (option 2, which is the default for historical reasons). As written, the equations still contain the problem that they require knowledge of Ik(t2), whereas only the half-step mass matrix elements (which are structural quantities) are available in a typical leapfrog scheme. If the Ik are slowly varying functions in time, a simple approximation solving this problem is to allow a lag of half a time step:

ωk(t2) ≈ [Ik(t0.5)/Ik(t1.5)]1/2 ωk(t1) + δt Fk(t1.5)/Ik(t1.5)

This is again written for the approximative version (setting 2). The resultant leapfrog integrator is extremely simple and efficient, and it is obtained by setting the related keyword TMD_INT2UP to 0. However, at each integration time step, we can also take a half-step guess using a similar approximation to obtain a value for all the Ik(t2). This done by explicitly perturbing the coordinates and recomputing just the mass matrix elements (little additional cost for all but tiny or trivial systems). With the values obtained, we can integrate the second equation above as written (this is obtained by setting TMD_INT2UP to 1). While theoretically more accurate, this variant can be noisy due to the extrapolation of the masses. In practice, for systems with very small and quickly varying Ik (such as rigid water molecules), performance is similar for all four pairings (TMD_INTEGRATOR 1 or 2, TMD_INT2UP 0 or 1), and reveals that additional corrections are recommended if the rate of change of the Ik is high (see below). Conversely, if the rate of change is negligible, all possible settings obtainable by combinations of the two keywords mentioned here relax to the exact same integrator (standard leap-frog in rotational space). This covers the special case of linear (translational) degrees of freedom, which have constant mass.
Note that this keyword is currently irrelevant for stochastic dynamics (always uses a derivation analogous to the last equation above), but that it is relevant for the stochastic minimizer. Another crucial keyword relevant to TMD integrators in general is ALIGN. Specifically for the Newtonian case, coupling parameters become relevant as well (TSTAT and TSTAT_TAU in particular).

TMD_INT2UP

If a simulation is performed in mixed torsional/rigid-body space that contains a Newtonian dynamics portion, then this keywords allows the user to control the number of incremental velocity update steps used to improve integrator stability for cases with quickly varying elements of the mass matrix (see above). The cases of 0 and 1 have already been covered in the documentation on TMD_INTEGRATOR. The remaining options assume that values for the diagonal elements of the mass matrix at times t1, t1.5, and t2 are available explicitly (as in: computed directly from coordinates) when trying to compute the updated angular velocity for a degree of freedom at time t2. Rather than solving the velocity update step in one step, the interval from t1 to t2 is instead divided into TMD_INT2UP subintervals, and the velocity is updated incrementally for each subinterval. If TMD_INT2UP is larger than 2, additional values are obtained by linearly interpolating between explicit values at the three times. This is why it is recommended to set this keyword to multiples of 2, and this is also why the added benefit becomes successively smaller. A recommended value is 4. Note that this only matters for velocity updates, and that the torque is assumed constant over the entire interval (Fk(t1.5) above). As a result, this option does not notably alter speed for a system of appreciable size and is not at all equivalent to a change in integration time step.

TMD_UNKMODE

If a simulation is performed in mixed torsional/rigid-body space with a gradient-based sampler (including minimization), then this keyword controls default constraints operating on certain rotatable dihedral angles. A second function of this keyword occurs in structural clustering using a distance function based on dihedral angles (see end of description). As described for sequence input, there is a selection of "native" CAMPARI torsional degrees of freedom that does not include every rotatable dihedral angle in natively supported residues, and for obvious reasons does not include any degrees of freedom within unsupported residues. This keyword therefore controls how to deal with these two categories of additional degrees of freedom. Options are as follows:
  1. Only native CAMPARI degrees of freedom are sampled. This will leave any unsupported residues and molecules completely rigid.
  2. In addition to native CAMPARI degrees of freedom, all identified degrees of freedom in unsupported residues and molecules will be sampled.
  3. In addition to native CAMPARI degrees of freedom, all torsional degrees of freedom in natively supported residues, which are frozen by default, are sampled. This will leave any unsupported residues and molecules completely rigid.
  4. All aforementioned classes of degrees of freedom are sampled.
This keyword has a related set of keywords in Monte Carlo simulations (→ OTHERFREQ etc). Note that the keyword is only relevant if either additional set of degrees of freedom is nonempty, .e.g., simulations of a water/methanol mixture are not affected by different settings for TMD_UNKMODE. At the residue level, additional control is offered by virtue of user-requested constraints. The combined information on which degrees of freedom are free and which ones are frozen is obtained with keyword TMDREPORT.
In the context of structural clustering, this keyword co-controls which dihedral angles are eligible as dimensions for a distance function. This applies to cases where a custom request is made through the appropriate input file and to cases where the full dimensionality is ment to be used. Further information is provided elsewhere.

TMDREPORT

This simple logical keyword enables the printing out of information regarding internal degrees of freedom (rigid body, torsional). This file is particularly useful for constructing input for a specific input mode for custom constraints. For every molecule it lists the index of the first atom in that molecule ("Ref."), the total number of atoms ("Atoms"), the total mass ("Mass") after applying all patches, and, if a gradient-based sampler in torsional space is in use, information on whether the (up to) 6 rigid body degrees of freedom are frozen or not ("Frozen"). The order is translation in x, y, and z followed by rotation around x, y, and z axes.
The output on the dihedral angles in the molecule provides the following information. For each atom that, in the Z matrix, corresponds to the definition of a relevant (rotatable) dihedral angle, the structure of the rotation list setup is provided. Specifically, the number of rotating (swiveling) atoms ("Rotat.") is printed out along with their total mass ("Mass"). It is specified how many of the rotating atoms are unique ("Unique") with respect to that atom's rotation list a level above in the hierarchy that contains the rotation list for the present atom entirely ("Parent"). The hierarchy is understood by considering the polymer as a branched chain with a number of tips and a base of motion. This base of motion is defined by keyword ALIGN. Degrees of freedom at the tip have minimal rank (starting at 0) whereas those near the base have maximal rank in the hierarchy ("Rank"). The hierarchy necessitates a particular sequence of processing the individual degrees of freedom ("Order"). The report also provides information on the chemical elements ("Ele.") of the 4 constituent atoms for the dihedral angle (the atom defining the dihedral angle comes last) and on whether the degree of freedom is frozen in torsional space molecular dynamics ("Frozen"). The last bit of information is available only if a gradient-based sampler in torsional space is in use. The report is available irrespective of the type of calculation being performed. Note that keyword ALIGN, while conceptually controlling the same thing, is implemented differently in Monte Carlo moves. This means that most of the columns are representative for the Monte Carlo part only if ALIGN is set to 1. In hybrid samplers, the dynamics portion takes precedence.

SHAKESET

It is standard practice in molecular dynamics simulations in Cartesian space to employ holonomic constraints such that the system evolves according to Gauss's principle of least constraint. The reader is referred to the literature as to what exactly constitutes a time-reversible, symplectic integrator if holonomic constraints are enforced. In general, it will possible to formulate an algorithm which at least is drift-free, has some target precision for the constraints, and is approximately symplectic when the microcanonical ensemble is in use.
The idea behind holonomic constraints in molecular dynamics is to eliminate fast vibrational modes in the system to allow for a larger integration time step to be used. This keyword allows the user different choices for which holonomic constraints to employ as follows:
  1. No holonomic constraints are used.
  2. All "native" bonds to terminal atoms with a mass of less than 3.5 a.m.u. are constrained in length. A terminal atom is defined as any atom bound to exactly one other atom. "Native" means that only bonds consistent with the assumed molecular topology (code-internal) are considered. This selection will usually constrain all bonds to hydrogen atoms.
  3. All "native" bonds of any type are constrained in length. This does include bonds formed by virtue of chemical crosslinks.
  4. All "native" bonds of any type are constrained in length as in mode 3. In addition, several bond angles are constrained explicitly. For a molecule free of rings of size 6 or less all bond angles are constrained (this also constrains improper dihedral angles at trigonal centers). For molecules with rings of size 6 or less, ring-internal bond angles are generally omitted. Note that more bond angles can be formulated at a tetrahedral site than constraints are needed, and that - system-dependent - redundant constraints may be created (which may be harmful). This option is only supported for the standard SHAKE constraint algorithm at the moment.
  5. This is nearly identical to option 4. However, bond angles are constrained by additional distance constraints rather than explicitly. This means this option is theoretically available for constraint algorithms other than SHAKE.
  6. An input file is read and used to derive the list of constraints. Note that it is possible to derive intra- and intermolecular long-distance constraints that way (geometric information will be taken from starting structure), but that those will very easily cause CAMPARI to crash.
After assembling the initial list, CAMPARI will parse this list and identify all sets of constraints that are coupled. Choices for how the actual values for the constrained distances and angles are set are provided by keyword SHAKEFROM. Two constraints are coupled if any atom is shared. Groups of coupled constraints have to be solved simultaneously using some numerical scheme and analytical solutions are generally not available. An individual uncoupled constraint can always be solved analytically and poses no particular difficulties. A specific case are the three coupled constraints in rigid water molecules, for which the analytical SETTLE algorithm has been derived. In these and similar cases, CAMPARI will substitute the numerical scheme with the analytical version for reasons of efficiency and accuracy regardless of specifications otherwise. However, in most cases, the constraint set specified via this keyword will not make water molecules rigid per se, and for this case a dedicated keywords exists (→ SETTLEH2O). Note that the use of virtual sites (sites with no mass) is only permissible if they have rigid geometry, and if their positions can be constructed from real atoms with smaller index values. Choices for this keyword, and the auxiliary keywords SETTLEH2O and SHAKEFILE will have to reflect this requirement.
The cost, accuracy, and applicability of constraint algorithms all scale poorly with the level of coupling. Options 4 and 5 from the list above will therefore be usable only in special cases (→ SHAKEMETHOD) such as systems without any rings or planar, trigonal centers. For specific applications using angle constraints, we strongly recommend defining a minimum set of distance-based constraints via option 6 above. This has the best chance to succeed.

SHAKEFILE

If SHAKESET is set to 6, this keyword specifies the name and location of the file defining user-selected holonomic constraints to be enforced during the simulation. Its format and requirements are documented elsewhere.

SETTLEH2O

This keyword allows the user to append/modify the constraint set selected via SHAKESET to replace all preexisting constraints acting on three-, four-, or five-site water molecules (SPC, TIP3P, TIP4P, TIP4P-Ew, or TIP5P) with constraints that completely rigidify each water molecule. It acts as a simple logical and is turned on by default since CAMPARI as of now does not support explicitly any inherently flexible water models. This means that a setting of 2 or 3 SHAKESET for a calculation in explicit water will still constrain waters to be rigid, and therefore correspond to a standard (and - for the supported water models - correct) simulation setup. Specifying this keyword and setting it to anything else but 1 will disable this override. Note that for water models possessing virtual sites (all four- and five-site models), it is assumed that the extra sites have no mass (see below). If this is not the case, the use of the analytical SETTLE algorithm for water is no longer possible, and the more complex set of constraints may no longer be solved efficiently (or may no longer be solved at all).

SHAKEFROM

This keyword allows the user to control how the actual values for the set of holonomic constraints are determined at the beginning of simulations in Cartesian space. There are currently the following options:
  1. Irrespective of structural input, all distance constraints of atoms bound covalently to one another are taken directly from the (hard-coded) CAMPARI default geometries that use, wherever possible, databases of high-resolution crystallographic structures of biomolecules (see for example the reference by Engh and Huber). For indirect angle constraints, i.e., constrained distances of atoms separated by two covalent bonds, the bond angle and two bond lengths in question are used to compute the effective length in similar fashion. For explicit angle constraints the reference value can be used directly. Lastly, simulations of unsupported residues require the structural input to be used directly (see comments on option 3 below for implied caveats). This option is currently the default.
  2. Irrespective of structural input, CAMPARI will try to reconstruct the required constraint lengths from the minimum positions of bonded potentials (see SC_BONDED_B and SC_BONDED_A) that are provided by the force field in use. As for option 1, this extends to indirect and explicit angle constraints. If terms are missing in the force field, covalent distances are taken from the default CAMPARI geometry as for option 1. This option is the recommended one for "standard" molecular dynamics calculations. Note that patches to bonded parameters are recognized and respected in this context.
  3. CAMPARI takes all reference values for constrained degrees of freedom directly form the structure the simulation is started in. This requires no adjustments but comes with caveats. Since input in pdb format is of limited precision, the various bond lengths and angles can only be extracted to the same precision. This means that constraints that are chemically identical will be set to slightly different values (e.g., the C-H bonds in a methyl group), which can cause small artifacts. For bond lengths involving hydrogen, rebuilding is an alternative to circumvent this problem (see PDB_HMODE). A second problem arises due to the lack of reproducibility caused by the limited precision. Specifically, simulations started from two different and minimized conformations of the system will end up using different values for the constraints. A more extreme version of this problem is encountered when starting simulations from snapshots of other simulations, in which the constrained degrees of freedom had been left free to move (this exceeds the mere precision effect). Due to these caveats, it is not recommended to use this option. Note that CAMPARI will have to use values defined by structural input for cases where no other information is available.
With file-based input to SHAKESET, it is possible to constrain distances between atoms that are not connected by just one or two covalent bonds. Since neither CAMPARI nor force fields prescribe "correct" distances to such pairs, a constraint of this type is always dependent on structural input (if input is absent, the initial geometry generated by CAMPARI according to settings for RANDOMIZE is used, which is rarely useful).
When restarting simulations, this keyword should generally be left unchanged. In case of option 3 being in use, it is recommended to either never supply an input pdb file or to always use the same one supplied as a template. For a simulation started normally, options 1 and 2 above entail the possibility that constrained degrees of freedom are adjusted before the simulation begins. This adjustment is reflected in the reference structure files written at the beginning of each run ({basename}_START.pdb and {basename}_START.int).

SHAKEMETHOD

This keyword allows the user to choose which of the currently implemented algorithms CAMPARI should use to enforce the chosen set of holonomic constraints during a molecular dynamics simulation in Cartesian space. Options are as follows:
  1. The standard, iterative SHAKE procedure is used. Coupled constraints are solved iteratively by assuming independence and linearity (Newton's method). SHAKE may converge in very few steps to good accuracy if the coupling is weak (coupling matrix is sparse). This is the only method that currently supports explicit constraints on bond angles (see SHAKESET). Due to the use of Newton's method, SHAKE is not guaranteed to converge if the underlying "landscape" is non-linear due to the introduction of coupling between constraints. Then convergence is only guaranteed within a small enough environment around the actual solution. Therefore, SHAKE places an upper limit on the time step that can be used even though it is meant to allow increases of precisely that time step. Nonetheless, in canonical applications (bond length constraints only), SHAKE will be a reasonably efficient solution. The main weakness of SHAKE and related algorithms is their inherent inability to enforce planarity at a given site. This is because at a planar site all bond vectors which form the basis set for the application of iterative corrections are part of the same plane, i.e., it is impossible to correct an out-of-plane motion using those vectors. Depending on the exact set of constraints used, SHAKE will require many steps, not converge at all or converge with limited accuracy, and occasionally crash if bond length and angle constraints at a site deem it to be perfectly planar.
  2. A mix of the SHAKE and P-SHAKE (see below) algorithms is used in which P-SHAKE is applied only to those constraint groups which are internally entirely rigid.
  3. The so-called P-SHAKE (preconditioned SHAKE) procedure is used. In P-SHAKE, SHAKE is augmented by introducing a pre-conditioning step which changes the convergence rate from linear to quadratic. The preconditioning step is a matrix multiplication essentially forming linear combinations from the bond vectors in the constraint vectors. Corrections employed along those new directions minimize the linear error by decoupling the constraints (within the bounds of a linear theory → hence the quadratic and not instantaneous convergence). Unfortunately, this method currently is implemented either inefficiently or incorrectly and does not usually offer a discernible improvement. However, it is also fundamentally flawed as large constraint groups are handled inefficiently due to the requirement of a full matrix multiplication that is needed to increment the coordinates at each iteration step. This operation in P-SHAKE has a cost of 3·np·nc and of only 6·nc in standard SHAKE. In addition, the matrix used to precondition the procedure, has to be recalculated frequently if a molecule undergoes significant conformational changes (currently hard-coded to 100 integration steps). P-SHAKE is therefore suitable only for enforcing holonomic constraints in small rigid or quasi-rigid molecules that can be solved by SHAKE as well. Just like SHAKE, it fails badly for planar sites (see above). In such a case, CAMPARI may crash without any indicative messages due to failures in the LAPACK routines used by the P-SHAKE algorithm (see installation).
  4. The LINCS method is used. LINCS is a linear constraint solver that uses a projection approach. In the end, a matrix equation needs to be solved which requires the inversion of a matrix related to the coupling matrix of the constraints in the group. This is the critical step and grossly ineffective as a general procedure. For sparse matrices, however, the inversion can be performed approximately by a series expansion. It is the order of this expansion and its applicability that will determine the success and accuracy of LINCS. LINCS is generally inapplicable to anything involving bond angle constraints, in particular in all-atom representation. It will work well for loosely coupled groups of constraints. Since the accuracy depends on the unknown convergence properties of an infinite sum, the accuracy of LINCS cannot be tuned to yield a specific tolerance for satisfying the constraints.
Unfortunately, the constraint algorithm often becomes the weakest link in the integration scheme if the dynamics itself is unstable (for example due to too large an integration time step). Therefore, not all reported failures of the SHAKE or LINCS procedures are due to those procedures. Conversely, if highly coupled constraints are selected via SHAKESET, CAMPARI may produce many warnings pertaining to failed constraint solving attempts that are in fact due to the limitations of those algorithms. In such cases, CAMPARI attempts to subtly change the targets for the constraint values to potentially steer the system toward conditions where solutions are obtained more easily. This - as may be expected - will not usually solve the problem (only delay critical errors), in particular for very large constraint groups. Ultimately, it will be more useful to consider internal coordinate space simulations if high-level constraints are desired (→ CARTINT).
There is an additional issue that arises when virtual sites (technically atoms with no mass) are used, for example in rigid water models like TIP4P. Such sites have to be circumvented by the integration scheme (displacement is dependent on inverse mass), and therefore they have to be exactly constrained with respect to the positions of atoms with finite mass. These constraints can not be solved within the standard framework (also dependent on inverse mass). Instead, the least constraint solution is obtained by simply rebuilding the positions of these sites with fixed internal geometry. For this to yield a correct integrator, however, the forces acting on the sites need to be remapped to the atoms they are connected to. This is done by decomposing the Cartesian force acting on the site into internal forces, for which compensating terms are added to all the atoms comprising the respective internal degree of freedom. This cancels exactly the net force on the site, and makes integration symplectic. Virtual sites cannot occur in constraint groups that are handled by a method other than standard SHAKE or SETTLE.

SHAKETOL

If SHAKE or P-SHAKE are in use (→ SHAKEMETHOD), this keyword allows the user to set the target tolerance for satisfying distance constraints. The tolerance is relative to the target value of the constraint. As soon as the maximum deviation is less than this value, the iteration stops unless it is terminated earlier for other reasons (→ SHAKEMAXITER).
If LINCS is in use, this keyword still has meaning even though the tolerance can not be set explicitly. Should CAMPARI find that LINCS with the given settings satisfies the constraints significantly worse than defined by this keyword, it will adjust the open parameter of the method (→ LINCSORDER) in an attempt to remedy this situation. Similarly, should the opposite occur (LINCS satisfies constraints significantly more accurately than the desired tolerance), the parameter will be adjust in the opposite direction. All this happens within sane bounds.

SHAKEATOL

If SHAKE (→ SHAKEMETHOD) is in use with explicit bond angle constraints (→ SHAKESET), this keyword allows the user to set the target tolerance for satisfying angular constraints. The tolerance is absolute and applies to the unitless cosine of the respective angle. As soon as both maximum deviations drop below the threshold tolerances (see also SHAKETOL) the iteration stops unless it is terminated earlier for other reasons (→ SHAKEMAXITER).

SHAKEMAXITER

If SHAKE or P-SHAKE are in use (→ SHAKEMETHOD), this keyword allows the user to alter the maximum number of iterations permissible to the algorithm. Since poor convergence properties are generally indicative of a more fundamental problem, increasing the value for SHAKEMAXITER will rarely be useful. After exceeding this many steps, the algorithm will simply continue with its current solution meaning that - for a good case - constraints will be violated slightly more than specified by SHAKETOL and eventually SHAKEATOL. Note that CAMPARI will then adjust the constraint targets in an attempt to rescue a simulation otherwise doomed. This may not always work and also lead to unwanted drift. Appropriate warnings are provided.

LINCSORDER

If LINCS is in use (→ SHAKEMETHOD), this keyword allows the user to define the initial expansion order for the approximate matrix inversion technique. As mentioned above, the convergence properties of this approximation are not really known and prevent LINCS from satisfying an exact tolerance explicitly. Should CAMPARI find that constraints are satisfied significantly better or worse than what is provided through SHAKETOL, the expansion order will be adjusted automatically. This is to prevent unnecessarily inefficient or unnecessarily inaccurate maintenance of constraints.

MINI_MODE

If a minimization run is performed, this keyword lets the user select the method of choice. CAMPARI currently supports three canonical and one nonstandard minimizer. All minimizers can operate either in mixed rigid-body/torsional space, i.e., the "native" CAMPARI degrees of freedom or in Cartesian space; → CARTINT. However, there are algorithmic restrictions that the canonical minimizers (options 1-3 below) only support trivial constraints (see FMCSC_FRZFILE), which is an issue in Cartesian space (rigid water models, etc).
Let us define γ as a vector of base increment sizes suitable for each of the degrees of freedom (partitioned into three classes: rigid-body translation, rigid-body rotation, and dihedral angles; keywords MINI_XYZ_STEPSIZE, MINI_ROT_STEPSIZE, and MINI_INT_STEPSIZE are used to specify each element γi). Also, let fm be an outside scaling factor in units of mol/kcal set by keyword MINI_STEPSIZE. Lastly, we introduce a unitless dynamic step length factor λ. If we now denote the heterogeneous vector of phase space coordinates as x, and the Hamiltonian is written as U(x), then we can write how the system is evolved through either one of four different protocols as follows:
  1. Steepest-descent:
    xi+1 = xi - λ·fmγ•∇U(xi)
    Here, "•" denotes the Hadamard (Schur) product, i.e., simply the element-by-element multiplication. Should the new conformation have overstepped in the direction of steepest descent, λ is iteratively reduced by a constant factor until a valid step is found (lower energy). In case of successful steps, λ is iteratively increased to improve the efficiency of the procedure if the underlying landscape is relatively smooth and flat. Successful steps are used as well to construct an appropriate guess for the initial step size should a complete reset be necessary. This mimics a line search.
  2. Conjugate-gradient:
    xi+1 = xi - λ·fm [ γ•∇U(xi) + fCG,idi-1 ]
    fCG,i = [ ∇U(xi)·∇U(xi) ] / [ ∇U(xi-1)·∇U(xi-1) ]
    di-1 = γ•∇U(xi-1) + fCG,i-1di-2
    This conjugate-gradient method follows the Polak-Ribiere scheme and augments the steepest-descent prediction by an additional term that is estimated according to the suggestion by Fletcher and Reeves. Much like in steepest-descent, should the new conformation have overstepped, λ is iteratively reduced by a constant factor until a valid step is found (lower energy). In case of successful steps, λ is iteratively increased analogously to what is described above.
  3. Memory-efficient Broyden-Fletcher-Goldfarb-Shanno method (L-BFGS) according to Nocedal (reference):
    xi+1 = xi - λ· [ H-1·(γ•∇U(xi)) ]
    This quasi-Newton approach technically employs the inverse of the Hessian which is typically unknown. However, the L-BFGS method constructs a numerical estimate directly for the matrix product H-1·(γ•∇U(xi)) from the recent history of the minimization process. This widely used recursive two-loop scheme has the advantage of i) only requiring very few floating point operations, and ii) not requiring a running guess for the complete Hessian (inverse or not) due to the recursive formulation. Note that the inverse Hessian in our implementation is constructed from γ•∇U(xi), i.e. has units of mol/kcal throughout, irrespective of which degree of freedom is considered. This means that the factor fm does not show up in the L-BFGS equation except for the first step (initially or after a reset) when the steepest-descent approximation is used (see mode 1). The usage of (estimated) second derivative information should generally help inform the minimizer of more useful directions to pursue but step size limitations and inadequate guesses of the Hessian may render this potential benefit ineffectual. The reader is referred to the literature for further details.
  4. Thermal noise quasi-stochastic (akin to simulated, thermal annealing):
    This minimizer couples the system to a variable temperature bath. By changing the coupling parameters, the degrees of freedom are successively brought to a state consistent with a very low temperature ensemble. A similar quench in conditions is used in simulated annealing, a general solution strategy for optimization problems.
    Initially, the system uses a heat bath as defined by the settings for TSTAT and TEMP. The system is then evolved using NVT molecular dynamics in either mixed rigid-body/torsional space or Cartesian space. Depending on initial conditions, this may heat up the system to a variable extent, and the maximum temperature is recorded. After a prescribed fraction of the total simulation steps, the target temperature is successively lowered to the value specified by keyword MINI_SC_TBATH. This interpolation uses a Gaussian function on the normalized time axis such that all interpolation curves can be rescaled in temperature to exactly coincide. Simultaneously, the algorithm measures the rate in change of temperature from the recorded maximum toward MINI_SC_TBATH. If the actual rate appears too slow or too fast, the time constant, τT, of the thermostat in use (→ TSTAT_TAU) is successively altered so as to achieve a cooling down of the system to a negligible temperature within the remaining number of available iterations. These alterations happen within bounds of 10 times the integration time step on the low end and the original setting for TSTAT_TAU on the high end.
    This minimization approach employs two convergence criteria as soon as the number of steps specified via MINI_SC_HEAT has passed. During the cooling schedule, the procedure will stop either because the RMS-gradient fell below the threshold (→ MINI_GRMS) or because the target temperature (MINI_SC_TBATH) was reached which - per se - does not provide information on the local gradient. Of course, it may be possible to minimize such a structure further using a canonical approach. Both temperature and RMS-gradient are written to log-output to allow for easy inspection whether the parameters are set reasonably well. As an additional note it must be pointed out that - much like in standard molecular dynamics - runs starting from very unfavorable structures will cause large accelerations which may lead to a catastrophic blow-up of the system. This behavior can be avoided by performing a number of steepest descent minimization moves upfront. This number is set by keyword MINI_SC_SDSTEPS.

In general a minimization run will terminate after either the maximum number of iterations has passed (see NRSTEPS) or after convergence is achieved (see MINI_GRMS). Note that bad combinations of the various step sizes and the convergence criterion can easily lead to non-terminating runs even if convergence is achieved de facto.
In general, minimizations are unlikely to be interesting for on-the-fly analysis. This is because the conformations encountered do not correspond to a meaningful ensemble: neither in terms of coverage nor in terms of relative weights. Nevertheless, all analysis routines are supported and will work assuming that a single step corresponds to a single successful perturbation doing minimization (due to overstepping, the number of energy/gradient evaluations in minimization is usually larger than the actual number of steps: keyword NRSTEPS sets the maximum for the former).

MINI_STEPSIZE

If a canonical minimization run is performed, this keyword acts as a scale factor applied to all conformational increments applied during minimization. It therefore sets the global step size and corresponds to factor fm in the equations above. It - for technical reasons - has units of mol/kcal to eliminate the energy units of the normalized gradients. There are no canonical rules one can formulate but values significantly less than unity will typically be most appropriate to avoid that the algorithm frequently oversteps in a subset of the degrees of freedom and then has to iteratively reduce the step size. However, step size management is dynamic (consult factor λ introduced in the equations for minimization modes 1-2(3) above). This means that the impact this keyword has may be less than what one would generally expect.

MINI_GRMS

If a minimization run is performed, this keyword allows the user to set the convergence criterion in units of kcal/mol. Since minimization runs can occur in torsional and rigid-body space, the "raw" gradient over all degrees of freedom is unsuitable. CAMPARI utilizes a simple workaround by normalizing all gradients by a basic step size for the respective types of degrees of freedom (see keywords MINI_XYZ_STEPSIZE, MINI_ROT_STEPSIZE, and MINI_INT_STEPSIZE). The resultant, normalized gradient is used to obtain its root mean square (→ GRMS) which is compared to the convergence criterion provided here. Since the normalized gradients assume a default step size, this parameter becomes dependent on them. For unit values for all three base step sizes, values around 10-2 are recommended. Conversely, in Cartesian space, only MINI_XYZ_STEPSIZE is relevant for the gradient criterion.

MINI_XYZ_STEPSIZE

If a minimization run is performed, this keyword determines a basic step size to be considered for all rigid-body translations of molecules and for all Cartesian displacements of atoms. This value is to be provided in units of Å. Note that this keyword determines the effective initial translation step size in conjunction with MINI_STEPSIZE and that it is mostly needed to be able to handle the different units occurring when minimizing in mixed rigid-body and torsional space. All translational gradients are normalized by this number such that numerical estimates of the Hessian (→ BFGS) or even a meaningful root mean square can be written (→ MINI_GRMS). Note that for simulations in (effective) Cartesian space, it would be possible to combine this parameter with MINI_STEPSIZE to a single step size parameter.

MINI_ROT_STEPSIZE

If a minimization run in mixed rigid-body and torsional space is performed, this keyword determines a basic step size to be considered for all rigid-body rotations. This value is to be provided in units of degrees (compare MINI_XYZ_STEPSIZE).

MINI_INT_STEPSIZE

If a minimization run in mixed rigid-body and torsional space is performed, this keyword determines a basic step size to be considered for all dihedral angles. This value is to be provided in units of degrees (compare MINI_XYZ_STEPSIZE).

MINI_UPTOL

If a minimization run is performed, and if the BFGS method is used, this keyword lets the user choose a tolerance criterion in kcal/mol for accepting uphill steps. At most ten or MINI_MEMORY (whichever one is smaller) such steps will be tolerated until a reset of the estimate of the Hessian occurs. This reset will reorient the (multidimensional) direction back onto a steepest descent path and the procedure can start anew. This feature is included since the curvature-based estimate of the direction in the BFGS method does not always guarantee a downhill direction (i.e., the energy resultant upon a perturbation in such a direction is larger than the current one for all steps within a finite interval (including arbitrarily small ones → this is a different problem from "overstepping" for which step size reductions are employed).

MINI_MEMORY

If a minimization run is performed, and if the BFGS method is used, this keyword lets the user choose the memory length for the running estimate of the Hessian. Since the system will evolve throughout the minimization, the estimate of the Hessian is of course a moving target and it will only be useful to include points from the immediate vicinity in its numerical, gradient-based estimate. This keyword simply gives the (integer) number of immediately preceding steps to consider. Note that very long values will typically be irrelevant since the BFGS procedure will - in rough landscapes - frequently propose an ill-fated (uphill) direction (see MINI_UPTOL for comparison). Such moves will eventually lead to a reset of the estimate of the Hessian which includes "forgetting" all the memory. Hence, the effective usable memory length will be limited by the system as well. Note that the resets are necessary for the BFGS method to find any minima.

MINI_SC_SDSTEPS

If a stochastic minimization run is performed, this keyword allows the user to request the program to first run the specified number of steps as canonical steepest-descent (SD) minimization. These SD moves will follow the same parameter settings as described above and are completely independent of the stochastic steps. Note that these steps are always skipped if the settings request the use of holonomic constraints when minimizing in Cartesian space.

MINI_SC_HEAT

If a stochastic minimization run is performed, this keyword specifies the fraction of the total number of steps (NRSTEPS) that are going to be used to perform NVT dynamics at the user-supplied initial temperature and thermostat settings. Generally, for an efficient annealing protocol, it is probably advisable to combine a large value for this keyword with a high enough temperature and/or a comparatively large value for the thermostat's time constant, τT, such that NVE dynamics are mimicked over short periods of time (this will lead to heating in itself). Conversely, for straight minimization, it will be more appropriate to supply small values in conjunction with tight thermostat settings and low initial temperature.

MINI_SC_TBATH

If a stochastic minimization run is performed, this keyword lets the user specify the target temperature of the bath the system will be coupled to at the very end of the run. From the simulation step defined by MINI_SC_HEAT onward, the target temperature is interpolated between TEMP and MINI_SC_HEAT using a Gaussian function operating on a normalized time axis. For the protocol to work as intended, it will not be useful to specify anything but values close to (but not exactly) zero here.



Move Set Controls (MC):


(back to top)

Preamble (this is not a keyword)

A Monte Carlo simulation is a series of biased or unbiased random perturbation attempts to the system, in which some moves will be accepted (the Markov chain transitions to a new microstate) and the others rejected (the Markov chain remains in place) dependent on some criterion. This acceptance criterion is designed to sample a specific distribution, and the most common example is the Metropolis criterion designed to produce Boltzmann-distributed ensembles.
The type of random perturbation attempts possible constitute the move set, and the resultant microstate transitions are usually very different from those observed in molecular dynamics (MD). In dynamics, all unconstrained degrees of freedom evolve simultaneously (high correlation), but in small increments (low effective step size). In Monte Carlo, one or few degrees of freedom evolve at a given time, but in step sizes of varying amplitudes. It is not required that individual degrees of freedom are all sampled with equal weight (nor would it be clear how to establish this). The effective sampling weight is determined by three components:
  1. The overall picking frequencies for move types (e.g., OTHERFREQ) are implemented by CAMPARI through a binary decision tree invoked at each step of the MC simulation. This means that the decisions taken at the root will influence the actual number of attempted moves of types chosen further up the tree, and that it may be complicated to calculate the expected numbers of attempts for those moves. This is why formulas are provided. Some totals (attempted and accepted moves) are reported in the log output at the end.
  2. The organizational unit for a move is often a residue, but not all residue may possess equal numbers of degrees of freedom. For instance, sidechain moves have a variable number of degrees of freedom they sample (→ NRCHI), but the actual numbers per degrees of freedom will not be uniformly distributed since different residues may have different numbers of χ-angles.
  3. Sampling weights can be adjusted explicitly with the help of the preferential sampling utility.
If one takes into account that selected degrees freedom may be eligible for different move types (while others are not), it should be abundantly clear that the details of a Monte Carlo move set are ultimately adhoc or empirical quantities (see Tutorial 4 for additional, theoretical considerations on move sets).



PARTICLEFLUCFREQ

This keyword is relevant only when ENSEMBLE is set to either 5 or 6, i.e., those ensembles which allow numbers of particles to fluctuate. In this case, the keyword defines the fraction of all moves that attempt to sample the particle number dimension of the thermodynamic state of the system. For the semi-grand ensemble, this corresponds to attempting to transmute one particle type into another while preserving the position of the target particle. For the grand ensemble, it will with 50% probability try to insert a particle of permissible type in a random location in the simulation container and with 50% probability attempt to delete a permissible particle. These moves are applied at the molecule level and most closely related to rigid-body moves in terms of complexity (→ RIGIDFREQ).
Technically, the GC ensemble is supported in CAMPARI by maintaining a set of ghost particles for each fluctuating type which work as "stand-ins". This framework entails certain limitations which are detailed elsewhere.
Expected numbers of such moves overall are calculated trivially as:
NRSTEPS · PARTICLEFLUCFREQ
Note that the default picking probabilities are such that every molecule type allowed to fluctuate in numbers receives equal weight. In case of particle permutation moves, which are implemented as joint insertion/deletion, there is no way to adjust these. This is because the implementation mandates the molecule types to be different, which would require additional corrections in the acceptance probability, which would cancel out the preferential sampling weights. For independent insertion and deletion available in the grand ensemble, the preferential sampling utility allows the user to at least adjust the picking probabilities on a per-type basis. This can be relevant for example in electrolyte mixtures with disparate target concentrations (and correspondingly disparate bath particle numbers), for which it would make sense to preferentially insert and delete those particle types with overall larger numbers. Such an adjustment would also bring the sampling weights in line with the default picking probabilities for rigid-body moves, which are flat on a per-molecule basis.

RIGIDFREQ

This keyword specifies what fraction of all remaining moves (i.e., 1.0 - PARTICLEFLUCFREQ) is to perturb rigid-body degrees of freedom. This encompasses translations and rotations of individual molecules as well as of groups of molecules (the latter are only available in case rotation and translation are coupled → COUPLERIGID). The default picking probabilities are even for all molecules regardless of type, size, or other properties. They can be adjusted via the preferential sampling utility, and this may be relevant in dense or semi-dilute systems with different molecule types of vastly different size (e.g., proteins and inorganic ions). In such a case, the acceptance rates for the macromolecules will be noticeably smaller, and this could be compensated for by sampling them preferentially.

COUPLERIGID

This keyword is a simple logical deciding whether or not to couple translational and rotational rigid-body moves for single molecules. Like any type of move coupling, this means that up to six independent perturbations of individual degrees of freedom are employed (translation in x,y,z, rotation around three axes) before energies and the acceptance criterion are evaluated. Note that molecules with no rotational degrees of freedom will have their moves counted as pure translation moves in the log-output.

ROTFREQ

This keyword can be used to set the sub-frequency for purely rotational moves if uncoupled moves are used (→ COUPLERIGID is false). It will then determine the fraction of those rotational moves. Total number:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · RIGIDFREQ · ROTFREQ.
And the total number of purely translational moves will be:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0-ROTFREQ)
Note that the above formulas do not account for the choice between randomizing and stepwise perturbations (→ RIGIDRDFREQ), which would introduce an additional factor into the above product.

RIGIDRDFREQ

This keyword sets a terminal choice in the selection tree that is common to many of the moves in CAMPARI (see similar keywords PIVOTRDFREQ, NUCRDFREQ, and so on). Amongst the available rigid-body moves (it applies to three separate branches: coupled single-molecule moves, coupled multiple-molecule moves, and decoupled single-molecule moves), the keyword chooses the fraction to completely randomize the underlying degrees of freedom. For example, the complete randomization of translational degrees of freedom would displace the molecule's reference center to an arbitrary point in the simulation container. The remaining fraction will correspond to stepwise perturbations in which a usually small random increment is added to the degrees of freedom in question. For example, such a move would displace a molecules reference center by a random vector small in absolute magnitude.
As an example consider single-molecule translation moves. The total number of expected randomizing translation moves would be (assuming COUPLERIGID is false):
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0-ROTFREQ) · RIGIDRDFREQ
And the number of stepwise translation moves would be:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0-ROTFREQ) · (1.0-RIGIDRDFREQ)
The same modifications apply to any other branch of rigid-body move as explained above. As an additional complication, the decision about randomization vs. stepwise perturbations is decoupled itself in coupled rigid-body moves. Also note that the log output does not distinguish between the stepwise and randomizing varieties for any move type.

ROTSTEPSZ

For any stepwise perturbation of rotational rigid-body degrees of freedom, this keywords sets the maximum step size in degrees. It is implemented such that the actual step size is drawn with uniform probability from an interval from 0.0° to ROTSTEPSZ°.

TRANSSTEPSZ

For any stepwise perturbation of translational rigid-body degrees of freedom, this keywords sets the maximum step size in Å. Analogous to ROTSTEPSZ, it is implemented such that the actual step size is drawn with uniform probability from an interval from 0.0 to TRANSSTEPSZ Å.

CLURBFREQ

This keywords sets the fraction of all available coupled rigid-body moves to simultaneously perturb the rigid-body degrees of freedom of more than one molecule in concerted fashion. In other words, these moves allow the concerted translation (by the same vector) and rotation (around the "cluster" center-of-mass) of several molecules in one shot.
The expected number of multi-molecule moves would be (assuming COUPLERIGID is true):
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · RIGIDFREQ · CLURBFREQ
And that of coupled single-molecule moves would be:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0-CLURBFREQ)
Currently, the picking of the molecules in a "cluster" is completely random. Note that cluster moves in can easily become tricky: In periodic boundary conditions, the nearest image and hence the internal structure of the cluster may actually change upon rotation of a cluster, whereas in droplet boundary conditions rotations and translations of clusters formed by distal molecules may incur significant boundary penalties and hence be inefficient overall. Like all other rigid-body moves, cluster moves can be stepwise or completely randomizing (still in concerted fashion). This is all regulated by the previously introduced keywords RIGIDRDFREQ, ROTSTEPSZ and TRANSSTEPSZ. The picking frequencies are regulated at the molecule level. With the preferential sampling utility, it is possible to alter the picking weights on a per-molecule basis. Note that this should yield either zero or reasonably large weights for all molecules, because the weights combine in a product sense during the picking process. This also means that it is tedious to compute the expected sampling probabilities for all possible "clusters" of molecules of sizes 2 to the maximum value.

CLURBMAX

This keyword sets the maximum "cluster" size for concerted multi-molecule rigid-body moves (see CLURBFREQ). The assignment is completely random at any given step such that detailed balance is maintained. Note that the number of possible "clusters" grows as binomial coefficients with increasing size of the cluster until CLURBMAX reaches half the number of molecules in the system. It is important to point out that picking values close to the number of molecules can cause search problems that CAMPARI actively avoids. Specifically, if the total sampling weight of available molecules remaining is less than 10%, a new molecule has not been found to add to the "cluster" in 100 tries, and the current size is at least 2, then the value picked initially for CLURBMAX is decreased to the current size. This is to avoid the code spending an excessive amount of time in an inefficient search procedure. The control on total sampling weight is particularly relevant for cases where the picking weights have been altered on account of the preferential sampling utility.

ALIGN

This keyword is an integer indicating how to handle the fact that lever arm effects can be asymmetric in multimolecular simulations. A brief explanation is in order. Consider a macromolecule with multiple dihedral angles along the backbone. Then, a perturbation of an individual of those dihedral angles may be implemented in two basic implementations corresponding to two building directions of the (unbranched) main chain. Either one of the ends will swivel around (lever-arm) while the other remains fixed in place. In a simulation with just a single molecule, the new conformations for either type will be identical except for an implied rotation of the reference frame. In a simulation with multiple molecules, however, the two conformations will be explicitly different since the other molecules define the now static reference frame. In general, moves with longer lever-arms will have lower acceptance rates and are slower to evaluate and should generally be avoided. For MC, this affects polypeptide pivot moves (coupled and uncoupled (see COUPLE)), ω-moves (see OMEGAFREQ), Favrin et al. inexact CR moves (see CRMODE), pivot-type nucleic acid moves (see NRNUC), sugar pucker moves (see SUGARFREQ), and polypeptide cyclic residue pucker moves (see PKRFREQ). It affects single torsion pivot moves (see OTHERFREQ) in a slightly different manner, and this is described there. It is also relevant for torsional dynamics for which it in similar vein determines the assumed building direction for the chains. Options are as follows:
  1. Always leave N-terminus unperturbed (C-terminus swings around).
  2. Always leave C-terminus unperturbed (N-terminus swings around). This is only recommended in special applications since the C-terminal alignment requires the whole molecule to be rotated around, which makes this mode more expensive but analogously asymmetric when compared to mode 1.
  3. Always leave the longer end unperturbed (shorter lever-arm is chosen). This is the default (and a good) choice as it should be the most efficient one for simulations with multiple chains of significant length. It is also the recommended setting for torsional dynamics in which the kinetics at one of the termini will otherwise be artificially slowed (note that the criterion determining lever arm length uses number of atoms rotated rather than number of residues in dynamics).
  4. A stochastic modification of mode 3 only available in MC: The probability with which the longer end swivels around is equal to:
    plt = (Lst + 1) / (Lst + Llt + 2)
    And conversely:
    pst = (Llt + 1) / (Lst + Llt + 2)
    Here, Lst is the smaller number of residues beyond the pivot point towards the nearer terminus and Llt is the larger number of residues beyond the pivot point towards the more distant terminus such that Lst+Llt+1 yields the total number of residues in the molecule. For example, a molecule with six residues would yield probabilities for doing C-terminal alignment (the N-terminus swings around) of 6/7 for residue 1, 5/7 for residue for residue 2, and so on down to 1/7 for residue 6.
    This choice represents the most flexible move set and should normally be preferred in MC when sampling problems are encountered.
Note that the "N"-terminus in polynucleotides is the 5'-end and the "C"-terminus the 3'-end. In general, when set to 2 or 3, this keyword may (currently) have slightly different effects in torsional dynamics vs. MC calculations. This is because the former considers each polymer as a generic branched chain, which means that, e.g., in lysine peptide the sidechain becomes the "terminus" owing to its greater length. As a general rule, terminal dihedral angles rotating only one or more hydrogen atoms are not considered for alignment regardless of setting. Additional, terminal dihedral angles may also be excluded in MC. It is relatively straightforward to extract the convention in use by running a short test simulation without rigid-body movement and comparing initial and final structures. It should be mentioned that the N-terminal alignment corresponds exactly to the way molecules are built (hierarchically) in CAMPARI from their Z-matrix entries, and that this is a purely technical choice that should not have an impact on the thermodynamics of a system.

COUPLE

If this keyword is set to 1 (logical true), all polypeptide pivot moves are coupled to sidechain moves on the same residue (→ PIVOTMODE). This means that new conformations for the φ- and ψ-angles as well as for all of the sidechain χ-angles are proposed before the energy and acceptance criterion are evaluated. Like any other unbiased move perturbing multiple degrees of freedom, this procedure drastically increases the chance of generating an unacceptable conformation (assuming a typical excluded-volume interaction potential is used). Consequently, acceptance rates will be very low and it is generally not recommended to use this option. Note that it is still possible to use independent sidechain moves but that it is impossible to do independent pivot moves for residues with sidechains. In other words, all frequency settings are used as normal but all standard polypeptide pivot moves (the default move type of the decision tree) are coupled to a mandatory sidechain move (of all sidechain angles in that residue). Keywords PIVOTRDFREQ, PIVOTSTEPSZ, CHIRDFREQ, CHISTEPSZ are observed in the respective parts of the coupled moves while NRCHI and CHICYCLES are not.
The expected number of those coupled moves would be:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · (1.0-NUCFREQ) · (1.0-PKRFREQ) · (1.0-OTHERFREQ)
Note that the same formula applies to uncoupled polypeptide pivot moves.

PIVOTMODE

Polypeptide pivot moves are historically the oldest move type in CAMPARI. Therefore, they are placed at the outermost branch of the move selection tree and possess no frequency selection keyword. In general, pivot moves simultaneously sample the φ- and ψ-angles of a single polypeptide residue unless the residue is ring-constrained (such as proline or hydroxyproline) in which case only the unconstrained degree of freedom (ψ for proline) is sampled. See PKRFREQ for "pivot" moves which sample the φ-angles of proline and analogous residues. The default picking probabilities for polypeptide moves are even for all residues with peptide φ/ψ-angles. They can be adjusted with the help of the preferential sampling utility. An example where this can be useful is in reducing the picking weight of proline and similar residues, for which the number of degrees of freedom is smaller.
Mostly for historical reasons, this keyword allows the selection of different modes for pivot moves as follows:
  1. Blind backbone sampling, i.e., all angles have equal likelihood (unbiased and the default)
  2. Using grids (requires GRIDDIR), i.e., angle pairs are sampled come from within an approximate envelope derived from the space available to the corresponding dipeptide if one assumes typical excluded volume interactions (biased).
The second option is an attempt to improve conformational space sampling but introduces bias. It is not fully supported at the moment and may be removed entirely in the future.

PIVOTRDFREQ

Much like for other move types, CAMPARI allows the user to mix two types of polypeptide pivot moves: the first randomizing the φ- and ψ-angles of the residue in question (for proline only the ψ-angle, for coupled moves also the sidechain χ-angles → COUPLE), the second perturbing them by a small increment whose size is set by the auxiliary keyword PIVOTSTEPZ. Note that randomizing moves may be extremely ineffective for the sampling of dense phases (collapsed states of macromolecules) and that the only accepted moves will be those realizing small displacements by chance.
To calculate the expected number of randomizing and stepwise polypeptide pivot moves, the user may employ the formula listed under COUPLE and multiply it with PIVOTRDFREQ and 1.0-PIVOTRDFREQ, respectively.

PIVOTSTEPSZ

This keyword sets the step size in degrees for local perturbation attempts to the φ- and ψ-angles of polypeptide residues (see PIVOTRDFREQ). Note that this step size encompasses the entire symmetric interval around the original position, i.e. a value of 10° will attempt uniformly distributed random displacements within the interval of -5° to 5°.

GRDWINDOW

This keyword sets a parameter determined by external input files which are used to assist conformation space sampling in biased fashion when PIVOTMODE is set to 2. Then, GRDWINDOW needs to specify half the bin size for the steric grids (see GRIDDIR). The files are supplied in the data-directory and the default value to be used here would be 5.0°. Note that grid-assisted sampling is not a fully supported option in CAMPARI and may be removed entirely in the future.

OMEGAFREQ

In polypeptides, the dihedral angle along the actual peptide bond (ω) is different from the φ- and ψ-bonds since the carbon and nitrogen atoms have partial sp2-character. This inhibits free rotation around the bond due to electronic effects and means that only a very narrow range of conformations is typically available to the ω-angle. The two dominant states are the planar cis- and trans-conformations with the latter being almost exclusively seen for non-proline residues and both contributing for proline. In molecular mechanics force fields, these effects are typically represented via strong torsional potentials (see SC_BONDED_T and SC_EXTRA). From a sampling point of view, this means that it would be unwise to couple the sampling of such a stiff degree of freedom to any other degree of freedom. ω-moves therefore perturb nothing but the ω-angle of an individual polypeptide residue. They technically are equivalent to pivot moves in that the "free" end will swivel around lowering the acceptance rates additionally if the perturbations are large (→ ALIGN).
To calculate the number of expected ω-moves use:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · OMEGAFREQ
Note that the moves are additionally split up into those attempting to completely randomize the ω-angle and those that attempt stepwise perturbations (→ OMEGARDFREQ). It should be emphasized that the randomizing move will typically be the only way of converting between cis- and trans-conformations due to the height of the barrier separating the two. The default picking probabilities are identical for all residues with ω-type bonds. They can be adjusted with the help of the preferential sampling utility, and such adjustment could be useful in mixed systems with small molecule amides and polypeptides, where it may be beneficial to preferentially sample the polypeptide ω-bonds.

OMEGARDFREQ

This keyword is completely analogous to PIVOTRDFREQ but applies to ω-moves instead of φ/ψ-moves.

OMEGASTEPSZ

This keyword is completely analogous to PIVOTSTEPSZ but applies to ω-moves instead of φ/ψ-moves.

PKRFREQ

Of the fraction of all pivot-type polypeptide backbone moves, what is the fraction of backbone moves to selectively alter the dihedral angles around the N-Cα bond in proline or similar residues? These rotations are hindered by the presence of the ring and hence they cannot be sampled independently. Moves of this type therefore alter the pucker state of the amino acid sidechain belonging to the chosen residue and the backbone conformation of the polypeptide (pivot-type move) simultaneously. These moves are analogous to sugar pucker moves for polynucleotides (see SUGARFREQ).
The expected number of polypeptide pucker moves would be:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · (1.0-NUCFREQ) · PKRFREQ
Note that these moves are split up into two variants - a non-ergodic one which inverts the pucker state, and one which introduces new degrees of freedom, bond angles, but allows sampling of most of the relevant phase space (bond length changes remain quenched). This is determined by PKRRDFREQ. When analyzing high-resolution structural databases, it can be seen that proline residues occupy two dominant pucker states separated by a barrier. The non-ergodic move can jump across this barrier but is unable to explore the basin around its current position. The latter requires bond angle changes as otherwise the problem is overconstrained. This introduction of new degrees of freedom is generally undesirable (see discussion under ANGCRFREQ) but in this particular case of small impact since none of the bond angles along the main chain are allowed to change. This keeps the effects of bond angle changes local while allowing exploration of the continuous manifold of conformations of the five-membered ring.
The exact set of degrees of freedom used to sample the ergodic move type is explained in detail elsewhere, and an implementation reference is given in the literature. The default picking probabilities for this move type are flat for all polypeptide residues possessing ring pucker degrees of freedom. The probabilities can be adjusted by the preferential sampling utility, and this could be used to fine-tune sampling weights in polymers. For example, puckering equilibria for central residues in polyproline are expected to be both more relevant and more difficult to sample than those for terminal residues and may benefit from being sampled preferentially.

PKRRDFREQ

As pointed out above, finding arbitrary conformations of a five-membered ring while keeping all bond lengths and angles constant is an overconstrained problem (→ PKRFREQ). Therefore, CAMPARI releases the constraint on bond angle rigidity for those systems which include proline and similar polypeptide residues. This necessitates the use of bond angle potentials (see SC_BONDED_A) to keep local geometries reasonable. To sample different ring conformers effectively, CAMPARI uses a strategy of combining a non-ergodic reflection of the pucker step (non-local) with stepwise but unbiased excursions away from the current state. This keywords regulates the fraction of pucker moves to be of the former type (reflection). The formulas listed under PKRFREQ multiplied with PKRRDFREQ and (1.0-PKRRDFREQ), respectively, would give the expected numbers for either type. Note that it typically is not a good idea to set this to either zero or unity. A value of unity would create an effective two-state model (with fixed bond angles), while a value of zero would make it very difficult for the gross pucker state to switch due to the barrier separating the two (this last statement assumes typical interaction potentials).

PUCKERSTEP_DI

This keyword applies to the second type of pucker sampling (see PKRRDFREQ) and controls the maximum step size for dihedral angles in degrees for the random stepwise excursions from the current state. It simultaneously applies to the problem of sugar pucker sampling (→ SUGARFREQ). In both cases, four of the seven freely sampled degrees of freedom are dihedral angles.

PUCKERSTEP_AN

This keyword applies to the second (stepwise) type of pucker sampling (see PKRRDFREQ) and controls the maximum step size for bond angles in degrees for the random stepwise excursions from the current state. Much like PUCKERSTEP_DI, this keyword simultaneously applies to the problem of sugar pucker sampling (→ SUGARFREQ). In both cases, two of the seven freely sampled degrees of freedom are bond angles and one bond angle is derived to correctly close the loop.

NUCFREQ

This keyword controls the frequency of all types of polynucleotide moves excepting those sampling just sidechain degrees of freedom. This set includes algorithms to sample stretches of polynucleotides with end-constraints (concerted rotation → NUCCRFREQ), dedicated algorithms to sample the constrained dihedral angles around the sugar bond (→ SUGARFREQ), and simple polynucleotide backbone pivot moves. The description below applies only to the latter type which does not possess a dedicated keyword but is the default fall-through choice for this branch of the decision tree.
Non-terminal polynucleotides have six backbone degrees of freedom one of which is not sampled by this type of move. Much like for proline, the rotation around the sugar bond is hindered and a dedicated algorithm is needed to sample this dihedral angle (→ SUGARFREQ). An overview of the backbone degrees of freedom for terminal and non-terminal nucleotides can be gleaned from the description of sequence input. Nucleotide pivot moves are physically analogous to polypeptide φ/ψ-moves in that they sample the backbone of a single nucleotide residue. The new conformation will imply the rotation of a lever arm which will render large-scale perturbations very unlikely to be accepted (→ ALIGN). Technically, these moves are implemented slightly differently in that the number of sampled degrees of freedom may vary (→ NRNUC). This is to make it possible to fine-tune sampling efficiency. As with any move coupling the sampling of independent degrees of freedom blindly, efficiency will typically be unacceptably low for more than two backbone dihedral angles given a realistic interaction potential and the complicated topology of polynucleotides. In the future, these moves are sought to cover any type of non-polypeptide polymer and the flexible setup was implemented partially with that in mind.
Expected numbers for all polynucleotide pivot moves may be calculated as follows:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · NUCFREQ · (1.0-NUCCRFREQ) · (1.0-SUGARFREQ)
Remember that NUCFREQ does not control the fraction of polynucleotide pivot moves directly but only sets the expected number for all polynucleotide moves. Note that the moves are additionally split up into those attempting to completely randomize the nucleotide backbone angles and those that attempt stepwise perturbations (→ NUCRDFREQ). The default picking probabilities for these pivot moves are flat on a per-residue basis. They can be adjusted by the preferential sampling utility, and this could become routinely relevant in future applications, for which other polymer types are subjected to pivot moves through this facility. In such a case, it would almost certainly be desirable to make the picking frequencies (at the very least) proportional to the number of backbone degrees of freedom in each residue, which may not necessarily be homogeneous.

NRNUC

This keyword allows the user to set the maximum number of nucleic acid backbone angles to be sampled within a pivot polynucleotide move. The dihedral angles will always come from the same residue. The implementation has the following features:
  • Whenever NRNUC is equal to or larger than the number of backbone angles on a certain residue, all backbone angles on that residue will be sampled simultaneously.
  • Whenever NRNUC is smaller than the number of backbone angles on a certain residue, on average NRNUC of the available angles should be sampled simultaneously. However, the actual average will be larger since always at least one angle has to be sampled (in other words, there is a stochasticity to the number of angles chosen, and the asymmetry is introduced by the constraint to always have at least one angle in the set).

NUCRDFREQ

This keyword is completely analogous to PIVOTRDFREQ but applies to polynucleotide backbone pivot moves instead of φ/ψ-moves.

NUCSTEPSZ

This keyword is completely analogous to PIVOTSTEPSZ but applies to polynucleotide backbone pivot moves instead of φ/ψ-moves.

NUCCRFREQ

This keyword sets the fraction of exact nucleic acid concerted rotation (CR) moves amongst all nucleotide moves. Concerted rotation algorithms are provided both for polypeptides and polynucleotide and function generally analogously although there are important implementation differences. Important general information for this type of move is provided elsewhere, along with parameters that apply to all variants of exact CR moves (such as UJCRBIAS, UJCRSTEPSZ, and UJCRWIDTH). In particular, the reader is referred to both the literature and the documentation on CR moves for polypeptides (→ CRFREQ and TORCRFREQ) in particular with regards to the interpretation of auxiliary keywords (NUCCRMIN and NUCCRMAX) and the handling of picking probabilities and their alteration by user-level constraints and preferential sampling weights.
The general idea of a concerted rotation move is to sample a stretch of polymer without changing the absolute positions and relative orientation of the termini. Six degrees of freedom are required to solve this constrained problem. Note that for nucleic acid CR moves the rotation around the sugar bond (C4*-C3*) is always excluded from the algorithm (treated as a rigid segment). The order of angles is as follows:
  1. Any number of consecutive and permissible backbone dihedral angles immediately preceding nuc_bb_4 on residue i
  2. O5P-C5*-C4*-C3* (nuc_bb_4 on residue i)
  3. C4*-C3*-O3P-P (nuc_bb_5 on residue i)
  4. C3*-O3P-P-O5P (nuc_bb_1 on residue i+1)
  5. O3P-P-O5P-C5* (nuc_bb_2 on residue i+1)
  6. P-O5P-C5*-C4* (nuc_bb_3 on residue i+1)
  7. O5P-C5*-C4*-C3* (nuc_bb_4 residue i+1)
The first entry is the so called pre-rotation segment, and the remaining six degrees of freedom are those for which to find a numerical solution that closes the chain exactly without changing the position of any atom from C3* on residue i+1 onward. The solution is obtained through a 1D numerical root search in an algebraically transformed set of equations. This part follows the work of Aaron Dinner exactly and its parameters are explained in further detail below. The pre-rotation segment uses its own biasing scheme for improved efficiency based on the work by Favrin et al.. Both types of biases are removed through a modified acceptance criterion. However, it must be pointed out that the reliability of the root search is nonetheless a critical component to obtain an unbiased Markov chain observing detailed balance. If unfamiliar, the user should read all of the documentation on exact concerted rotation methods below in addition to consulting the literature.
The expected number of nucleic acid concerted rotation moves is obtained as follows:
Expected numbers for polynucleotide CR moves may be calculated as follows:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · NUCFREQ · NUCCRFREQ
The user is reminded again that some of the parameters required for this move type apply universally to all exact CR methods while some apply specifically to the nucleic acid variant.

SUGARFREQ

This keyword sets the fraction of polynucleotide backbone moves to selectively alter the dihedral angles around the sugar bond (C4*-C3*) amongst all polynucleotide moves not of the CR variety. Exactly analogous to the case for proline and similar cyclic residues in polypeptides (→ PKRFREQ), these rotations are hindered by the presence of the ring and cannot be sampled blindly. Moves of this type will therefore alter the pucker state of the sugar belonging to the chosen nucleotide and the backbone conformation of the polynucleotide (including lever arm) simultaneously.
The expected number may be calculated as follows:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · NUCFREQ · (1.0-NUCCRFREQ) · SUGARFREQ
The approach chosen to sample sugars is identical to the one for proline. There are two basic move types, one which inverts the pucker state by flipping the sign of two dihedral-angles, and a second one which perturbs the bond angles and dihedral angles defining the 5-remembered ring by small random increments while maintaining bond lengths exactly (→ SUGARRDFREQ). The default picking probabilities for this move type are even for all eligible, sugar-containing residues. They can be adjusted by the preferential sampling utility. An example application could be to preferentially sample sugars close to the binding interface of a well-defined protein-DNA complex rather than those in the rigid portion of the DNA.

SUGARRDFREQ

This keyword is exactly analogous to PKRRDFREQ but applies to sugar pucker moves in polynucleotides instead of to polypeptide pucker moves.

CHIFREQ

Most biologically relevant polymers possess at least minor branches off the main chain. These sidechains are typically short and usually encode the alphabet underlying for instance polypeptides and polynucleotides. From a technical point of view, such short branches are much easier to sample than the backbone of a polymer since the impact of a change in conformation of the branch only affects the branch (lever arm effects are minimal and the assumed direction is always from the main chain outward towards the end of the branch). Since the perturbation is local, energy evaluations are much less costly and acceptance rates generally higher. There is no need for advanced algorithms and simple pivot-style moves re-setting or perturbing the dihedral angles angles in such a sidechain branch are sufficient to explore phase space. This keyword sets the fraction of all sidechain moves including a specialized move type used for analysis only (→ PHFREQ).
Expected numbers for actual sampling moves (denoted as χ-moves) are:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · CHIFREQ · (1.0-PHFREQ)
And for moves trying to determine the pK-values of ionizable polypeptide sidechains:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · CHIFREQ · PHFREQ
Note that the former are decomposed further into those randomizing the contributing degrees of freedom and those applying stepwise perturbations (→ CHIRDFREQ). The default picking probabilities for this move type give equal weight to all residues with at least one χ-angle independent of the number of χ-angles. This can be adjusted by the preferential sampling utility, which as an example would allow making all residue picking probabilities directly proportional to the number of χ-angles for each residue.

CHIRDFREQ

This keyword is completely analogous to PIVOTRDFREQ but applies to χ-moves instead of φ/ψ-moves.

CHISTEPSZ

This keyword is completely analogous to PIVOTSTEPSZ but applies to χ-moves instead of φ/ψ-moves.

NRCHI

Many sidechains have different numbers of χ-angles and the complexity of a move would depend on the number of such angles sampled concurrently. Therefore, this keyword allows the user to set the maximum number of χ-angles to be sampled within a sidechain move. The dihedral angles will always come from the same sidechain on the same residue. Analogously to NRNUC, the implementation has the following features:
  • Whenever NRCHI is equal to or larger than the number of χ-angles on a certain residue, all χ-angles on that residue will be sampled simultaneously.
  • Whenever NRCHI is smaller than the number of sidechain angles on a certain residue, on average NRCHI of the available angles should be sampled simultaneously. However, the actual average will be larger since always at least one angle has to be sampled (in other words, there is a stochasticity to the number of angles chosen, and the asymmetry is introduced by the constraint to always have at least one angle in the set).

OTHERFREQ

MC move sets are highly specialized tools that have to reflect the choice of the system's degrees of freedom, its density, etc. Some of the choices enforced by the "standard" CAMPARI move sets and mandated by the default parameterization of the ABSINTH implicit solvent model are somewhat arbitrary. This is primarily an issue for degrees of freedom describing rotations around electronically hindered bonds and for rotations around terminal bonds between heavy-atoms (methyl and ammonium spins). For example, the amide bond in secondary amides is allowed to vary with dedicated moves, but these are not available for primary amides (the reasoning behind it is connected to the vanishing relevance of cis/trans isomerization in the latter case). However, these choices may not always be desirable. Second, when attempting to simulate entities that CAMPARI does not support natively, the majority of "standard" move types may not be available (exceptions apply if the entities are recognized as conforming to a supported biopolymer type). This would limit simulations containing such entities to pure rigid-body sampling.
To address both issues, CAMPARI offers a separate class of dihedral angle pivot moves that can be applied to any freely rotatable torsion angle in any of the system's components. There is a requirement that the Z-matrix be constructed such that only a single Z-matrix angle needs to be edited to describe the perturbation, and this is true for all candidate dihedral angles in residues supported natively by CAMPARI that are frozen by default (e.g., C-N bond in the lysine sidechain, all C-N bonds in primary amides, CA-CB bond in alanine, and so on). For unsupported residues, the Z-matrix is inferred from the input structure, and it may require some reordering of atoms to achieve the desired results (see a tutorial relevant in this context). In addition, these moves can also sample torsional degrees of freedom supported by other move sets as long as they fulfill the Z-matrix criterion (this currently excludes the polypeptide φ/ψ-angles, which are supported by the widest range of specialized move sets).
In terms of parameters, some care has to be taken that torsional potentials describing electronic effects (e.g., in primary amides) are included. Technically, moves of this type are unique in that they always sample only a single degree of freedom. Chain alignment works slightly differently for these moves. Specifically, for options 3 and 4, the number of atoms (rather than the number of residues) moving is critical in determining alignment. Also, all degrees of freedom are eligible for an inverted alignment including sidechain degrees of freedom. Even for option 3, this may consequently lead to the absence of a "base of motion" that would stay rigorously in place in the absence of rigid body moves. For option 2, CAMPARI attempts to preserve a well-defined base of motion at the C-terminus, but this may not work as expected, in particular for polynucleotides and/or very short chains.
To calculate the number of all expected moves of type of OTHER, use:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · (1.0-NUCFREQ) · (1.0-PKRFREQ) · OTHERFREQ
Note that these moves are additionally split up into three basic types (see OTHERUNKFREQ and OTHERNATFREQ for choosing different subsets of degrees of freedom), each of which is again split into two variants, i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). For each subcategory of degrees of freedom, sampling weights can be adjusted individually with the preferential sampling utility. Details and examples are given for the individual subcategories.

OTHERUNKFREQ

If single dihedral angle pivot (OTHER) moves are in use, and if the simulation utilizes entities (residues, molecules) that are not natively supported by CAMPARI, this keyword allows the user to choose the bulk sampling weight for degrees of freedom in those unsupported residues. The use of unsupported residues in simulations is explained in a dedicated tutorial.
To calculate the number of expected moves acting on single dihedral angles in unsupported residues, use:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · (1.0-NUCFREQ) · (1.0-PKRFREQ) · OTHERFREQ · OTHERUNKFREQ
As mentioned above, these moves are additionally split up into two subtypes i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). They can be adjusted at the level of individual degrees of freedom by the preferential sampling utility. As an example, this can be useful when sampling an unsupported polymer (e.g., a polyester) and greater sampling emphasis should be placed on backbone degrees of freedom.

OTHERNATFREQ

If single dihedral angle pivot (OTHER) moves are in use, and if not all OTHER moves are consumed on unsupported residues (→ OTHERUNKFREQ), this keyword allows the user to choose the bulk sampling weight amongst remaining OTHER moves for degrees of freedom that are supported natively by CAMPARI.
To calculate the number of expected moves acting on single dihedral angles natively supported, use:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · (1.0-NUCFREQ) · (1.0-PKRFREQ) · OTHERFREQ · (1.0 - OTHERUNKFREQ) · OTHERNATFREQ
This keyword also controls the fraction of moves acting on dihedral angles frozen by default, but located in residues supported natively by CAMPARI. Compute expected number as:
NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · (1.0-CRFREQ) · (1.0-OMEGAFREQ) · (1.0-NUCFREQ) · (1.0-PKRFREQ) · OTHERFREQ · (1.0 - OTHERUNKFREQ) · (1.0 - OTHERNATFREQ)
Both subclasses are additionally split up into two subtypes i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). They can be adjusted at the level of individual degrees of freedom by the preferential sampling utility. For the natively supported degrees of freedom, this could be useful in order to aid sampling of backbone degrees of freedom, whereas for the natively frozen degrees of freedom it could be used to selectively enable a few of those degrees of freedom (e.g., enable flexibility of arginine sidechains, but keep suppressing the methyl spins in hydrophobic residues).

OTHERRDFREQ

This keyword is completely analogous to PIVOTRDFREQ but applies to all moves of type OTHER instead of polypeptide backbone pivot moves.

OTHERSTEPSZ

This keyword is completely analogous to PIVOTSTEPSZ but applies to all moves of type OTHER instead of polypeptide backbone pivot moves.

CRFREQ

This keyword is a global frequency setting which controls and entire branch of Monte Carlo moves all sharing the feature that they are of the concerted rotation (CR) type and apply to polypeptides. The general idea of a CR move is to sample a stretch of polymer without changing the absolute positions and relative orientation of the termini. Six degrees of freedom are required to solve this constrained problem exactly but simpler methods exist to use more degrees of freedom to solve it approximately (→ CRMODE). The reader is referred to NUCCRFREQ for CR moves on polynucleotides.
There are four different types of CR moves for polypeptides provided in CAMPARI:
  1. Exact CR moves utilizing both bond angles and dihedral angles along the polypeptide backbone to solve the closure problem exactly given fixed end constraints: these moves are based on the work of Ulmschneider and Jorgensen (→ ANGCRFREQ). (reference)
  2. Exact CR moves utilizing φ-, ψ-, and ω-angles along the polypeptide backbone to solve the closure problem exactly given fixed end constraints: these moves are primarily based on the work of Dinner (→ TORCRFREQ and TORCROFREQ). (reference)
  3. Exact CR moves utilizing just φ- and ψ-angles along the polypeptide backbone to solve the closure problem exactly given fixed end constraints: these moves are also based on the work of Dinner (→ TORCRFREQ and TORCROFREQ).
  4. Inexact CR moves utilizing just φ- and ψ-angles along the polypeptide backbone to approximate a solution to the closure problem by linear response: these moves are based on the Favrin, Irbäck, and Sjunnesson (default fall-through for this branch). (references)
Note that the default picking probabilities for all CR moves (including those for polynucleotides) are chosen such that every final residue in a CR-eligible stretch of minimal length has equal probability. There are many factors influencing the eligibility, most notably terminal residues are excluded to different extents in the four algorithms described above, and cyclic residues (proline, etc.) are only supported in the exact torsional variants (caveats apply). The minimal length aspect is also handled differently in different algorithms. Specifically, the exact torsional variants will allow dynamic shortening of the entire stretch to include the minimal number of degrees of freedom (see TORCRMIN_DJ and TORCRMIN_DO; this also holds for the analogous move type for nucleotidesNUCCRMIN). This means that more residues are eligible. Conversely, both the exact bond angle variant and the approximate torsional variant enforce the lower limit specified for the stretch length leading to a more restrictive scenario (see UJCRMIN and CRDOF). Finally, the inexact torsional moves also may require additional buffer residues (see CRMODE). It is important to point out that this does not lead to equivalent numbers of attempted CR moves for all eligible residues. This is because eligible stretches are generally overlapping leading to a bias toward sampling residues that are part of many stretches. This is particularly true if the maximum stretch length possible is large (it results from a combination of the length of the polymer and keywords UJCRMAX, TORCRMAX_DO, TORCRMAX_DJ, NUCCRMAX, and CRDOF). The default picking probabilities for moves of this general type can be adjusted via the preferential sampling utility. Note that there is a very important difference compared to the application of user-level constraints. Whereas the latter render all possible CR stretches ineligible that contain the constrained residue, the preferential sampling utility will only alter the probability for a residue being picked as the final (most C-terminal) residue in a CR stretch. This means that the residue is generally not actually constrained even if the picking weight is chosen as zero.
The general appeal of exact CR methods partially lies in the reduced complexity of energy evaluations since the move only perturbs conformation locally and large parts of the polymer (assuming sufficient length) will remain static with respect to each other. This is never true for pivot-type moves applied to residues at the center of the chain. The other aspect which makes CR moves appealing is that they introduce correlation into the MC move set (the reader is referred to Vitalis and Pappu for further reading).
To compute expected numbers, use (same numbering as above):
  1. NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · CRFREQ · ANGCRFREQ
  2. NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · CRFREQ · (1.0-ANGCRFREQ) · TORCRFREQ · TORCROFREQ
  3. NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · CRFREQ · (1.0-ANGCRFREQ) · TORCRFREQ · (1.0-TORCROFREQ)
  4. NRSTEPS · (1.0-PARTICLEFLUCFREQ) · (1.0-RIGIDFREQ) · (1.0-CHIFREQ) · CRFREQ · (1.0-ANGCRFREQ) · (1.0-TORCRFREQ)

ANGCRFREQ

This keyword selects the (sub-)fraction of Ulmschneider-Jorgensen (UJ) CR moves (see J. Chem. Phys. 118 (9), pp4261-4271 (2003)) according to the formulas shown above. Like any other exact CR move implemented in CAMPARI, UJ-CR moves combine two strategies for efficient conformational sampling: the approach of Favrin et al. (→ CRMODE) is used to obtain a variable length pre-rotation which biases the end of the pre-rotation segment to a position with high chance of having at least one real solution when attempting to close it. The closure problem is solved exactly using a numerical root search for an algebraically transformed equation for the following six degrees of freedom:
  1. Dihedral angle Ci-2, Ni-1, Cα,i-1, Ci-1i-1)
  2. Bond angle Ni-1, Cα,i-1, Ci-1
  3. Dihedral angle Ni-1, Cα,i-1, Ci-1, Nii-1)
  4. Bond angle Cα,i-1, Ci-1, Ni
  5. Bond angle Ci-1, Ni, Cα,i
  6. Dihedral angle Ci-1, Ni, Cα,i, Cii)
This assumes the C-terminal constraint is defined by the "latter half" of residue i. The pre-rotation segment may be of variable length but must employ degrees of freedom immediately preceding the closure segmen(→ UJCRMIN and UJCRMAX). The use of this method in general and specifically within CAMPARI is marred as follows:
  1. The chain closure algorithm relies on a search process to locate roots for a complicated equation, which makes repeated matrix operations necessary which generate a considerable computational overhead for a single UJ-CR move. This is true for all exact CR methods, even much more so for exact torsional variants than for UJ-CR moves (→ TORCRFREQ).
  2. The inclusion of bond angles into the pre-rotation stretch is not a particularly useful extension but required for reasons of ergodicity. Additional parameters are needed to manage this aspect properly (→ UJCRSCANG). The inclusion of bond angles in the closure segments simplifies the root search procedure by eliminating branches for solution space and generally reducing the number of possible solutions. This makes the algorithm faster than comparable methods using dihedral angles only. However, varying bond angles cause two crucial issues:
    1. Allowing bond angles to change violates CAMPARI's typical paradigm of fixed geometry in MC calculations and therefore might invalidate some of the force field calibration done under this assumption. In general, it is very important to match the degrees of freedom chosen for the calibration phase of a force field with that for the application phase. The commonly held belief that the introduction of constraints does not alter the positions and relative weights of basins but merely influences barriers in the free energy landscape is not correct.
    2. CAMPARI currently has no way of independently sampling bond angles in Monte Carlo simulations. This means that effectively a subset of all bond angles are introduced as new degrees of freedom, for which is there is no a priori justification whatsoever (in other words: selectively sampling a few bond angles makes unjustified assumptions about the remaining bond angles). It is therefore recommended to use this feature with the utmost caution until a more sound implementation surrounding it is added. Presently, it may be most suitable as part of the MC move set in hybrid runs (see DYNAMICS) employing Cartesian sampling in the dynamics portions (see CARTINT) although this approach has its own caveats.
Note that there currently is no analogous implementation of this approach for nucleic acids even though Ulmschneider and Jorgensen developed one. The only CR moves available to nucleotides are exact torsional CR moves (→ NUCCRFREQ). Finally, for details on picking probabilities, see above.

TORCRFREQ

Aside from the UJ-CR moves which employ bond angles (see ANGCRFREQ), analogous methods have been formulated to instead employ exclusively dihedral angles in both the closure and pre-rotation stretches. This keyword sets the frequency with which both subtypes of those moves occur during the simulation according to the formulas listed above. The preceding discussion has outlines the appeal of exact CR methods and it is not repeated here. Much like Ulmschneider and Jorgensen, CAMPARI employs a hybrid scheme of biased pre-rotations according to Favrin et al. (see CRMODE) and of exact closures according to Dinner. The latter half of the algorithm is the cost-intensive one. The algebraically transformed equation requires a numerical root search, for which we use a modified Newton scheme outlined below. Typically, multiple solutions need to be found, and a careful weighting and bias-removal strategy has to be employed to choose solutions with the proper probabilities (→ TORCRMODE). Those comments apply equally to exact polynucleotide CR moves (see NUCCRRFREQ). For polypeptides, there are two variants available which differ in which peptide torsions are used to close the chain (described below).
Note that proline (or any other cyclic residue with constrained flexibility around any of the backbone dihedral angles) causes additional problems. In theory, one could formulate algebraic solutions which skip the proline φ-torsion. Since the number and positions of proline residues in the closure stretch are not known a priori, this appears impractical. We therefore provide a coupling to (weakly biased and simplified) pucker moves (see PKRFREQ) which will simultaneously determine and propose a new pucker state while solving the chain closure problem. This means that:
  1. Sampling of the φ-angle becomes coupled to the proline sidechain conformation (as it should be).
  2. The acceptance rate for CR moves will be significantly lower due to the extra degrees of freedom included.
  3. The sampling of the sidechain conformation will be weakly biased towards proper pucker states. In detail, some of the proposed closures will yield φ-angle values incompatible with sidechain closure and those will be discarded. For those which yield a sane φ-angle, a corresponding χ1-value is proposed with bias toward closable states. One of two free bond angles is perturbed slightly in random fashion and the last one is given by the closure as usual.
  4. Due to the above, it will be advantageous to not rely overly on CR-sampling for proline-rich systems - both for reasons of efficiency and accuracy. Conversely, it should be difficult to find a statistically significant impact of the sampler on global chain properties for polypeptides with low proline content.
Finally, for details on picking probabilities, see above.

TORCROFREQ

This keyword lets the user set the fraction amongst exact, torsional polypeptide CR moves to include ω-angles in the formulation of the closure problem? Conversely, the remaining moves will use only φ/ψ-angles to close the chain. Expected numbers for either type are listed above. In detail the ω-variant uses the following six degrees of freedom:
  1. Dihedral angle Cα,i-2, Ci-2, Ni-1, Cα,i-1i-1)
  2. Dihedral angle Ci-2, Ni-1, Cα,i-1, Ci-1i-1)
  3. Dihedral angle Ni-1, Cα,i-1, Ci-1, Nii-1)
  4. Dihedral angle Cα,i-1, Ci-1, Ni, Cα,ii)
  5. Dihedral angle Ci-1, Ni, Cα,i, Cii)
  6. Dihedral angle Ni, Cα,i, Ci, Ni+1i)
This implies that the carbonyl or carboxylate carbon atom of residue i with its three covalent bonding partners define the C-terminal constraint. The pre-rotation segments consists of a certain number of torsional degrees of freedom (including ω-angles) immediately preceding the closure segment.
Conversely, for the non-ω-variant we have:
  1. Dihedral angle Ci-3, Ni-2, Cα,i-2, Ci-2i-2)
  2. Dihedral angle Ni-2, Cα,i-2, Ci-2, Ni-1i-2)
  3. Dihedral angle Ci-2, Ni-1, Cα,i-1, Ci-1i-1)
  4. Dihedral angle Ni-1, Cα,i-1, Ci-1, Nii-1)
  5. Dihedral angle Ci-1, Ni, Cα,i, Cii)
  6. Dihedral angle Ni, Cα,i, Ci, Ni+1i)
Again, this implies that the carbonyl or carboxylate carbon atom of residue i with its three covalent bonding partners define the C-terminal constraint. The pre-rotation segments consists of a certain number of torsional degrees of freedom (including ω-angles) immediately preceding the closure segment. This implies that the minimal peptides needed for these moves to be applicable is different: For those including ω-bond sampling, the minimal peptide is an uncapped peptide of three amino acids: the first ψ-angle serves as the pre-rotation segment, and the ω-, φ-, and ψ-angles of residues two and three serve to close the chain. For those not including omega sampling, the minimal peptide is an N-capped peptide of three amino acids in length. Here, the ω-angle of the first amino acid residue serves as the pre-rotation segment and the phi- and ψ-angles of the three peptide residue serve to close the chain. For details on picking probabilities, see above.
The need for different implementations is that the problems differ algebraically (for once) and that the stiffness of the ω-bond may make those moves using the ω-bonds in the closure particularly ineffective. This is not the only reason, however, to favor the non-ω-variant which is also better-behaved in terms of finding solutions to the closure reliably. Note that several diagnostics of the performance of exact CR methods are reported during the simulation and after its completion in the log-file.

CRMODE

This defines the mode to use for concerted rotation moves roughly according to the Favrin et al. reference: J. Chem. Phys. 114 (18), 8154-8158 (2001). In general, this type of move attempts to introduce correlation into a MC move by coupling several consecutive backbone angles (only φ/ψ are considered) together to minimize a cost function which in this case is the difference of the position of the last atom in the stretch compared to its original position. Larger biases lead to smaller moves and higher acceptance. More often than not, this algorithm suffers from its computational inefficiency. Because the loop is only approximately closed, energy evaluations of high complexity (even more expensive than a pivot move) are necessary. It is not recommended to use moves of this type extensively.

There are two modes available:
  1. A matrix relating changes in the degrees of freedom to changes in the cost function (dr/dφ) is computed by considering effective lever arms. In this implementation six effective restraints are imposed through the three reference atoms (N, Cα, C) on the residue following the last one of those whose torsions are sampled (note, though, that algorithmically all nine Cartesian positions are used). Note that this mode therefore requires an additional buffer residue at the C-terminus. Specifically, sampling is possible only within an interval from the third residue (in addition to the ineligible terminal residues, there is a symmetry-creating N-terminal buffer residue as well) to the third last residue in each polypeptide chain. In that sense, these moves are trivially non-ergodic since they fail to sample a subset of the chosen degrees of freedom (i.e., those within terminal residues).
  2. The dr/dφ matrix is computed by nested rotation matrices (propagating changes via matrix multiplication). This directly accounts for peptide geometry within the reference atoms and yields six actual restraints. Here, the reference atoms are Cα, C, and O on the last residue of which torsions are to be sampled. The implementation with nested rotation matrices is costlier and this mode is only marginally supported, i.e., offers very limited adaptability through the keywords below.
These inexact CR moves are supported exclusively for polypeptides and sample only their φ/ψ-angles. For details on picking probabilities, see above.

CRDOF

If inexact concerted rotation moves for polypeptides are in use (→ CRMODE), this keyword allows the user to provide the exact number of torsions to use each time such a move is performed. The default value is eight but a different number may be chosen as long as the chain is long enough to accommodate these moves. A minimum of seven degrees of freedom applies since the linear equations are otherwise overdetermined and only trivial solutions are (asymptotically) found. Note that this keyword is only supported if CRMODE is set to 1. Extensions of this to support mode 2 or to allow random, variable lengths during the simulations are currently not anticipated. This is due to the overall inefficiency of the Favrin et al. approach (see discussion here).

CRWIDTH

This keyword gives the standard deviation in radian of the random normal distribution underlying inexact concerted rotation moves for polypeptides (→ CRMODE), from which the (unbiased) displacement vectors are implicitly drawn. This corresponds to parameter "a" in the reference but is specified here as its inverse (a = 1/CRWIDTH). Note that the actual resultant distribution width is only set by this keyword if the bias toward minimizing the cost function is zero. If the latter is non-zero the resultant distribution width will be co-controlled by the setting for CRBIAS. Note that only values up to π/2 may be specified to avoid wrap-around artifacts which may upset the procedure of removing the bias from these moves.

CRBIAS

This keyword specifies the strength of the bias for inexact concerted rotation moves for polypeptides (→ CRMODE) and corresponds to parameter "b" in the reference. It essentially controls how close the end of the rotated segment will end up to its original position (satisfying the restraints). Unfortunately, this also co-regulates the step size, hence there is a need for parameter optimization (i.e., the variance of the resultant biased distribution cannot be controlled easily). Intuitively, the reason is that - in a linear response-type theory - tiny step sizes always represent one way of satisfying the restraint. Note that with a choice of zero for this keyword, these inexact CR moves relax to random pivot moves of multiple residues in a row (→ CRDOF) with a sampling width controlled by CRWIDTH. Conversely, when choosing very large numbers for this keyword, it should be kept in mind that the evaluation of the acceptance criterion requires inclusion of an exponential factor, exp[- (ΔφT A Δφ) + (Δφ'T A' Δφ') ]. Here, the primed quantities are for the reverse move. Matrix A is diagonal if this keyword is set to zero which implies A = A', and the bias correction is unity. For large values of CRBIAS, the two elements within the exponential become disparate in magnitude very quickly and the exponential may exceed numerical limits even for double precision variables. This may cause some compilers to throw exceptions. Note that the complete bias correction formula includes the determinant of matrix A as well.

UJCRBIAS

Despite its name, this keyword regulates the biasing strength for the pre-rotation steps in all exact CR methods, i.e., nucleic acid CR moves, UJ-CR moves and both types of exact polypeptide CR moves (→ ANGCRFREQ, TORCRFREQ, and NUCCRFREQ). The strength of the bias controls how close the end of the pre-rotation segment remains to its original position hence improving the chances for successful closure. This parameter is strongly co-dependent "with" the default distribution width in the absence of any bias (→ UJCRWIDTH). This keyword parameter is analogous to CRBIAS in the Favrin et al. scheme and is called "c2" in the UJ reference. It should be stressed that all caveats outlined above apply here as well.

UJCRWIDTH

Despite its name, this keyword regulates the general (in the absence of bias) width of the distribution (in degrees) sampled in the pre-rotation segment for all exact CR methods (→ ANGCRFREQ, TORCRFREQ, and NUCCRFREQ). As in the Favrin et al. scheme (which is practically embedded in all exact CR methods implemented in CAMPARI), the resultant width is co-dependent on the bias factor (see UJCRBIAS and for comparison: CRBIAS and CRWIDTH). It corresponds to "1/c1" in the UJ reference and therefore larger values give wider distributions.

UJCRSTEPSZ

The chain closure algorithm works in most exact CR implementations by reducing a multi-dimensional variable search to a 1D root-search, which is then solved by some form of step-through protocol and subsequent bisection. This keyword allows the user to choose the step-size for that root search in degrees for all exact CR methods. Currently, the UJ-CR method (→ ANGCRFREQ) uses a simple, non-adaptive stepping protocol (see also UJCRINTERVAL). Larger step-sizes there increase the speed of the algorithm significantly, but also increase the fraction of attempts in which no solution is found at all (a quantity reported at the end of the log-file). The recommended value by the authors is 0.05°. Conversely, the exact torsional CR methods for both polypeptides and polynucleotides (→ TORCRFREQ and NUCCRFREQ) employ a modified Newton scheme to map out the complete solution space in three hierarchical steps. In those cases, this keywords merely defines the largest step size to ever be used (i.e., if target function and derivative indicate that no root is near, the step size is not adjusted to very large values but instead to the value given by this keyword). For these methods, a setting of around 1.0 appears much more appropriate. In the future, the implementation of the UJ-CR method may be adjusted to use the same protocol as the torsional methods. For clarity, it shall be repeated that this keyword applies to all exact CR methods (but is inapplicable to inexact CR moves: → CRMODE). It is very important to understand that the numerical root search will invariably be unreliable, i.e., that there are conformations for which the function may be approaching zero asymptotically while also approaching imaginary solution space. This implies that with such a technique, it will be nearly impossible to eliminate all biases rigorously although it will be possible to reduce their amplitude below that of statistical noise, even when the settings are such that satisfactory computational efficiency is provided (which of course is a crucial element to consider for expensive algorithms such as exact CR methods).

UJCRMIN

Specifically for the bond angle-based Ulmschneider-Jorgensen algorithm (→ ANGCRFREQ), this specifies the minimum requested length (in terms of number of residues) for the pre-rotation segment in the implementation. Note that if no molecule in the system is at least UJCRMIN+4 residues long (two for closure, two terminal buffer residues that can be caps), CR moves will be disabled entirely. Due to the problems outlined above, this suboptimal implementation has not yet been improved. Note that UJCRMIN and UJCRMAX are analogous to keywords TORCRMIN_DO and TORCRMAX_DO, but use residue numbers instead of numbers of degrees of freedom. Another restriction is that - unlike for TORCRMIN_DO and analogous keywords - UJCRMIN is enforced strictly, i.e., candidate residues are only those that provide the correct padding on either side (for the exact, torsional variants, the specified minimum padding is generally adjusted to the absolute minimum for stretches that would otherwise be too short). Therefore, the implementation of the angular UJ-CR moves generally offers less flexibility.

UJCRMAX

Specifically for the bond angle-based Ulmschneider-Jorgensen algorithm (→ ANGCRFREQ), this keyword specifies the maximum requested length (in numbers of residues) for the pre-rotation segment in those moves. Note that this parameter is automatically reduced if a move is attempted for a molecule which is too short to allow the full range of segment lengths (but long enough to satisfy UJCRMIN of course). This will make it difficult to predict the resultant distribution of pre-rotation segment lengths (compare TORCRMIN_DO).

UJCRINTERVAL

Specifically for the bond angle-based Ulmschneider-Jorgensen algorithm (→ ANGCRFREQ), this keyword lets the user choose the size of the search interval for the one-dimensional root-search (see UJCRSTEPSZ). The algebraically isolated degree of freedom is scanned over the interval [φ-d;φ+d] where φ is the original value and d is the (half-)interval size specified by this keyword. The recommended value is 20.0°. Note that this implementation is unique to the bond angle UJ-CR method and offers much reduced overhead cost per CR move compared to the exhaustive search performed by exact torsional methods. The efficiency and justifiability of the method both rely on the crucial assumption that - given a typical pre-rotation - approximately one solution will be found in the scanned interval. If the number of solutions is often zero or larger than one, the algorithm violates detailed balance and the resultant distributions will be strongly biased. It is generally recommended to analyze the performance of the algorithm beforehand by checking for proper Boltzmann weights in the distributions of both torsional and angular degrees of freedom. This is most easily and meaningfully done employing only bond angle potentials (→ SC_BONDED_A) but no other terms in the Hamiltonian. Then, the distributions of the dihedral angles must be flat and those for the angular degrees of freedom must be such that to -kbT·ln(p(α)) equals the acting bond angle potential on α.

UJCRSCANG

This keyword applies exclusively to the bond angle-based Ulmschneider-Jorgensen CR algorithm for polypeptides (→ ANGCRFREQ). It lets the user set a scaling factor to reduce the magnitude of pre-rotation perturbations of bond angle degrees of freedom (in the absence of pre-rotation bias, resultant width will be proportional to UJCRWIDTH·UJSCRANG → values less than unity are desirable). Large perturbations on those bond angles would reduce the efficacy of the method considerably due to the stiff potentials typically used to keep bond angles in the valid regimes. Note that the UJ-CR method never considers ω-angles for conformational sampling and that they are consequently excluded from pre-rotation sampling in their entirety. This is a bit of an arbitrary choice - in particular when considering the problems introduced by the bond angle sampling in the first place (discussion here) - and remedied in exact but purely torsional CR methods (→ TORCRFREQ). The parameter specified here corresponds to "1/c3" in the UJ reference.

TORCRMODE

Unlike standard MC moves (such as φ/ψ-pivot moves), exact CR methods do not constitute an ergodic move set beyond the subspace satisfying the constraint (which is of course invariant toward sampling on that manifold). This necessitates mixing exact CR moves with other types of moves to achieve sampling of the entire phase space. Moreover, they solve an analytical problem numerically with finite error rate, i.e., not all solutions are always found. If these errors are dependent on the "position" of the constraint, i.e., on polymer conformation, the resultant sampling is biased even though Jacobian corrections are applied. This small bias is nearly impossible to remove entirely. CAMPARI supports two implementations for exact, torsional CR methods:
  • When set to 1, at each step, a superset of solutions is created containing the original solution, a set of alternative closures given the original pre-rotation state, and a set of new conformations with a given, altered pre-rotation state and a set of closures for that altered state. For each solution, the Jacobian determinants with respect to the closure constraint and the pre-rotation constraint are evaluated, multiplied, and a solution is picked using the net Jacobian as a weight factor. The chosen move is then evaluated via the acceptance criterion given the additional bias correction of evaluating the randomness of the pre-rotation move forward and backward as in the Favrin et al. scheme. In the absence of any pre-rotation bias, this algorithm is conceptually rejection-free. It also (in theory) satisfies detailed balance on account of the construction of the solution superset.
  • When set to 2, at each step, a finite number of trials (see UJMAXTRIES) of pre-rotations according to the Favrin et al. scheme is performed. Closure is attempted and in case solutions are found, the possible closures along with the sampled pre-rotation constitute the set of possible moves. A random one is chosen (uniform probability) and the new conformation is evaluated via Metropolis with the Jacobian corrections for the proposed vs. the current state (with respect to both types of constraints) and the randomness correction for the pre-rotation step. Because solutions only need to be found given the pre-rotation, this algorithm is usually twice as fast as the one above given sane pre-rotation settings. This implementation does not satisfy detailed balance even in theory but attempts to remain globally balanced.
Note that this keyword applies to all exact, torsional CR methods, i.e., to torsional polypeptide CR (→ TORCRFREQ and TORCROFREQ) as well as to nucleic acid CR (→ NUCCRFREQ) but that it is not (yet) supported for the Ulmschneider-Jorgensen bond-angle CR algorithm (→ ANGCRFREQ) and inapplicable to the Favrin et al. scheme (→ CRMODE).

TORCRMIN_DO

This specifies the minimum requested number of degrees of freedom for the pre-rotation segment for exact CR moves for polypeptides utilizing ω-angles during closure (→ TORCRFREQ). Note that this minimum number is not rigorously enforced but will be ignored if closure residues too close to the N-terminus are used. This is done in the interest of generality and to prevent the code from disabling these types of moves frequently. It is therefore not as straightforward as one may think to compute the expected distribution of pre-rotation segment lengths (and which residues are part of them with what probability) for each polypeptide. Note that here numbers of degrees of freedom are specified whereas for the bond angle UJ method, numbers of residues are specified (→ UJCRMIN).

TORCRMAX_DO

This specifies the maximum requested number of degrees of freedom for the pre-rotation segment for exact CR moves for polypeptides utilizing ω-angles during closure (→ TORCRFREQ). Note that this maximum number is in fact a rigorous upper limit and never exceeded but that the length of some polypeptides in the system may be such that it is never realizable. In the latter case, there will be an additional complication in predicting the resultant distribution of pre-rotation segment lengths (see TORCRMIN_DO as well).

TORCRMIN_DJ

This keyword is exactly analogous to TORCRMIN_DO but applies to exact CR moves for polypeptides without using ω-angles in the closure.

TORCRMAX_DJ

This keyword is exactly analogous to TORCRMAX_DO but applies to exact CR moves for polypeptides without using ω-angles in the closure.

TORCRSCOME

This parameter is analogous to UJCRSCANG and scales down the magnitude of the step-size for ω-bonds in the pre-rotation segment of exact torsional CR methods for polypeptides. Since stiff torsional potentials usually act on ω-bonds (→ OMEGAFREQ), the likelihood of obtaining rejected moves mostly on account of excursions of the ω-angle is high. This unwanted behavior may be alleviated by employing small values for TORCRSCOME. Remember, however, that the pre-rotation step size will often be relatively small in general.

UJMAXTRIES

Despite its name, this keyword regulates the maximum number of pre-rotation sampling events to consider in exact, torsional CR methods with TORCRMODE set to 2. If no solution is found within UJMAXTRIES, the move is counted as rejected. Naturally, detailed balance is maintained only if there is always at least one solution found given the new pre-rotation (i.e., this keyword is rendered obsolete). As alluded to above, this is never the case for the entirety of a simulation. It is difficult to predict what setting in those cases would best preserve global balance. The main utility of this keyword, however, lies in different sampling applications, e.g., in the efficient and exhaustive sampling of different loop conformations given a fixed constraint.

NUCCRMIN

This keyword is analogous to TORCRMIN_DO but applies to exact CR moves for polynucleotides. Note that the sugar bond (C3*-C4*) is always excluded from pre-rotation sampling.

NUCCRMAX

This keyword is analogous to TORCRMAX_DO but applies to exact CR moves for polynucleotides. Note that the sugar bond (C3*-C4*) is always excluded from pre-rotation sampling.

PHFREQ

This is the frequency out of all sidechain moves (see CHIFREQ) whether to perform a (de)ionization MC move. These moves will be turned off automatically in case there are no titratable residues in the system (currently only polypeptide residues D, E, R, K, and H (use neutral form) are supported). Note that these are pseudo-MC moves, i.e., they do not interface intuitively with the rest of the MC code. This means that the guidance criterion for accepting / rejecting titration moves is based on a distinct and simplified energy evaluation which has no impact on the actual Markov chain. These moves are therefore analyzing (on-the-fly) an independently generated Markov chain (using whatever Hamiltonian was specified) but do not perturb the conformational ensemble generated by said Markov chain in any way. This essentially corresponds to the assumption that the generated ensemble is independent of titration states - an assumption which is always wrong but may - in certain circumstances such as extreme denaturing conditions - nonetheless be justified. These moves rely on environmental settings (PH and IONICSTR) and are required for obtaining output in PHTIT.dat. The default picking probabilities for ionizable residues are flat and cannot be altered.

FRZFILE

This keyword specifies name and location (full or relative path) of the input file for the selection of molecules or residues for which selected degrees of freedom are to be excluded from sampling by explicit removal from Monte Carlo sampling lists and/or by not integrating equations of motion for them. This means that only such degrees of freedom can be constrained that are in fact explicit degrees of freedom of the sampling scheme in use (see DYNAMICS and CARTINT). If this keyword is not present, no constraints are going to be used beyond the system-imposed ones, which may be sampler-dependent. Note that restricting the Monte Carlo move set defines effective constraints not covered here. In Cartesian space, explicit constraints to the x, y, and z coordinates of selected atoms are possible. However, indirect geometric constraints are also supported (differently and independently via SHAKESET).
The input for explicit constraints is described in detail elsewhere. Hard constraints may be necessary for specialized applications, for example when one attempts to just re-equilibrate the sidechains in a folded protein while leaving the fold intact. In general, it will be possible to use restraints (see for example SC_TOR or SC_DREST) as alternatives. Those allow the selected degrees of freedom to respond and fluctuate around a stable equilibrium position.
Note that constraint requests are not entirely arbitrary, and that the level of control being offered depends on the sampling engine. It is is not possible - for instance - to constrain just one out of several χ-angles in a protein sidechain in Monte Carlo simulations. In general, custom constraints in combination with a hybrid sampling approach may prove challenging when trying to match the sampled sets of degrees of freedom between Monte Carlo and dynamics segments. Furthermore, introducing constraints may prohibit certain MC samplers from being applied not just to the residues carrying the constraints but surrounding ones as well (such as concerted rotation methods → CRFREQ) due to underlying and conflicting assumptions. Lastly, CAMPARI will exit with an error if user-selected constraints deplete the sampling list for a given move type entirely. Here, it is requested of the user to explicitly adjust the move set, since otherwise these moves would have to be converted to another type that is not necessarily desirable (note that this still happens if moves are requested that the system simply does not support).

FRZREPORT

If constraints are used (→ FRZFILE) in torsional space simulations, this keyword acts as a simple logical whether or not to write out a summary of the constraints in the system to the log-file.

SKIPFRZ

If constraints are used (→ FRZFILE) in torsional space simulations, this keyword gives the user control over the calculation of effectively frozen interactions due to constraints. In Monte Carlo simulations (see DYNAMICS), incremental energies are computed by only considering the parts of the system that move relative to one another. This automatically addresses constraints. Conversely, in dynamics the total system energy and forces are calculated at each step. If this keyword is set, interactions between parts which have no chance of moving relative to one another (relative orientation completely constrained), will no longer be considered. Note that the potentials rigorously have to be at most pairwise decomposable for this option to be available (e.g., the polar term in the ABSINTH implicit solvation model is not strictly pairwise decomposable; → SC_IMPSOLV and SCRMODEL). Usage of this keyword can significantly accelerate dynamics runs or minimization runs in heavily constrained systems (such as ligand optimizations within a rigid protein binding site). Note that any reported energies do not contain the frozen contributions either if this option is chosen.

PSWFILE

This keyword specifies name and location (full or relative path) of an optional input file parsed to alter the default picking probabilities for all types of moves in CAMPARI at most down to the residue level (but not further). In general, the idea of preferential sampling rests on the realization that any ergodic and unbiased move set is theoretically capable of producing a Markov chain yielding the correct phase space distribution. This means that the sampling weights given to degrees of freedom of the system need not be equivalent, but rather can be chosen arbitrarily (as long as a choice of zero somewhere does not eliminate ergodicity). Of course, the convergence properties of a Monte Carlo simulation are an exceptionally complicated function of the move set, and therefore deviation from default choices should be properly justified. Examples have been listed above, e.g. in the discussion of sidechain sampling.
CAMPARI generally allows the preferential sampling facility to overlap with user-level constraints. Constraints are applied first, and then picking probabilities are altered. In the process, it is possible to effectively introduce additional constraints on account of setting selected sampling weights to zero. This is tolerated as long as it does not deplete the pool for a class of moves entirely. In such a case, the program terminates with an error. There is a notable difference in zero sampling weights and constraint requests for concerted rotation moves of polymers (described elsewhere). Note that it is not possible to control frequencies that would lead to incorrect sampling. In particular, it is impossible to control picking probabilities for particle permutation moves, and particle insertion and deletion moves can only be controlled down to the molecule type level. Rigid-body moves are generally limited to the scope of molecules, not residues. The format of the input file is described elsewhere.

PSWREPORT

If the default picking probabilities are altered (→ PSWFILE) in torsional space Monte Carlo simulations, this keyword acts as a simple logical whether or not to write out a summary of the resultant picking frequencies for every move type that is active and has been modified (to the log-file).



Files and Directories:


(back to top)

Preamble (this is not a keyword)

In general, files and directories should be provided using absolute paths. This is often advantageous in deployment-based computing where relative directory structures and/or shortcuts may change or not exist. However, CAMPARI may fail in reading strings longer than 200 characters leading to truncation and subsequent failure. This should be kept in mind. Also, this section is merely a list of the auxiliary files potentially required by CAMPARI. The functionalities itself (including the files) are usually explained elsewhere.


BASENAME

This keyword allows the user to pick a name for the simulation/system that is going to be used in the names of all structural output files. However, all other output files produced by CAMPARI use generic names and will be overwritten if simulations are continuously run in the same directory.

SEQFILE

This is the most important input file as it instructs CAMPARI which system to simulate. Its format and possible entries are explained in detail elsewhere.

SEQREPORT

This keyword is a simple logical (specifying 1 means "true", everything else means "false") that controls whether CAMPARI writes out a summary of some of the system's features initially. In detail, it will provide an overview of the identified molecule types, the numbers of each molecule type present, the first instance, and their high-level suitability for performing CAMPARI-internal analyses. The latter would for example report that urea molecules are not suitable for peptide-centric analysis such as secondary structure analyses.

ANGRPFILE

See below.

BBSEGFILE

This keyword lets the user choose an input file containing a map annotating φ/ψ-space for polypeptides with canonical secondary structure regions. This mapping is used to perform segment-based analyses of polypeptide secondary structure. CAMPARI provides two such files already (in the data/ subdirectory). These and the files' format are explained in detail elsewhere.

GRIDDIR

This keyword sets the directory CAMPARI browses to find input files for grid-assisted sampling (see above). CAMPARI provides by default sample input files in $CAMPARI_HOME/data/grids/. The code assumes filenames to follow a systematic naming convention "xyz_grid.dat", where xyz is the lower-case, three-letter code of the standard 20 amino acids.
This functionality is de facto obsolete and should not be used. It may be removed entirely in the future.

TORFILE

 See below.

POLYFILE

 See below.

TABCODEFILE

 See below.

TABPOTFILE

 See below.

TABTANGFILE

 See below.

REFILE

 See below.

PCCODEFILE

 See below.

SAVATOMFILE

 See below.

ALIGNFILE

 See below.

TRAJIDXFILE

 See below.

FRAMESFILE

 See below.

CFILE

 See below.

TRAJBREAKSFILE

 See below.

DRESTFILE

 See below.

FEGFILE

This keyword lets the user specify name and location of the input file from which CAMPARI extracts which residues and/or molecules to subject to scaled interaction potentials with the rest of the system in free energy growth (ghosting) calculations.

BIOTYPEPATCHFILE

 See above.

MPATCHFILE

 See above.

RPATCHFILE

 See above.

BPATCHFILE

 See below.

LJPATCHFILE

 See below.

CPATCHFILE

 See below.

FOSPATCHFILE

 See below.

SAVPATCHFILE

 See below.

ASRPATCHFILE

 See below.

NCPATCHFILE

 See below.

FRZFILE

 See above (note that this is not just relevant in Monte Carlo simulations).

PSWFILE

 See above.

SHAKEFILE

 See above.

TRACEFILE

 See below.

PARTICLEFLUCFILE

This keyword is relevant only when ENSEMBLE is set to either 5 or 6 (ensembles with fluctuating particle numbers). It provides the location of the file that specifies the particle types that are allowed to fluctuate, the numbers of particles of those types to initially include in the system, and the chemical potentials of each fluctuating particle type (see here).

WL_GINITFILE

 See above.



Structure Input and Manipulation:


(back to top)

RANDOMIZE

This keyword determines the randomization aspects of initial structure generation. The possible degrees of freedom being randomized are the backbone dihedral angles of flexible chains and the rigid-body coordinates of the various molecules. If one of the excluded volume potentials (see SC_IPP and SC_WCA) is in use, this weakly biases the system toward conformations with little or no steric overlap; otherwise the configurations are completely random. The only exception to this are possible boundary potentials that restrict the randomization of rigid-body coordinates. The excluded volume bias does not occur in any meaningful quantitative fashion, however. It utilizes a finite number of attempts for each degree of freedom (see RANDOMATTS) and applies a universal energy threshold criterion (see RANDOMTHRESH). The resultant system conformations can serve to avoid initial structure biases when running multiple copies of the same simulation. For dense systems (including longer, randomized polymers) in the presence of excluded volume interactions, they will generally be so high in initial energy that a short Monte Carlo-based simulation is indispensable in order to relax the system to a configuration that can then be used to start gradient-based simulations on. These errors can be particularly sever if the molecule(s) contain(s) chemical crosslinks.
In detail, the options are:
  1. No randomization is performed. This option is the default and only available if the complete system configuration is provided by file, generally via PDBFILE. If no file is given, the choice will be changed automatically to option 2 below.
  2. Full randomization is performed. Technically, any polymers' internal conformations are randomized possibly with an excluded volume bias. This happens independently for all molecules and only for those polymer stretches that are not constrained by a crosslink. For every residue in a stretch, the randomization occurs in three phases (1/3 each of the total attempts per residue). In the first, only freely rotatable backbone angles (excluding all pucker and ω-angles) are considered, e.g., the φ/ψ-angles of polypeptides or any backbone-like angles in unsupported residues. Energies are evaluated for residue pairs involving the current residue vs. all residues further toward the N-terminus of the stretch (already processed) and the single residue immediately following in the stretch (not yet processed). If the sum of these energies is less than the threshold, the algorithm proceeds to the next residue. The second phase only comes into play if 1/3 of the attempts are exceeded, and now involves rotatable sidechain angles (excluding those in native CAMPARI residues that are frozen by default) of the current residue as well. The last phase is triggered analogously, and additionally includes all aforementioned degrees of freedom for the residue immediately prior in the sequence. If no satisfactory solution is found, the energetically most favorable one is picked. Similarly, rigid-body coordinates (position of centroid, and rotational orientation) are randomized for every molecule. Here, there is only a single phase with the same number of total attempts (now per molecule). Energies are evaluated in pairwise fashion for all molecules occurring prior in sequence input vs. the current molecule. Intramolecular crosslinks are attempted to be closed with a potentially larger number of trials than set by RANDOMATTS. This is because simultaneous stretch randomization and successful closure are required (also see below). Intermolecular crosslinks are satisfied by moving the second molecule after rigid-body randomization and randomizing the crosslink itself. Both types of crosslinks can easily lead to repeated failures during randomization and/or very high initial energies.
    If structural input is provided, this choice will be changed automatically to option 2 below with the exception that any missing parts (structural input is truncated) are reconstructed randomly with respect to internal degrees of freedom. Thus, there can be a difference between specifying option 2 explicitly in conjunction with structural input (missing parts are built in default conformation internally) vs. letting CAMPARI downgrade option 1 to option 2 (missing parts are internally random).
  3. Partial randomization of rigid-body coordinates is performed as described for option 1. If a structural input file is provided, CAMPARI will extract the internal arrangements of all molecules from file and only randomize rigid-body coordinates. This can be useful for generating random starting structures for studies of the assembly of a protein complex from rigid components. If no file is given, polymers will be built in their default configurations (mostly extended), which will rarely be useful.
  4. Partial randomization of internal arrangements of flexible molecules is performed. This uses the same degrees of freedom and protocols as option 1) above, and works only in conjunction with a structural input file. Here, the positions of the N-termini of all flexible polymers are taken as in the input file, but the chains are then randomized. This is an option useful only for highly specialized applications due to the poor relationship between the original rigid-body arrangement and the resultant one for polymers of appreciable length. If no structural input file is provided, this choice will be changed automatically to option 1 above. Note that this option destroys existing intermolecular crosslinks (those present in the input structure). It is also the only option that will change an existing intramolecular crosslink found in the input structure.
  5. Full randomization of any missing parts is performed. This uses the same degrees of freedom and protocols as option 1) above, and, like option 3), works only in conjunction with a structural input file. This mode will preserve all information in the structural input file and randomize any missing parts. If there are no missing parts, this option is the same as option 0). If there is no structural input file, this option is changed automatically to option 1) above.
Some more technical information is provided next. The assumed building direction (N→C) is an arbitrary choice, but can have an impact on what structures are available, in particular when using option 3 above. The issue may be severe in the presence of boundary potentials. In dense and large systems, the computational cost of randomization may become significant if the excluded volume bias is present. In all cases, performance of the randomization can be tuned by altering the settings for RANDOMATTS and RANDOMTHRESH.
It is a very important restriction that initial structure randomization does not observe user-level constraints. In order to have a degree of freedom, which is accessible to randomization, start out in a well-defined state, randomization of the corresponding class of degrees of freedom must be disabled entirely (in which case the initial state comes either from the CAMPARI default or - more likely - from structural input).
For systems containing intramolecular crosslinks, randomization of an individual chain follows a hierarchical procedure ensuring that all crosslinks can (theoretically) be satisfied. First, a randomization of the loop stretch occurs (with additional bias using half-harmonic restraints on every bracketing crosslink pair), and then the crosslink constraint is attempted to be satisfied by finding suitable values for a minimal set of degrees of freedom encompassing the entire linkage and as few main-chain dihedral angles as possible. This successive randomization of "free" degrees of freedom and subsequent closure of the crosslink are repeated until a solution is found. This solution does not observe excluded volume bias in the same way that unconstrained randomization does. The entire procedure is repeated for all crosslinks as long as they can be arranged such that they remain weakly coupled. Note that intramolecular randomization is necessary for systems containing crosslinks for the crosslink to be satisfied initially (i.e., if RANDOMIZE is 0 or 2, and if no structural input is provided, the default geometries are used irrespective of the presence of any crosslinks as requested in the sequence file). For partial structural input, crosslinks contained entirely in missing parts will be constructed randomly as described above. Crosslinks entirely represented in the input structure will be preserved unless the option chosen is 3. It is not possible to randomly generate an intramolecular crosslink between neighboring residues. Finally, crosslinks spanning both the PDB input and missing parts are always ignored (this is almost certainly undesirable).
Depending on whether the user provided a request for structural input (→ PDBFILE or FYCFILE), the setting for RANDOMIZE may be altered as described above. The importance of initial structure randomization lies in avoiding initial structure biases that may be difficult to detect (for example, when starting identical replicas of a simulation all from full extended polymers). Note that in replica exchange or MPI averaging runs, all replicas will start from different conditions unless RANDOMSEED is given explicitly by the user.

RANDOMATTS

If any type of initial structure randomization is requested, this keyword sets the general number of maximum attempts in randomizing the permissible degrees of freedom for a single residue or molecule. Large numbers (> 10000) may produce unacceptably slow performance when trying to randomize a long, complex polymer and/or a dense fluid. Large numbers can also be counterproductive in the presence of intramolecular constraints since they limit the search space.

RANDOMTHRESH

If any type of initial structure randomization is requested, this keyword sets the universal energy threshold to be applied with respect to energetic penalties for excluded volume, boundary potentials (rigid-body only), and intramolecular crosslinks bias terms. For every residue or molecule being processed, the sum of the (at most) three terms above for all pairwise terms containing the residue or molecule in question as described above (see information on option 1). All these terms are pure penalty terms and cannot yield negative energies. The strength of the half-harmonic crosslink restraints is also set as 0.1*RANDOMTHRESH (in kcal mol-1-2), whereas their maximum distance is a function of the spacing between the residue in question and the position of the crosslink constraint. Small values (less than 1000 kcal/mol) may be harmful in the presence of intramolecular constraints since they may overemphasize excluded volume bias.

FYCFILE

This near-obsolete keyword allows the user to provide an input file to re-build the coordinates of a single macromolecule based on a list of its "native" CAMPARI degrees of freedom. This serves to start a simulation from a non-random, file-encoded structure. CAMPARI provides an output file suitable for this task (→ TOROUT) and the format has to match exactly. Note that FYC.dat does not encode rigid-body coordinates which is the reason why only single molecule structural input is supported via this method. Only if this keyword is present will CAMPARI attempt to read any such torsional input. This functionality is overridden should the user provide a valid specification for PDBFILE as well. Note that this form of input is not supported for analysis runs (→ PDBANALYZE) and that the systems of course have to be identical.

PDBFILE

This keyword provides the (base)name and location of a structure or trajectory input file for reading in Cartesian coordinates from a file in pdb-convention. There are two possible interpretations:
  1. PDBANALYZE is false:
    In this case, PDBFILE operates analogously to FYCFILE in that it attempts to read an external file to construct an initial non-random conformation for the system. Depending on the setting for RANDOMIZE, only some of the information may be used. Naturally, the system (sequence) in the pdb-file has to be consistent with the choices made via SEQFILE. Note that parallel runs can use multiple input structures (→ PDB_MPIMANY). Depending on the choice for PDB_READMODE, the program then follows either of the following approaches:
    1. It extracts dihedral angles, pucker states, and rigid-body coordinates (i.e., the "native" CAMPARI degrees-of-freedom) using the coordinates of the appropriate heavy atoms in the file. This preserves the CAMPARI-default geometry (based on high-resolution crystallographic databases) for the remaining degrees of freedom (most bond lengths, angles, improper and some proper dihedral angles). This option is often unsuitable for reading in larger polymers due to mismatch propagation along the backbone. It is generally unavailable for simulations featuring residues not natively supported by CAMPARI.
    2. It reads all Cartesian coordinates from the file and reconstructs the internal coordinate values from the Cartesian ones, i.e., CAMPARI uses the implicitly pdb-encoded covalent geometry instead of its inherent default one for the constant elements of the Z-matrix. This option is recommended for systems containing longer polymers. Note that CAMPARI is able to correct for a fair amount of conflicting atom naming conventions in pdb-files (see PDB_R_CONV as well) but that it may sometimes be necessary to change atom names in a given trajectory file. Missing coordinates will generally lead to multiple warnings but may not stop a simulation from running regardless.
    Should the user have attempted to also supply an input file via FYCFILE, only PDBFILE will be relevant (i.e., FYCFILE is ignored). Randomization options may be downgraded as described.
  2. PDBANALYZE is true:
    Since the point here is to analyze a given trajectory (whatever structural data it may encode), and not to start a simulation from a suitable input file, the setting for PDB_READMODE is ignored if PDBANALYZE is true, and all structural input is processed by trying to extract all Cartesian coordinates (option 2 for PDB_READMODE). A pdb trajectory file has to fulfill the requirement that it conforms to the MODEL / ENDMDL syntax (the actual numbering is ignored, however). Naturally, for CAMPARI to run and terminate properly, the number of snapshots in the file has to equal or exceed the request for NRSTEPS. Alternatively, CAMPARI offers the option to read in individual pdb files (one snapshot each) that employ a systematic numbering scheme (plain numbers, or numbers with leading zeros). In this scenario, the first of such files should be provided; CAMPARI will then try to extract the numbering scheme and open NRSTEPS-1 consecutive snapshots. Note that in this mode the filename must not contain any additional numeric characters (i.e., foo_001.pdb is permissible while ala7_001.pdb is not). To choose between single-file and multiple-file formats, keyword PDB_FORMAT is used.
Note that it is not possible to start a simulation from a structure provided in a binary trajectory file format. In this case, however, CAMPARI can be used to extract a suitable pdb file from the trajectory with the help of keywords PDBANALYZE, XYZPDB, XYZOUT, XYZMODE, and - for example - DCDFILE.
Lastly it is important to mention that PDBFILE provides some functionality that is overlapping with that provided by PDB_TEMPLATE. Specifically, runs containing residues not natively supported by CAMPARI require the topology of those moieties to be inferred from file. If an analysis run operates on a single pdb file, a trajectory file in pdb format or a series of pdb files, or if a simulation run is supposed to start from a specific structure supplied via PDBFILE, then PDBFILE can (but need not) serve the function of topology inference as described for PDB_TEMPLATE.

XTCFILE

This is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (xtc format) to analyze. Like all other xtc-related options, this is only available if the code was in fact compiled and linked with XDR support (→ installation instructions). See PDB_TEMPLATE for instructions how to convert binary trajectory files with non-CAMPARI atom order. If the analysis run is parallel (→ REMC), an example is given elsewhere.

DCDFILE

Analogous to XTCFILE, this keyword is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (dcd format) to analyze. See PDB_TEMPLATE for instructions how to convert binary trajectory files with non-CAMPARI atom order.

NETCDFFILE

Analogous to XTCFILE, this keyword is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (NetCDF format) to analyze. Like all other NetCDF-related options, this is only available if the code was in fact compiled and linked with NetCDF-support (→ installation instructions). See PDB_TEMPLATE for instructions how to convert binary trajectory files with non-CAMPARI atom order.

FRAMESFILE

If PDBANALYZE is true, it is possible for CAMPARI to analyze a specific set of frames from the trajectory file (see PDB_FORMAT) rather than the entire trajectory. It is also possible to give every analyzed snapshot a sampling weight, and both functionalities are implemented by this keyword. Example applications are the extraction of structural clusters from a trajectory or the reweighting of biased simulations.
Most input trajectories currently need to be processed sequentially (this applies to xtc, dcd, and pdb trajectory files, i.e., PDB_FORMAT is 1, 3, or 4). For these, the list of requested frames is sorted first, and duplicates are removed. This means that any newly written trajectory files (→ XYZOUT) will have exactly the same order of snapshots as the input. Conversely, the snapshots encoded in individual pdb files and NetCDF trajectory files (PDB_FORMAT is 2 or 5) can be accessed in arbitrary order. For these two settings, the frames file is processed "as is" unless there are floating point weights per snapshot or unless this is a parallel trajectory analysis run. Frames files processed "as is" have the advantage that they can arbitrarily reorder and duplicate individual simulation snapshots, which is relevant, for example, in the construction of synthetic trajectories.
It is important to note that the settings for NRSTEPS and EQUIL and all related frequency settings for analysis routines (see corresponding section) lose their straightforward interpretations if not all snapshots in the original trajectory are processed exactly once and in sequence. For the case of a processed frames file (sorted and free of duplicates), the analysis frequencies will still refer to the original, full trajectory file. This means that CAMPARI will read all frames sequentially and increment step counters accordingly. However, all the frames that are not part of the list are simply skipped. This implies that it is possible for a selection of 20 frames from a larger trajectory to fail to produce any output for polymeric quantities if POLCALC is set to 10, 5, or even 2 (simply on account of chance). It will therefore generally be easier to set such frequency flags to 1 if processed frame lists are used (this is the only setting that guarantees that the number of analyzed snapshots will be exactly proportional to the size of the list). Conversely, for a frames file used "as is," the unused frames are never read and no step counters are incremented. This means that the effective step becomes the processing of the frames file itself. Returning to the above example, a selection of 20 (possibly duplicated) frames from a larger trajectory will in this case always produce output for polymeric quantities if POLCALC is set to any value of 20 or less.
As mentioned above, the frames file allows the user to alter the type of averaging that is normally assumed for CAMPARI analysis functions. By default, each data point (trajectory snapshot) contributes the same weight to computed averages or histograms (distribution functions). This implied that the input trajectory conforms (was sampled from) the distribution and ensemble of interest. If, however, the input trajectory does not correspond a well-defined ensemble (or to a different one), it is common and possible to apply snapshot-reweighting techniques based on analyses of system energies or coupled parameters using weighted histogram methods. The result is a set of weights for each snapshot, which allows simulation averages and distribution functions to conform to that target distribution and ensemble. As an example, one may combine all data from a replica-exchange run (that no longer conform to a canonical ensemble at a given temperature), use a technique such as T-WHAM to derive a set of snapshot weights for a target temperature that was not part of the replica-exchange set, and then use this input file containing the weights to compute proper simulation averages at the target temperature.
The input file for this functionality is very simple and explained elsewhere. There are three important points of caution. First, floating-point weights imply that floating-point precision errors may occur. The implied summation of weights of very different sizes may then become inaccurate. CAMPARI provides a warning if it expects such errors to be large (based purely on the weights themselves). Second, snapshot weights do not influence the values reported for instantaneous output such as POLYMER.dat or for analyses that do not imply averaging (such as structural clustering). Third, reweighting techniques have associated errors that are difficult to predict. Simultaneous assessment of statistical errors via block averaging or similar techniques is therefore strongly recommended.

PDB_FORMAT

This simple keyword lets the user select the file format for a trajectory analysis run:
  1. CAMPARI expects a single trajectory file in pdb-format using the MODEL /ENDMDL syntax to denote the individual snapshots.
  2. CAMPARI expects to find multiple pdb files with one snapshot each that are systematically numbered starting from the file provided via PDBFILE.
  3. CAMPARI expects to find a single trajectory in binary xtc-format (GROMACS style).
  4. CAMPARI expects to find a single trajectory in binary dcd-format (CHARMM/NAMD style).
  5. CAMPARI expects to find a single trajectory in binary NetCDF-format (AMBER style). (reference)
Note that .xtc, .nc, and .dcd trajectory files are not annotated and that the order of atoms between the file and CAMPARI's inner workings must be consistent. Since this is almost never true for binary trajectory files obtained with other software, CAMPARI offers the user to provide a pdb template which contains the order of atoms in the binary file in annotated form (see PDB_TEMPLATE).

PDB_READMODE

This integer is only used if a pdb-file is sought to provide the starting structure for an actual simulation (→ PDBFILE). In this scenario, two options are available:
  1. CAMPARI attempts to read in the Cartesian coordinates of heavy atoms from the pdb file, proceeds to extract the values for CAMPARI's "native" degrees of freedom (i.e., those corresponding to the unconstrained ones in Monte Carlo or torsional molecular dynamics runs → CARTINT), and lastly rebuilds the entire structure using the determined values as well as internal geometry parameters for the constrained internal degrees of freedom (extracted from high resolution crystallographic databases). This hybrid approach will often lead to a propagation of error along the backbone of longer polymers and is therefore unsuitable for reading larger proteins or particularly for macromolecular complexes.
  2. CAMPARI attempts to read in the Cartesian coordinates of all atoms from the pdb file and uses those explicitly (i.e., it implicitly adopts the encoded geometry even for degrees of freedom that are normally constrained within CAMPARI). This will produce warnings if very unusual bond lengths or angles are encountered (see PDB_TOLERANCE_A and PDB_TOLERANCE_B) which are most often indicative of missing atoms in the pdb-file (in particular termini and hydrogens). Some of these problems will be dealt with automatically, but it is highly recommended to check the file {basename}_START.pdb and to make sure that no drastic deviations occur. If deviations of due to CAMPARI rebuilding atoms along the backbone do occur, it is recommended to increase the thresholds for PDB_TOLERANCE_B and PDB_TOLERANCE_A.
In either case, it is extremely important that the first three atoms (in CAMPARI's assumed atom order) are given explicitly in the input file since CAMPARI does not possess the capability to extract rigid-body coordinates otherwise and will terminate prematurely.

PDB_INPUTSTRING

This is an experimental keyword to use at your own risk. It allows changing the assumed PDB formatting string (Fortran) for PDB files. This is required to make CAMPARI be able to read in altered PDB files produced by the analogous keyword PDB_OUTPUTSTRING. The default is "a6,a5,1x,a4,a1,a3,1x,a1,i4,4x,3f8.3" (with the quotes). The letters (a, i, f) give the type of variable, which must not change. The numbers give the fields lengths, and these can be customized for variables of type integer ("i") or real ("f"). It is also possible to modify the field widths of string variables ("a") but is not possible for extra content to be read, i.e., the resultant behavior is undefined. The only exception to this is the second variable (atom number), which is of the "wrong" type here because these values are ignored on input. This particular field width can be increased without harm. It is of course intended and required that the corresponding output string format uses an integer field here, by default "i5" instead of "a5".

PDB_HMODE

If structural input from a pdb file is requested in mode 2 (see PDB_READMODE and PDBFILE) or if a trajectory analysis run) is being performed, this keyword offers two choices for dealing with hydrogen atoms (which may be missing from the input file and/or may be ill-defined):
  1. CAMPARI will attempt to read in the Cartesian coordinates of all hydrogen atoms directly and only rebuild those hydrogen (and other) atoms which cause a geometry violation defined by keywords PDB_TOLERANCE_B and PDB_TOLERANCE_A.
  2. CAMPARI will rebuild all hydrogen atoms according to its underlying default models for local geometry in chemical building blocks. This is most useful if hydrogen atoms are missing entirely from the input file.
Note that the second option is not available for analysis runs operating on binary trajectory files in which the atom number match has to exactly match sequence input anyway, and which are typically only generated by computer simulation themselves.

PDB_NUCMODE

For processing structural input, keyword PDB_NUCMODE explained below is ignored. It is listed here nonetheless to explain what CAMPARI actually does when reading in a pdb file supplied via PDBFILE or via PDB_TEMPLATE:
If the input file is in CAMPARI convention, i.e., the O3* oxygen atom is part of the same residue as the phosphate it belongs to, read-in is consistent with internal convention. If, however, the input file is in pdb convention (also used by almost all other simulation software), i.e., the O3* oxygen atom is always part of the same residue as the sugar it belongs to, a heuristic is used to avoid an incorrect assignment. This heuristic relies on the geometry of the input structure being sane as it checks the bond distance to the appropriate phosphorous atom.
As long as atom names can be parsed (see below), the user should therefore not have to worry about the convention used in pdb input files. This implies that it is possible to supply a binary trajectory file (for example via DCDFILE) written in the non-CAMPARI convention of assigning the O3*-atom to the residue carrying the sugar it is attached to by the use of an appropriate template.

PDB_R_CONV

CAMPARI can in general process different conventions for the formatting of PDB files. A large fraction of simple atom naming convention multiplicities is handled automatically without the use of any keywords. PDB_R_CONV allows the user to select the format a read-in pdb-file is assumed to be in to be able to deal with more severe discrepancies. Possible choices currently consist of:
  1. CAMPARI format (of course suitable for reading back in any CAMPARI-generated output even if PDB_NUCMODE was used → see above).
  2. GROMOS format (nucleotide naming). This option offers very little unique functionality since most of the supported conversions are handled automatically regardless of the setting for this keyword. It is primarily used to handle the GROMOS residue names for nucleotides (ADE, DADE, and so on).
  3. CHARMM format (in particular atom naming, cap and nucleotide residue names and numbering (patches), ...). Note that there are two exceptions pertaining to C-terminal cap residues (NME and NH2) which arise due to non-unique naming in CHARMM: 1), NH2 atoms need to be called NT2 (instead of NT) and HT21, HT22 (instead of HT1, HT2), and 2), NME methyl hydrogens need to be called HAT1, HAT2, HAT3 (instead of HT1, HT2, HT3). For nucleotides, there is an additional exception to 5'-residues carrying a 5'-terminal phosphate (the hydrogen in the terminal POH unit needs to be called "5HO*" instead of " H5T"). This is again due to nonunique naming conventions within CHARMM.
  4. AMBER format (atom and residue naming in particular for nucleotides). Note that this option is the least tested one. Please let the developers know of any additional issues you may encounter.
As mentioned above, this setting is relevant for files requiring more advanced reinterpretation (such as the patch conventions in CHARMM / NAMD). It should allow the user to keep the manual processing of pdb files to a minimum. However, support is generally limited to standard biomacromolecules supported within CAMPARI, and in most cases does not extend to small molecules or unusual polymer residues. This is the twin keyword to PDB_W_CONV below.

PDB_TOLERANCE_A

This setting allows the user to override CAMPARI's built-in defaults for the tolerances it applies on a read-in structure (usually xyz from pdb). Since it is not always easy to distinguish distorted structures from missed input, the code applies a tolerance when comparing read-in bond angles to the internal reference value (which is derived from crystallographic databases). The default is an interval to either side of 20.0° and this setting can be expanded or contracted using this keyword. If a violation is found, the code usually overrides the faulty value with the default since it assumes that atomic positions were missing. This can sometimes lead to unwanted effects which can be avoided by setting this to a large number.

PDB_TOLERANCE_B

This is analogous to PDB_TOLERANCE_A, but allows the user to change the interval for considering bond length exceptions. The difference here is that two numbers are required: a lower fractional (down to 0.0) and an upper fractional number (preferably larger than 1.0 of course). This is because bond lengths ranges are inherently not normalized and in addition nonlinear (exceptions with too long bond lengths are much more frequent). The default is an interval between 80% and 125% of the crystallographic reference value (settings 0.8 and 1.25).

PDB_TEMPLATE

This keyword allows the user to provide name and location of a pdb file that serves in possibly several auxiliary functions. A template pdb file is relevant in the following circumstances:
  • In a trajectory analysis run, it can serve as a map to correct a mismatch in atom ordering between a binary trajectory file (dcd, xtc, NetCDF) and CAMPARI's intrinsic convention. Typically, a pdb file provided by the program having generated the binary file will serve this purpose. In order for the map to work, it is crucial to ensure that every single atom to be read-in has a proper match (by atom name) in the pdb file, i.e., it is not tolerable to provide a pdb template with missing atoms or with atom names that CAMPARI cannot parse. In general, CAMPARI's pdb parser is relatively flexible and allows additional control via PDB_R_CONV. It is typically not possible, however, to correct mismatches in the grouping of atoms into residues.
  • The template pdb file can simultaneously serve as a reference structure if alignment is requested in trajectory analysis runs (→ ALIGNCALC).
  • In all types of runs, the template pdb file can be used to infer the topology of residues not natively supported by CAMPARI. This is crucial for handling these systems. Importantly, using the template for this purpose, decouples the topology determination from structural input for simulation runs, which allows initial randomization of systems containing such unsupported residues. The content of the template must match the sequence file, and there are some precise requirements for both input files. They are listed in the corresponding documentation for both the sequence file and the pdb file). Assuming both files to be properly formatted, CAMPARI then does the following:
    1. From the sequence file, the number of unknown residues and their linkages are extracted.
    2. The template is read and the atomic indices delimiting all unknown residues are extracted. Basic parameters such as the effective residue radius and the reference atom are inferred. It is therefore important that the conformation of the residue in the pdb file is somewhat representative.
    3. The remainder of the system topology is constructed. All atomic positions are set to the corresponding values in the template. Internal order of atoms for unsupported residues always reflects the order in the input pdb file exactly.
    4. From the PDB atom names, the chemical element is guessed (C, O, N, H, P, S, halogens, various metals and metalloids) and the mass is set to that of an appropriate atom type in the parameter file (identification by attempts to match mass and valence). The assignment will be poor if the parameter file does not support the chemical element in question. Further details are found elsewhere. This can later be overridden by a biotype patch and/or a combination of other patches.
    5. A new biotype is created for every new atom type encountered. This biotype is initialized to be empty with the exception of keeping the atom name and the (already) assigned atom type. The numbering of these new biotypes continues from what the highest number in the parameter file is. It is therefore not possible to use the parameter file for these assigned biotypes directly. Instead, it is recommended to use a biotype patch or specialized patches. The assignment of an atom type is sufficient to provide basic support, so for certain applications no patches may be required.
    6. The covalent bond information is used to infer the molecular topology (including a detection of rings). This defines the Z-matrix entries (internal coordinate representation) for unsupported residues. Similarly, the linkage to covalently bound residues that are either supported or also unsupported is inferred. In the process, rotatable dihedral angles are detected automatically. This procedure, which explicitly tests for bond angle or length variations upon rotation, is critical to most subsequent assignments.
    7. Given a set of PDB names, atom types, valences, and a topology, CAMPARI attempts to conclude by analogy whether the residue conforms to the backbone of one of the supported polymer types (currently, polypeptides and polynucleotides). If it does, as many internal pointers as possible are set to identify the residue accordingly (this does not work for single-residue molecules).
    8. If a residue is recognized as being part of a supported polymer type, the topology itself is corrected (the goal is that it should make no difference to CAMPARI whether a residue is supported or whether it is masked (by changing the name) as an unsupported one and all the information has to be inferred from the input structure). Further corrections pertain to the setup of interactions, etc. Note that the match cannot always be perfect, e.g., fudge factors that are not zero or unity in conjunction with MODE_14 being 2 and INTERMODEL being 1 may lead to energetic inconsistencies. The interaction setup relies on determining local rigidity via its knowledge of which dihedral angles are rotatable. Due to code-specific reasons (scanning for short-range exceptions, exclusions, etc), it is highly recommended to parse the chain into residues such that any pair of atoms in residues i and i+2 is separated by a least four rotatable bonds.
    9. All flexible dihedral angles may be made part of basic sampling routines if the simulation is in internal coordinate space. These are the torsional dynamics sampler (→ TMD_UNKMODE for details) and the basic Monte Carlo moves for degrees of freedom of this type (→ OTHERUNKFREQ). Furthermore, access will be granted to the specialized samplers if the residue is detected as eligible. This, however, may sometimes lead to an altered interpretation of the absolute values of certain dihedral angles or even alter details of the sampler slightly, e.g., the pucker sampling of proline-like, unsupported residues may end up perturbing different sets of auxiliary bond angles.
    10. If analyses are requested, these routines will respond to the unsupported residue according to the values set in the previous steps. Basically, the better the match to natively supported entities is, the more analysis functionalities will be available. Straightforward cases depend only on Cartesian coordinates (e.g., RHCALC or CONTACTCALC), whereas polymer type-specific analyses (e.g., DSSPCALC) require an unsupported residue to be recognized as the corresponding polymer type. Care must be taken in mixed polymers or other exotic cases, and it may occasionally be necessary to disable certain analysis routines.
    The success of the inference depends very strongly on the input pdb file. Atom names and in particular atom order should be - if possible - made as compliant with CAMPARI's internal convention as possible. The code will report if analogy-based settings have been applied. In general, inference regarding polymer type is restricted to the backbone level with the small exception of detecting residues eligible for pucker sampling (see PKRFREQ and SUGARFREQ). The inferred topology is available from the starting Z-matrix coordinate file. Except in output pdb files, unsupported residues will be referenced with the three-letter code "UNK" (for example in the header to the contact map output file). Note that simulation runs featuring unsupported residues will generally require the application of patches to the energy function (see BIOTYPEPATCHFILE, MPATCHFILE, LJPATCHFILE, BPATCHFILE, NCPATCHFILE, CPATCHFILE, RPATCHFILE, ASRPATCHFILE, SAVPATCHFILE, and FOSPATCHFILE).

PDB_MPIMANY

For certain types of parallel runs, this logical keyword (1 means "on") allows the user to provide different starting structures via pdb-files for different replicas. The keyword is irrelevant in trajectory analysis mode. The four eligible classes of calculations are as follows:
The reason for not always allowing this option for replica exchange runs in internal coordinate space is the limited accuracy of pdb files. The lack of accuracy leads to distortions of covalent geometries that can result in changes to short-range energy terms, in particular bonded ones (such as bond angles potentials). These energetic offsets should be absent if the covalent geometries are not actually read from file (second class above). Caution must be applied when simulating systems with flexible rings. If present, these energetic offsets have the potential to severely bias the results of a replica exchange calculation unless the degrees of freedom are relaxed (third class above) or unless no exchanges are performed (fourth class above).
If this option is active, CAMPARI expects to find systematically named pdb files with the base name given via keyword PDBFILE. The naming is analogous to the convention CAMPARI uses for outputs of parallel runs and also identical to what parallel trajectory analysis runs require. It is explained elsewhere. A list of keywords specific to running CAMPARI in parallel is found found below.



Energy Terms:


(back to top)

HSSCALE

This keyword controls a generic scaling factor for size parameters (Lennard Jones σii and σij) that were read in from the parameter file. This fundamentally alters the excluded volume properties of the system. Motivation for using this keyword (which naturally defaults to 1.0) may arise during parameter development or in specialized calculations.

SC_IPP

This keyword allows the user to specify the linear scaling factor controlling the strength of the inverse power potential (IPP) defined as:

EIPP = cIPP·4.0ΣΣi,jεijf1-4,ij·(σij/rij)t

Here, the σij and εij are the size and interaction parameters for atom pair i,j, f1-4,ij are potential 1-4 fudge factors (see FUDGE_ST_14) that generally will be unity, rij is the interatomic distance, t is the exponent, and the (double) sum runs over all interacting pairs of atoms. Lastly, cIPP is the linear scaling factor controlled by this keyword which - unlike most other scaling factors for energy terms - defaults to 1.0. In most applications, the inverse power potential will be the repulsive arm of the Lennard-Jones potential (t = 12 → 12th power, see IPPEXP). The interpretation and application of the provided parameters (see documentation and keyword PARAMETERS) can be controlled through keywords SIGRULE, EPSRULE, INTERMODEL, FUDGE_ST_14, and MODE_14. Note that the use of the Weeks-Chandler-Andersen (WCA) potential (→ SC_WCA) is mutually exclusive with inverse power potentials.

IPPEXP

This keyword allows the user to adjust the exponent (an even integer that defaults to 12) for the inverse power potential. An important restriction is that many of the optimized loops in dynamics calculations do not support any other choice except 12. Note that very large numbers will of course - possibly in compiler-dependent fashion - slow down code execution due to the increasing complexity of expensive operations in innermost loops. By (formally) setting this to a value greater than 100, CAMPARI is instructed to replace the IPP potential with a hard-sphere (HS) potential, which is only available in pure Monte Carlo runs. In this case the scaling factor is ignored, the "infinity"-value (penalty for nuclear fusion) is determined by the setting for BARRIER, and the use of a size reduction factor (HSSCALE) is strongly recommended. In hard-sphere potentials, any energy readout for the IPP term should now be in multiples of BARRIER, and all persisting non-zero values would indicate a frustrated (non-relaxable) system. The actual value specified for IPPEXP is then irrelevant.

SIGRULE

This keyword defines the combination rule for combining the size parameters of Lennard-Jones (and WCA) potentials, i.e., how to construct σij from σii and σjj if σij is not provided as a specific override in the parameter file (for details see PARAMETERS).
The choices are:
 1) σij = 0.5·(σii + σjj) (arithmetic mean)
 2) σij = (σii · σjj)0.5 (geometric mean)
 3) σij = 2.0·(σii-1 + σjj-1)-1 (harmonic mean)

EPSRULE

Analogous to SIGRULE, this keyword defines the combination rule for interaction parameters of Lennard-Jones potentials. The same options are available and the same caveats apply with respect to overrides in the parameter file.

SC_ATTLJ

This keyword allows the user to specify the linear scaling factor controlling the strength of the dispersive (van der Waals) interactions defined as:

EATTLJ = -cATTLJ·4.0ΣΣi,jεijf1-4,ij·(σij/rij)6

Here, the σij and εij are the size and interaction parameters for atom pair i,j, f1-4,ij are potential 1-4 fudge factors (see FUDGE_ST_14) that generally will be unity, rij is the interatomic distance, and the (double) sum runs over all interacting pairs of atoms. Together with an inverse power potential with scaling factor 1.0 and exponent 12 (see SC_IPP), the canonical Lennard-Jones potential is constructed if the scaling factor, cATTLJ, is set to unity.

INTERMODEL

This very important keyword controls the exclusion rules for short-range interactions of the excluded volume and dispersion types (see SC_IPP, SC_ATTLJ, and SC_WCA). For Monte Carlo or torsional dynamics simulations assuming rigid geometries, the computation of spurious (constant) LJ interactions is inefficient. Conversely, in Cartesian sampling, bonded interactions are almost always parametrized with all 1-4, and certainly with all 1-5-interactions in place. The latter refer to intramolecular atom pairs separate by either exactly three (1-4) or four (1-5) bonds. The ABSINTH implicit solvation model, which is one of the core features of CAMPARI, was parametrized with a reduced interaction model. Hence, this keyword allows two choices:
  1. Consider only interactions which are not rigorously or effectively frozen when using internal coordinate space sampling. This setting for example excludes all interactions within aromatic rings. As for determining 1-4-interactions, the rules outlined under MODE_14 apply.
  2. Consider all interactions separated by at least three bonds to be valid. This is the default setting for molecular mechanics force fields. Note, however, that many of these interactions are quasi-rigid and that their computation is somewhat nonsensical even in a full Cartesian description. Also note that due to the inherent assumption that every bond is rotatable the setting for MODE_14 does not matter if INTERMODEL is set to 2. All atoms separated by exactly three bonds will be considered 1-4. It is important to point out that the setting chosen for INTERMODEL affects the setting for ELECMODEL as well (see ELECMODEL).
  3. The GROMOS force field uses a very specific set of non-bonded exclusions which is supported by choosing this option for INTERMODEL. It is essentially a weakened version of the first (sane) option. Note that to reproduce the GROMOS force field exactly, ELECMODEL (which remains an independent setting) has to be set to 2 and INTERMODEL to 3.

LJPATCHFILE

This keyword can be used to provide the location and name of an input file that allows reassigning the size exclusion and dispersion parameters used in describing generic short-range potentials of the Lennard-Jones (see SC_ATTLJ and SC_IPP) or WCA types. The parameter file that CAMPARI parses will contain atom entries that specify general atom types. These types have associated with them entries of the contact and epsilon types specifying the Lennard-Jones σij and εij parameters (see equations provided with scale factor keywords). Within the list of biotypes, each biotype is assigned an atom type, and the patch functionality described here allows the user to change this to a different atom type for a specific instance of a biotype. Note that the reassignment is restricted to the Lennard-Jones parameters, but excludes other atomic parameters specified by atom types such as mass, proton number, description, or valence. Conversely, parameters derived from Lennard-Jones parameters are altered. This is particularly important for the derived atomic radii and volumes used in the continuum solvation model and analysis. If those parameters are meant to be left unchanged or set to yet another set of values, either the radius facility of the parameter file must be employed (if it is not already in use for the original atom type in question), or a patch of atomic radii must be applied in addition. Because size exclusion and dispersion parameters rely on combination rules and/or many overrides for special cases, it can be tedious to patch them. This is because a patch will often require the user to define a new atom type, which, for example, for the GROMOS force fields can be a lot of work. Some more details are given elsewhere.

SC_EXTRA

This (somewhat obsolete) keyword specifies a linear scaling factor for certain structural correction potentials. Assuming the typical set of torsional space constraints (see CARTINT), these are applied to rotatable bonds with electronic effects which cannot be captured by atomic pairwise contributions. These consist of:
  1. Secondary amides: The rotation around the C-N bond is hindered due to the partial double-bond character present in amides. Corrections are therefore applied to residues which have an ω-angle (all non-N-terminal peptide residues excluding NH2 as well as the secondary amides NMF and NMA → sequence input). These keep the peptide bond roughly planar while allowing for cis/trans-isomerization and increased overall flexibility. The potentials are directly ported from OPLS-AA.
  2. Phenolic polar hydrogens: The rotation around the C-O bond in phenols is hindered due to its partial double bond character and in-plane arrangements of the attached hydrogen are favored. Corrections are applied to tyrosine (TYR) and p-cresol (PCR). These keep the polar hydrogen in their favored position. The potential is not overly stiff so that out-of-plane arrangements will be populated as well. The parameters are again ported directly from OPLS-AA.
Like all other linear energy scaling factors with the exception of SC_IPP, the parameter controlled by this keyword defaults to zero. Its use is recommended for rigorous backward compatibility in certain cases only. We recommend employing the torsional potentials specified via the parameter file itself (see PARAMETERS) instead. In that case, concurrent use of canonical torsional potentials (SC_BONDED_T) and this term will produce a warning since this is nonsensical. For all cases, the use of the correct OPLS-based hybrid parameter files with (SC_BONDED_T) should provide an exact match to the potentials available by this keyword.

SC_BONDED_B

This keyword gives the linear scaling factor for all bond length potentials. Their usage is permissible in all simulations but not meaningful unless bond lengths are actually allowed to vary, i.e., typically unless sampling happens fully in Cartesian degrees of freedom (see CARTINT). It is important to remember, however, that even in rigid-body / torsional space simulations, specific move types and systems will require setting this to unity (so we recommend it throughout). For bond length potentials, the only such exceptions are crosslinked molecules (see CRLK_MODE). Note that the parameter file has to provide support to be able to use this energy term (see PARAMETERS for details), and that simulations relying on those terms will otherwise fail, crash, or produce nonsensical results. Use GUESS_BONDED to circumvent those issues for incomplete parameter files.

SC_BONDED_A

Similar to SC_BONDED_B for all bond angle potentials. Bond angle potentials (see PARAMETERS for details) matter for sampling in Cartesian space (see CARTINT), for crosslinked molecules (see CRLK_MODE), and for the sampling of five-membered, flexible rings (see PKRFREQ and SUGARFREQ). The coordinate derivatives for bond angles diverge at the extreme values of both 0° and 180°. This means that care must be taken in setting up the Z-matrix such that no terms are included, which would explicitly demand these values. In other software, this is sometimes overcome by the use of dummy atoms. In CAMPARI, this is unlikely to be problematic in Monte Carlo simulations. In dynamics, forces are buffered to avoid program crashes due to floating point errors, but the actual values are no longer meaningful. This issue is primarily relevant when modifying the code or when simulating unsupported residues, for which the Z-matrix is inferred from input (see elsewhere for details).

SC_BONDED_I

Similar to SC_BONDED_A for all improper dihedral angle potentials.

SC_BONDED_T

Similar to SC_BONDED_B for all torsional potentials. Note that these do in fact encompass degrees of freedom sampled in all types of simulations supported within CAMPARI and hence are always relevant. As alluded to above, torsional potentials can be easily set up to cover the same correction terms as the ones applied within SC_EXTRA. If that is the case, we therefore recommend not using SC_EXTRA (otherwise energy terms will in fact be applied twice, which is effectively scaling up those torsions; in such a case, CAMPARI produces an appropriate warning as well).

SC_BONDED_M

Similar to SC_BONDED_B for all CMAP potentials. These grid-based correction potentials are part of the CHARMM force field and explained in PARAMETERS. This keyword specifies the "outside" scaling factor. Note that CMAP corrections can theoretically be relevant for all possible simulations of biopolymers within CAMPARI since they act on consecutive dihedral angles. The default CMAP corrections from CHARMM only apply to polypeptides, however, and are only contained within the reference CHARMM parameter file.

IMPROPER_CONV

If improper dihedral potentials are in use (→ SC_BONDED_I), this very specialized keyword can be used to force a reinterpretation of the input sequence for the assignment of improper dihedral angle potentials to bonded types (see elsewhere). When set to 2, this keyword forces CAMPARI to switch the meaning of the first and third specified bonded type when it comes to energy and force evaluations. This allows a more or less exact match to the convention used in the AMBER set of force fields (and by extension: in OPLS-AA). For any other value specified, CAMPARI will use the CAMPARI-native convention (that is the same as in the CHARMM and GROMOS force fields).

CMAPORDER

If CMAP corrections are used (→ SC_BONDED_M), this keyword sets the interpolation order for cardinal splines (assuming those are chosen through parameter input → PARAMETERS). A higher spline order will yield a smoother surface. Since the splines are non-interpolating, however, rapidly varying or coarsely tabulated functions may not be well approximated in such cases. The only interpolating cardinal B spline is the linear one which requires a choice of 2 for this keyword. This keyword is irrelevant should bicubic splines be chosen.

CMAPDIR

If CMAP corrections are used (→ SC_BONDED_M), this keyword lets the user specify the absolute path of the directory in which the CMAP files are to be found (by default they are in the data/-subdirectory of the main distribution tree).

BPATCHFILE

This keyword can be used to provide the location and name of an input file that allows reassigning or adding bonded potential terms (see bond length potentials, bond angle potentials, improper dihedral angle potentials, torsional potentials, and CMAP potentials). At the level of the parameter file that CAMPARI parses to generate default assignments based on biotypes (see elsewhere), there are limitations to how finely the system can be parsed. For instance, it is technically not possible to have different bond length potentials acting on the N→Cα bonds of two non-terminal glycine residues (because biotypes are identical). Of course, even providing bonded parameter assignments exactly at biotype resolution would generally be inordinately complicated, which is the reason for grouping biotypes into so-called bonded types in the parameter file. In cases where specific alterations to a given a system are desired, the patch functionality provided by this input file will generally be the most convenient (and often the only) route to take. For bond length and angle terms, CAMPARI can also guess values based on initial geometries. Applied patches to bonded interactions are always printed to log-output In order to diagnose their correctness more easily, it is recommended to use the report functionality for bonded potential terms. Note that the most critical limitation is that extra or alternative bonded potentials can only be applied to such internal coordinates that are eligible for default assignments themselves, e.g., it is not possible to apply a bond angle potential to atoms a-b-c if a is not covalently bound to b or if b is not covalently bound to c.

GUESS_BONDED

This keyword is a simple logical whether to construct a set of bonded parameters from the default molecular geometries (high-resolution crystal structures). Force constants are flat and made-up and the potentials are always harmonic. This is useful if a parameter file is meant to work with Cartesian dynamics but lacks the necessary support to carry out initial tests.

BONDREPORT

This report flag allows the user to request a summary of the bonded potentials found and not found during processing of the parameter file. This is primarily useful as a sanity and debugging tool for creating parameter files. Note that missing but necessary parameters (necessary ones are all bond length and angle potentials if and only if CARTINT is 2) as well as guessed parameters (see GUESS_BONDED) are always reported upon.

SC_WCA

Mutually exclusive to the use of the Lennard-Jones potential, CAMPARI allows using the extended Weeks-Chandler-Andersen (WCA) potential which is defined as :

EWCA = 4.0·cWCAΣΣi,jεijf1-4,ij·[(σij/rij)12 - (σij/rij)6 + 0.25·(1.0 - c2)]    if rij < σij·21/6
EWCA = c2·cWCAΣΣi,jεijf1-4,ij·[0.5·cos(c3·(rijij)2 + c4) - 0.5]    if rij < σij·c1
EWCA = 0.0    else

with:

c3 = π·(c12 - 21/3)-1
c4 = π - c3·21/3

(reference)

Here, the size, interaction, and fudge parameters are used as defined before. c1 is the interaction cutoff (in units of σij) while c2 is the depth of the attractive well to be spliced in (in units of εij). c1 and c2 can be set by keywords ATT_WCA and CUT_WCA, respectively. The potential provides a continuous function mimicking a LJ potential in which the dispersive term can be spliced in without shifting the position of the minimum. cWCA denotes the linear scaling factor specified by this keyword.

ATT_WCA

This allows the user to specify the well depth (positive number) for the attractive part of the WCA potential in units of εij (parameter c2 under SC_WCA).

CUT_WCA

This allows the user to specify the cutoff value for the extended WCA potential in units of σij (parameter c1 under SC_WCA). Note that the minimum allowed choice here is 1.5.

VDWREPORT

This keyword is a simple logical deciding whether or not to print out a summary of the Lennard-Jones (size exclusion and dispersion) parameters, i.e., to report the base values (σii and εii), the combination rules, and in particular all "special" values which overwrite the default combination rule-derived result.

INTERREPORT

Mostly for debugging purposes, this simple logical allows the user to demand a summary of short-range interactions. Naturally, this output can be very large and the keyword should only be used when absolutely needed, for example to understand the settings for MODE_14 and FUDGE_ST_14.

SC_POLAR

CAMPARI only supports fixed-charge atom-based electrostatic interactions which work by defining partial charges for each atom and then writing the potential as:

EPOLAR = cPOLAR·ΣΣi,j [ (f1-4,C,ij·qiqj) / (4.0πε0·rij)]

Here, the qi are the atomic partial charges, f1-4,C,ij are potential 1-4 fudge factors (see FUDGE_EL_14) that generally will be unity, ε0 is the vacuum permittivity, rij is the interatomic distance, and the (double) sum runs over all eligible atom pairs (see ELECMODEL). cPOLAR is the linear scaling factor for all polar interactions specified by this keyword. Since electrostatic interactions are characterized by the potential to yield long-range effects (distance scaling ranges from r-1 for monopole-monopole terms to r-6 for dipole-dipole terms between molecules tumbling freely), the Coulombic term can employ a different cutoff in MC calculations (see below) than short-range potential. The correct long-range treatment of electrostatic interactions is one of the most investigated areas in molecular simulations and the user is referred to current literature and keywords LREL_MC and LREL_MD for details. All required partial charges are read either through the parameter file or can be set by a dedicated patch.
Note that the functional form given above is only correct if no implicit solvation model is in use. In such a scenario, Coulomb interactions are usually modified by an extra term sij which can be complicated function of interatomic distance and/or the positions of all nearby atoms. The reader is referred to Vitalis and Pappu for the exact functional forms used in the ABSINTH implicit solvation model.

ELECMODEL

This important keyword is somewhat analogous to INTERMODEL and allows the user to set the interaction model for electrostatic interactions:
  1. Depending on the setting for INTERMODEL, interactions are either screened for connectivity and frozen interactions are excluded (INTERMODEL is 1), or are purely considered based on the number of bonds separating two atoms (INTERMODEL is 2). In any case, partial charges interact without considerations of net neutrality (see below), which is problematic for short-range interactions. Consider for example the ω-bond in polypeptides and assume that CO and NH both form neutral groups supposed to indicate dipole moments. If INTERMODEL is 2 and ELECMODEL is 1, the interaction between O and H is considered (1-4) but none of the others as they are topologically too close. This leads to spurious (and very strong) Coulomb interactions between what essentially are fractional, net charges. This is an inherent weakness of the point charge model which is typically addressed by extensive co-parameterization of bonded parameters, 1-4-fudge factors, etc. (see FUDGE_EL_14).
  2. The partial charge set in the parameter file is read and the assigned charges are screened for (generally) net neutral charge groups. These charge groups are determined largely automatically and are currently not patchable per se. The automatic charge group generation operates by trying to group partial charges into groups of minimum size and spanning a minimum number of covalent bonds satisfying a target net charge. The default target net charges are derived from knowledge of every CAMPARI-supported residue and assumptions about their titration states (if any). This means, for example, that a nonterminal lysine residue will be processed by first looking for a charge group with a net charge of +1 before trying to identify as many net neutral groups as possible. While CAMPARI does not allow grouping charges arbitrarily, there is a dedicated patch which allows defining a series of (arbitrary) target values for the net charges of charge groups in a given residue. This is required to deal with charge sets that do not group at all, or to deal with residues that contain multiple ionic moieties. For example, depending on the charge set in use, one may want to partition free, zwitterionic alanine either as multiple groups with +1, -1, and 0 charges, or simply as one or more net neutral groups. For multiple targets, failure of the grouping algorithm at a given stage will lead to this stage being skipped. Conversely, failure at the last stage will result in all remaining atoms in the residue being members of a single group. Groups that are not well-defined charge groups according to CAMPARI's standards will be reported on in log output. With the groups in place, only interactions between those groups, for which all possible atom-atom pairs are separated by at least one significant degree of freedom, are computed. Interactions within a group are always excluded. What constitutes a significant degree of freedom is predetermined by the choice for INTERMODEL, and the reader is encouraged to read up on this if necessary. Essentially, INTERMODEL will define the maximum set of short-range interaction pairs that can also be considered for polar interactions. As an example, for the 6 net neutral CH units in benzene, if INTERMODEL is 1, no intramolecular polar interactions can be considered (the maximum set is empty). Conversely, if INTERMODEL is 2, several group-group interactions are now permissible (C1H-C4H, C2H-C5H, C3H- C6H). Depending on the charge set and on the choice for INTERMODEL, setting ELECMODEL to 2 can lead to a massive depletion of short-range electrostatics. This paradigm clashes heavily with traditional force field development, but - in the authors' opinion - is the only sane treatment of dipole-dipole interactions if the latter are represented by point charges.
The partitioning into charge groups for option 2 is currently relevant for 2 additional areas (independent of the setting for ELECMODEL):
  • The charge groups are important for deciding how long-range electrostatic interactions between ionic groups are computed exactly (see options 1, 2, and 3 for LREL_MC and options 4 and 5 for LREL_MD).
  • The charge groups are used as the basis for computing group-averaged screening factors for certain screening models in the ABSINTH framework (see options 1, 3, 5, and 7 for SCRMODEL).
It should be noted that the typical combinations here will be setting ELECMODEL to 1 and INTERMODEL to 2(3) (the standard molecular mechanics force field paradigm) versus setting ELECMODEL to 2 and INTERMODEL to 1 (the ABSINTH paradigm). In particular, the combination of both being set to 1 will not be particularly meaningful. This is because the interactions isolated via INTERMODEL must always remain a superset of those isolated here.

AMIDEPOL

One "flaw" in the biotype setup in CAMPARI (see PARAMETERS) is the fact that the two polar hydrogens on primary amides are treated as chemically equivalent which - on a typical simulation timescale - they are not. Instead of creating yet more biotypes, this keyword simply allows to add a small polarization term for partial charges on those hydrogens. The value specified will be added to the hydrogen cis to the oxygen (the electronegative atom nearby increase the partial positive charge) and subtracted from the trans-H to keep them both at the same total charge. For example, if both hydrogens have a charge of +0.36, a specification of 0.05 here will yield charges of 0.41 (cis-O) and 0.31 (trans-O). It will be useful to track these changes using ELECREPORT. It is very important to note, however, that fundamentally a sampling algorithm may isomerize the amide bond and hence render the correction incorrect and - moreover - that reading in a structure may flip the two hydrogens to start out with (because of inconsistent numbering between two software packages). Hence, this keyword should be used only when absolutely necessary (and its sign may have to be flipped to achieve the desired effect).
This correction to primary amides is a specific example for the occasional need to overwrite partial charge parameters for atoms due to "biotype splitting". The more general approach provided CAMPARI for this explicit purpose is to "patch" the partial charge set by a dedicated input file.

CPATCHFILE

If the polar potential is in use, this keyword can be used to provide the location and name of an input file that allows overriding some or all of the partial charge parameters CAMPARI obtains from the parameter file (see elsewhere). This can be required to match the exact standard given by a force field with a finer biotype parsing. Note that - by default - such corrections are error-prone and should only be used when absolutely needed. In any case, the user is recommended to use ELECREPORT for a detailed summary of final partial charges in the system.

DIPREPORT

This simple logical will - when turned on by the user - produce two summary files (see DIPOLE_GROUPS.vmd and MONOPOLES.vmd), which allow to graphically assess the automatically determined charge groups. The former will visualize all charge groups in the system (not just the net neutral ones) by highlighting all atoms belonging to each group. The second will visualize the "center" atom of all groups carrying a net charge (the meaning of this is defined by the value for POLTOL). Note that - naturally - this option is not available if SC_POLAR is zero.

NCPATCHFILE

If the polar potential is in use, CAMPARI automatically determines charge groups, i.e., groups of atoms within a residue that are topologically close and whose partial charges sum up to zero or to an integer net charge. If LREL_MD is 4 or 5 and/or LREL_MC is 1, 2, or 3, this information is used to flag residues as carrying ionic groups, which leads to the computation of additional interactions even if residues are not in each others' neighbor lists. A residue is flagged if it contains at least one charge group with a total, absolute charge greater than a tolerance that is zero by default (and there should be very good reasons for increasing this tolerance).
This keyword allows the user to specify location and name of an optional input file that can perform two important tasks:
  • It allows removal of the net charge flag for specific residues, thereby altering the overall interaction model (if the corresponding options for LREL_MD and/or LREL_MC are selected).
  • It allows the manual specification of sequential target values for the total charges of charge groups to be identified. This is currently the only way to manually alter the charge group partitioning, and can be crucial when simulating unsupported residues and/or when dealing with charge sets that do not group naturally (such as those within the AMBER family of force fields).
The details of the input and an example application are described elsewhere.

POLTOL

If the polar potential is in use, CAMPARI automatically determines charge groups, i.e., groups of atoms within a residue that are topologically close and whose partial charges sum up to zero or to an integer net charge. As described above, these net charge values can be patched. This may, for example, be used to obtain a grouping into approximately neutral groups for partial charge sets that include complex polarization patterns. In order to avoid having the resultant groups cause CAMPARI to flag the corresponding residue as carrying a net charge (i.e., they are treated like ions), this keyword allows the user to defined an increased tolerance for what is considered "approximately neutral". This is relevant because treatment of residues as ions can have substantial implications for the interaction model in particular in terms of computational efficiency (see LREL_MC and LREL_MD). Note that this keyword operates at the charge group level, whereas patches via NCPATCHFILE can (also) disable the ionic flag status of residues. Therefore, both offer different levels of control. The numerical value specified here (in units of e) is compared to the total charge of a given charge group.
As an example, consider a terminal nucleotide residue carrying a 5'-phosphate with an integer negative charge. Suppose that the partial charges on the phosphate linker to the next residue are such that - in addition to the terminal phosphate - this leaves a charge group with a small, fractional charge. In this case, the residue-level patch could only remove the net charge flag for the entire residue (probably undesirable), whereas the tolerance setting described here could specifically eliminate the group with the fractional charge from the list of ionic groups. The default tolerance is set to be zero within reasonable numerical precision. Note that this keyword has no impact on the charge group partitioning itself and is relevant only if LREL_MD is 4 or 5 and/or LREL_MC is 1, 2, or 3.

FUDGE_ST_14

This keyword provides a flat 1-4 scaling factor for interatomic, non-bonded interactions of specific types. 1-4 interactions are defined according to the choice for MODE_14 and depend on the setting for INTERMODEL as well. The value for FUDGE_ST_14 is applied to all steric and dispersion potentials, i.e., the potentials turned on by SC_IPP, SC_ATTLJ, and SC_WCA. The only other 1-4-scaled interaction potential is the electrostatic one for which a separate 1-4-scaling factor is in use (see FUDGE_EL_14). All other pairwise, non-bonded potentials are never subjected to 1-4-corrections (see for example SC_TABUL or SC_DREST). Note that the value for FUDGE_ST_14 is applied in addition to corrections applied at the parameter level by providing 1-4-specific σ- and ε-parameters in the parameter file (see PARAMETERS).

FUDGE_EL_14

Similar to FUDGE_ST_14, this keyword specifies a scale factor for 1-4-interactions. Here, the provided value will be applied specifically to electrostatic interactions (see SC_POLAR) only. If ELECMODEL is set to 2, any charge group interaction will be scaled as a whole by this factor, as soon as any of the possible atom pairs fulfill the 1-4-criterion (see MODE_14).

MODE_14

This keyword's relevance is limited to the case in which INTERMODEL is 1. Then, this essentially defines what a 1-4-interaction is, specifically whether anything separated by exactly three bonds or by exactly one relevant rotatable bond should be considered 1-4:
  1. Only two interacting atoms separated by exactly three bonds are treated as 1-4.
  2. Two interacting atoms separated by exactly one relevant, freely rotatable bond are always treated as 1-4.
The difference only matters for rigid systems and is best illustrated in an example:
Take a phenylalanine residue and consider the CA-CB-CG-CD1 stretch (from Cα to one of the Cδ). This is exactly three bonds and the bond CB-CG is the only relevant rotatable one (CA-CB is also rotatable but irrelevant, since CA lies on the axis, while CG-CD1 is not rotatable). CA and CD1 are treated as 1-4 in both modes. Now consider the CA-CB-CG-CD1-CE1 stretch. These are four bonds and CA and CE1 are not considered 1-4 in mode 1. However, there is still only one relevant rotatable bond in between (CB-CG, since CG-CD1 is rigid), so CA and CE1 are in fact treated as 1-4 in mode 2.
Note that CAMPARI allows specific modifications of 1-4-interactions, either through the use of fudge factors (see FUDGE_ST_14 and FUDGE_EL_14) or through specific parameters provided in the parameter file. If neither of those indicates a deviation from normal interaction rules, then this keyword becomes irrelevant as well.

ELECREPORT

If the polar potential is in use, this simple logical allows the user to request a summary for the close-range electrostatic interactions in the system. Similarly to INTERREPORT, this keyword mostly serves debugging purposes and should only be needed/required to understand the details of the short-range interaction setup.

SC_IMPSOLV

This keyword serves two functions. First, as a logical it enables the ABSINTH implicit solvent model, i.e., it will compute the direct mean-field interaction (DMFI) of each solute with the continuum and enable screening of polar interactions (if turned on → SC_POLAR). For the former (the DMFI) it simultaneously serves as the linear scaling factor. Note that the amount of screening of polar interactions is not dependent on this keyword and solely determined by other parameters (in particular IMPDIEL). The DMFI is defined as:

EDMFI = cDMFI·Σk ΔGFES,ki ζik·υik]

Here, υik is the solvation state of the ith atom in the kth solvation group and ζik is its weight factor. The solvation states are computed by CAMPARI and vary throughout the simulation whereas the weight factors are constant. The reference free energies of solvation for each solvation group (ΔGFES,k) are provided through the parameter file and are constant as well (for the latter see PARAMETERS). Note that the computation of the DMFI given the υik is a computation of negligible cost and that CAMPARI obtains the υik while computing short-range non-bonded interactions at a moderate additional cost. This implies that the ABSINTH implicit solvation model is speed-limited almost exclusively by the complications incurred by the screening of polar interactions. The user is referred to Vitalis and Pappu for further details (reference).
To employ the ABSINTH implicit solvent model as published use:

FMCSC_FOSMID 0.1
FMCSC_FOSTAU 0.25
FMCSC_SCRMID 0.9
FMCSC_SCRTAU 0.5
FMCSC_SAVPROBE 2.5
FMCSC_IMPDIEL 78.2
FMCSC_SCRMODEL 2 # (or 1)

Note that as of 2013 the more rigorous screening model (option 1) appears in published literature only for the work on arginine-rich peptides (Mao et al.). Finally, note that the DMFI can be made temperature-dependent by additions to the parameter file and use of keyword FOSMODE.

SAVPATCHFILE

This keyword can be used to provide the location and name of an input file that allows overriding the default, topology-derived values for the maximum fractions of the solvent-accessible volume, ηi,max. Because values depend on hard-coded parameters (geometry) and user-level settings (choice of parameters and keyword FMCSC_SAVPROBE), CAMPARI (re)computes these values at the beginning of each run. This utilizes the default local geometries (not input structures) and works by decomposing the molecule into suitably small model compound units. The patch prints a summary of all successful changes, and results can also be assessed via column 4 in output file SAV_BY_ATOM.dat. Note that these values rely on other patchable quantities, most notably atomic radii. Patches follow a hierarchy, and a patched value for the ηi,max overrides values derived from radii that could be patched themselves (here, RPATCHFILE overrides indirect reassignment via LJPATCHFILE) without touching the atomic radii. This means that it possible for the patched values of ηi,max to be grossly inconsistent with the underlying set of radii.

ASRPATCHFILE

This keyword can be used to provide the location and name of an input file that allows overriding the default, topology-derived values for the pairwise reduction factor for atomic volumes used in most computations using the atomic volume, most prominently the ABSINTH implicit solvation model. Reduction factors are needed, because the exclusion volumes of covalently bound atoms overlap. The reduction factors are computed in linear approximation, and - by default - the overlap volume is subtracted evenly from the remaining atomic volume of each partner. These values depend on various parameters (parameters and hard-coded geometry), and CAMPARI (re)computes them at the beginning of each run. The patch prints a summary of all successful changes, and results can also be assessed via column 7 in output file SAV_BY_ATOM.dat. See SAVPATCHFILE for remarks on the hierarchy of patches of atomic parameters.

FOSPATCHFILE

Since there is no external way to control details of the solvation group assignments relevant to the computation of the DMFI (→ SC_IMPSOLV) through the parameter file, CAMPARI offers users to alter the default group partitioning and to control reference free energies of solvation on a per-moiety basis through a dedicated input file. This also supports alterations to transfer enthalpies and heat capacities at the patch level if a temperature-dependent DMFI is in use. This keyword is used to provide the location and name of this input file. There are some underlying restrictions to the freedom of choices, but in principle it is possible to completely redesign the underlying DMFI model using this facility. Restrictions and formatting are explained elsewhere. The applied patch implies that CAMPARI will keep the built-in default partitioning along with the default reference values from the parameter file (see elsewhere) for unpatched residues and molecules. As with other force field patches, these corrections are error-prone and CAMPARI output should always be double-checked against the intended input. For this purpose, keyword FOSREPORT and associated output file FOS_GROUPS.vmd will be of particular use.

FOSMODE

Simulation temperature is used frequently in biomolecular sampling both to explicitly probe temperature-dependent behavior and to enhance sampling. For the former, the correctness of fixed force field parameters becomes questionable. If the DMFI of the ABSINTH implicit solvation model is in use, this keyword allows the user to make some of the parameters of the model temperature-dependent themselves. There are currently two options:
  1. All values for ΔGFES in the equation above are fixed to the reference values specified in the parameter file independent of temperature or any other environmental parameters. This is the default.
  2. CAMPARI tries to extract values for temperature-independent enthalpies and heat capacities of the transfer process of a given model compound from a fixed conformation in the gas phase into water from the parameter file. By default, all CAMPARI parameter files do not contain these parameters. The temperature-dependent values are computed as:

    ΔGFES(T) = (ΔGFES(T0) - ΔHFES)T/T0 + ΔHFES + ΔCp,FES[T[1 - ln(T/T0)] - T0]

    Here, ΔHFES and ΔCp,FES are the aforementioned enthalpies and heat capacities of transfer, whereas T denotes the simulation temperature and T0 denotes the reference temperature for the listed free energy value. T0 is set by keyword FOSREFT.
Note that this keyword is irrelevant unless the parameter file actually contains entries at least for the solvation enthalpies. If no parameters are present, the values are adjusted such that ΔGFES(T) equals ΔGFES(T0) for all temperatures. Furthermore, it is important to keep in mind that a temperature-dependent model may not be more meaningful if other parameters remain fixed. For the screened ABSINTH part, it is at least possible to adjust the reference dielectric for water in a temperature-dependent manner.

FOSREFT

If the DMFI of the ABSINTH implicit solvation model is in use, and if a temperature-dependent model has been requested, this keyword sets the assumed reference temperature for transfer free energies of solvation listed in the corresponding section of the parameter file. It defaults to 298K.

FOSREPORT

This simple logical allows the user to request CAMPARI to print a summary of the group-based reference free energies, enthalpies, and heat capacities of solvation read from the parameter file. The latter two terms are only relevant if a temperature-dependent model has been selected. In general, the reference free energies will correspond exactly to the terms ΔGFES,k above. Note, however, that this initial output is not a summary of the system but rather of the parameters, i.e., it is more like VDWREPORT and unlike ELECREPORT or INTERREPORT. If some solvation group assignments and parameters are changed via a corresponding patch file, this keyword will also ensure that the applied patch is documented in detail in CAMPARI's log output. The actual group partitioning for the system at hand (but not the associated numerical parameters) is available from output file FOS_GROUPS.vmd.

SAVPROBE

This keyword is crucial for the ABSINTH implicit solvent model and specifies the size of the solvation shell around individual atoms. The input value is interpreted to be the radius in Å of a solvent sphere rolled around each atom and consequently twice the value of SAVPROBE will yield the thickness of the assumed first solvation layer. The resultant solvation shell volume is the starting point for determining solvent-accessible volume fractions (ηi) which are then mapped to yield atomic solvation states (υi) which are relevant for the DMFI and screened electrostatic interactions (→ SCRMODEL). It is important to note that SAVPROBE is the only keyword directly controlling the ηi which are otherwise purely functions of atomic parameters (see PARAMETERS). Lastly, note that this keyword is still relevant for SAV analysis even though the implicit solvent model might not be used (→ SAVCALC).

FOSTAU

The atomic solvent-accessible volumes, ηi, are mapped to solvation states by two different sets of parameters, the first being responsible for obtaining υi,f which are the solvation states describing the change in DMFI with changes in conformation (the second set is responsible for obtaining υi,s which describe the change in dielectric response with change in conformation). The details of the mapping function are complicated by the requirement to normalize the υi,f to the well-defined interval [0:1] but in essence it holds:

υi,f ~ [ 1.0 + exp[ - (ηi-h(χf))/τf ] -1

Here, τf is the parameter determining the steepness of the sigmoidal interpolation and this is the parameter determined by this keyword. Large values will yield an approximately linear re-mapping between the natural limits of ηi which are derived from closest packing of spheres (lower limit) and model compound topology (upper limit). This case is not obvious from the above equation but is obtained via τf-dependent re-scaling to match the target interval. Conversely, very small values yield a step-function like interpolation. h(x) is a linear function shifting the mid-point parameter χf (set by FOSMID) such that symmetry between the two natural limits of ηi is obtained.

FOSMID

As explained for FOSTAU, the mapping from solvent-accessible volumes ηi to solvation states υi,f relies on a mid-point parameter, χf. In the functional form given above, the mid-point of the sigmoidal function (i.e., the point of maximal slope) can be shifted toward either one of the natural limits of ηi by varying this keyword between zero and unity. Since the sigmoidal nature of the interpolation disappears in the limit of large values chosen for FOSTAU, FOSMID is only relevant for sufficiently small values of FOSTAU and its impact deteriorates progressively with growing FOSTAU. Note that the default is 0.5 but that it is easily possible to generate fairly asymmetric interpolation functions in the process (i.e., at values close to zero atoms are considered solvated at almost all times while at values close to unity the opposite is true). There is a Matlab script in the tools-directory (sigmainterpol.m) that helps assess the effect FOSTAU and FOSMID have given values for the natural limits of ηi.

IMPDIEL

This keyword lets the user set the assumed continuum dielectric. Primarily, this is used in the ABSINTH solvation model to treat the screening of electrostatic interactions. The dielectric constant enters the equation for the modified Coulomb sum in different ways depending on the choice for SCRMODEL. In general, the solvent-accessible volumes will be mapped to yield solvation states υi,s for dielectric screening. The mapping process is identical to the one described for FOSTAU but relies on parameters τs (→ SCRTAU) and χs (→ SCRMID) instead. In the published ABSINTH model, the screening factor for the polar interaction is given as:

sij = [ 1 - aυi,s ]·[ 1 - aυj,s ]
a = (1 - εr-1/2)

Here, εr is the relative dielectric constant set by this keyword. The above equation corresponds rigorously only to using screening model 2. Note how the functional form ensures an interpolation between the vacuum (υi,j = 0.0 → εeff = 1.0) and the fully screened cases (υi,j = 1.0 → εeff = εr).
In a completely different context, this keyword also sets the assumed continuum dielectric outside the cutoff sphere when treating electrostatics interactions with reaction-field methods (→ LREL_MD). For this latter purpose, it may be advantageous to set a very large value.

SCRMODEL

This keyword has several options which allow the user to control how dielectric screening of charges is done, specifically what functional form is used for the pairwise screening factor sij for a pair of interacting atoms i and j. The electrostatic framework within ABSINTH aims specifically at ensuring that only moieties with well-defined net charges interact (this is discussed in a different context for ELECMODEL). This means that for every base functional form of sij there will be two variants, one in which the υi,s are used directly (atom-based) and one in which a charge group-based υk,s is pre-computed for each group k out of its constituent atoms' solvation states υi,sk. Only the latter ensures rigorously that two formally neutral charge groups interacting will not create effective charge imbalances by atom-specific screening. The downside of those models (and the reason we generally do not recommend using them) is the higher computational cost associated and the dependence on the local neutrality in the partial charge set (i.e., should the base parameters not yield any locally neutral subgroups within a residue, the relevant charge group may be as large as an entire polynucleotide residue and dielectric responses of fairly distant moieties may become coupled which suggests a length scale to the solvent response vastly inconsistent with the setting for SAVPROBE). In the latter case, it may be necessary to attempt to patch the charge groups so that an approximate grouping is obtained.
  1. For every charge group, the solvation states for the individual sites are averaged in charge-weighted fashion (group-based → see above). The resultant group solvation state υk,s is used to screen all the charges belonging to this group:
    sij = [ 1 - aυk,s ]·[ 1 - aυl,s ]
    a = (1 - εr-1/2)
    Here, we assume atom i is part of the kth charge group and atom j is part of the lth charge group. εr is provided by IMPDIEL.
  2. This is the published atom-based model and explained above (→ IMPDIEL). The atom-specific screening via atomic solvation states υi,s will break the neutral paradigm somewhat but localizes and strengthens specific interactions.
  3. Since electrostatic interactions tend to be somewhat weak with the aforementioned options, this model extends the default model (1) by an important change. If the distance of atoms i and j, rij approaches the length-scale of the first solvation shell, the dielectric is augmented by a distance-dependent contribution intended to strengthen specific interactions. This yields a very complicated (although computationally not much more expensive) model:
    sij = senv,ij if rij ≥ (r0,ij+dW) or senv,ij > [ εc·r0,ij ]-1
    sij = [ 1 - fMIX·[1 - dw-1(rij-r0,ij)] ]·senv,ij + fMIX·[1 - dw-1(rij-r0,ij)]·[ εc·r0,ij ]-1 if rij < (r0,ij+dW) and rij > r0,ij
    sij = (1 - fMIX)·senv,ij + fMIX·[ εc·r0,ij ]-1 if rij < r0,ij
    senv,ij = [ 1 - aυk,s ]·[ 1 - aυl,s ]
    a = (1 - εr-1/2)
    Here, dW is the thickness of the solvation shell (2·SAVPROBE) and r0,ij is given by the sum of the atomic radii of atoms i and j. fMIX is the impact of the distance-dependent contribution and set by keyword SCRMIX. εc is set by CONTACTDIEL (compare model 4). Note that the distance-dependence is achieved by the interpolation performed in the distance regime r0,ij < rij < (r0,ij+dW) but that no explicit distance-dependence is introduced otherwise. Furthermore, the contact dielectric εc·r0,ij is generally overridden if the environmental dielectric senv,ij would lead to a stronger interaction (less screening). Importantly, model 3 operates on the group-consistent solvation states (as model 1 does). The atom-specific modification corresponds to model 9. It should be noted that these models are largely untested and were part of initial calibration studies with the ABSINTH implicit solvent model. They are fully supported by CAMPARI, however.
  4. This model implements a (more or less) pure distance-dependent dielectric:
    sij = [ εc·rij ]-1 if rij > r0,ij
    sij = [ εc·r0,ij ]-1 else
    Here, εc is the strength of the distance increase of the dielectric constant and r0,ij is the contact distance below which no further distance dependence to sij is applied. The resultant effective dielectric constant is εc·r0,ij which should never be less than unity. εc is set by CONTACTDIEL and r0,ij is defined by the sum of the atomic radii of atoms i and j. This means that the derivative of the potential is discontinuous at the contact point. Note that distance-dependent dielectric models break for a variety of limiting cases, in particular for anything involving net charged species. They also rely on a cutoff criterion since they otherwise do not converge upon a meaningful limiting dielectric. In this way, distance-dependent dielectrics may be seen as somewhat analogous to reaction-field treatments (see LREL_MD).
  5. This model is a group-based variant and therefore similar to option 1). It attempts to take a different route toward computing an effective dielectric. Whereas models 1, 2, 3, and 9 use an effective charge approach, this model (just like models 6, 7, and 8) employs an effective dielectric approach. The former implies that the solvation state enters the potential energy for Coulombic interactions as υi,s·υj,s, i.e., EPOLAR will scale with changes in the υi,s differently than the DMFI. Consequently, screening model 5 implies:
    sij = M( [1 - a·υk,s], [1 - a·υl,s] )
    a = (1 - εr-1)
    Here, we assume atom i is part of the kth charge group and atom j is part of the lth charge group and M is a function corresponding to a generalized mean whose exact form is determined by the choice for ISQM. The latter will be able to give rise to fundamentally different scaling behavior of EPOLAR with the υi,s illustrated for example by taking the arithmetic mean. This can more closely approximate the behavior seen for the DMFI and may allow using much more similar parameter sets τs and χs compared to τf and χf than is the case with models 1 or 2.
  6. This model is the atom-based variant of model 5:
    sij = M( [ 1 - a·υi,s], [ 1 - a·υj,s] )
    a = (1 - εr-1)
  7. This model is an equivalent modification to model 5 as model 3 is to model 1.
  8. This model is an equivalent modification to model 6 as model 3 is to model 1.
  9. This model is an equivalent modification to model 2 as model 3 is to model 1.

CONTACTDIEL

For certain screening models, (SCRMODEL = 3, 4, 7, 8, or 9) a value for the effective dielectric at an interatomic distance matching the sum of the two atomic radii exactly is postulated to have the limiting value of εc·r0,ij (see equations above). This keyword provides the value for the parameter εc.

SCRTAU

As explained before (see IMPDIEL and FOSTAU), the ABSINTH implicit solvent model employs two sets of solvation states υi,f and υi,s. This keyword determines the steepness of the sigmoidal interpolation that yields the υi,s from the ηi, τs. The functional form is identical to the one described for FOSTAU. The υi,s determine the effective dielectric acting between polar atoms (see equations above).

SCRMID

This is the specification analogous to FOSMID but provides χs rather than χf.

SCRMIX

Several of the screening models (choice of 3, 7, 8, or 9 for SCRMODEL) splice a distance-dependent term into the environmental charge-screening over a well-defined length scale. The impact of this contribution is set by this keyword which corresponds to the parameter fMIX in the equations above. If set to values close to zero, the model approaches its unmodified base model, e.g. model 3 essentially converges to model 1. Conversely, a value close to 1.0 would yield maximum impact and let - for example - model 3 approximate model 4 for distances close to the contact distance r0,ij. The choice here is naturally tightly coupled to that for CONTACTDIEL.

ISQM

In those screening models postulating an effective dielectric rather than effective charges, the generalized mean function M(x,y) was introduced (see equations above). This can be an integer from -10 to 10, but large absolute values slow down the computation drastically and are not recommended. The specification here defines the order m for the generalized mean:

M(x,y) = [0.5·( xm + ym ) ]1/m if m ≠ 0

With the limiting case of:

M(x,y) = (x·y)1/2 if m = 0

Common cases aside from the geometric (m=0) are the arithmetic (m=1) or the harmonic (m=-1) mean. Any m>1 will favor large values in an asymmetric pair, i.e., let both participating atoms appear desolvated leading to stronger interactions, while any m<1 will favor small values in an asymmetric pair, i.e., let both participating atoms appear solvated and weaken such interactions (it is the derived screening factors and not the solvation states that enter the mean). The former scenario (m>1) would rarely seem desirable as it means that - for instance in solutions of small, polar molecules - the cooperativity for converting between fully dissociated and fully associated states becomes overly pronounced on account of the positive coupling between adding more and more species to a growing cluster and the enthalpic benefit offered by that process.

SC_TOR

This keyword specifies the linear scaling factor controlling the "outside" scaling of torsional bias terms, VTOR. Such a potential allows to either harmonically restrain virtually all freely rotatable dihedral angles to specific target values or to softly bias them toward such target values. The setup for these is handled through an input file (details of the format are described elsewhere). Note that a particularly useful application of ETOR is to apply torsional restraints according to structural input which is useful for equilibrating molecules meant to remain in a specific, internal arrangement.

TORFILE

This keyword specifies the location and name (absolute paths preferable) of the input file for individual backbone torsional bias potentials, VTOR (see elsewhere for description).

TORREPORT

This is a simple logical allowing the user to instruct CAMPARI to write out a complete summary of the torsional bias terms contributing to VTOR (naturally parsed by residue) in the system. In addition to the annotated log-output, this will also create the output file SAMPLE_TORFILE.dat, which is a rewriting of the current input specifications to a fully explicit and residue-based version. This is useful primarily in preserving input to the definition of torsional bias potentials that comes from structural input. It is recommended to utilize this option.

SC_ZSEC

This keyword gives the linear scaling factor for a global secondary structure bias term. For values larger than zero, a harmonic bias is applied on two order parameters, fα and fβ which measure the secondary structure content of the chain. fα and fβ are calculated as the sequence-averaged (excluded termini) values of a mapping function defined for each residue:
zα = eα·(dα-rα)2 if dα < rα
zα = 1.0 else
The radius of the (spherical) α-region, rα, is provided by ZS_RAD_A and its center φ/ψ-position by keyword ZS_POS_A. The distance dα is taken from the center of the circle and corrected for periodic wraparounds in φ/ψ-space. zβ is defined analogously. This function represents a smooth "top hat" function which is continuous and differentiable. By tuning the parameter τα through keywords ZS_STP_A and ZS_STP_B, the Gaussian decay beyond the limits of the spherical plateau region can be turned from very shallow to step function-like. The default definitions (all of which can be overridden) are:
α
Center: φ/ψ=(-60.0,-50.0)°; rα = 35.0°; 1.0/τα1/2 ≅22.36°
β
Center: φ/ψ=(-155.0,160.0)°; rβ = 35.0°; 1.0/τβ1/2 ≅ 22.36°

The global values (if there are multiple polypeptide chains in the system, the average is over all of them) are then restrained:

VZSEC = cZSEC·(kα·(fα - fα0)2 + kβ·(fβ - fβ0)2)

Here, cZSEC is the linear scaling factor specified by this keyword. The other parameters are explained below. Note that it may not be a good idea to use such a residue-based restraint potential for very short sequences. Here, the net content idea breaks down and (for typical choices of τα/β) the chain will have access only to values in the vicinity of those given by a discrete residue content. This may lead to a specific sampling of the ring regions around the plateaus to satisfy intermediate target values which runs counterintuitive to the intent of the potential.

ZS_FR_A

This keyword specifies the target α-content for the global secondary structure bias (fα0) potential (values [0.0:1.0]).

ZS_FR_B

This keyword specifies the target β-content for the global secondary structure bias (fβ0) potential (values [0.0:1.0]). Note that the sum of fβ0 and fα0 (see ZS_FR_B) should usually not exceed unity, especially in conjunction with stiff spring constants. Doing so would generate a frustrated system for which results will often be irrelevant.

ZS_FR_KA

Through this keyword, (twice) the spring constant (in kcal/mol) operating on fα is provided (kα) if the global secondary structure bias potential is in use.

ZS_FR_KB

Analogous to ZS_FR_KA, this keyword lets the user specify the spring constant (in kcal/mol) operating on fβ (kβ) if the global secondary structure bias potential is in use. If both parameters are meant to be restrained, it usually would not seem meaningful to choose very different values for the two spring constants. In doing so, one would essentially create a primary bias (stiffer term) and a secondary bias (softer term) operating "within" the primary restraint.

ZS_POS_A

This is one of the few keywords that requires two floating point numbers as input. It allows the user to override the default location of the α-basin (see SC_ZSEC). The two numbers are interpreted to be the φ- and ψ-values (in degrees) for the center of the (spherical) basin. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.

ZS_POS_B

See ZS_POS_A, only for the β-basin.

ZS_RAD_A

This keyword requires one floating point number to be specified. It allows overriding the default radius of the α-basin (see SC_ZSEC) and is assumed to be given in degrees. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.

ZS_RAD_B

See ZS_RAD_A, only for the β-basin.

ZS_STP_A

This keyword requires one floating point number. It allows overriding the default steepness of the decay (τα) of the order parameter value beyond the spherical plateau region defining the α-basin (see SC_ZSEC). It is assumed to be provided in inverse degrees squared. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.

ZS_STP_B

See ZS_STP_A, only for the β-basin.

SC_DSSP

This keyword provides the outside scaling factor, cDSSP, on biasing potential acting on order parameters derived from the secondary structure annotation of polypeptides in the simulation system using the DSSP alogrithm. In essence, this allows to bias the system to populate more and stronger hydrogen bonds characteristic for either α-helices (H) or β-sheets - whether parallel, antiparallel, multi-pleated or hairpins (E). Since secondary structure annotation is essentially a discretized and on/off variable, it may seem surprising that a restraint potential can be applied in meaningful fashion.

VDSSP = cDSSP·(kH·(fH - fH0)2 + kE·(fE - fE0)2)

Here, the kH and kE are (twice) the spring constants for the harmonic restraints applied to the secondary structure scores, fH and fE. The spring constants are set by keywords DSSP_HSC_K and DSSP_ESC_K for H-score and E-score, respectively. fH and fE are exactly identical to the H-score and E-score defined below and rely on the same base parameters (→ DSSP_MODE). Essentially, they correspond to a multiplicative function of the assignment and the quality of the hydrogen bonds giving rise to the assignment. They can - depending on system and DSSP settings - be continuous and approximately smooth order parameters over a large part of the accessible regime. The target values fH0 and fE0 are set via keywords DSSP_HSC and DSSP_ESC. There are a few noteworthy peculiarities which the user should keep in mind:
  1. DSSP E-assignments can rely both on intra- and intermolecular hydrogen bonds rendering the DSSP term a true system-wide potential. Currently, CAMPARI only allows restraining global E- and H-scores which may make calculations with multiple polypeptides more difficult to interpret.
  2. In the limit of no hydrogen bonds, the order parameters will always be discontinuous since the discrete assignment score has to be non-zero for the quality score to matter.
  3. Due to the potential discontinuities, dynamics calculations utilizing the DSSP biasing potential may suffer from substantial noise, in particular for stiff restraints and small systems.
  4. Again, due to the functional form, there is no direct driving force to form new hydrogen bonds of the right type. The potential relies on random encounters and the cooperativity of secondary structure elements.
  5. Lastly, in case some proper hydrogen bonds are formed, the resultant energy landscape is often very rugged and sampling may be severely hampered by the presence of the restraints. It is therefore advisable - at the very least - to perform multiple independent simulations when using DSSP restraints.
As mentioned above, the parameters determining criteria for hydrogen bonds are explained further below (see DSSP_MODE). It is easy to see that these types of restraints complement the ones introduced above (→ SC_ZSEC).

DSSP_HSC

In case DSSP restraints are used (→ SC_DSSP), this keyword allows the user to set the target H-score (α-content, fH0 above). Its value is limited to the interval from zero to unity. A large value will steer the system toward forming many i→i+4 hydrogen bonds.

DSSP_ESC

In case DSSP restraints are used (→ SC_DSSP), this keyword lets the user set the target E-score (β-content, fE0 above). Just like for DSSP_HSC, values are restricted to the interval [0.0:1.0]. A large value will bias the system toward forming characteristic β-hydrogen bonds but does not distinguish between parallel or anti-parallel arrangements. Note that the sum of DSSP_HSC and DSSP_ESC should probably never approach unity. Also note that the E-score can never be exactly unity for a monomeric, finite length polypeptide even when discarding termini (turn requirement).

DSSP_HSC_K

If DSSP restraints are in use (→ SC_DSSP), this keywords sets (twice) the spring constant (in kcal/mol) operating on the DSSP H-score, i.e., it sets the value of kH above.

DSSP_ESC_K

If DSSP restraints are in use (→ SC_DSSP), this keywords sets (twice) the spring constant (in kcal/mol) operating on the DSSP E-score, i.e., it sets the value of kE above.

SC_POLY

In studies of generic polymers coarse descriptors like size and shape of the macromolecule may be more relevant than structural characteristics tailored specifically to polypeptides. CAMPARI supports restraint potentials on such coarse descriptors, specifically the parameters t and δ (see description of output file POLYAVG.dat) which measure size and shape asymmetry, respectively. Two-dimensional histograms of these quantities can be computed and written by CAMPARI (see output file RDHIST.dat). These molecule-based restraint potentials yield a bias term to the total potential energy, VPOLY, and this keyword provides its "outside" scaling factor cPOLY. Note that with the exception of the scaling factor, requests are generally handled through a dedicated input file (see elsewhere for details).

POLYFILE

This keyword should point to the location of the input file for individual molecular polymeric biasing potentials (→ elsewhere for description).

POLYREPORT

Like other report flags, this keyword is a simple logical which allows the user to obtain a complete summary of the polymeric bias terms (by molecule) in the system. It is only meaningful if polymeric biasing terms are in use (→ SC_POLY).

SC_TABUL

CAMPARI has an extensive facility to supply tabulated non-bonded potentials which are then applied to the system. This keyword specifies the "outside" linear scaling factor cTABUL according to:

ETABUL = cTABUL ·ΣΣi,j I(Vijk,Vijk+1,mijk,mijk+1,dij)

Here, the sum runs over all atom pairs i,j which have a tabulated potential specified for them, Vijk is the kth tabulated value of the acting potential and dij is the interatomic distance. dij is located uniquely within the interval given by the kth and k+1th tabulated value. I(...) is the interpolation function, and CAMPARI currently performs only cubic interpolation with cubic Hermite splines:

I(Vijk,Vijk+1,mijk,mijk+1,dij) = (2t3 - 3t2 + 1)·Vijk + (3t2 - 2t3)·Vijk+1 + (dk+1-dk)·[(t3 - 2t2 + t)·mijk + (t3 - t2)·mijk+1]
t = (dij - dk)/(dk+1-dk)

Here, t is the relative position in the interval from k to k+1 normalized to unit length. The mijk are the tangents to (slopes at) the control points (tabulated values) of the potentials. The spline is set up to recover both values and tangents at the control points. This means that the resultant function is continuously differentiable regardless of the values used for the tangents. Tangents are either read from file (without error checks → description of dedicated input file) or estimated numerically via finite differences from the potential input (see description of dedicated input file). In the latter case, some options are available to tune the spline (see TABIBIAS and TABITIGHT).
There are a few additional characteristics of the implementation of tabulated potentials in CAMPARI:
  1. Aside from Coulombic terms, these potentials are the only ones captured by the longer of the non-bonded cutoffs in MC runs (→ ELCUTOFF).
  2. When used concurrently with other non-bonded potentials, a lot of wasteful distance calculation may be performed. This is since tabulated potentials have to use their own data structure to be able to function efficiently both for cases with universal use and for very sparse use.
  3. Atom pairs that are in close proximity and are excluded from all other non-bonded potentials are not excluded from tabulated potentials.
All further implementation details are provided in the description of the corresponding input files (index input, potential input, and optional tangent input). As an example for the utility of this facility, consider knowledge-based (effective) potentials acting on solutes in an implicit solvent. The potentials themselves may have been derived from binned distance distributions and are practically defined in the form of tabulated potentials. The possibilities are such that - in theory - each individual atom pair could interact via its own unique potential.

TABCODEFILE

This keyword provides the index input file which determines which tabulated potential to use for which atom pair (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.

TABPOTFILE

This keyword should give the name and location of the actual input file for the tabulated potentials (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.

TABTANGFILE

This keyword should give the name and location of the optional input file for providing derivatives of the tabulated potentials specified via another keyword. If this file is not provided, the derivatives are estimated numerically to generate the necessary tangents for the cubic interpolation scheme. If the file is provided, however, no checks are performed on the supplied values (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.

TABITIGHT

If tabulated potentials are in use, and if the input file providing derivatives of the potentials is either missing or incomplete, the cubic interpolation scheme applied to the discrete input data (using cubic Hermite splines) utilizes numerical estimates of the tangents (slopes) at the nodes (control points). The shape and nature of the resulting spline can be varied somewhat with two control parameters, the first controlling the "tightness", and the second (see below) controlling a left/right-sided bias with respect to the control points. The control parameters are used in the construction of the tangents as follows:

mijk = [ (1-tt)·(1+tb)·(Vijk - Vijk-1) + (1-tt)·(1-tb)·(Vijk+1 - Vijk) ] / (dk+1 - dk-1)

This is essentially a simplified Kochanek-Bartels spline scheme skipping the discontinuity parameter and assuming identical distance spacings. The Vj are the potential values at the specified distances, dk, supplied via the required input file. tt is the tightness parameter controlled by this keyword, and tb is the bias parameter controlled by TABIBIAS. For both parameters being zero, the well-known Catmull-Rom spline is obtained. Regardless of the choices for tt and tb (allowed values span the interval from -1 to 1), the resultant interpolation scheme will yield a function that is continuous and smooth (i.e., continuously differentiable). However, unless the control points are very sparse with respect to the features of the potentials, any non-zero settings for tt and/or tb will most likely lead to undesirable effects, in particular at the level of derivatives.

TABIBIAS

If tabulated potentials are in use, and if the input file providing derivatives of the potentials is either missing or incomplete, the cubic interpolation scheme applied to the discrete input data (using cubic Hermite splines) utilizes numerical estimates of the tangents (slopes) at the nodes (control points). The shape of the resulting spline utilizes a bias parameter, tb, that is specified by this keyword. Its exact interpretation is explained above. Simply speaking, positive values lead to a lag (along the distance axis) in the interpolated, piecewise polynomial compared to the control points, whereas negative values do the opposite.

TABREPORT

If tabulated potentials are in use (see SC_TABUL), this keyword lets the user instruct CAMPARI to print out a report of all the tabulated interactions in the system. This output can be quite large and is written to a separate output file (see TABULATED_POT.idx).

SC_DREST

A lot of experimental techniques (in particular NMR or FRET) can derive distance restraints on the relative position of two sites in a biomolecule. Hence, several computational techniques are able to utilize such restraints (prominent for example in the computational determination of protein structures via NMR). CAMPARI offers the simple facility to harmonically restrain atoms which otherwise need not have any particular relationship. These restraints can be made one-sided, i.e. they can also restrain a distance to simply be within or beyond a certain threshold, which is usually a more appropriate treatment for incorporating experimental results. Such requests are handled and processed through a dedicated input file (see FMCSC_DRESTFILE), and details are provided there. This keyword discussed here simply provides the "outside" scaling factor cDREST for the VDREST term.

DRESTFILE

This keyword should give the location and name of the input file containing specific atom-atom distance restraint requests (see elsewhere for format description). Naturally, this is only relevant if custom distance restraints are in use.

DRESTREPORT

If distance restraint potentials are in use (see SC_DREST), this keyword allows the user to request a summary of the active distance restraint terms in the system.

SC_EMICRO

This keyword sets the global scaling factor for a spatial density restraint potential. The method was introduced recently (Vitalis and Caflisch), and the user is referred there for additional details. The potential relies on reading and quantitatively interpreting an input density map. The interpreted density for a given lattice cell with indices l, m, and n is denoted Ξlmn and is meant to correspond to some atomic property such as mass (→ EMPROPERTY). The potential itself is as follows:

EEMICRO = fEMICRO Σijkijk - Ξijk )2

The value of fEMICRO is set by this keyword. The potential is extensive with the number of grid cells. If it is the dominant contribution in terms of CPU time to energy evaluations, the use of Monte Carlo sampling is currently quite wasteful since the values for ΔEEMICRO are not actually incremental. The sum implied in the above equation is over all lattice cells of an evaluation grid reduced in resolution to exactly that of the input density map. Note that the dimensions of the evaluation grid are controlled by system size and shape, and that its formal resolution is either assumed to be that of the input map or set explicitly by keyword EMDELTAS (although the resultant lattice is required to have cell boundaries that align exactly with those of the input map). If the resolution of the evaluation grid is finer, the values for its cells are summed up to give the coarser resolution. Furthermore, the evaluation grid may extend beyond the input map, and in such a case the summation also includes (coarse) cells where the input is assumed to be exactly the background density. Taken together, these caveats mean that it is rarely useful not to match the input lattice exactly. Importantly, the spatial density restraint provides an absolute reference in space, which means that it is most likely incorrect to use drift removal techniques. Another unusual aspect about this potential is that it only applies to physically present molecules in simulations in ensembles with fluctuating particle numbers. This is despite it not being a pairwise interaction term, and distinguishes it from potentials affecting the bath particles as well (such as bonded potentials). Because the potential is strictly a penalty term, this creates an effective mismatch that must be lumped manually into the excess chemical potential. This is neither pretty nor clean meaning that concurrent use of this techniques should be accompanied by the appropriate skepticism.
Depending on the choice for EMMODE, EEMICRO can also be written using an average of the simulation density that is typically not equivalent to the canonical ensemble average:

EEMICRO = fEMICRO Σijk ( ⟨ ρijk ⟩ - Ξijk )2

Here, the angular brackets indicate an average that depends on keyword EMIWEIGHT and is explained there. Further details as to why the canonical average is not used are below. Note that the potential utilizing this average no longer corresponds to a unique Hamiltonian, i.e., every time the average is updated the energy landscape changes. This means that the ensembles generated are no longer straightforward to interpret. The obvious benefits of using an ensemble-averaged restraint are twofold. First, explicit heterogeneity can explain data that would be inconsistent with a unique structure. Second, sampling is aided by the fact that "stuck" conformations will tend to become unstable in terms of EEMICRO over time. As a final remark, users should keep in mind that the actual ensemble average generated may not agree with input given that this quantity was never actually restrained during the simulation.

EMMODE

If the density restraint potential is in use, this keyword allows the user to choose between two options. Setting this keyword to 1 computes the restraint term by comparing the instantaneous simulation density to the input density map, whereas a choice of 2 computes the restraint term by comparing an ensemble-averaged simulation density to the input density map.
While the first option is straightforward, the second one requires some additional considerations as follows. Irrespective of whether a run is in parallel or not, the ensemble average is currently obtained over the previous sampling history (beyond equilibration) of the exact trajectory in question. Note that any average is created in terms of numbers of steps, which may cause inconsistencies in hybrid sampling runs due to the different average phase space increments. Choosing an appropriate type of average is not trivial (see, e.g., this reference), because the naive approach of including the entire sampling history leads to a continuously decreasing impact of the restraint term. There are currently two ways to address this. First, the accumulation frequency for the ensemble average can be reduced by keyword EMCALC. This slows down the reduction in impact and effectively gives the system more time to explore, because it results in concatenated runs of length EMCALC, during which the potential is in fact constant. Second, CAMPARI uses a fixed weight for the instantaneous component of the average while evaluating the potential. This fixed weight is set by keyword EMIWEIGHT and provides a way to utilize the entire history without degrading the impact of the restraint potential. A third route would be to use an appropriate kernel function in the time averaging, but this is inconvenient and potentially inefficient for spatial density analysis due to the large number of terms that would have to be stored and processed to recompute the kernel-based average.
A third option for this keyword may be added in the future that allows a lateral ensemble average to be restrained in MPI averaging calculations.

EMIWEIGHT

If the density restraint potential is in use, and if the potential acts on some ensemble-averaged simulation density, this keyword allows the user to set a fixed weight for the constructed average:

⟨ ρijk ⟩ = (1-winst) Nsteps-1 Σi ρijk(i) ) + winst ρijk(current)

Here, the factor winst is set by this keyword and bound to the interval from 0 to 1. The ρijk(i) are the Nsteps values contributing to the running, canonical average of the density, and ρijk(current) is the density produced by the current conformation at that given lattice cell. The limiting case of winst being 1.0 recovers the instantaneous treatment (→ EMMODE). The limiting case of winst being 0.0 does not, however, produce a meaningful restraint (since it is independent of the current conformation). Both limiting cases are therefore forbidden. Note that it is currently not possible to recover the naive approach of a restraint that continuously decreases in relevance.

EMMAPFILE

This keyword provides the location and name of the mandatory density input file when using the density restraint potential. The file format is described in detail elsewhere, and here it suffices to say that the external NetCDF library is needed, and that currently no other common density file formats (.ccp4, .mrc, ...) are read directly by CAMPARI. UCSF Chimera is able to convert between various density-based file formats, and does read and write NetCDF files.
The most common application is likely that of a simulation with 3D periodic boundary conditions and a rectangular cuboid simulation volume. Here, the cells of the input lattice should align exactly with those of the analysis and evaluation lattice CAMPARI uses, and generally it will be easiest to match both origin and dimensions exactly. By default, CAMPARI will obtain the lattice cell dimensions from the input map. For nonperiodic boundaries (including simulation systems with curved boundaries), it will be required, however, to deviate from such an exact match. Here, keyword EMBUFFER can be used to define the buffer in size for the evaluation grid at any nonperiodic boundaries. Furthermore, keyword EMDELTAS can always be used to request the analysis and evaluation lattice to have cells of a smaller size, which, with the restraint potential in place, has to yield the exact input cell size by integer multiplication for all three dimensions. Lastly, keyword EMREDUCE can be used to average the input map to a lower resolution by re-binning.
Assuming no further transformations are applied (→ keywords EMREDUCE, EMTRUNCATE, EMFLATTEN), the interpreted density based on the input file is as follows:

Ξijk = ρsol + cijk - ωbg)

Here,the final density for a given lattice cell, Ξijk, has units of physical density, c is a scale factor explained below, ωijk is the original input density for the same lattice cell, and ρsol and ωbg are the assumed physical and input background signals, respectively. ρsol is set by keyword EMBGDENSITY, and ωbg can be set by keyword EMBACKGROUND if the value determined automatically from the histogram of input densities is not appropriate. Factor c is given as follows:

c = [ MM - ρsol Σijk Vijk H(ωijkt) ] · [Σijkijkbg)Vijk H(ωijkt) ]-1

Here, the first term in square brackets is a hypothetical excess signal (mass) using the apparent macromolecular volume (the sum of the volume of all lattice cells with signals exceeding the threshold, ωt) and the assumed total mass. The Vijk are the volumes of individual lattice cells and currently have to be all equal, and H(x) denotes the Heaviside step function. The second term in square brackets is the actual excess signal (mass) derived from the input map obtained by analogous summation. Factor c has units that convert optical density (input) to physical density. It is important to note the crucial impact of keywords EMTHRESHOLD and EMTOTMASS on the quantitative interpretation of the map. In particular, many combinations of values will be rejected by CAMPARI, because they cannot produce an excess signal larger than the background. The resultant interpreted map is written to a dedicated output file at the beginning of each run. Note that this includes all optional transformations controlled by keywords EMREDUCE, EMTRUNCATE, and EMFLATTEN.

EMREDUCE

If the density restraint potential is in use, this keyword can be used to change the formal resolution of the input density map. This is accomplished by simple re-binning, i.e., the target and original lattices are aligned at the origin, and the original signal for each cell is distributed to the target cells by simple overlap. Because the input is assumed to be a density, volume renormalization is performed. Note that it is generally meaningless to create a finer grid this way, because no new information is available, and CAMPARI distributes signal assuming a flat distribution inside each original input cell. Similar to keyword EMDELTAS, this keyword requires the specification of three floating point numbers that set the target lattice cell sizes of the re-binned input map in Å for the x, y, and z dimensions, respectively. Note that the exact values will generally be slightly different because of the requirement to have the outer dimensions of both grids align exactly. Finally, users should keep in mind that physical resolution and formal resolution of the lattice used to represent the data are two distinct quantities.

EMBACKGROUND

If the density restraint potential is in use, this optional keyword can be used to override the value determined to correspond to background in the input density mapbg in the equation above). This value is commonly set by binning the densities in all cells, and identifying a well-resolved peak in the histogram. If the map does not contain encode much background signal, the histogram-based determination may be inappropriate, and this is when this keyword is useful. Note that values refer to the original input density map.

EMTHRESHOLD

If the density restraint potential is in use, this important keyword controls the linear transform used to interpret the input density map in terms of a physical mass density. Specifically, it sets a threshold level in units and numbers of the (potentially re-binned) input that distinguishes signal from background. Since measurements often have low contrast, the threshold is not an obvious property of the input map. The threshold set here corresponds to parameter ωt in the equation above. It is primarily responsible for the overall scaling factor, i.e., larger threshold values will generally produce interpreted maps with a wider spectrum of physical density values. Using the apparent molecular volume and the total mass, the chosen threshold directly determines the apparent physical density (reported in log-output). This quantity poses constraints on the chosen value, because the integrated signal must be yielding a density larger than the assumed physical background density.

EMTOTMASS

If the density restraint potential is in use, this keyword sets the mass in g/mol to be assumed to correspond to the signal in the input density map exceeding the threshold. In general, this can be set to correspond exactly to the explicitly represented matter in the simulation (this is the default), but exceptions may desire an override, e.g., when simulating only a part of the system without wanting to distort the interpretation of the map. The parameter corresponds to MM in the equation above.

EMTRUNCATE

If the density restraint potential is in use, this keyword enables truncation of the input map below the chosen value as long as it is higher than the minimum and lower than the assumed threshold levelt in the equation above). Truncation implies that the spectrum of values for the interpreted density is completed depleted below the specified level, because all values are simply assigned the background level, ωbg. This technique can be used to eliminate noise from the input that may hamper sampling. Note that values refer to the original input density map. This keyword is the exact complement to EMFLATTEN.

EMFLATTEN

Depending on how a density map is generated, the signal may cover a wide spectrum of values. This is particularly true if the contrast to the background is generally low, and the lack of contrast is compensated for by averaging over similar, but heterogeneous conformations. In such cases, the ratio of peak to barely detectable signals may be impossible to describe by physical densities of instantaneous conformations. If the density restraint potential is in use, this keyword therefore allows the user to flatten an input density map at a given level specified by this keyword. The requirement is that the value be larger than the assumed threshold level. This keyword is the exact complement to EMTRUNCATE, and using both concurrently can produce an interpreted map that is purely an envelope of homogeneous density.

EMHEURISTIC

The evaluation of the density restraint potential involves the summation of contributions from all the grid cells. Each cell contributes a squared difference of the input density and the actual density for the current conformation of explicit matter in the system. If the formal resolution is high, the evaluation of the potential can be costly. Occasionally, it may be possible to save some CPU time by applying dedicated heuristics, and this is what is controlled by this keyword. Choices are as follows:
  1. No heuristic is used. At each global evaluation of the density restraint potential, all grid cells are recomputed and summed up.
  2. When spreading the atomic masses in the system onto the analysis and evaluation grid, CAMPARI keeps track of whether any given xz-slice of the input map actually received a contribution from any atom. If not, the cells constituting this xz-slice are not recomputed, but instead a precomputed value for the entire slice is used. This is possible because the simulation densities in all the cells of the slice will be equivalent to the assumed background density. Efficacy of this heuristic obviously depends on the details of the system.
  3. This works identically to the previous option, except that x-lines are considered rather than xz-slices.
  4. This works identically to the previous options, except that local rectangular supercells are used rather than xz-slices or x-lines. Here, the algorithm will try to combine existing grid cells to yield approximately 1000 supercells. This option is probably the most successful in general, because it can match arbitrary arrangements of explicit matter best.
Note that heuristics of the above type offer only moderate savings if on average the explicit matter does in fact cover the evaluation and analysis grid to a large fraction.

GHOST

This keyword is a simple logical that determines whether or not to (partially) "ghost" the interactions of selected particles (see FEGFILE) with the rest of the system (and eventually amongst themselves → FEG_MODE). Such scaling of interactions creates artificial systems which can be used to interpolate between two well-defined end states. The most common need for such an application arises in cases where the two end states are significantly different and one is interested in the free energy difference. For example, to calculate the aqueous free energy of solvation of a small molecule in water, one could scale the interactions of the small molecule with water from zero to their full value. Such growth-based calculations are usually complicated to set up and perform since i) trajectories evolved at a given Hamiltonian have to be evaluated (on-the-fly usually) assuming different Hamiltonians, and ii) it is difficult to maintain an internally consistent system of interactions such that all changes induced by the ghosting can be mapped to atomic parameters of the ghosted species. In CAMPARI, FEG (free energy growth/ghosting) calculations are therefore supported in conjunction with limited Hamiltonians only: the only potentials allowed are IPP, ATTLJ, POLAR, and the bonded interactions. In other cases, it may be possible to extract the same or related quantities through other techniques realizable in CAMPARI. As an example, the free energy of solvation for a flexible (single) solute immersed in the ABSINTH continuum solvation model can be obtained by simultaneously scaling the dielectric from 1.0 to 78.0 and the DMFI from 0.0 to 1.0. The default settings for the auxiliary keywords to GHOST are such that the molecules or residues listed in FEGFILE will be completely ghosted (i.e., invisible to the system).

FEG_MODE

In FEG calculations interactions (see GHOST) are always scaled between the ghosted species and the rest of the system. A natural question arises as to what happens to interactions between or within ghosted species (if any are present)? If they are not scaled but instead use the background Hamiltonian it will be impossible to map the effect of the scaling to a change in atomic parameters which is desirable from the viewpoint of rigor. As an example, consider polar interactions between a single ghosted butane molecule and a bath of non-ghosted water. A scaling of the atomic charges on the ghost butane by a factor f would give rise to interactions with the bath scaled by f and self-interactions scaled by f2. This type of scaling is enforced in CAMPARI if a method requires it such as treating electrostatics with the reaction-field method (see LREL_MD). In general, however, it is impossible to find a unique mapping while leaving the background Hamiltonian intact. It is therefore left to the user to determine which of two options to choose:
 1) Interactions between/within ghosted species use the full background Hamiltonian.
 2) Interactions between/within ghosted species use the scaled Hamiltonian.
The choice made here is important only if such interactions are present in the system. If so, however, the raw results will usually depend strongly on it and corrections may have to be applied. As an example, consider the butane-water example from above. The fact that intramolecular interactions are scaled will contribute toward the apparent free energy obtained when interpolating between the fully ghosted and the fully present states. Hence, gas phase corrections have to be applied. They are obtained by repeating the calculation in the absence of water to compute the thermodynamic cycle which then allows isolating the free energy of solvation. Additional complications may arise if molecules are constrained (see FRZFILE).

FEG_IPP

This keyword specifies the "outside" scaling factor for the ghosted inverse power potential. Note that depending on the choice for FEG_LJMODE this is not as simple as SC_IPP and that additional parameters may determine the impact this keyword has. The setting here corresponds to the parameter sgIPP below. Note as well that the inverse power potential supported in calculations with ghosted interactions always uses an exponent of 12 (i.e., setting IPPEXP to anything but the default of 12 will cause CAMPARI to abort). This keyword is only relevant if GHOST is true.

FEG_ATTLJ

This keyword is analogous to FEG_IPP but controls the "outside" scaling of the attractive r-6 dispersive term. The setting here corresponds to the parameter sgattLJ below. Note that scaling this up while FEG_IPP is set to zero (or - depending on the mode - even set to something smaller) will potentially lead to numerical instabilities.

FEG_LJMODE

The exact functional form of the scaled (ghosted) Lennard-Jones potential is as follows:

EgLJ = 4.0·ΣΣi,j εijf1-4,ij·[ g(sgIPP)·[α·h(sgIPP) + (rijij)6]-2 - g(sgattLJ)·[α·h(sgattLJ) + (rijij)6]-1 ]

Here, the εij and σij are the standard pairwise Lennard-Jones parameters (see PARAMETERS), the f1-4,ij are potential 1-4 fudge factors (see FUDGE_ST_14) that generally will be unity, g(s) and h(s) are auxiliary functions whose functional form depends on the choice for this keyword, and α is the so-called soft-core radius (unitless). The two scaling factors sgIPP and sgattLJ are provided by keywords FEG_IPP and FEG_ATTLJ). There are three possible choices determining g(s) and h(s):
  1. g(s) = s
    h(s) = 0
  2. g(s) = sf1
    h(s) = 1.0 - sf2
  3. g(s) = (1.0 - e-sf1)/(1.0 - e-f1)
    h(s) = (1.0 - s)f2
Option 1 corresponds to a simple linear scaling which is unsuitable for most purposes since it does not remove the singularity at rij→0. Mode 2 is the most common workaround which introduces polynomial scaling via additional parameters f1 (→ FEG_LJEXP) and f2 (→ FEG_LJSCEXP). The non-zero values of h(s) will also make the soft-core radius α a relevant parameter. The latter is specified through keyword FEG_LJRAD. Essentially, interpolation between the singularity-containing native potential and a zero potential is obtained by successively ramping up a barrier proportional to (h(s)·α)-1, which for tiny values of h(s) (s→1) approaches the singularity. Mode 3 is very similar but uses exponential rather than polynomial scaling for g(s) and a slightly different functional form for h(s). The parameters are reinterpreted accordingly. In all cases, setting s to zero will eliminate the potential entirely, while setting it to unity will recover the native Lennard-Jones potential. This keyword is only relevant if GHOST is true.

FEG_LJRAD

This keyword allows the user to specify the parameter α in the above equations (see FEG_LJMODE), i.e., the soft-core "radius" for the modified Lennard-Jones potential. It is generally of limited utility to set this to zero since in that case the scaled potential could as well be created by setting FEG_LJMODE to 1 in which case this parameters becomes meaningless. Conversely, for large soft-core radii, the potential is modified for large distances which generally represents unnecessary modification which may slow down convergence in free energy calculations relying on interpolation via ghosting. Generally speaking, values around 0.5 are recommended for either mode 2 or 3. This keyword is only relevant if GHOST is true.

FEG_LJEXP

This keyword sets the parameter f1 in the above equations (see FEG_LJMODE). It represents a way to - in a simple way - alter the weight of change experienced by the system depending on the choices of FEG_IPP and FEG_ATTLJ. In that sense, it is very closely tied to the design of the interpolation schedule (i.e., both address the exact same issue). There are no gold standard rules for picking this and the user is referred to the literature for further details. In case of free energy calculations, it will be best to inspect the schedule empirically by metrics such as the statistical precision of the pairwise estimates or overlap metrics such as (theoretical) swap probabilities and to then refine either the schedule itself or the global settings accordingly. This keyword is only relevant if GHOST is true.

FEG_LJSCEXP

This keyword sets the parameter f2 in the above equations (see FEG_LJMODE). Much of the same discussion applies here as already mentioned for keywords FEG_LJRAD and FEG_LJEXP. This keyword is only relevant if GHOST is true.

FEG_POLAR

The only other non-bonded potential besides Lennard-Jones supported in FEG calculations is the polar potential (see SC_POLAR). This keyword provides a scaling factor (sgPOLAR) for the soft-core Coulomb potential. Much similar to the case for scaled LJ interactions (see above), this may involve three additional parameters (see FEG_CBMODE). Note that it would be most common to only scale this up while FEG_IPP is set to unity so as to avoid potential numerical instabilities.

FEG_CBMODE

In analogy to FEG_LJMODE, this keyword determines what exact functional form CAMPARI uses for the scaled (ghosted) Coulomb potential with the "outside" scaling factor sgPOLAR set by\ FEG_POLAR):

EgLJ = (4.0πε0)-1·ΣΣi,j g(sPOLAR)·qiqj·f1-4,C,ij·[αC·h(sgPOLAR) + rij]-1

Here, the atomic partial charges are represented as qi,j, ε0 is the vacuum permittivity, and rij is the interatomic distance. f1-4,C,ij denotes potential fudge factors acting on 1-4-separated atom pairs (see FUDGE_EL_14) but will generally assume a value of unity. g(s) and h(s) are the same auxiliary functions defined above for the Lennard-Jones potential (→ FEG_LJMODE) and αC is the soft-core radius (unitless) specific to the Coulomb potential (controlled by keyword FEG_CBRAD). For completeness the options are listed again in detail:
  1. g(s) = s
    h(s) = 0
  2. g(s) = sfC,1
    h(s) = 1.0 - sfC,2
The "outside" scaling factor sPOLAR is controlled through keyword FEG_POLAR. Additional parameters in mode 2 are fC,1 (→ FEG_CBEXP) and fC,2 (→ FEG_CBSCEXP).

FEG_CBRAD

This keyword is analogous to FEG_LJRAD and allows the user to choose the value for the soft-core radius specific to the Coulomb potential (αC in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.

FEG_CBEXP

This keyword is analogous to FEG_LJEXP and allows the user to choose the value for the polynomial scaling exponent to the Coulomb potential (fC,1 in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.

FEG_CBSCEXP

This keyword is analogous to FEG_LJSCEXP and allows the user to choose the value for the soft-core scaling exponent to the Coulomb potential (fC,2 in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.

FEG_BONDED_B

Non-bonded interactions provide a straightforward interpretation for parsing the energetics of the system into solute-solvent, solute-solute, and solute-solvent contributions. This is used in a thermodynamic cycle argument when computing - for instance - the free energy of solvation of a solute in solvent via FEG methods. Sometimes (as alluded to under FEG_MODE), it may be desirable to scale intramolecular non-bonded interactions as well. But what about intramolecular bonded interactions? This keyword allows the FEG-like scaling of bonded terms associated with a ghosted species but not of those associated with non-ghosted particles. Beyond that this keywords operates just like SC_BONDED_B. Note that this almost certainly creates a pathological situation if bond length potentials are allowed to approach zero and naturally relies on bond lengths being allowed to vary (see CARTINT) to be meaningful. Note that for all bonded parameters the assignment of terms to individual residues in a multi-residue molecule is somewhat arbitrary if atoms from two different residues participate.

FEG_BONDED_A

This is analogous to FEG_BONDED_B only for bond angle potentials. Note that this may lead to a pathological simulation if bond angle potentials are allowed to approach 0° or 180° and - again - relies on bond angles actually being varied throughout the simulation to be meaningful.

FEG_BONDED_I

This is analogous to FEG_BONDED_B only for improper dihedral angle potentials. Note that this may lead to a pathological simulation if improper dihedral angle potentials are allowed to approach zero and - again - relies on these degrees of freedom actually being varied throughout the simulation to be meaningful.

FEG_BONDED_T

This is analogous to FEG_BONDED_B only for proper dihedral angle potentials. Note that this relies on torsional angles actually being varied throughout the simulation to be meaningful (there may be subsets).

FEGREPORT

This simple logical keyword lets the user instruct CAMPARI to write out a summary of the ghosted particles (residues or molecules) in free energy growth/ghosting calculations.

SCULPT

The accelerated molecular dynamics method of Hamelberg et al. offers a general (parameter-dependent) way to modify the potential energy landscape or individual terms thereof (torsional potentials and 1-4 interactions have been used most often). The idea is that a controlled modification of the landscape that leads to reduced barrier heights is capable of massively accelerating the effective dynamics without reducing the ensemble overlap dramatically. CAMPARI offers a generalization of this approach as follows:

EELS = Σi Ei + ΔEi,ELS
ΔEi,ELS = 0 if Vif < Ei < Vis
(Vif - Ei)2/(Vif - Ei + αif) if Vif > Ei
(Vis - Ei)2/(Vis - Ei - αis) if Ei > Vis

Here, the sum runs over all active terms of the Hamiltonian. These are generally the terms CAMPARI offers a global scaling factor for, e.g., the total DMFI of the ABSINTH model, EDMFI, the total sum of improper torsional potentials, EBONDED_I, etc. Limitations are discussed below. By default, the threshold energy parameters for every energy term, Vif and Vis, are initialized such that Ei,ELS is always zero, i.e., no sculpting occurs. They can be modified with the auxiliary keywords ELS_FILLS and ELS_SHAVES. Naturally, Vif must always be less than or equal to Vis. The parameters αif and αis must always be greater than or equal to zero. They serve as buffer parameters. The modified energy landscape for a given term has two possible modifications. First, its low energy states (local minima) can be filled up. Using αif as zero flattens all low energy states to the specified threshold, Vif. Larger values for αif preserve the unbiased shape of the landscape more and more, and the limit of αif reaching infinity recovers the unbiased potential exactly. Second, its high energy states (barrier regions) can be shaved off and the use of αis as zero flattens all barrier regions to the value of Vis exactly. The effect of larger values is exactly analogous. Note, however, that potentials allowing for large positive energy values must be treated with caution (notably inverse power potentials). The value of ΔEi,ELS for large negative values of (Vis - Ei) obviously approaches (Vis - Ei) itself, which means that the barriers are more or less completely eliminated. This can be dangerous in conjunction with attractive nonbonded interactions (numerically speaking) and also lead to poor behavior during reweighting (see below).
This keyword (SCULPT) allows the user to specify one or more terms to be sculpted (list of integers). The choices available correspond exactly to the columns of output file ENERGY.dat (click the link for a list). It includes the total energy (choice 2), which is mutually exclusive with any other term. There are further limitations as follows:
  • In gradient-based simulations (including hybrid runs), nonbonded interactions can only be controlled as two joint terms (sums), viz., the sum of all active short-range steric interactions (see SC_IPP, SC_ATTLJ, and SC_WCA), and the sum of polar and tabulated interactions (see SC_POLAR and SC_TABUL). The correct codes to use for these joint terms are 3 and 8, respectively.
  • In gradient-based simulations, the evaluation of all forces cannot use some of the optimized gas-phase loops (these will be disabled if relevant terms are picked by the user for sculpting). This is because these loops combine the evaluation of all standard nonbonded interactions.
  • In gradient-based simulations, energy landscape sculpting is incompatible with any twin-range cutoff setup (using different values for NBCUTOFF and ELCUTOFF) if either group of nonbonded terms is picked specifically by the user for sculpting.
  • If either of the groups of nonbonded interactions is sculpted selectively, the Ewald or reaction-field methods (see LREL_MD) can no longer be used.
Most of the above caveats apply to the specific sculpting of nonbonded energy terms. Conversely, the total energy (option 2) can always be sculpted (but must be the only terms as mentioned above). Note that any energy term will be completely disabled by choosing both buffer parameters as zero (keywords ELS_ALPHA_F and ELS_ALPHA_S) and by setting the threshold energies to the same value (keywords ELS_FILLS and ELS_SHAVES). Lastly, note that the energy output in ENERGY.dat will continue to report unperturbed energies, whereas all other files with energies (e.g., ENSEMBLE.dat) will give the sculpted values.

ELS_FILLS

If the energy landscape sculpting method is in use, this keyword supplies the parameters Vif described above. Values are to be provided in kcal/mol. For example, if the choices for SCULPT are "20 22", then a choice for ELS_FILLS of "5.0 5.0" would provide lower threshold energies of kcal/mol each to both proper dihedral angle potentials and to CMAP potentials. It is not possible to skip values, i.e., the length of the list supplied here should be identical to that for SCULPT. To disable the basin filling aspect of sculpting, it is generally safe to supply a very large negative energy here.

ELS_SHAVES

If the energy landscape sculpting method is in use, this keyword supplies the parameters Vis described above. Values are to be provided in kcal/mol. The interpretation is identical to keyword ELS_FILLS above. To disable the barrier shaving aspect of sculpting, it is generally safe to supply a very large positive energy here.

ELS_ALPHA_F

If the energy landscape sculpting method is in use, this keyword supplies the parameters αif described above. Values are to be provided in kcal/mol and must be zero or positive. Note that a choice of zero inevitably leads to force discontinuities. In addition, the absence of any force (flat surface) will lead to the natural shape of the landscape being completely forgotten, which can deteriorate the statistical significance of the reweighted results.

ELS_ALPHA_S

If the energy landscape sculpting method is in use, this keyword supplies the parameters αis described above. Values are to be provided in kcal/mol and must be zero or positive. The keyword is interpreted identically to ELS_ALPHA_F above and applies to the barrier shaving aspect.

ELS_PRINT_WEIGHTS

If the energy landscape sculpting method is in use, this keyword controls the output frequency for output file ELS_WFRAMES.dat, which contains the corresponding simulation step numbers (that will of course increase in steps of ELS_PRINT_WEIGHTS) and the associated weights. These weights are derived from knowledge of the applied net sculpting potential for each snapshot as wi = exp(β EELS). They can be used in a trajectory analysis run with user-supplied frame weights. Note that large positive values of the sculpting potential will make the reweighting susceptible to shot-like noise (due to few conformations receiving very large weights).

EWALD

CAMPARI supports using the Ewald decomposition technique to compute long-range electrostatic interactions in periodic systems (see LREL_MD). There are two supported approaches to computing the reciprocal space sums in the Ewald formalism:
  1. Particle-Mesh Ewald (PME): This elegant and vastly popular method introduced by Darden et al. uses discrete Fourier transforms (DFFTs) and cardinal B-splines to simplify the computation of the reciprocal space sum. Due to the DFFTs, CAMPARI needs to be linked against the free open source-library FFTW for this option to be available. Briefly, PME reciprocal space sums have different scaling components, i) the number of charges; ii) the number of grid-points; iii) the interpolation order for the cardinal B-splines. It depends strongly on the system which of these components is the speed-limiting factor, in particular since the accuracy of the reciprocal sum depends on the simultaneous optimization of the spline order (see BSPLINE) and the grid-size (EWFSPAC) given that the real-space part co-determines the Ewald parameter (EWPRM). Note, however, that the fundamental scaling with the number of charges is O(N). PME is almost always the recommended (since fastest) implementation of Ewald sums.
  2. Standard Ewald: A straightforward computation of the reciprocal part of the original decomposition introduced by Ewald is supported by CAMPARI as well. This method is slow and scales poorly (K3) with the (linear) cutoff size in the reciprocal dimension. Much like PME, the reciprocal sum fundamentally scales as O(N) with the number of charges, however, such that it might be a reasonably efficient alternative should tight cutoffs in reciprocal space be permissible (or should PME be slowed down due to a dominant cost imposed by DFFTs such as in dilute systems).
Note that the Ewald method is only accessible in specialized dynamics calculations at the moment. This is due to the underlying assumptions as well as due to the lack of decomposability of the algorithm for the purpose of MC calculations ( → LREL_MD for details).

BSPLINE

When using the PME method (see LREL_MD and EWALD), this keyword determines the order of the cardinal B-splines to be used. The order can be increased at a moderate cost, such that it is sometimes advantageous to choose a higher interpolation order coupled to a relatively coarse mesh (see EWFSPAC) instead of a lower interpolation order coupled to a finer mesh. The default order is 8, and currently only even numbers are permitted.

EWFSPAC

When using the PME method (see LREL_MD and EWALD), this keyword determines the grid spacing for the mesh in Å. A smaller value yields a finer mesh which in turn yields more accuracy. The cost associated with finer grids easily becomes substantial (K3-scaling), though, even when using the DFFTs provided by FFTW. The code will occasionally adjust too coarse a value since the interpolation order (BSPLINE) requires a certain minimum for the number of available mesh points in each dimension. When using the standard Ewald method, this determines the reciprocal space cutoff directly as the ratio of the longest box side length and EWFSPAC.

EWPRM

When using the Ewald method (see LREL_MD and EWALD), this can be used to overwrite the automatically determined value for the Ewald parameter. The Ewald parameter is given in units of Å-1 (but can just as well be defined as a dimensionless parameter). It determines the relative weight of the real-space and the reciprocal sum in determining the total electrostatic energy of the system. The larger EWPRM is the more weight shifts to the reciprocal sum. Note that the accuracy of the Ewald method is highly sensitive to this parameter in conjunction with the real-space and reciprocal space cutoffs and that a catastrophic lack of accuracy can easily be realized. Therefore, the code tries to determine a reasonable value for the Ewald parameter based on the (hard) settings for the real-space cutoff (NBCUTOFF) as well as EWFSPAC and - in the case of the PME method - BSPLINE. Unfortunately, the accuracy predictor formulas in use are currently somewhat flawed (they are based on the mean force error estimates presented by Petersen). They should be more accurate for the standard Ewald method than for PME since in the latter certain error contributions from the spline-based interpolation are missing. Hence, the automatically chosen parameter should by no means considered an optimal one, merely one which - given the cutoff settings - provides comparatively small errors in forces and energies. Should the procedure be deemed inadequate or should there be an independent estimate of the error this keyword comes into play.

RFMODE

When using the Reaction-Field method (see LREL_MD), this keyword determines whether the corrections include a continuum electrolyte assumption (generalized reaction field) or not:
  1. The generalized reaction-field correction is used. The code determines the concentration of net charges (including those which are part of macromolecules) and derives an effective ionic strength. This bulk electrolyte concentration is used to model the dielectric response outside of the cutoff sphere for an individual charge in a Poisson-Boltzmann sense.
  2. The standard reaction-field correction is used. Irrespective of the existence of free, net charges in the system, the dielectric response is simply an approximate solution to the Poisson equation.
Note that for a system void of net charges (purely polar), the two methods are identical. Also note that the bulk dielectric of the dipolar medium outside of the cutoff sphere is set by IMPDIEL. The reaction field method produces a force discontinuity at the outer cutoff distance for all assumed dielectric constants smaller than infinity. The use of the method in conjunction with twin-range cutoffs (inner cutoff distance is less than outer cutoff distance and the neighbor list update frequency is larger than one) is explicitly disallowed due to unpredictable and generally poor stability.



Cutoff Settings:


(back to top)

NBCUTOFF

This keyword is interpreted differently dependent on the type of calculation:
  1. For MC calculations (see DYNAMICS), it simply sets the non-bonded (IPP, ATTLJ, WCA, and IMPSOLV) cutoff in Å. Neighbor lists are populated based on this value, and exact truncation is performed unless keyword MCCUTMODE is set differently from the default. All the potentials governed by NBCUTOFF should conceptually be short-range in nature.
  2. For MD/LD/BD calculations, it defines the short-range regime, within which all interactions and forces are computed at every time step. It never truncates said interactions at a distance of NBCUTOFF Å (i.e., there is no equivalent of keyword MCCUTMODE for gradient-based calculations. These types of cutoffs are residue-based and use buffered neighbor lists, which means that the true interaction range is larger than the value specified (unless the simulation involves only monoatomic molecules).
Note that because of the differences in interpretation, hybrid calculations (→ DYNAMICS) are mutually consistent only with a subset of possible choices, i.e., by setting NBCUTOFF and ELCUTOFF to the same value and by letting MCCUTMODE be 2.

ELCUTOFF

Similarly to NBCUTOFF, this keyword is interpreted differently dependent on the type of calculation:
  1. For MC calculations (see DYNAMICS), it simply sets the second non-bonded (TABUL and POLAR) cutoff in Å. All the potentials governed by ELCUTOFF are potentially long-range in nature. Note that interactions beyond this second cutoff, which are Coulomb terms involving moieties flagged as carrying a net charge, are potentially still calculated (see LREL_MC).
  2. For MD/LD/BD calculations, it defines the mid-range regime, within which all interactions and forces are computed accurately, but only every nth time step, i.e., at a lower frequency which is set by the neighbor list update frequency (see NBL_UP). It truncates said interactions at a distance of ELCUTOFF unless they involve long-range electrostatic corrections (in particular in cases involving Coulomb terms involving moieties flagged as carrying a net charge). The twin-range terms (forces and energies stemming from particle pairs with distances between NBCUTOFF and ELCUTOFF Å) are assumed to be approximately constant for the number of steps between neighbor list updates. Twin-range cutoffs are explicitly disallowed for the Ewald and reaction-field methods. If CAMPARI computes additional interactions, i.e., if LREL_MD is either 4 or 5, these interactions are subjected to the same assumption for forces and energies (particle pairs with distances beyond ELCUTOFF Å).
Note that because of the differences in interpretation, hybrid calculations (→ DYNAMICS) are mutually consistent only with a subset of possible choices, i.e., by setting ELCUTOFF to the same value as NBCUTOFF. The long-range interactions covered by this keyword are never truncated exactly (with the exception of the reaction-field method available in dynamics) but rely on a residue-level parsing (buffered neighbor lists). This ensures that cutoff artifacts due to breaking neutral groups (→ POLTOL) are avoided, but it does increase the effective interaction range (based on the buffer sizes, which are calculated from the effective residue radii).

MCCUTMODE

When using nonbonded interaction potential in conjunction with cutoffs, Monte Carlo calculations typically truncate short-range interactions (IPP, ATTLJ, WCA, and (usually not an issue) DMFI) exactly at the cutoff distance. Conversely, in gradient-based calculations, the cutoff on short-range terms is always used exclusively to populate the corresponding neighbor lists. These two settings are not identical except for pairs of single atom residues. Keyword MCCUTMODE can be used to switch from the default (mode 1) to the same residue-level exclusion approach as in dynamics calculations (mode 2). This is essential for achieving exactly the same Hamiltonian in hybrid MC/MD calculations or for gradient testing.

NBL_UP

This keyword provides the update frequency for neighbor lists in MD/LD/BD calculations. Every NBL_UPth step, it is recalculated which residues are within a distance of NBCUTOFF Å (short-range) and which ones are within a distance of ELCUTOFF Å (mid-range). Interactions with the former are computed at every time step explicitly and those with the latter are computed only every NBL_UPth step explicitly. For interactions outside of either cutoff, truncation occurs unless the electrostatic model chosen provides a long-range term (see LREL_MD). These latter interactions will then be recomputed at the same frequency as the mid-range ones (with the exception of the reciprocal space sum in Ewald methods which is always computed at every step). Note that this keyword is irrelevant if CUTOFFMODE is set to 1, a setting useful only for debugging purposes.
The assumptions made by this keyword are rather aggressive, and it is therefore recommended to use it with caution. Specifically, the neighbor lists here should not be thought of as "buffered" in any way. The integrator noise accumulating by setting this to something large can be quite substantial, and should probably be offset by a large choice for the outer cutoff distance (→ ELCUTOFF). Conversely, the use of residue-level neighbor list with large effective radii tends to bloat the effective cutoff radius, which creates something akin to an effective buffer zone. This implementation may be changed in the future.

LREL_MC

This keyword determines CAMPARI's method of handling long-range electrostatic interactions in MC calculations. There are currently several options for this with more being added in the future. A general problem is hidden in the fact that MC calculations have to be able to compute relative energies of drastically different configurations at every step such that similarity assumptions cannot be used to speed up the calculations as is the case in MD/LD/BD.
  1. All monopole-dipole and monopole-monopole interactions are computed explicitly (at full atomic resolution). By default, the governing factor is the parser for the partial charge sets which determines the individual charge groups (see option 2 for ELECMODEL and output files DIPOLE_GROUPS.vmd and MONOPOLES.vmd). Those with a total charge exceeding a threshold (usually zero) are considered "net charges", and those without are considered "dipoles". The flagging is at the residue level, and can be overwritten by a dedicated patch facility. Interactions between dipole groups are skipped even if one or both of the participating residues are flagged.
  2. All monopole-monopole interactions are computed explicitly (at full atomic resolution). As in the option above, the flagging is at the residue level, and here both residues are required to be flagged. Dipole-dipole and dipole-monopole interactions are skipped even if both of the participating residues are flagged.
  3. This is identical to the previous option except that monopole-monopole terms are computed at a reduced resolution, viz., polyatomic monopole groups are represented by collapsing the total charge onto a single atom, which is nearest to the true monopole center. This choice is currently the default. The same caveats as for option 2 apply.
  4. No additional interactions are computed (rigorous truncation).
Note that for systems having no residues that are flagged as carrying a net charge this keyword is irrelevant. Note also that none of the approaches is rigorously feasible for a system in which the dielectric response of the system is primarily described by explicitly represented dipolar units (such as a calculation in explicit water) if charges are present. The reason is that dipole-dipole terms and in particular monopole-dipole terms are crucial in modelling the dielectric response, and options like 2 or 3 above will hence generate a dielectric mismatch outside of the second cutoff. For straight truncation (option 4), drastic artifacts are observed in which density of like-charged species accumulates just outside of the second cutoff (ELCUTOFF). Conversely, for 1), 2) or 3) above, inverted artifacts are observed in which like-charged species accumulate inside the cutoff sphere and form favorable long-range interactions with other quasi-macroions (charge separation occurs). Such problems usually disappear if there is no dipolar fluid exerting a dielectric response (such as in a plasma) or if the dielectric response is exerted through a continuum.
Note that periodic boundary conditions are mutually inconsistent with any of the above treatments with the exception of truncation. This is because in PBC the largest effective cutoff value for nonbonded interactions must not exceed half of the smallest linear dimension of the box. In case of a hybrid sampler, the values for LREL_MC and LREL_MD should be matched to achieve a consistent Hamiltonian. Compatible values are 1/5 and 3/4, and 4/1 (LREL_MC/LREL_MD).

LREL_MD

Much like LREL_MC, this keyword controls how CAMPARI handles long-range electrostatic interactions in MD/LD/BD calculations. There are currently several options for this which are generally different from those available for Monte Carlo runs since two core assumptions are true for dynamics calculations; i) only global energy/force evaluations are needed; and ii) the system remains self-similar through several integration steps. The options are as follows:
  1. No additional interactions are computed, i.e., everything beyond the mid-range cutoff is discarded. This setting can be used along with LREL_MC set to 4 and ELCUTOFF being equal to NBCUTOFF to create an exact match between dynamics and MC Hamiltonians which may be relevant for hybrid calculations (→ DYNAMICS).
  2. Ewald summation is used, which relies on periodic boundary conditions,and (currently) cubic boxes → BOUNDARY and SHAPE). This technique relies on the decomposition of an infinite sum over all periodic images into two quickly convergent contributions, a real-space and a reciprocal space part. The real-space part involves a modified Coulomb interaction, which therefore requires separate loops. Hence, support for Ewald sums is currently limited to "gas-phase"-type calculations with Lennard-Jones and polar interactions only. The reciprocal space part can be solved in a number of different ways (see EWALD and associated keywords). Note that the two cutoffs are collapsed into the shorter one (there is no mid-range regime) when using Ewald techniques. Both the real-space and the reciprocal sums are recomputed at every step. Ewald summation replaces the standard Coulomb term and is relevant for all polar interactions even in the absence of full charges.
  3. The (generalized) reaction-field correction is used. The mode is picked with keyword RFMODE. This involves a modified Coulomb sum and relies on the assumption that truncation can be dealt with by assuming that a low dielectric cutoff sphere is embedded in a high dielectric medium, which gives rise to a reaction-field correction, which lets the force on a charge vanish at the cutoff distance if the difference in dielectric constants is large. The high dielectric is set with keyword IMPDIEL, and the size of the cutoff sphere is given by ELCUTOFF. This method requires modified Coulomb interactions and support is limited similar to Ewald sums. Note that reaction-field corrections assume dielectric homogeneity, i.e., the underlying theory breaks down if the effective dielectric inside or outside the cutoff sphere might become inhomogeneous. The latter is always the case, if, for example, a large enough macromolecule is present or if the system is non-periodic. Note that algorithmically this is not a long-range correction and that (G)RF-corrected terms are computed with the same frequency as short- and mid-range terms are (see NBCUTOFF and ELCUTOFF). Due to stability issues, twin-range cutoffs are not allowed for reaction-field methods. Even then, the force discontinuity at the cutoff distance (vanishes only if the dielectric is assumed to be infinite) may cause more noise than a simple truncation scheme (option 1). The reaction-field solution replaces the standard Coulomb term, i.e., it is relevant for all polar interactions even in the absence of full charges.
  4. The same option as 3) in LREL_MC. The same rules and caveats apply. By matching the methods this way and setting the two cutoff criteria equal to one another, this allows a consistent choice of Hamiltonian in hybrid runs (→ DYNAMICS). This option is currently the default choice.
  5. The same option as 1) in LREL_MC. The same rules and caveats apply. By matching the methods this way and setting the two cutoff criteria equal to one another, this allows a consistent choice of Hamiltonian in hybrid runs (→ DYNAMICS).
Note that periodic boundary conditions are mutually inconsistent with the above options 4 and 5 because in PBC the largest effective cutoff value for nonbonded interactions must not exceed half of the smallest linear dimension of the box.

CUTOFFMODE

For determining spatial neighbors, different modes are available:
  1. If - for whatever reason - cutoffs are undesirable, the code will assume that all residues are spatial neighbors and compute all interactions at every step. Note that not all MD/LD/BD calculations might support this option since optimized loops relying on neighbor lists are often employed (and/or the method may rely in its formulation on a cutoff).
  2. This option is obsolete.
  3. This option instructs CAMPARI to employ grid-based cutoffs. The grid-association is governed at the residue level by the position of the residues' reference atoms. All grid-based methods (with a uniform mesh) are difficult/inefficient for systems with very asymmetric density (such as a single very long extended chain in a large periodic box) since those systems would either require too large grids (inefficient and memory-consuming) or are so coarse that no efficient pre-screening can occur. Grid-based cutoffs are a good choice for systems with homogeneous density and many small (few atoms) residues. They are absolutely indispensable for simulations of large explicit water systems as any other cutoff mode supported by CAMPARI will critically slow down simulations in such scenarios.
  4. The last available option instructs CAMPARI to employ topology-assisted cutoffs. Here, interatomic distances are simply pre-screened by a master value for the two reference atoms of residue pairs. This takes advantage of molecular topology to simplify the generation of spatial neighbor lists since only residues which pass the pre-screen are assumed to be spatial neighbors. Note that the program will compare the distance between the two reference atoms to the sum of the cutoff and the effective radii of the two residues in questions. These radii are currently hard-coded. This mode is the method of choice for systems with heterogeneous density and/or large (many atoms) but relatively few (<1000) residues. Note that in the presence of non-bonded interactions methods 3 and 4 reduce the scaling of CPU time with system size from N2 to something considerably faster.

GRIDDIM

If grid-based cutoffs are in use (→ CUTOFFMODE), this keyword allows the user to specify the three integers determining the x,y,z dimensions for the rectangular cutoff grid. The origin and the size of the grid are determined by the box parameters (see BOUNDARY and SHAPE). In a droplet boundary condition, the grid cannot be aligned with the simulation container exactly, and parts of it are wasteful. The extra buffer space is computed automatically, and this may lead to crashes of CAMPARI complaining that a part of the system is "off the grid". This most often occurs with an unstable (exploding) simulation but can also happen if a residue-based boundary condition is used in conjunction with bulky residues or if the restraining force is very small.
The total number of grid points should not be so large that operations scaling linearly with this number become a contribution of significant computational cost. Setting the size of the grid cells equal to the cutoff is typically not an effective strategy due to the requirement of having large margins. The latter are a result of the residue-based grid association CAMPARI uses which requires accounting for the effective residue radii in determining spatial neighbor relationships via the grid.

GRIDMAXRSNB

If grid-based cutoffs are in use (→ CUTOFFMODE), this keyword allows the user to specify an initial limit for the maximum number of residues associated with a single grid point. Arrays are dynamically re-sized during the simulation but if the initial setup fails already, an error is returned (see also GRIDMAXGPNB). This keyword is required mostly so CAMPARI has a realistic estimate of the required memory at the beginning.

GRIDMAXGPNB

If grid-based cutoffs are in use (→ CUTOFFMODE), static grid-point neighbor lists are set up initially and used to simplify the generation of neighbor-lists using the grid. This keyword specifies the maximum number of grid-point neighbors each grid-point may possess. If the number is too small, the program will fail during the initial setup. This is again to avoid inadvertent memory emergencies (as for GRIDMAXRSNB).

GRIDREPORT

If grid-based cutoffs are in use (→ CUTOFFMODE), this simple logical instructs CAMPARI to write out a summary of the initial grid occupation statistics.

CHECKFREQ

This keyword is interpreted differently dependent on the type of calculation. Firstly, for an MC calculation, it specifies the interval how often to recompute the total energy assuming a lack of cutoffs (N2 sum). A simple reason for this is that incremental energies may cause incremental drift errors even in the absence of any algorithmic simplifications. Depending on boundary conditions, the N2 sum may not be a particularly meaningful reference state. Note that the reset to the N2 energy has no implications for the Markov chain, but that it can affect absolute energy values, which may be relevant for certain free energy calculations, for comparisons of simulation results obtained with different cutoff lengths, etc. In addition, if cutoffs are turned on, a sanity check is performed as well, i.e. given the current structure, are the derived interactions in fact complete given the chosen maximum cutoff distance set by ELCUTOFF? If not, this would most likely mean that the parameters used for deriving the list of relevant interactions (specifically, the maximum residue radii) are inappropriate (this can happen for simulations of unsupported residues). Other than that, the usefulness of this check lies mostly in debugging the code itself. Because both the N2 energy evaluation and the cutoff check can be extremely slow for large systems, low frequencies are highly recommended for these cases. In order to track the progress of a longer simulation, it is recommended to rely on the instantaneous output files (e.g., ENERGY.dat) rather than on what is printed to log output. Secondly, for MD/LD/BD runs, CHECKFREQ simply sets the interval for how often to report global ensemble variables to log output. In hybrid runs, the functionality varies depending on what type of segment the simulation is currently in.

N2LOOP

This keyword is a simple logical which allows the user control over whether or not to initially compute the full N2-loop of non-bonded interactions (on by default). This keyword is extremely useful for simulations of very large systems for which this number may take a considerable amount of time to compute and be largely uninformative (in particular in periodic systems). As an auxiliary function in Monte Carlo calculations, it determines whether the sanity check procedure for cutoffs is performed every CHECKFREQ steps. Since N2LOOP is turned on by default, it needs to be explicitly disabled for the cutoff checks to be skipped.

USESCREEN

This logical keyword applies to all Monte Carlo elementary moves (except particle deletion moves). The normal sequence of events in CAMPARI is:
  1. Perturb configuration.
  2. Compute short-range terms for moving parts for new conformation.
  3. Compute corresponding long-range terms.
  4. Restore original conformation.
  5. Compute short-range terms for moving parts for original conformation.
  6. Compute corresponding long-range terms.
  7. Evaluate Metropolis criterion.
  8. Process acceptance or rejection.
Often it will be possible to detect already after step 2 that the new conformation produces a steric clash. Therefore, when USESCREEN is turned on, CAMPARI will build in an energy screen after step 2, which can serve to quickly eliminate those conformations (acceptance probability will often be numerically too small to even be representable in floating-point format). This premature termination of the sequence above is sought to shorten the overall runtime. The height of the screen is set by keyword BARRIER. Note that the above sequence of events is incomplete for certain types of concerted rotation moves.
From the above, it is clear that at step 2 we do not yet have access to a difference in energies (which is only available after step 5). Consequently, this quantity is simply compared to the net value of the short-range energy terms (→ SC_IPP, SC_ATTLJ, SC_WCA, boundary interactions, SC_BONDED_B, SC_BONDED_A, SC_BONDED_I, SC_BONDED_T, SC_EXTRA), and certain bias terms (→ SC_ZSEC, SC_POLY, SC_DSSP, SC_EMICRO, SC_DREST). With the exception of SC_ATTLJ, SC_WCA, SC_BONDED_T, and SC_BONDED_I, these are all strictly penalty terms that can only yield positive contributions to the total energy. Because of the above, the screen is most useful if SC_IPP is used. Inverse power potentials diverge for small distance and can yield arbitrarily large values, which allow meaningful choices for the associated keyword BARRIER. If all aforementioned terms are either zero or negative, the screen will not have any effect. Harmonic potentials (as used in most of the bias terms) can also yield very large values, but the likelihood of this happening during simple MC moves is very small except for SC_DREST, SC_BONDED_B, and SC_BONDED_A (for the latter two terms, this only holds in the presence of soft crosslinks). Therefore, the difficult cases are those, for which the penalty terms are generally high, but do not necessarily vary quickly or strongly upon MC moves. It may then become impossible to use a simplification of this type, i.e., if the chosen screen height is too small, the Markov chain will be corrupted, and if it is made larger, the screen no longer has any effect. To buffer against incorrect use of the method, there is an additional criterion that the incremental energy must exceed twice the total system energy (for typical interaction potentials and an equilibrated system, the latter is often a negative number, and this condition becomes trivially fulfilled).
Note that this technique assumes that the Markov chain remains unperturbed even though the actual acceptance criterion is circumvented. Depending on the setting for BARRIER, this will often be rigorously true for a finite-length simulation. Because the same threshold is used for all types of moves, the efficacy of the screen is likely move type-dependent. Finally, simulations using the Wang-Landau acceptance criterion may not be able to use this technique (a warning is printed in any case).

BARRIER

This keyword is used in two different contexts. First, Monte Carlo moves can take advantage of a cutoff-like screen eliminating proposed conformations after only a partial evaluation of the relevant energy terms. (this is enabled with USESCREEN). Then, BARRIER sets the energy threshold (screen height, cutoff value, barrier) in kcal/mol.
Second, the value of BARRIER in kcal/mol is used as the hard-sphere penetration penalty in the hard-sphere excluded-volume implementation (enabled by setting IPPEXP to a sufficiently large value).



Parallel Settings (Replica exchange (RE) and MPI Averaging):


(back to top)

Preamble (this is not a keyword)

Most biomolecular simulation software packages allow a form of parallelization which one may refer to as domain decomposition. Here, the system is partitioned into a number of subsystems corresponding to the number of processor cores available to the parallel computation. Each core then - more or less - computes only interactions of its own subsystem. The main requirements for an efficient implementation are to keep the communication load as small as possible and the workload even (refer for example to publications on the parallel model within NAMD). This option is not yet supported within CAMPARI. The only parallel algorithms supported by CAMPARI are sparse communication algorithms such as replica exchange. Like most simulation software, CAMPARI uses the MPI standard for handling inter-processor communication. Here, each slave process only has direct access to its own memory image. With modern multi-core processors and multi-CPU machines, dedicated parallel architectures and corresponding software development (for example GPU computing) have become ubiquitous in many fields. However, scalability and efficiency of parallelization (whether shared memory, e.g., thread-based, or MPI) remain challenging to achieve without sacrificing generality, e.g., many GPU adaptations of scientific code offer restricted functionality and limited control compared to their CPU counterparts.


REMC

This logical keyword - when set to 1 - instructs CAMPARI to perform a replica exchange (RE) calculation. If and only if the code was compiled with MPI (and the right executable is used), this keyword activates the RE method (see FMCSC_REFILE to learn how to setup a RE run) employing REPLICAS separate conditions (processes). Irrespective of whether the base sampler is pure Monte Carlo (see DYNAMICS), a dynamics-based method, or any hybrid method, restrictions apply in that the sampled ensemble must be the canonical (NVT) one (see ENSEMBLE). This can either be achieved by running constant particle number MC, Newtonian dynamics with a proper thermostat (see TSTAT), or stochastic (Langevin) dynamics (which inherently tempers the ensemble). Performing RE swaps is optional (simply set REFREQ to something larger than the simulation length to disable them). While counterintuitive at first, this allows to set up parallel runs in which the system is evaluated for different Hamiltonians (so-called "foreign" energies: see REOLCALC), thereby allowing simple free energy calculations which use a well-established ("safe") sampling method as their engine without worrying about the intricacies of the RE method itself (reference).
In CAMPARI, each replica and its output will correspond to instantaneous and averaged information from the associated condition, i.e., the underlying trajectory is no longer continuous. The typical assumption is that, depending on the settings for REFREQ, RESWAPS, RENBMODE, and RE_VELMODE, and given a suitable arrangement of replicas in the RE input file, it can be achieved that the resultant ensemble averages and distributions are, for finite samples, indistinguishable within error from a correct reference simulation for the same condition that does not utilize exchange moves. This issue is not trivial, however, and the more general and precise approach to the analysis of replica exchange data is to reweight all samples to a given target condition that should either have been part of the original replica space or that can be obtained by interpolation (rather than extrapolation). This reweighting is technically possible in CAMPARI (→ FRAMESFILE) for almost all analysis features in trajectory analysis mode, but the weights have to be determined externally (e.g.,, by the weighted histogram analysis method, WHAM).
As an entirely separate issue, it may sometimes be desirable to perform trajectory analysis in parallel. One motivation can be to simply speed up analysis of large data sets and/or obtain data suitable for error estimates via block averaging. Parallel trajectory analyses are possible and require the RE setup, specifically, keywords REFILE, REPLICAS, and REDIM. All other simulation-related keywords are ignored. Conversely, analysis keywords REOLCALC, REOLINST, and REOLALL are respected. This can be useful in post-processing simulation data for free energy growth or related calculations requiring "foreign" energies. There is another complication with RE data and that is the question how to evaluate a possible sampling benefit. Users should always keep in mind that a RE trajectory with swaps inherently averages over data from several coupled trajectories. A simple consequence of this is that data tend to look smoother and better converged if the number of replicas is increased. An assessment of the actual purpose of the method, i.e., increased barrier crossing rates by excursions into conditions amenable to barrier crossing, is more feasibly obtained by unscrambling trajectories, i.e., by looking at trajectories continuous in conformation (and not in condition). This is why CAMPARI allows the user to supply an input file with the swap history of a set of trajectories with the goal of transcribing the set of trajectories to a new set that are all continuous in conformation. The input file needs to be similar in format to the analogous output file created by CAMPARI during RE simulations. If this option is enabled, auxiliary keywords RE_TRAJSKIP and RE_TRAJOUT may become relevant.
Technically, parallel trajectory analysis requires that the REPLICAS individual trajectories are systematically named and numbered in a fashion similar to how CAMPARI writes trajectories in RE simulations. This means that every file is prefixed with "N_XXX_", where XXX gives the replica number (started from "000"). Since there is only a single key-file, the input trajectory name specified should not include this prefix (it will be added automatically). An example is given elsewhere. Frame-specific analyses (and thereby frame weights) are not yet supported in parallel trajectory analysis runs.
Note that a replica-exchange run that contains Monte Carlo moves, and that uses the Wang-Landau acceptance criterion with WL_MODE being set to 1 may result in identical copies of Wang-Landau runs if the exchanged parameters do not alter the Hamiltonian (since environmental conditions are irrelevant to the Wang-Landau sampler in such a case). In any case, the Wang-Landau iterations will proceed independently for each replica. This implies that it may yield results that are difficult to interpret if replica-exchange swap moves are allowed (because those - currently - always follow a Boltzmann criterion).

REFREQ

For any multi-replica simulation that supports structure transfer between replicas, this keyword sets a fixed interval for attempting these structure transfers. It is an important parameter of both the replica exchange method and the PIGS method. Unlike frequencies supplied to define Monte Carlo move sets described above, this parameter is a deterministic interval, i.e., a setting of 104 will imply that possible exchanges are attempted exactly every 104 elementary steps. This is because, in general, the communication requirement will mandate that all replicas remain synchronized regardless. For replica exchange, a swap cycle counts as a single (Monte Carlo) step in the trajectory. For PIGS, the reseeding does not count as a step. Instead, it is performed exactly between the steps corresponding to multiples of REFREQ and the respective next ones (see elsewhere for details).
All structure exchange is implemented in peer-to-peer mode. For generality reasons, the head node is always involved in decision making for structure exchange. This imposes an unfavorable (centralized) communication structure for some data (e.g., reassignment maps).
For replica exchange runs, structure transfer is realized as swaps between conditions. Viewed as a Monte Carlo move, such a swap attempt is defined in the context of a multicanonical ensemble. This means that any analysis should consider the entire set of simulation data and employ appropriate reweighting protocols to obtain canonical averages corresponding to the individual or even interpolated conditions. It is not immediately clear how justified it is to assume that the individual replicas in a replica exchange run can be analyzed as if they satisfied the canonical distribution for each condition individually. For a large fraction of published replica exchange simulations, swap attempts are restricted to the immediate neighbors along a one-dimensional temperature coordinate, and the data coming from replicas are treated independently. Keyword RENBMODE allows the user to choose between neighbor-only and global swap protocols. We emphasize again that CAMPARI does support the computation of reweighted averages and distributions by adding floating-point weights to a frames file.
It is difficult to provide guidelines for useful settings for this keyword. In replica exchange, very small values for this exchange attempt interval can lead to relaxation problems. With dynamics samplers, the treatment of velocities becomes an important consideration (see RE_VELMODE). There is a considerable body of literature on this subject (some of it is cited in this reference).

RESWAPS

If the replica exchange method is in use, this keyword specifies the number of swaps within a swap cycle. Each time a step is encountered that is a multiple of REFREQ CAMPARI will collect the data from all replicas, construct the required energy matrix, and randomly pick pairs of eligible replicas (see RENBMODE) for which the swap move Boltzmann acceptance criterion is evaluated. This process is repeated RESWAPS times and the map matrix (structure to condition) is upated after every successful swap. This means that it is possible for no pairs of replicas to effectively swap structures despite the presence of accepted moves. This stochastic implementation differs from that seen in other software and requires a careful choice for this keyword. For exchanges between all replicas (see RENBMODE), this should probably be at least Nrep·(Nrep-1)/2, where Nrep is the number of replicas in the simulation. For neighbor swaps only, it should be Nrep-1. The reason for choosing a number proportional to or larger than the unique possible exchanges is that the computational cost of computing necessary cross-energies (in Hamiltonian replica exchange) and of communicating the information required for the aforementioned matrix is, in our implementation, independent of the final number of accepted swaps. This means that the cost of a swap cycle would be largely wasted by exchanging just a single pair chosen from a much larger number of replicas. For neighbor swaps, the set of possible swaps is limited because the required energy matrix is only a tridiagonal band matrix. This means that "secondary" swaps may be rejected due to lack of information rather than the Boltzmann criterion, which can introduce biases.
Note that the acceptance rates become very small once there is hardly any overlap between different replicas (in turn, the acceptance is always strictly unity if the conditions are the same - regardless of the two structures). A large number of attempted swaps in conjunction with all-against-all exchange corresponds to an equilibration of current structures across conditions. In the limit of tiny acceptance rates, the impact of the replica exchange method is no longer felt, and it reduces to a set of independent canonical simulations at different conditions (the same limit is achieved explicitly by setting REFREQ to be very large). Because of this, a reasonable swap acceptance rate is often taken as the primary diagnostic toolfor the choice of conditions (see output file for swap probabilities).

REFILE

This keyword defines location and name of the file containing the specifications for the replica exchange method (see elsewhere for details).

RENBMODE

As alluded to above, the replica exchange method represents a rigorous sampling technique if one considers the multicanonical ensemble it defines. This can cause problems in the interpretation of data obtained for an individual condition. Moreover, the energetic overlap between distant conditions is often small leading to negligible swap likelihood for all but the replicas most similar in condition. This is the typical scenario for temperature replica exchange calculations in explicit solvent. Here, it is very common to restrict swap attempts to the (at most) two neighboring replicas for a series of conditions. In Hamiltonian replica exchange, the same idea might actually be more useful as it also restricts the computation of the energy matrix to neighboring conditions. Recomputing energy values for many different conditions can be costly. Therefore, the available options are:
  1. Swaps are attempted with all available replicas
  2. Only the (at most two) neighboring replicas are eligible for swap moves, and neighbor relationships are determined by the sequence of conditions as they appear in the input file (this is the default).
The replica exchange moves can be seen as attempts to stochastically distribute structures across conditions according to their energies and a Boltzmann criterion. This attempt is conditional upon the spectrum of energies available. The spectrum of energies is changed exclusively by the base sampler and associated with a relexation time scale. This creates a possible race condition with the setting for REFREQ. The exchange pattern defined by RENBMODE influences this rate of structure exchange. Choosing neighbor-only exchange often gives quantifiable biases in the terminal replicas. Beyond that, it is the simpler and computationally more efficient choice that is also more compliant with standard literature practice.
Note that almost all exchange-related problems naturally disappear in the limit of few attempted swaps (→ REFREQ) or in the limit of poor overlap and consequently few accepted swaps. This limit is very easily reached for large, condensed-phase system with typical interaction potentials (fluctuations decrease with increasing size).

REPLICAS

This keyword sets the number of subprocesses intended to be created by a multi-copy simulation. For replica exchange calculations, this has to rigorously correspond to the number of processes granted by the system. A large enough number of different conditions in the corresponding input file (→ FMCSC_REFILE) has to be present. For MPI averaging calculations, which includes PIGS runs and parallel Wang-Landau runs, this will be altered to match the actual processor number granted by the system.

REDIM

If the replica exchange method is in use (→ REMC), this keyword sets the number of dimensions specifying the conditions to be expected in the dedicated input file (→ FMCSC_REFILE). Note that replica exchange calculations may rely on neighbor relations (see RENBMODE), and that those may be difficult to define if multiple dimensions are used to specify each condition.

REMC_DOXYZ

For any multi-replica simulation that uses pure MC sampling and supports structure transfer between replicas, this simple logical keyword lets the user choose to use Cartesian rather than torsional/rigid-body coordinates to be transferred. The keyword is ignored if the propagator is fully or partially reliant on a dynamics method. This keyword can be useful if internal degrees of freedom not sampled by MC diverge in any node-specific input files (for example, through rare scenarios when trying to restart an MC run from (modified) restart files produced by MD).

RE_VELMODE

This keyword selects how to deal with velocities for any multi-replica calculation allowing structural transfer between replicas. As such, it is relevant for replica-exchange molecular dynamics runs and for PIGS runs using a molecular dynamics propagator (see DYNAMICS). One of the complications of theses types of calculations arises in the necessity to pass on or re-assign velocities upon any successful structure change. The options for handling this difficulty are as follows:
  1. All velocities are always randomly re-assigned upon receiving a new structure. This is equivalent to an instantaneous, global action of an Andersen-type thermostat (see TSTAT). It might be the safest option to use for pure Hamiltonian replica-exchange, especially if the Andersen thermostat is used in conjunction with Newtonian dynamics. It is also the required option for PIGS runs with propagators lacking a stochastic component.
  2. Velocities are re-scaled by a factor equivalent to (Ti/Tj)1/2 where Ti is the temperature of the current node, and Tj the temperature of the node the received structure originated from. Note that this does not scale the instantaneous temperature to a specific value, but rather by a specific factor. Unlike the first option, it preserves directions and relative magnitudes of all velocities. This mode relaxes to the third option if temperature is not one of the replica exchange dimensions, or if the run is using the PIGS protocol instead of replica exchange.
  3. Velocities are taken directly from the node the incoming structure originated from, i.e., always remain associated with "their" structure. This will almost certainly lead to small artifacts for replica exchange calculations with temperature as one of its dimensions. It is the preferred choice for PIGS runs with stochastic propagators.

RETRACE

This keyword is only relevant in MPI replica exchange or in MPI PIGS calculations with swaps or reseedings performed. It requests that an instantaneous integer trace is written that allows the user to recapitulate the complete history of structure transfer between replicas. For replica exchange, the trace indicates which condition is (after each swap or reseeding cycle) associated with which initial starting conformation (see N_000_REXTRACE.dat). For PIGS, the trace indicates every reseeding event in an even simpler form (see N_000_PIGSTRACE.dat). For replica exchange, these data can be used to reconstruct trajectories continuous in geometric variables rather than continuous in exchange condition (the latter being the CAMPARI default). This is useful to be able to estimate the sampling enhancement provided by replica exchange in terms of conformational decorrelation or similar metrics.

MPIAVG

This logical keyword - when set to 1 - instructs CAMPARI to perform a multi-replica calculation that is not replica exchange. If and only if the code was compiled with MPI (and the right executable is used), this keyword, without any further deviations from default settings, activates the so-called MPI averaging method. This means that the chosen system is simply replicated REPLICAS times onto as many processing units (typically processor cores). Additional keywords can activate methods that utilize a similar framework, viz., parallel Wang-Landau runs (via MC_ACCEPT) and the PIGS protocol (via MPI_PIGS).
MPI averaging
The individual copies (replicas) are strictly independent (no communication requirement) until the very end when on-the-fly analysis data are automatically collected and processed by the head node (see OUTPUTFILES for details). Some analysis functions or simulation algorithms may not be supported. This is primarily a mode to save time for the user since it essentially reproduces a common mode of running molecular simulations, i.e., running multiple trajectories in parallel and analyzing the results to together. Starting conditions (see RANDOMIZE and PDBFILE) and stochasticity of the propagator are important here to avoid multiple replicas diverging only on account of numerical drift.
Parallel Wang-Landau runs
If the simulation is a pure Monte Carlo simulation , and if the Wang-Landau acceptance criterion is used, the behavior changes. Wang-Landau runs are essentially iterative, and in such a case keyword MPIAVG will create a parallel version of the Wang-Landau scheme. At an interval set by WL_FLATCHECK, the histograms are recombined over the individual nodes. The combined histogram is then what determines the move acceptance and what is used to evaluate whether to update the convergence parameter or not. The value of the convergence parameter and all other relevant settings remain synchronized throughout. In between update steps, the individual replicas evolve according to the last global histogram that was since incremented locally. This means that the value chosen for WL_FLATCHECK is a delicate quantity since both too small and too large values may impede convergence. While the former may remove the bias for an individual replica to traverse phase space faster than a canonical simulation, the latter may result in several replicas exploring the same area of phase space, thereby amplifying a lack of global convergence. Note that the communication routines used in the parallel Wang-Landau implementation can be fine-tuned using keywords MPICOLLS and MPIGRANULESZ.
PIGS runs
PIGS runs are explained in detail below. Here, CAMPARI collects information from all replicas over a specified interval. Rather than biasing the potential energy surface, this information is used to make decisions on whether to truncate some of the trajectories and to restart them from more interesting points corresponding to the current states of other replicas. PIGS stands for Progress Index-Guided Sampling and utilizes information from a method described elsewhere (please refer also to the published articles: progress index; PIGS).

MPI_PIGS

If a multi-replica simulation is requested via keyword MPIAVG, this keyword allows the user to enable the PIGS enhanced sampling scheme. We refer the user to the literature for a detailed overview. Technically, PIGS utilizes parts of the infrastructure of both replica-based parallel simulation protocols (keywords MPIAVG and REMC as described below). Briefly, PIGS works as follows:
Each of the REPLICAS processes running in parallel propagates a copy of the same system. After an interval of REFREQ steps has elapsed, the algorithm evaluates a heuristic that is used to selectively terminate some of the trajectories and to reseed them from the current states of other replicas. To avoid bit-identical trajectories, the propagator must have a stochastic component to it, e.g., Langevin dynamics, Monte Carlo, or Newtonian molecular dynamics with suitable thermostats. Unlike in replica exchange, the conditions associated with each replica are identical, and swaps would be redundant. The termination and reseeding of trajectories implies a loss of information and is justified only if the reseeding point ultimately leads to better sampling. The notion of "better" is not general. For PIGS, it consists of the desire to diversify individual replicas, e.g., to prevent sampling of overlapping regions of phase space. The truncation and selective reseeding of simulations is used in many methods such as distributed computing or transition path sampling.
To evaluate the heuristic, PIGS collects data from every trajectory over every interval of size REFREQ. To remain scalable, it is a memory-free algorithm, i.e., the slice of data determining the reseeding is always of the same size. From the composite data slice, the so-called progress index is constructed (see option 4 to CMODE). The size of the data slice is therefore set by the combination of keywords REFREQ, CCOLLECT, and REPLICAS (or the actual number of replicas available). Construction of the progress index requires as essential input only the definition of a representation and distance between snapshots, which is provided by CDISTANCE and possibly CFILE. Again, for scalability reasons, the approximate progress index is constructed, and this entails additional parameters of minor importance (see keyword CPROGINDMODE for details).
With the complete progress index in hand, it is possible to locate the current snapshots for all replicas in the index. This requires that REFREQ be a multiple of CCOLLECT. The progress index is an ordered sequence of snapshots that arranges geometrically similar snapshots close to one another without using a reference state. Every snapshot is associated with a specific distance that corresponds to the length of an edge in an underlying spanning tree. From this information, a composite rank of three individual ranks is constructed. The latter are:
  1. Position in the progress index (larger is better as snapshots from low likelihood regions are more likely to appear there).
  2. Length of the associated edge (larger is better as distances tend to be larger in low likelihood regions).
  3. Distance from any other current snapshot in terms of progress index position (larger distances are better as they indicate more unique sampling domains).
From the composite rank (sum of the individual ranks), a specified number of top-ranked replicas is identified and serves as a database for reseeding the remaining replicas. This number is provided as keyword MPI_GOODPIGS. For each of these lower-ranked replicas, one of the MPI_GOODPIGS top-ranked replicas is chosen with uniform probability, and the following expression is evaluated:
p(X → Y) = [ ζ(X)-ζ(Y) ] / (Δζmax+1)
Here, ζ(X) and ζ(Y) are the composite (summed) ranks for replicas X and Y, respectively. Δζmax is the maximum realizable difference in composite rank. The result is compared to a uniformly distributed random number between 0 and 1. A reseeding is accepted putatively if this random number is smaller than the evaluated expression, which biases acceptance toward cases with large rank difference. It is only putatively accepted because every replica can be protected on account of a uniqueness criterion. This is evaluated by finding the first and third quartiles of the snapshots coming from the replica in question in terms of progress index position. If they are tightly clustered, the number is small, and it is inferred that this replica samples a relatively unique area of phase space. Conversely, if they are distributed, this indicates sampling overlap. The difference in the positions of the first and third quartiles is compared to the number REFREQ/CCOLLECT, which is twice the minimum value. If it is less than this number, any putative reseeding is rejected for the replica in question.
A positive reseeding decision incurs the same mechanism as that of accepting a new structure in replica exchange. Principally, all required settings and variables for the propagator are transferred. This excludes the seed of the random number generator, i.e., otherwise identical trajectory can diverge quickly on account of the different sequences of random numbers. This is how the stochastic component of the propagator as mentioned above is relevant. For molecular dynamics propagators, velocities can be kept or reassigned, i.e., both meaningful controls supplied by RE_VELMODE are supported. For pure MC propagators, keyword REMC_DOXYZ is also supported. The history of reseeding decisions can be recorded in a trace file. This file is similar to the same output file for the replica exchange method and can be obtained with keyword RETRACE. It is strongly recommended to always write this file for subsequent diagnosis and analysis.
With the exception of structural clustering, on-the-fly data analysis is supported by PIGS in the same way as it is by the default multi-replica (MPI averaging) setup. Data are gathered across replicas, combined, and total averages and distributions are provided. In general, however, PIGS leads to biased distributions, and it may therefore be more useful to adopt a standard protocol that stores trajectories individually for each replica (with MPIAVG_XYZ), disables most on-the-fly analyses, and performs all further analyses strictly in post-processing (with PDBANALYZE).
Structural clustering uses the same infrastructure that PIGS requires to evaluate the heuristic but the data are deleted after each interval of length REFREQ. Note that this implies that the memory requirement of the head node can be large if REFREQ/CCOLLECT and the number of replicas are both large. Scalability of the protocol with respect to the number of replicas requires parallelization of the progress index computation itself, and this is not implemented yet. It is therefore recommended to ensure through appropriate parameter settings that the cost added by the heuristic is kept manageable. Keep in mind that the data are communicated by the replicas to the head node instantly after each collection event. Some aspects of the structural clustering facility are not available in the context of the PIGS heuristic. Obviously, CMODE is not selectable, nor are CPROGINDSTART or CPROGINDMODE controllable (they default to 4, -2, and 2, respectively). Keyword CPROGMSTFOLD has no effect (it use would be undesirable in the context of the first ranking criterion mentioned above). All keywords related to editing or utilizing the link structure of the network are irrelevant. Data preprocessing and the utilization of weights are both supported but the application of linear transforms is not (yet). The use of weights can entail a number of associated parameters that reflect or describe a (time) locality in the sequence of snapshots, e.g., a lag time. It is important to keep in mind that the PIGS algorithm simply concatenates the data from all replicas, which can lead to artificial periodicities or spikes in locally estimated fluctuations, which may or may not be desired. Lastly, the technical parameters controlling the tree-based clustering and the short spanning tree construction for the progress index are of course relevant in PIGS (→ CRADIUS, CMAXRAD, BIRCHHEIGHT, BIRCHMULTI, CREFINE, CLEADER, CPROGINDRMAX, CPROGRDEPTH, CPROGRDBTSZ).

MPI_GOODPIGS

This, along with REFREQ, CCOLLECT, and CDISTANCE, is one of the main parameters of the PIGS protocol (see link for details). It determines how many of the parallel replicas are protected from being reseeded and serve as database for the remaining replicas. If MPI_GOODPIGS equals the actual number of replicas (normally set by REPLICAS), the PIGS algorithm relaxes to the propagation of independent, identical copies of the system (basic functionality of MPIAVG). There is no consensus rule for good choices for this parameter, but a reasonable starting point is usually given by setting it to REPLICAS/2.

MPIAVG_XYZ

If the MPI averaging technique is in use (→ MPIAVG), this simple logical keyword lets the user choose to obtain trajectory data for each of the independent, identical replicas separately (which is also the default). If this keyword is explicitly set to zero (logical false), only a single trajectory file will be written with entries cycling not only through the time or equivalent axis but also through replica space (see elsewhere for details). The choice here is mostly a matter of convenience for post-processing but note that with individual trajectories REPLICAS as much structural data are written as with a single file. Lastly note that very frequent write operations by different processes to a shared output file may occasionally cause race conditions and/or be inefficient due to long waiting times.

MPICOLLS

This keyword acts as a simple logical (turned off by default) that allows the user to enable the usage of collective communication routines defined by the MPI standard for selected communication operations in CAMPARI (routines such as MPI_ALLREDUCE, MPI_BCAST, etc). These routines should at all times be functionally equivalent to what CAMPARI would use otherwise, i.e., collective primitives constructed exclusively from blocking send and receive operations (MPI_SEND and MPI_RECV).
The reason for having such a keyword is twofold. First, buggy code in conjunction with these MPI-defined collective communication routines can be difficult to diagnose and debug, because the MPI standard requires an outcome, but not a specific implementation. Essentially, developers and users cannot make any assumptions about the underlying communication flow. In general, this is of course desired (especially from a performance point of view), since it leaves the optimization of said communication to the MPI library rather than forcing the calling program to address these issues. Second, there are enough reports on the web of potentially faulty implementations of these routines in common MPI libraries. In conjunction with additional concerns regarding thread safety, etc, it could prove advantageous to developers to have modifiable implementations in place.

MPIGRANULESZ

If custom CAMPARI routines for collective communications are in use (→ MPICOLLS), and if a calculation is performed that relies on such collective communication operations, this keyword lets the user alter the communication flow structure CAMPARI sets up to handle these cases. The keyword specifies a number of processes, amongst which communication is presumed fast (most often the number of CPU cores on a single board). The communication flow is then set up in a way that minimizes the required communication between such blocks of processes (they are generally assumed to be in sequence and to all be of identical size). This keyword is therefore unlikely to be useful for heterogeneous allocations (different numbers of cores granted on different machines or processes distributed non-sequentially). Between blocks, communication attempts to minimize latency (tree topology), whereas within blocks communication is (currently) strictly hierarchical and sequential with a single head process for each block. This means that (currently) setting MPIGRANULESZ to the number of processes granted by MPI will generate a global hierarchical flow with a single master, whereas setting it to 1 will generate a global tree-like flow.

NRTHREADS

Version 3 of CAMPARI will include a full implementation of a parallel execution mode using OpenMP. Version 2 includes (at the level of source code and not at the level of supported features) a fundamentally different and stalled development toward the same goal (offering very limit coverage of algorithms). This is not documented in INSTALL. If you are interested in parallel CAMPARI, please contact the developers for the current development version or wait for the release of CAMPARI v3.



Output and Analysis:


(back to top)

Preamble (this is not a keyword)

Unlike most other simulation software, CAMPARI offers to analyze certain quantities while the simulation is being performed ("on-the-fly"). This has the advantage that the frequency of dumping raw trajectory data to the disk does not have to control the frequency of analyses. This can save time and money by circumventing expensive write operations to disk. Of course, in a typical simulation setting, the user will still want to obtain trajectory data: for visualization, for non-yet-defined analyses, and so on. However, the built-in analyses can still prove beneficial by utilizing as much data as possible. This is generally controlled by several interval settings: analysis X should be performed or instantaneous data Y should be reported every N steps. Such keywords (see for example ANGCALC) are interpreted the same way unless otherwise noted. For example, if ANGCALC is 250 and NRSTEPS is 1000, the analysis would be performed at steps #250, 500, 750, and 1000. There is only one other keyword affecting this: the number of equilibration steps. If in the above example EQUIL is 400, the analysis would only be performed at steps, 500, 750, and 1000 (i.e., the count is always relative to the 0th step).
Note that some analyses can be costly. Their scaling with system size will usually be stated. At the very end, the log-output will typically report the fraction of CPU time spent performing analysis routines. This may help assess whether some of the frequency settings should be reduced.
CAMPARI often groups statistics together. For example, for a melt of identical polymers, CAMPARI would by default compute only a single histogram of end-to-end distances. This grouping is at times undesired and is overcome by the concept of analysis groups. Unfortunately, the opposite task of grouping information from different species together is currently not supported for such analyses.
For long CAMPARI runs, the instantaneous analysis has the downside that (currently) no intermediate results are produced (everything remains in memory until the very end). In this case, it can be useful to utilize the restart functionality (→ RSTOUT) to produce simulation blocks of equivalent sizes each with complete analysis output files. This strategy also serves to preserve more information in case of unexpected crashes or job terminations. The alternative is to follow the classical route of shifting the entire analysis burden to post-processing by only saving instantaneous trajectory output. As mentioned above, this has the downside of dealing with larger amounts of data and, more importantly, with a loss of coordinate precision (for example, when using the xtc compression library). Another issue that can prove problematic with long runs is that some instantaneous output files (such as the file with running energy terms) are subject to buffered I/O. This depends on compiler and operating and file systems and means information can be lost in case of unclean terminations, which makes it harder to diagnose the error.


RSTOUT

This keyword sets the interval specifying how often to write out a restart file. Such a file will allow continuing both crashed and normally terminated runs without losing significant accuracy due to truncation of significant digits (such as in pdb-files). Note that they are not bitwise perfect, however. The concept is described elsewhere. Restart files are written to two files continuously replacing themselves on an alternating schedule such that even if a crash occurs during a write-operation at least one sane restart file should exist. These files are generally named {basename}_1(2).rst. Settings for EQUIL are (of course) irrelevant for this output.

ANGRPFILE

This keyword sets path and name to the input file for determining analysis groups by custom request rather than by molecule type (→ ANGRPFILE). By default, CAMPARI will often combine collected analysis data for molecules of identical type. This is not always the desired behavior. For example, CAMPARI fails to recognize differences introduced to molecules of the same type by virtue of molecule-specific constraints or biasing potentials. Analysis groups alleviate this and similar problems by allowing to group molecules of identical type into arbitrary analysis groups. Note that it is never possible to combine data for molecules of chemically different type or to split a single molecule into multiple groups (although the latter may be implemented in the future). Systems employing chemical crosslinks (please refer to sequence input for details) pose a special case: here, intermolecular crosslinks do not conjoin two molecules in terms of data structures and analysis, i.e., it will for example (currently) not be possible to obtain the net radius of gyration of two crosslinked polypeptide chains. Instead, both chains will be analyzed and treated as if they were separate molecules.

ENOUT

This keyword defines the interval how often current potential energy data are written to a file called ENERGY.dat. Note that the total energy is decomposed into the individual terms controllable by keywords of the type SC_XYZ (for example SC_IPP). It is presently not possible to obtain energy decompositions based on subcomponents of the system. The data in ENERGY.dat are the only direct print-out of unperturbed energy values if energy landscape sculpting is in use. Settings for EQUIL are ignored for this output.

ENSOUT

By this keyword, the user sets the interval how often to write current ensemble data to a file called ENSEMBLE.dat. This is only relevant if DYNAMICS is not set to 1 or 6 (pure Monte Carlo sampling or minimization). The reported quantities are informative ensemble variables (limited output presently) including - most prominently - potential and kinetic energies. Settings for EQUIL are ignored for this output.

ACCOUT

If pure Monte Carlo or hybrid sampling is used (→ DYNAMICS), this keyword sets the interval how often to report cumulative acceptance data to a file called ACCEPTANCE.dat. Note that these data are mildly informative in that they do not directly allow to compute acceptance rates. They are mostly useful in analyzing a running simulation and assessing the performance of the move set. CAMPARI will report acceptance statistics as well as residue- and molecule-resolved acceptance counts at the very end of the simulation to log-output. The data in ACCEPTANCE.dat are only resolved by move type. Settings for EQUIL are ignored for this output.

TOROUT

This keyword lets the user decide how often to write sets of internal coordinate space degrees of freedom to a file FYC.dat in a one structure-per-line format. These files can easily become large due to the number of degrees of freedom in general scaling linearly with system size. There are two options selected by using a positive (mode 1) or negative integer (mode 2) for TOROUT.
  1. Native CAMPARI degrees of freedom are written with a header providing residue-level information. These generally correspond to the unconstrained degrees of freedom in Monte Carlo or torsional dynamics calculations (see sequence input for details). All but rigid-body coordinates are written to FYC.dat and much more information is provided there. Because rigid-body coordinates as missing, the information in the file is never enough to completely reconstruct the system even when assuming the default covalent geometries
  2. Sampled, dihedral angle degrees of freedom are written with a header that provides atomic indices corresponding to the various Z-matrix lines describing these dihedral angles. This mode excludes degrees of freedom that are actually frozen, and can include degrees of freedom that are not native to CAMPARI. All values are again written to FYC.dat, and more details are provided there. This mode never includes bond angles and/or dihedral angles that have no explicit Z-matrix entry.
Note that regardless of mode, bond angles altered by the Ulmschneider-Jorgensen concerted rotation moves are never reported in FYC.dat.

XYZOUT

This very important keyword sets the frequency with which snapshots containing (at least) the Cartesian coordinates of the system (or selected subsystem) are written to a new file or appended to an existing trajectory file (→ OUTPUTFILES and XYZPDB). Part of the filename(s) will be determined by keyword BASENAME. This is the fundamental saving frequency for obtaining trajectory data and should be chosen carefully whenever the proposed simulation is resource-intensive.

XYZPDB

If structural output is requested (→ XYZOUT), this keyword chooses the output file format (see OUTPUTFILES). It is an integer [1-3(4,5)] interpreted as:
  1. Tinker-style .arc-files (ASCII)
  2. ASCII .pdb-files (default option) in various output conventions (see PDB_W_CONV and PDB_OUTPUTSTRING)
  3. CHARMM-style binary .dcd-files (these include the box information for each snapshot and have a CHARMM-style header - note that the header is written only once by CAMPARI and contains the number of snapshots in the file. This may not always be correct if simulations are prematurely terminated or trajectory files appended)
  4. Compressed binary .xtc-files as used in GROMACS: note that this option is only available if the program is linked against a proper version of XDR (see INSTALL)
  5. Compressed binary .nc-files as define by the NetCDF format in AMBER convention: note that this option is only available if the program is linked against a proper NetCDF library (see INSTALL).
When simulating large systems with significant computational expense, it is typical to preserve as much trajectory information as possible which is why file-sizes become important. The ratio of sizes for typical settings may be something similar to 15:15:3:1:3 for .arc vs. .pdb vs. .dcd vs. .xtc vs. .nc. Note that CAMPARI always prints out atoms in its intrinsic assumed order which may be different from that used by other programs. The exception to this is a specific setting changing that order across residue boundaries for polynucleotides (see PDB_NUCMODE). Unannotated binary files such as .xtc-, .dcd-, and .nc-files may therefore not be compatible with other software unless a template file is used (for the analogous option within CAMPARI see PDB_TEMPLATE).

XYZMODE

If structural output is requested (→ XYZOUT), this integer [1-2] keyword determines whether to write to a series of numbered files (1) or a single file (2, the default). This, however, currently works for pdb only (specifically: .arc are always multiple files, and the binary formats always write to (append) a single file).

XTCPREC

If structural output is requested (→ XYZOUT) and the chosen output format is the binary .xtc-format (option 4 for XYZPDB), this keyword can be used to specify the multiplicative factor determining the accuracy of compressed xtc-trajectories (the minimum is 100.0). It is also required for proper reading of xtc-trajectories in xtc-analysis mode (see PDBANALYZE and XTCFILE).

PDB_NUCMODE

CAMPARI's internal representation of polynucleotides has one peculiarity. It assigns the entire PO4- functional group to the same nucleotide residue whereas most other programs seem to assign the 3'-oxygen atom to the residue carrying the sugar. This causes a non-trivial inconsistency when trying to use CAMPARI-generated pdb-files as input for other software. Therefore, this keyword defines how to assign the O3*-atom of nucleic acids in pdb-output only. There are two options:
  1. The O3*-atom is assigned to the residue carrying the 5'-phosphate it is part of, i.e., it is the very first atom in that residue. This is the CAMPARI-inherent convention and reflects the authentic structure of arrays in CAMPARI (which is relevant for any analysis requiring atom numbers, see for example PCCODEFILE in INPUTFILES).
  2. The O3*-atom is assigned to the residue carrying the sugar it is part of; this is the PDB-typical convention. Note that this inherently disrupts the 1:1-correspondence between numbering in the pdb-file and how nucleic acids are represented internally. It is recommended if CAMPARI-output is sought to be compatible with other software working in this latter convention.
Note that option 2 also destroys the 1:1 correspondence between binary (unannotated) trajectory files or TINKER .arc-files on the one side and pdb-files on the other side. This may lead to misrepresentations when using the automatic visualization scripts. This option may be extended in the future to support other trajectory file formats - both for reading and writing - as well.

PDB_W_CONV

CAMPARI can in general process different atom and residue naming conventions for the formatting of PDB files. This keyword selects the format for written files. Choices are:
  1. CAMPARI format
  2. GROMACS format (atom naming, nucleotide and cap residue names, ...)
  3. CHARMM format (atom naming, cap residue names and numbering (patches), ...): Note that there are two exceptions pertaining to C-terminal cap residues (NME and NH2) which arise due to non-unique naming in CHARMM: 1), NH2 atoms are called NT2 (instead of NT) and HT21, HT22 (instead of HT1, HT2), and 2), NME methyl hydrogens are called HAT1, HAT2, HAT3 (instead of HT1, HT2, HT3).
  4. AMBER format (atom naming, nucleotide residue names, ...)
Note that for subsequent work with CAMPARI it is of course safest and easiest to just the default option (1). This keyword is supposed to provide cross-compatibility with other simulation software. Note that support is limited to standard biomacromolecules and typically does not extend to small molecules or unusual polymer residues (extensions may be provided in the future). Also note that for polynucleotides, the setting for PDB_NUCMODE is an independent modifier of pdb-output. This is the twin keyword to PDB_R_CONV above.

PDB_OUTPUTSTRING

This is an experimental keyword to use at your own risk. It allows changing the formatting string (Fortran) used for PDB files written by CAMPARI. This can be useful to make PDB files of very large systems and in particular for changing the precision of PDB files. In order for CAMPARI to read these files back in, the analogous keyword PDB_INPUTSTRING has to be used. The default is "a6,i5,1x,a4,a1,a3,1x,a1,i4,4x,3f8.3" (with the quotes). The letters (a, i, f) give the type of variable, which must not change. The numbers give the fields lengths, and these can be customized for variables of type integer ("i") or real ("f"). It is also possible to modify the field widths of string variables ("a") but that is likely harmful because the variables in use are tied to the default format.

XYZ_SOLVENT

If structural output is requested (→ XYZOUT), this logical keyword allows the user to suppress trajectory output for molecules labeled as solvent. This can be useful to down-convert trajectory files from explicit solvent runs or - more general - to isolate certain parts of the system from existing trajectory data (employing PDBANALYZE and ANGRPFILE). It may also be used to save space during actual simulations but it should be kept in mind that information about the solvent may be lost irrevocably and that the resultant trajectories may no longer be straightforward to analyze.

TRAJIDXFILE

Usage of keywords XYZ_SOLVENT in conjunction with the concept of analysis groups allows the user some amount of fine control over what is written to the trajectory file. In some scenarios this may not be enough (for example, if external scripts or software or even CAMPARI itself are meant to analyze non-trivial subsets of the system). Then, the user has the option to supply a simple index file providing per-atom control over what coordinate information is written to the trajectory file. Note that this will be useful for subsequent trajectory analysis runs only if the selected subset preserves the integrity of all molecules to remain in the output, or if the output format is pdb such that missing atoms can be re-built. For example, consider a block co-polymer consisting of two blocks. The full trajectory could be re-analyzed using an index file to yield a reduced trajectory in pdb-format (keywords XYZOUT, XYZPDB, and XYZMODE) that contains only one of the two blocks. With a properly adjusted sequence input file, it may then be possible to perform intrinsic CAMPARI analyses over the isolated block which really was part of a larger molecule. In this process, almost certainly some terminal atoms would have to be rebuilt at the break point (but those may not influence the analyses). For a description of the input file format, see here. Note that all other output selection settings are ignored if an index set is used via this keyword.

XYZ_FORCEBOX

If a system is simulated or analyzed that utilizes periodic boundary conditions, this keyword can be used to alter the standard CAMPARI way of placing atoms with respect to the unit cell. By default, CAMPARI will never break up molecules in trajectory output, which implies that the absolute coordinates in the trajectory file(s) can extend significantly beyond the formal boundary of the unit cell. Sometimes (for example, for visualization or for certain analyses), it may be desired to instead have all atoms be inside the unit cell, and this is what this keyword accomplishes. It currently works a simple logical, and setting it to 1 will make sure that in all trajectory and other structural output all atoms selected for output are indeed inside the formal unit cell. Naturally, this will generally split molecules into two or more parts, which may interfere with molecular representations relying on bonds, etc.
Note that trajectory files created in such manner are currently not understood by CAMPARI when trying to read them back in. It is therefore recommended to utilize this feature only to transform preexisting trajectories (via trajectory analysis mode) rather than using it during the actual simulations.

XYZ_REFMOL

If a system is simulated or analyzed that utilizes periodic boundary conditions, this keyword can be used to alter the standard CAMPARI way of placing molecules in two contexts, viz., trajectory output and structural clustering using absolute Cartesian coordinates (RMSD). The use of this keyword is explained primarily in the context of the former.
By default, CAMPARI will never allow the geometric center (or center of mass in gradient-based runs) to "leave" the central unit cell. When looking at intermolecular interfaces, this can lead to the undesirable effect of the interface being broken across the periodic boundary. These images often flicker back and forth, which makes visual inspection difficult unless periodic images are explicitly replicated. XYZ_REFMOL allows the user to specify a reference molecule whose center serves as reference point for all images, i.e., the coordinates of all other molecules printed to trajectory output are those of the nearest image of these molecules with respect to the chosen reference. This operation does not destroy information (i.e., it does not center or align anything) but leads to molecules being displayed that are outside of the central unit cell. In fact, the reference molecule is the only one that is guaranteed to reside in the central cell at all times.
Note that this keyword does not actually alter coordinates used internally, and therefore has no impact on the majority of analysis functions, etc. The only exception is structural clustering relying on absolute Cartesian coordinates (options 5, 6, and 10 for CDISTANCE). It is also ignored for the pdb files written at the beginning and end of a simulation. Along similar lines, trajectory files created in such a manner can be read back by CAMPARI without problems (internally, every molecule is translated to the central unit cell upon read-in as long as the box information (BOUNDARY, SIZE, and ORIGIN) is preserved).

ALIGNCALC

In trajectory analysis runs CAMPARI offers the option to structurally superpose the current Cartesian coordinates to a suitable reference. Note that this functionality is conveniently available through almost all molecular visualization software packages. CAMPARI provides automatically generated visualization scripts designed to work with VMD. If these options are unavailable or inconvenient, this keyword lets the user set the interval how often CAMPARI should perform structural alignment. For example, to create - from an original trajectory - a superposed trajectory of every 10th frame, XYZOUT would have to be 10 and ALIGNCALC would have to be 10 or a factor of 10 (5,2,1). For convenience, the root mean square deviation over the alignment set after alignment can be written to an instantaneous output file. This can be enabled by specifying a negative number to ALIGNCALC, which is, except for the sign, interpreted in the same way.
Alignment happens before any of the analysis routines are called and works by first defining a reference set of atom indices (→ ALIGNFILE). Using a quaternion-based algorithm, an optimal translation and rotation is determined that minimizes - when applied to the current coordinates - the deviation between the transformed current coordinates and the reference coordinates (i.e., a set of coordinates for all atoms in the reference set). Note that this procedure will always preserve the internal state of molecules and - except for certain cases in periodic boundary conditions - the relative arrangement of molecules. It will not, however, preserve the relative position of the system boundary. This may lead to artifacts in energetic analyses of aligned trajectories or any analyses that rely upon relative, intermolecular coordinates.
There are two ways of defining the reference coordinate set. The first is via an external file. Here, CAMPARI reuses the pdb-template functionality. If keyword PDB_TEMPLATE is specified and successfully read, then the reference coordinate set is extracted from this file using the reference set defined via ALIGNFILE. Note that the template may serve a double purpose in this scenario as it may still provide the atom numbering map needed to read binary trajectory formats with non-CAMPARI atom order. If no template is specified, the reference set will be given by the previously aligned structure. This successive alignment therefore uses a different reference coordinate set each time and will consequently lead to drift.

ALIGNFILE

If system alignment is possible and requested (→ ALIGNCALC), this keyword allows the user to supply the path and name of a mandatory input file containing an atomic index list defining the reference set to be used for the alignment algorithm. For example, in the simulation of a macromolecule with co-solutes it will not be meaningful to use the entire set of atoms in the system as the alignment set since the randomly dispersed co-solutes will dominate the alignment. Instead, one will typically want to only supply nonsymmetric protein atoms here.
This keyword serves a second purpose, viz., if structural clustering is requested, and if an RMSD distance criterion with differing alignment and distance atom index sets is desired, this keyword lets the user specify the input file with the alignment set. Simultaneous use of both functionalities is permitted. Lastly, note that any set used for alignment must consist of at least three atoms.

POLOUT

This keyword sets the interval how often to compute and write current system-wide polymeric variables (→ POLYMER.dat). This instantaneous output can be useful to easily monitor structural changes (such as dimerization events) in dilute systems with heterogeneous density. It is completely uninformative for systems with homogeneous density. For simulations of a single polymer chain, distributions of polymeric order parameters as well as correlation functions can be computed from the output in POLYMER.dat.

POLCALC

This keyword lets the user specify the frequency with which values for polymeric properties incurring low computational cost are computed. These data are collected and reported resolved by analysis group and include characteristic values for shape and size, histograms of end-to-distances, etc. Setting this keyword such that polymeric analyses are performed, several output files are generated: (→ POLYAVG.dat, RGHIST.dat, RETEHIST.dat, and RDHIST.dat). Furthermore POLCALC controls the interval for data collection to obtain averages of the suitably defined angular correlation function along the polymer backbone, which may be related to the intrinsic stiffness or persistence length of the polymer (→ PERSISTENCE.dat and TURNS_RES.dat). Lastly, this keyword controls the frequency for the computation and averaging of molecular, radial density profiles, i.e., the mass distribution function along the radial coordinate originating from the each molecule's center of mass considering only atoms belonging to that molecule (→ DENSPROF.dat). This quantity is used in Lifshitz-type polymer theories.

RHCALC

Since the computation of comprehensive polymer-internal distances is more expensive, this dedicated keyword controls the data collection interval for analyses relying on such data. A comprehensive set of internal distances in CAMPARI is used to compute three quantities:
  1. An alternative estimate of the polymer's spatial size which is sometimes related to the hydrodynamic radius (→ corresponding entry in POLYAVG.dat; note that should RHCALC be set such that no analysis is performed but POLCALC be chosen such that the other quantities in POLYAVG.dat are compute and provided, the corresponding column must be ignored).
  2. A scaling profile of the internal distances with distance of separation in primary sequence (→ INTSCAL.dat).
  3. The scattering (Kratky) profile of the polymer (→ KRATKY.dat; this relies on the additional frequency setting SCATTERCALC).
Note that the scaling of internal distances can be used to assess solvent quality given the simulation of just a single polymer very efficiently. All mentioned data are obtained resolved by analysis group.

SCATTERCALC

As alluded to above, this keyword sets an auxiliary frequency for the calculation of scattering properties resolved by analysis group (→ KRATKY.dat). This requires computing Fourier transforms of internal distances for a series of wave vectors and is consequently a very expensive calculation. Due to the coupling to the computation of internal distances (see RHCALC), this keyword is not interpreted like the other interval keywords (???CALC). Instead, SCATTERCALC sets the calculation interval amongst only those steps chosen already via RHCALC. For example, if RHCALC is 10 and SCATTERCALC is 20, then scattering data will be accumulated every 200 steps. The data in KRATKY.dat can be used to compare simulation data directly to experiment. In a double-logarithmic plot, it may also be possible to identify linear regimes ("power law regime" in contrast to the "Guinier regime" for smaller wave vectors) which can be fit to yield the scaling exponent for fractal objects. Conversely, for globular polymers, Porod's law may hold.

SCATTERRES

Since the required number of points and range of wave vectors for the prediction of scattering profiles may be system-dependent, this keyword allows the user to adjust the spacing of wave vectors assuming scattering data are being calculated at all (→ RHCALC and SCATTERCALC). The first wave vector's absolute magnitude q=|q| will always be 0.5·SCATTERRES with units of Å-1. In general, the larger the chain, the smaller the absolute magnitudes of wave vectors needed.

SCATTERVECS

Since the required number of points and range of wave vectors for the prediction of scattering profiles may be system-dependent, this keyword allows the user to adjust the total number of employed wave vectors assuming scattering data are being calculated at all (→ RHCALC and SCATTERCALC). Together with SCATTERRES, this determines the range of the wave vectors. Note that generally a coarse resolution (and hence a small number of vectors) is sufficient as scattering profiles tend to be very smooth functions.

HOLESCALC

For polymers it may be interesting to analyze the distribution of "internal" void spaces. In CAMPARI, a rudimentary analysis routine exists which attempts to place spheres of varying size at different distances from the molecule's center-of-mass and to record whether any overlap with part of the polymer is encountered. This analysis is recorded in instantaneous output (HOLES.dat), and the latter needs to be post-processed. Note that this analysis is restricted to simulations of monomeric polymers.

RGBINSIZE

If standard polymeric analyses are performed (→ POLCALC), this keyword sets the size of the bins in Å for the three output files RGHIST.dat, RETEHIST.dat, and DENSPROF.dat. It therefore determines the resolution along the radius of gyration- or related axes.

POLRGBINS

If standard polymeric analyses are performed (→ POLCALC), this keyword can be used to set the number of bins of size RGBINSIZE for the three output files RGHIST.dat, RETEHIST.dat, and DENSPROF.dat. Since quantities like the radius of gyration or end-to-end distances are strongly system-dependent, it is up to the user to ensure the appropriate number of bins. Note that - just like all other histograms in CAMPARI - terminal bins will be overstocked should range exceptions occur.

PHOUT

This keyword controls the frequency how often to output ionization states of certain ionizable residues. Currently, this analysis relies on pseudo-Monte Carlo moves (see PHFREQ) to work and is therefore only available in straight MC runs. Further limitations are listed in the descriptions of sampler and output file.

ANGCALC

This keyword lets the user define the interval how often to extract polypeptide backbone torsion angle statistics, i.e., how often to go through all non-terminal polypeptide residues and bin values for the φ/ψ-angles into a two-dimensional histogram. This keyword also controls the data collection frequency for estimation of vicinal NMR J-coupling constants (HN to HαJCOUPLING.dat). The Ramachandran analysis itself is reported globally in a file called RAMACHANDRAN.dat. Due to the system-wide averaging (including over molecules of different type), this is probably most meaningful for simulations of single homopolymers. For more detailed control, further output files may be obtained: residue-specific as well as analysis group-specific maps should requests have been provided via keywords RAMARES and RAMAMOL, respectively.

ANGRES

This keyword matters only if ANGCALC is chosen such that polypeptide backbone φ/ψ-statistics are accumulated. If so, it sets the resolution in degrees for such angular distribution functions. The smallest permissible value at the moment is 1.0°.

RAMARES

This keyword matters only if polypeptide φ/ψ-analysis is requested (→ ANGCALC). If so, it allows the user to monitor the distributions specifically for selected polypeptide residues in the system. The first entry, which defaults to zero, specifies the number of such specific requests. The user then has to provide the appropriate number of integer values (residue numbers as defined per sequence input) on that same line in the key-file. The maximum number for individually monitored residues is limited to 1000. Successful requests (those pointing to non-polypeptide, non-existing, or terminal residues will be ignored) will create output files like "RESRAMA_00024.dat".

RAMAMOL

This keyword is exactly analogous to RAMARES only that it operates not on residue but on analysis groups (all residues of all molecules in that analysis group are pooled, numbering as reported initially in the log-output). It will create files like "MOLRAMA_00002.dat".

INTCALC

This keyword sets the interval how often to compute comprehensive statistics for typical internal coordinates of the system, i.e., all bond lengths, angles, torsional angles, as well as improper torsional angles (trigonal-planar centers - consult PARAMETERS for further details). Note that molecular topology defines which atom pairs - for example - share a bond. With this analysis, it is therefore not possible to analyze arbitrarily defined distances, angles, and torsion angles in the system. If turned on, up to five different output files are provided, namely INTERNAL_COORDS.idx, INTHISTS_BL.dat, INTHISTS_BA.dat, INTHISTS_DI.dat, and INTHISTS_IM.dat.

WHICHINT

This is one of the few keywords expecting multiple inputs and matters only if internal coordinate analysis is requested (→ INTCALC). Four integers should be provided and each one is interpreted as a logical to turn on individual groups of internal coordinate analyses. The first turns on the calculation of bond length histograms, the second that of bond angle histograms, the third that of improper dihedral angle histograms, and the fourth that of proper torsional angle histograms. Note that the number of possible internal coordinates quickly exceeds the number of atoms for any complex molecule. These analyses can therefore easily become fairly time-consuming as well as data-rich (in terms of the sizes of the output files). This is one of the reasons for introducing this selection mechanism. The other lies simply in the fact that in any simulation using CAMPARI-typical torsional space constraints (see CARTINT) analyses of bond length, angle, and improper dihedral distribution is meaningless.

SEGCALC

This keyword lets the user specify the interval how often to scan the polypeptide backbone for stretches of similar secondary structure (as defined in the specified file through FMCSC_BBSEGFILE). The annotation - in contrast to DSSP - is obtained purely on torsional criteria and relies on defining consensus regions within φ/ψ-space. These consensus definitions are found in a supplied data file (→ BBSEGFILE). At the end of the simulation results are written to files named BB_SEGMENTS_NORM.dat, BB_SEGMENTS_NORM_RES.dat, BB_SEGMENTS.dat, and BB_SEGMENTS_RES.dat This analysis is resolved by analysis group and useful to identify coarse secondary structure propensities in polypeptides. As an example, the data in BB_SEGMENTS_NORM_RES.dat can be used to compute parameters of the helix-coil transition according to the Lifson-Roig formalism (see for example Tutorial 3 or this reference). SEGCALC also controls the computation of global (at a molecular level) secondary structure order parameters fα and fβ (which are also used for the corresponding bias potentials → SC_ZSEC used in Tutorial 9 or this reference). Various distribution histograms are written to files ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat. Analysis of these order parameters is similarly performed in analysis group-resolved fashion.

DSSPCALC

This keyword specifies how frequently to perform DSSP analysis. DSSP is a secondary structure assignment procedure for proteins (reference). All eligible (i.e., full peptide) residues are scanned for backbone-backbone hydrogen bond patterns and various statistics and running output is provided if so desired (see DSSP_NORM_RES.dat, DSSP_NORM.dat, DSSP.dat, DSSP_RES.dat, DSSP_HIST.dat, DSSP_EH_HIST.dat, and DSSP_RUNNING.dat). The DSSP results typically complement the results from backbone segment statistics (see for example BB_SEGMENTS_NORM_RES.dat) well as the former are based exclusively on hydrogen bond patterns while the latter are based exclusively on dihedral angles.

INSTDSSP

If DSSP analysis is requested (→ DSSPCALC), this keyword is interpreted as a simple logical whether to write out running traces of the full DSSP assignment for the current snapshot (see DSSP_RUNNING.dat). This can be useful when analyzing pdb-trajectories or even individual pdb-structures with CAMPARI. Instantaneous DSSP output is currently not supported for MPI-averaging calculations (see MPIAVG).

DSSP_MODE

Based on DSSP analysis (→ DSSPCALC), the code computes two order parameters to measure canonical secondary structure content. The E-score corresponds to the β-content and the H-score to the α-content. they are system-wide quantities and are computed as follows:

   E-score = E-fraction · ( H-bond-Score_E )1/n
   H-score = H-fraction · ( H-bond-Score_H )1/n

Here, E-fraction and H-fraction are simply the fractions of residues which are assigned E or H according to DSSP. n is an arbitrary scaling exponent (see DSSP_EXP). H-bond-Score_E is a continuous variable which measures the mean quality of the hydrogen bonds forming the β-sheets in the system and H-bond-Score_H is the analog for α-helices. In principle, all the hydrogen bond energies are collected and divided by the value for the same number of good hydrogen bonds (see DSSP_GOODHB). The quantity can be capped, however, based on the choice for DSSP_MODE:
  1. Every hydrogen bond can maximally contribute the value of DSSP_GOODHB. Therefore, H-bond-Score_X is always less than unity and only approaches unity if each and every relevant H-bond is at least as favorable as the cutoff given by DSSP_GOODHB. This is the most stringent score. The resultant X-scores will always be less or equal to the corresponding X-fractions.
  2. Every hydrogen bond can maximally contribute DSSP_MINHB which is always more negative than DSSP_GOODHB. The value of H-bond-Score_X, however, is capped to be at most unity. In this score, very strong H-bonds can compensate the effects of a few weak ones but the value for X-score still is capped by the corresponding X-fraction.
  3. Every hydrogen bond can maximally contribute DSSP_MINHB. The value of H-bond-Score_X is not capped and can adopt values larger than unity. The X-score is capped, however, to never exceed unity. This is the most lenient score and the only one in which X-score can exceed the value of X-fraction.
Through the H-bond weighting two things are achieved: i) instead of the inherently discreet X-fractions, smooth and continuous distributions are obtained for X-score, and ii) it removes redundancy in which secondary structure elements with predominantly weak H-bonds are indistinguishable from perfect, canonical secondary structure elements when using X-fraction. The limit of X-score ≡ X-fraction is of course obtained for n → ∞.

DSSP_EXP

For the DSSP analysis in CAMPARI (→ DSSPCALC), this keyword choose the integer scaling exponent for the H-bond term in computing E- and H-scores (see DSSP_MODE).

DSSP_GOODHB

For the DSSP analysis in CAMPARI (→ DSSPCALC), this keyword defines the standard energy for a "good" hydrogen bond. This is used to evaluate the smoothed E- and H-scores (see DSSP_MODE) and not part of the original DSSP standard. Permissible values lie between -1.0 and -4.0 kcal/mol.

DSSP_MINHB

For DSSP analysis (→ DSSPCALC), this keyword specifies the minimal (= lowest possible = most favorable) energy for any hydrogen bond. Since the DSSP-formula is based on inverse distances it is useful to introduce this lower cap such that conformations with steric overlap do not overly bias the analysis (for example in pdb-analyses → PDBANALYZE). Permissible values lie between -10.0 and -4.0 kcal/mol.

DSSP_MAXHB

For DSSP analysis (→ DSSPCALC), this keyword allows the user to define the maximal (= highest possible = least favorable) energy fo any hydrogen bond. This is the fundamental cutoff for DSSP to consider H-bonds and therefore a very important quantity for the analysis to be meaningful. The recommended value is -0.5 kcal/mol but values between -1.0 and 0.0 kcal/mol are allowed.

DSSP_CUT

For DSSP analysis (→ DSSPCALC), this keyword defines the distance cutoff applied to the Cα-atoms of two peptide residues to consider them for hydrogen bonds. This can be relatively short (defaults to 10 Å) but the accuracy hinges on the choice for DSSP_MAXHB. Consistency has to be ensured by the user. Using a Cα cutoff for pre-screening of residue pairs significantly reduces the computation time needed by the DSSP analysis.

CONTACTCALC

This keyword specifies the interval how often to perform contact analysis, i.e., how often to get information about which and how many solute residues are close to each other. See CONTACTMAP.dat and CONTACT_HISTS.dat for more details. Note that this analysis is restricted to residues of molecules tagged as solutes (→ "FMCSC_ANGRPFILE) in order to facilitate frequent contact analysis even if solute molecules are explicitly represented (which may be prohibitively expensive otherwise).

CLUSTERCALC

This keyword (along with CONTACTCALC) controls the computation frequency for solute cluster statistics (i.e., cluster sizes, cluster contact orders, and molecule-resolved cluster statistics) where a cluster is defined through the minimum atom-atom distance contact definition (between any pair of residues). Note that this is the interval at which to perform cluster analysis from within the calculation of contacts (i.e., CLUSTERCALC is relative to CONTACTCALC, as SCATTERCALC is to RHCALC). The reason is that the cluster detection algorithm relies on the determination of contacts but that it may not always be a meaningful analysis to perform (see CLUSTERS.dat, MOLCLUSTERS.dat, and COOCLUSTERS.dat for further details on the output).

CONTACTOFF

If contact analysis is requested (→ CONTACTCALC), this keyword defines a sequence-space offset to exclude neighboring residues from the analysis. For topologically connected systems (i.e., polymer chains) data for near-neighbor contacts such as i↔i+1 may be uninformative as they will always be in contact on account of the underlying topology. Note that the omission only applies to intramolecular contacts. Setting this to zero includes everything (even i↔i), and any larger integer lets the analysis start from the distance. The default here is zero and there is rarely a reason to change it.

CONTACTMIN

For contact and cluster analysis (→ CONTACTCALC), this keyword provides the threshold value for of a residue-residue contact in Å. Here, the threshold is applied to the minimum distance between any arbitrary pair of atoms formed by the two residues in question. This defaults to 5.0 Å. Note that this computationally more expensive definition has the advantage of rendering the contact probabilities more or less size-independent for polyatomic residues. In the presence of excluded volume interactions, monoatomic residues (ions) of different size will still yield contact statistics which include physically meaningless biases, however.

CONTACTCOM

For contact and cluster analysis (→ CONTACTCALC), this keyword gives the alternative threshold value for a residue-residue contact in Å. Here, the threshold applies to the distance between the centers of mass of the two residue it question. It also defaults to 5.0 Å. Note that (in the presence of excluded volume interactions) contact probabilities obtained this way are by design dependent on the size of the interacting residues and results may be misleading if contact statistics between pairs of residues with highly variable size are compared.

PCCALC

This keyword allows the user to specify how often to perform pair correlation analysis, i.e., get distance counts for a variety of intra- and intermolecular distances and - in the case of intermolecular distances - proper normalization by the current volume element. It controls the computation frequency for three different classes of distance distributions:
  1. Generic intramolecular amide-amide distributions (various acceptor-donor pairs, as well as centroid-centroid) (→ AMIDES_PC.dat in OUTPUTFILES), only relevant for polypeptide systems.
  2. Generic intermolecular pair correlation functions for solutes (→ RBC_PC.dat in OUTPUTFILES), only relevant for systems with more than one solute. Note that this option can consume inordinate amounts of memory should a lot of different solute types be present. Workarounds consist of disabling this analysis or of using the analysis group feature to redeclare most of those as solvent molecule types and to use specific atom-atom distributions instead.
  3. Specific atom-atom distributions and/or pair correlation functions as defined through an index file (see FMCSC_PCCODEFILE) (→ GENERAL_PC.dat in OUTPUTFILES).

DO_AMIDEPC

If pair correlation analysis is requested (→ PCCALC), this keyword enables the user to disable the computation of intramolecular amide-amide distance distribution functions (→ AMIDES_PC.dat) by setting it to zero.

PCBINSIZE

This keyword specifies the distance bin size in Å for pair correlation analysis (→ PCCALC).

PCCODEFILE

This keyword specifies the path and filename to the input file for requesting specific pair correlation or distance distribution analyses (see FMCSC_PCCODEFILE). It is also possible to generate instantaneous traces for the selected distances with keyword INSTGPC. In general, the input is rather flexible and it is possible to pool many analogous or even unrelated atom-atom distances under a certain code or to use unique codes for very specific requests. Upon successful parsing of the input and given that pair correlation analysis is globally requested (→ PCCALC), the output file GENERAL_PC.dat is created.

GPCREPORT

This logical keyword instructs CAMPARI whether or not to write out a summary of the terms requested through FMCSC_PCCODEFILE (→ GENERAL_PC.idx). It is only available if distance distribution / pair correlation analysis is in use (→ PCCALC).

INSTGPC

This keyword lets the user instruct CAMPARI how often to print out instantaneous values for all the specific distances selected via FMCSC_PCCODEFILE. Note that this does not include the generic distances CAMPARI analyzes, and consequently the keyword has no effect if no usable input has been provided via FMCSC_PCCODEFILE or of course if pair correlation analysis is not in use. This keyword is understood as a dependent frequency, i.e., a setting of 1 will print instantaneous values for every PCCALCth step. Note that this feature is disabled by default and that the output in GENERAL_DIS.dat can easily become large.

SAVCALC

This keyword specifies how often to compute solvent-accessible volume (SAV) fractions and solvation states for the system. If the ABSINTH implicit solvent model is in use (→ SC_IMPSOLV), this analysis can rely on the current values for those quantities (no additional, computational cost); otherwise computing atomic SAV fractions incurs a moderate computational cost. The solvent-accessible volume will globally depend on the choice for the thickness of the assumed solvation shell (→ SAVPROBE). The mapped solvation states as reported for individual atoms (please refer to the ABSINTH publication for details) will depend on further ABSINTH parameters. Some of these can be adjusted through patches, e.g., user-supplied values for overlap reduction factors.
SAV analysis creates at most three output files; an instantaneous one (SAV.dat) that depends on auxiliary keyword INSTSAV, an atom-resolved output file that reports simulation averages (→ SAV_BY_ATOM.dat), and finally a file containing distribution functions (histograms) for selected atoms for those quantities (→ SAV_HISTS.dat). The latter file is dependent on another auxiliary keyword, i.e., SAVATOMFILE. The instantaneous output is primarily useful as a diagnostic tool for the system while the simulation is running, and to be able to compute correlation functions, multidimensional histograms, etc. for quantities related to the solvation of specific sites on macromolecules. Please refer to the descriptions of the output files for further details.

INSTSAV

If analysis of solvent-accessible volume fractions is requested (→ SAVCALC), this keyword allows the user to have a quantity related to the total SAV along with a running average being printed to a dedicated output file (→ SAV.dat). In addition, the values for SAV fractions for selected atoms (via SAVATOMFILE) are written out. The latter allows the construction of correlation functions, multidimensional histograms, etc. The keyword (positive integer) is interpreted as a print-out frequency relative to the frequency with which SAV analysis is performed per se. This means that the effective print-out frequency will be SAVCALC·INSTSAV.

SAVATOMFILE

If analysis of solvent-accessible volume fractions is requested (→ SAVCALC), this keyword specifies the location and name of a simple input file (list of atomic indices, format is described elsewhere) that allows the user to select a subset of the system's atoms for creating histograms of both SAV fraction and resultant solvation state (see above). These histograms are written to a dedicated output file (→ SAV_HISTS.dat). In addition, if instantaneous output of SAV-related quantities is requested (→ INSTSAV), the values for the SAV fractions for the selected atoms are written to the corresponding output file (SAV.dat). Note that instantaneous values for the SAV fractions allow manual computing (during post-processing) of solvation states (using parameters set in the key-file and/or reported in SAV_BY_ATOM.dat, and using the reference publication to retrieve the necessary expressions). It should be kept in mind that with normal settings for SAVPROBE, SAV fractions of nearby atoms are tightly coupled. This means for example that requesting information for atoms that are covalently bound will rarely yield additional information. Lastly, the binning for the histograms is fixed and uses 100 bins across the interval from zero to unity (both quantities are restricted to this interval).

NUMCALC

This keyword is relevant only when the chosen thermodynamic ensemble allows for particle number fluctuations (simulations is performed in the (semi-)grand canonical ensemble). It then specifies the number of Monte Carlo steps between successive accumulations of number-present histograms for each fluctuating particle type. For a description of the corresponding output file please refer to PARTICLENUMHIST.dat.

COVCALC

This simple keyword instructs CAMPARI to collect raw data (signal trains) for select degrees of freedom in the system (currently this is restricted to all flexible dihedral angles → TRCV_xxx.tmp) every COVCALC steps. This is a near-obsolete functionality that has large overlaps with the output written to FYC.dat via TOROUT. It was meant to provide intrinsic support for variance/covariance analyses, e.g., with the ultimate goal of performing dimensionality reduction. Given that merely raw data are provided and that dihedral angle data are generally circular (periodic) variables requiring the use of circular statistics (not as trivial as it may sound), usage of this facility is generally not recommended. This option is available in different modes (see COVMODE) and may eventually be revived or extended later. Note that CAMPARI can perform intrinsic principal component analysis (PCA) as part of the structural clustering facility (→ CCOLLECT and PCAMODE).

COVMODE

This keywords chooses between (currently) two types of raw data to be provided by CAMPARI in output files TRCV_xxx.tmp. It can be set to:
  1. Internal degrees of freedom (i.e., torsions) directly in torsional space (radian)
  2. Internal degrees of freedom (i.e., torsions) expressed as their cosine and sine components
As alluded to above (COVCALC), more options may be added in the future. In all cases, more documentation is found for TRCV_xxx.tmp.

DIPCALC

This keyword specifies how often to compute molecular and residue-wise dipole moments for net-neutral molecules (or residues). Because the analysis relies on atomic partial charges, dipole analysis requires SC_POLAR to be set to a value larger than zero as charges are otherwise not assigned. The (somewhat preliminary) analysis produces output files MOLDIPOLES.dat and RESDIPOLES.dat.

EMCALC

This keyword specifies how often to compute spatial density distributions for the simulated system. If the density restraint potential is in use, this analysis is automatically performed at every step given that it is computed regardless. The result is an averaged density on a three-dimensional grid of dimensions controlled generally by keywords EMDELTAS and SIZE. For nonperiodic boundaries, the evaluation grid is or can not be mapped to the system dimensions exactly, and keyword EMBUFFER becomes relevant. When using the density restraint potential the grid serves both the purpose of analysis as described here, and the purpose of evaluating the potential itself, which implies that it is an option to adopt the grid dimensions from the input density map. This is the default behavior for a cuboid system with 3D periodic boundary conditions when EMDELTAS is not provided.
The resultant spatial density is that of a given atomic property selected by keyword EMPROPERTY. It is written to an output file in NetCDF format, an external library required to use this feature. The details of the file format CAMPARI use are described elsewhere. The spatial density is computed as follows:

ρijk = ρsol + Vijk-1 ΣnN [ Xn - γnVnρsol ] Πd3 BA ( rnd - Pijkd )

Here, Vijk is the volume of the grid cell with indices i, j, and k, N is the number of atoms in the system, Xn is the target property of the atom with index "n", Vn is that atom's volume, and rnd are the three components of its position vector. The parameter γn is a pairwise, volume overlap reduction factor that corrects atomic volume for overlap with covalently bound atoms. It is explained in some detail elsewhere. The parameter ρsol sets a physical background density for the property in question, and this is relevant when not all matter contributing to the property density in the system is represented explicitly. In such a case, an assumed vacuum would lead to severe errors. Note that atomic volumes and volume reduction factors are no longer relevant if ρsol is zero in the above equation. Finally, the product in the above equations utilizes cardinal B-spline functions of order "A", BA, which are assumed centered at the center of each grid cell (vector Pijk with components Pijkd for each dimension). This technique of distributing a property on a lattice is shared with the particle-mesh Ewald method.

EMDELTAS

If the density restraint potential is not in use, but spatial density analysis is requested, this keyword is mandatory and sets the lattice cell size of the analysis grid by providing three floating point numbers corresponding to the lattice cell sizes in Å for the x, y, and z dimensions, respectively.
Conversely, if the density restraint potential is in use, this keyword is optional and allows the user to set a lattice cell size different from the one used by the input density map. The keyword again requires the specification of three floating point numbers that set the lattice cell sizes in Å for the x, y, and z dimensions of the analysis and evaluation grid, respectively. Note that acceptable choices require that it be possible to superpose the cells of the input density map exactly with the analysis grid after reducing its resolution to that of the input map. Minor adjustments may be made automatically to system size and/or the origin of the input map. If, for example, in the x-dimension the input map has 10 cells of width 2Å, and the evaluation grid has 26 cells of width 1Å, then the system origin has to be chosen such that the left boundary of the first cell of the input density aligns with the left boundary of the first, third, or fifth cell of the evaluation grid (but not any others). In the same example, CAMPARI would reject a system size of 25Å, because the resultant number of cells in the x-dimension would not be divisible by the integer factor corresponding to the differences in resolution (here 2). It would also reject an origin aligning the first input cell to the seventh evaluation grid cell, because this would mean that the input map extends beyond the system boundaries. Finally, implied boundary conditions of the input map are not made to correspond to system boundary conditions automatically. For any periodic boundaries of the system, the evaluation grid is and must be fit exactly to the system dimensions.

EMPROPERTY

If spatial density analysis is requested, or if the density restraint potential is in use, this keyword lets the user pick an atomic property to be distributed on a lattice. If this is supposed to work as a density restraint, there are only two options available at the moment:
  1. Use atomic mass (resultant units are g/cm3)
  2. Use atomic number, i.e., proton mass (resultant units are also g/cm3 for convenience)
Note that the density calculation is controlled by formal grid resolution and by the function used to represent the point property in question. If there is an assumed physical background level, the effective atomic volumes become relevant as well. If the density restraint potential is not in use, there currently is one further option as follows:
  1. Use atomic charge (resultant units are e/Å3)

Note that additional options may be made available in the future.

EMBGDENSITY

If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets an assumed background level for the atomic property in question. In general, the value should be zero if all relevant matter in the system is represented explicitly, i.e., if empty space is indeed meant to correspond to a vacuum. If not, the value should be given in appropriate units depending on the property the density is derived from. These are g/cm3 for mass and proton densities (atomic number), and e/Å3 for charge.

EMBUFFER

If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets a ratio for how much to extend the evaluation grid for spatial densities beyond any nonperiodic boundaries of the system. In the direction of a nonperiodic boundary, CAMPARI takes the maximum dimension (e.g., the diameter of a sphere) and multiplies it with this factor to obtain the (approximate) size of the rectangular cuboid grid. Alignment with a potential input grid is achieved by shifting the origin of the evaluation grid slightly. Note that the behavior will generally be undefined for cases where solute material samples positions off the evaluation grid. It is up to the user to ensure that the buffer spacing is big enough for the stiffness of the boundaries to prevent this from happening.

EMBSPLINE

If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets the order of B-splines used to distribute the atomic property of interest on the lattice. This setting corresponds to parameter "A" in the equation above. B-splines of order 3 or higher lead to functions with smooth derivatives, and are appropriate for gradient-based methods. B-splines have finite support, and the cost per atom will increase with A3 for a three-dimensional lattice. The limiting case of A being unity corresponds to a simple binning function, whereas for large A, a Gaussian function is recovered. The effective width does not grow linearly with A, but it is rather the tails of the functions that grow. This implies that very large values for A are probably not a useful investment of CPU time. Note that the effective width of the B-spline can be thought of as setting an inherent resolution or averaging scale for a given atom in question, since it replaces a point function with a distribution. The choice for this keyword should therefore be made in concert with the choice of formal grid resolution.

DIFFRCALC

This keyword specifies how often to compute approximate fiber diffraction patterns for the whole system (excluding ghost particles in GC simulations → ENSEMBLE). The system is aligned according to an assumed fiber axis in the system (see DIFFRAXIS), and amorphous diffraction patterns using cylindrical coordinates (through Fourier-Bessel transform) are computed. The code currently assumes atomic scattering cross sections which are proportional to atomic mass with the additional modification that all hydrogen atoms are excluded from the diffraction calculation. Specifically, the atomic scattering function for heavy atom i is proportional to mi/mC with a proportionality constant yielding units of the square root of scattering intensity. It is zero for hydrogen atoms. See DIFFRACTION.dat for more details. As a cautionary comment it should be noted that these calculations are somewhat untested and that output should be carefully examined.

DIFFRRMAX

For diffraction calculations (→ DIFFRCALC), this specifies the maximum number of bins in the reciprocal radial dimension (r in cylindrical coordinates). The resultant bins will be centered around zero.

DIFFRZMAX

For diffraction calculations (→ DIFFRCALC), this specifies the maximum number of bins in the reciprocal axial dimension (z in cylindrical coordinates). The resultant bins will be centered around zero.

DIFFRRRES

For diffraction calculations (→ DIFFRCALC), this gives the resolution in the reciprocal radial dimension (r in cylindrical coordinates) in Å-1.

DIFFRZRES

For diffraction calculations (→ DIFFRCALC), this gives the resolution in the reciprocal axial dimension (z in cylindrical coordinates) in Å-1.

DIFFRJMAX

This defines the maximum order of Bessel functions to use in the Fourier-Bessel (Hankel) transform to generate the (fiber) diffraction pattern (→ DIFFRCALC). Note that the transform takes the product of actual and reciprocal radial coordinate as its argument. Hence, the maximum order will determine how meaningful the generated information for large values of inverse radial dimensions is. This soft cutoff will scale reciprocally with the size of the system in the radial dimension. These features arise due to the fact that Bessel functions of order n only contribute non-zero values beyond a (unitless) argument value of ca. n. Also note that the input file for the Bessel functions (see FMCSC_BESSELFILE) needs to provide the tabulated functions up to the necessary order.

DIFFRAXIS

For diffraction calculations (→ DIFFRCALC), it is possible (and usually meaningful and necessary) to use a fixed system axis as the assumed fiber axis. This is (naturally) particularly appropriate for single-point calculations on specific structures. The axis' x, y, and z components have to be provided as three floating point numbers. The length of the vector is not important. The axis will pass through the point defined (see DIFFRAXON). If this keyword is not specified, the program will identify the longest possible atom-atom distance in the system, and use the resultant axis. Note that this axis will not be constant with respect to the absolute (lab) coordinates, but that it is supposed to cover cases where changes in configuration are allowed (especially if rigid-body movement is permitted).

DIFFRAXON

This keyword specifies the point the (constant) axis (see DIFFRAXIS) for diffraction analysis (→ DIFFRCALC) will pass through. This will define the zero-point in the z-coordinate, and hence the origin of the cylindrical coordinate system. If this keyword is not provided, CAMPARI will assume the {0.0 0.0 0.0}-point for this (independent of specifications for the system origin).

REOLCALC

This keyword is only relevant in MPI replica exchange calculations (or parallel trajectory analysis runs using the same setup). It instructs CAMPARI to compute various overlap measures between the different Hamiltonians employed in the REMC/D run (see N_XXX_OVERLAP.dat). Note that this relies on the evaluation of the system energy at different conditions, i.e., Hamiltonians. Unless the only exchange dimension is temperature, CAMPARI makes the assumption that the energy has to be fully reevaluated for each condition, which means that there is a significant cost associated with the overlap calculation. Cutoffs and long-range corrections (see keywords CUTOFFMODE, LREL_MC, and LREL_MD) are always respected by these additional evaluations of cross- (or foreign) energies. In dynamics runs, an additional complication arises if neighbor list updates are performed infrequently (see NBL_UP). Here, CAMPARI enforces an extra update of neighbor lists that is always out-of-sync with the schedule of the simulation propagation (this is for technical reasons). The unfortunate consequence is that for identical random seed, trajectories are not going to be identical if NBL_UP is greater than 1 and overlap calculations are performed with different frequencies.
The user controls whether to calculate foreign energies across all replicas (see REOLALL). If only neighboring conditions are requested, output in N_XXX_OVERLAP.dat may be truncated or uninformative. Lastly, it is important to mention that the MC branch of the energy functions is used only in plain REMC calculations, and that in all other cases (including hybrid methods → DYNAMICS) the dynamics branch is used. This is important since cutoff and long-range treatments can easily be inconsistent between the two (see LREL_MC and LREL_MD).

REOLINST

This keyword is only relevant in MPI-Replica Exchange calculations (or parallel trajectory analysis runs using the same setup). This keyword requests instantaneous "foreign" energies to be written (see N_XXX_EVEC.dat). "Foreign" or "cross"-energies are simply the energies of the current structure evaluated at Hamiltonians different from the one generating the ensemble. Note that the user controls whether to calculate foreign energies across all replicas (see REOLALL). If only neighboring conditions are requested, a truncated vector (length 2 or 3) is provided in N_XXX_EVEC.dat. To facilitate frequent overlap analysis with sparser instantaneous output, this keyword is interpreted as a subordinated frequency for REOLCALC (as SCATTERCALC is relative to RHCALC).

REOLALL

This keyword is only relevant in MPI-Replica Exchange calculations (or parallel trajectory analysis runs using the same setup). It is interpreted as a simple logical which determines whether "foreign" energies are computed over all other or just the neighboring replicas (see N_XXX_EVEC.dat and N_XXX_OVERLAP.dat).

TRACEFILE

This optional keyword is relevant for the post-processing of two types of parallel simulation runs. First, if a parallel trajectory analysis run is performed (→ details elsewhere), it allows the user to supply a file with a running map of replicas to starting conditions. Details of format and interpretation are given elsewhere. The default map assumed by CAMPARI is the identity mapping 1..REPLICAS. If a trace file is provided, sets of step number and an updated map for that specific step are read. This is primarily meant to make replica exchange trajectories that are continuous in condition (i.e., have conformational jumps in them) continuous in conformation (i.e., afterwards they have jumps in condition in them). In such a case, the trace file is the history of replica exchange moves such as output by CAMPARI itself. CAMPARI will then recombine information from the input trajectories according to the trace. This means that all analyses performed are on the unscrambled trajectory that can of course also be written (→ XYZOUT). Naturally, this keyword can also be used to specify any other map for other applications, e.g., to create trajectories for obtaining bootstrap-type error estimates. The relation of step numbers in the trace file to frames in the trajectories is handled by keywords RE_TRAJOUT and RE_TRAJSKIP.
Second, if a serial trajectory analysis is performed, it can be used to post-process data from an parallel PIGS run. PIGS runs provide their own trace file. For this, the trajectories from individual replicas must be concatenated in numerical order. Unless the trace file itself is edited (its first column has the step number), keywords RE_TRAJTOTAL, RE_TRAJOUT, and RE_TRAJSKIP define the output settings for the original simulation run, and the settings must be matched exactly. For example, with 4 replicas, XYZOUT 50, NRSTEPS 1000, and EQUIL 500, each trajectory will have 10 snapshots. In trajectory analysis mode, the concatenated trajectory (40 snapshots) can then be supplied with settings of RE_TRAJTOTAL 10, RE_TRAJOUT 50, RE_TRAJSKIP 500, and NRSTEPS 40. The trace file is processed exclusively in the context of network-based analyses (see CCOLLECT, CMODE, output file STRUCT_CLUSTERING.graphml, and so on). Reading in the PIGS trace accomplishes the automatic removal and addition of (conformational) network links incurred by the PIGS protocol. Overlapping functionality is provided by keywords TRAJBREAKSFILE and TRAJLINKSFILE.

RE_TRAJOUT

If a parallel trajectory analysis run is performed (→ details elsewhere) and a file with the replica exchange history (trace) has been provided, or if a serial trajectory analysis run is performed and a file with the PIGS reseeding history (trace) has been provided, this keyword lets the user set the trajectory output frequency CAMPARI is supposed to assume for the supplied input trajectories (separate or concatenated). This is important because the trace is meant to use simulation step numbers that are not preserved in trajectory analysis mode (no step number or time information from input trajectories are read and used). For replica exchange, a successful unscrambling of the trajectories requires that the exchange trace is exhaustive at the level of the output frequency of this keyword. This means that it is sufficient to provide the current map of condition to starting structure for every snapshot in the input trajectories (more information can be supplied without harm, less information will lead to errors). For PIGS, the trace must contain information about all reseedings. In the case of a PIGS run being analyzed, keyword RE_TRAJTOTAL is also essential.

RE_TRAJSKIP

If a parallel trajectory analysis run is performed (→ details elsewhere) and a file with the replica exchange history (trace) has been provided, or if a serial trajectory analysis run is performed and a file with the PIGS reseeding history (trace) has been provided, this keyword lets the user set the equilibration period for trajectory output that CAMPARI is supposed to assume for the supplied input trajectories (separate or concatenated). This is important because the trace is meant to use simulation step numbers that are not preserved in trajectory analysis mode (no step number or time information from input trajectories is read and used). Both RE_TRAJOUT and this keyword are required for CAMPARI to correctly relate the frames in the trajectories to the step numbers in the trace file. Of course, it is also possible to edit the file with the trace to match the saved trajectory data exactly, and to then set RE_TRAJOUT and RE_TRAJSKIP to 1 and 0, respectively. In the case of a PIGS run being analyzed, keyword RE_TRAJTOTAL is also essential.

RE_TRAJTOTAL

If a serial trajectory analysis run is performed and a file with the PIGS reseeding history (trace) has been provided, this keyword lets the user set the length in numbers of snapshots for the contribution of an individual replica to the concatenated trajectory that CAMPARI is supposed to analyze (→ elsewhere). This is important because the trace is meant to use simulation step numbers that are not preserved in trajectory analysis mode (no step number or time information from the input trajectory is read and used). RE_TRAJOUT, RE_TRAJSKIP, and this keyword are required for CAMPARI to correctly relate the frames in the trajectories to the step numbers in the trace file.

CCOLLECT

This keyword controls the frequency with which a selected subset (see CDISTANCE and CFILE) of the trajectory data (typically in a trajectory analysis run → PDBANALYZE) is stored in a large array in memory for post-processing. Such analysis currently consists of different algorithms (→ CMODE) to identify structural clusters in the data and is performed after the last step of the run has completed. If set to something larger than the number of simulation steps (NRSTEPS), the clustering analysis is disabled (also the default).
Various output will be produced aside from information written directly to standard out or the log-file. The first and foremost is a list of cluster annotations per analyzed snapshot (→ STRUCT_CLUSTERING.clu) along with a helper script for the visualization software VMD (→ STRUCT_CLUSTERING.vmd). Furthermore, CAMPARI will produce a file representing the clustering as a graph in an xml-based (so-called "graphml") format (→ STRUCT_CLUSTERING.graphml). Taken together these files allow further analyses of the clustering, primarily those that take advantage of the fact that the clustering yields a complex network/graph. All clustering algorithms (→ CMODE) will write a summary of the determined clusters (usually involving at least the number of contained snapshots and a measure of size) to log-output. Also, they will - normally at the very end - give an empirical assessment of clustering quality. Currently, these numbers are meant primarily for developer consumption and the reader is referred to the source code to understand how they are computed. The exact progress index method is an exception as it does not explicitly record a clustering (the aforementioned output files are missing).
Note that structural clustering breaks the typical CAMPARI paradigm of "on-the-fly" analysis since the bulk of the CPU time for analysis will be invested only at the very end. Therefore, structural clustering will most often be used in trajectory analysis runs as it will be highly undesirable to risk an unclean termination of an actual simulation (certain algorithms for structural clustering require large amounts of memory and/or CPU time). Note as well that structural clustering should not be confused with the analysis of molecular clusters (see CLUSTERCALC and its corresponding output files).
A special remark is required for simulation runs using the MPI averaging technique. Similar to any use of the clustering functionality "on-the-fly", trajectory output should be generated in accordance with the setting for CCOLLECT (most easily by using MPIAVG_XYZ and a matching value of XYZOUT). This is so the clustering results can be annotated and understood at all. In an MPI averaging run, CAMPARI will then at each collection step gather data from all replicas and store them in an array allocated exclusively by the master process. The data arrangement is such that trajectories will be continuous and ordered by increasing replica number. The concatenation introduces spurious transitions that may affect subsequent computations. Data collection causes a synchronization and communication requirement absent in other types of MPI averaging calculations. At the end of the simulation, the resultant concatenated trajectory is analyzed exclusively by the master process, which - depending on settings and algorithms in use - may lead to severe imbalances in terms of both memory consumption and CPU time requirements. This should be kept in mind when using this approach across machines not sharing any memory. To enforce the complementary behavior of every identical replica analyzing its own trajectory, it may be possible to use a fake replica exchange run by using a single dummy (or nearly invariant) parameter for exchange.
Because the chosen set of degrees of freedom often is a superset of an unknown subspace of particular interest to the user, CAMPARI offers two common routes for a dimensionality reduction. These rely on standard linear algebra techniques and are available if i) the chosen proximity metric is not circular (this excludes options 1-2 for CDISTANCE); ii) the code was linked to a linear algebra library (LAPACK-compliant, see installation instructions for general information on linking libraries); and iii) there are more samples than variables (degrees of freedom). The reason that circular (periodic) data are currently not supported is that required measures variance and in particular covariance become somewhat empirical and laborious to compute. If this type of transformation is performed (→ PCAMODE), CAMPARI produces up to two output files, one containing the eigenvectors themselves (PRINCIPAL_COMPONENTS.evs) and another optional one containing the data matrix in the transformed space (PRINCIPAL_COMPONENTS.dat). The latter can be used to derive probability or free energy surfaces in reduced-dimensional spaces.

PCAMODE

If data for structural clustering are collected (→ CCOLLECT), this keyword instructs CAMPARI to calculate and perform a linear transformation on the collected data. As mentioned above, this option is not available for all measures of conformational distance. The linear algebra works straightforwardly for options 3 and 7 for keyword CDISTANCE and always involves centering the data first (subtraction of dimension-wise means). For options 4, 9, and 10 (local weights), the local weights are averaged, and the input data are scaled by dimension-wise average weights. The same scaling idea is used for option 8 (global weights). Lastly, the possibility of alignment of 3D coordinates (options 5, 6, and 10 → CALIGN) causes additional complications. The general strategy here is to first align all snapshots to the last one (static alignment), which may or may not be provide a meaningful description.
Five options are currently available:
  1. No transformation is performed
  2. Principal component analysis (PCA) is performed via single-value decomposition (SVD), and the eigenvectors of the covariance matrix are written to a dedicated output file. PCA works by identifying linear transforms of the centered data that collect maximal sample variance in as few components as possible. The principal components are normalized and orthogonal, i.e., have unit length and zero (linear) covariance. The latter should not be equated with a lack of correlation. Many nonlinear correlations between variables yield zero covariance. The amount of variance contained in the first few components can differ dramatically between data sets. The printed eigenvectors and eigenvalues are the only result of this analysis, i.e., the transform is not actually used.
  3. This is the same as the previous option with an important difference. Here, the original sample data are transformed and centered data in PCA space. The transformed data set is written to an additional output file. If keyword CREDUCEDIM is not zero, the original data are overwritten and lost and any algorithm relying on conformational distance evaluations thereafter will treat these as the simplest case (CDISTANCE becomes 7). This is because the weighting or alignment requests were taken care of before. The benefit of using CREDUCEDIM is to be able to obtain a more informative representation in a space of reduced dimensionality in an unsupervised fashion.
  4. Time structure-based independent component analysis (tICA) is performed, which is based on original work from the 90s. tICA solves the matrix equation ΤF=ΣFΛ, where Σ is the covariance matrix, Τ is a time-lagged and symmetrized covariance matrix (lag time is set by keyword CLAGTIME), F is the matrix of eigenvectors, and Λ is a diagonal matrix with eigenvalues, which correspond to the values of the autocorrelation function at the specified lag time for the transformed variables. Unlike in PCA, the eigenvectors do not form an orthonormal basis (rather, they satisfy FTΣF=ID). This means that unlike PCA the transformed data do not preserve values of Euclidean distances between points even if the full dimensionality is used. As in option 2, the printed eigenvectors and eigenvalues are the only result of this analysis, i.e., the transform is not actually used.
  5. This is the same to option 4 as option 3 is to option 2, i.e., the original sample data are transformed to tICA space and centered with the aforementioned options, implications, and consequences.
The default option is 1. Note that options 3 and 5 can generate extremely large output files. While PCA is performed via SVD of the data matrix, the covariance matrices are computed explicitly for tICA, which scales unfavorably with dimensionality. As mentioned above, when choosing options 3 or 5, CAMPARI also offers to perform a simplified clustering in a space of reduced dimensionality (→ CREDUCEDIM). Supplying data dimensions that are linearly dependent on other dimensions or have zero variance may cause the analysis to fail, in particular for tICA. An unusual exit from the respective LAPACK routine will be indicated by a warning.

CDISTANCE

If data for structural clustering are collected (→ CCOLLECT), this keyword defines what type of data to collect and how to define structural proximity. There are currently 10 supported options:
  1. This option is tailored toward the intrinsic degrees of freedom of a typical CAMPARI simulation that are also the essential internal degrees of freedom of most molecular systems, i.e. the molecules' dihedral angles. The values {φk} for a set of K dihedral angles are collected throughout the run. A list can be provided by using a dedicated input file (→ CFILE), otherwise most of CAMPARI's internal degrees of freedom are used (excluding those pertaining to the conformation of five-membered rings). The details of the set of eligible dihedral angles are controllable by keyword TMD_UNKMODE. More information can be found in the description of the input file. The distance between two states is given as:
    dl↔m = [ (1.0/K) · ΣkK ( (φkl - φkm) mod 2π )2]1/2
    Because dihedral angles are periodic (circular) quantities, a meaningful metric of proximity must account for boundary conditions, hence the "mod 2π" term. Dihedral angles-based clustering poses - aside from periodicity - the challenge that all considered degrees of freedom are bounded and that the strongest contribution to the signal will come from those torsions with large variance, which unfortunately are often the ones of least interest (for example sidechain torsions). Therefore, a careful selection of the subset to use is critical for an informative clustering. Like any other method, dihedral angle-based clustering is vulnerable to Euclidean distances in high-dimensional spaces becoming uninformative. Note that all dihedral angle-based proximity criteria are useful primarily for single molecules since relative intermolecular orientations are not representable whatsoever.
  2. This is identical to the previous option only that each dihedral angle is weighted by the combined effective masses (the associated diagonal element in the mass matrix, i.e., mass-metric tensor) of that very dihedral angle in the respective states l and m, i.e. {IMkl+IMkm}. The distance between two states will then be given as:
    dl↔m = [ (ΣkK (IMkl+IMkm) ) -1 · ΣkK (IMkl+IMkm) · ( (φkl - φkm) mod 2π )2]1/2
    This option attempts to remedy the problem with the previous one regarding the impact of "uninteresting" degrees of freedom. The weighting with the effective masses ensures that slow degrees of freedom (e.g. central backbone torsions) will contribute much more to the overall signal than sidechain torsions. These mass matrix-based weights are affected by the choice for ALIGN. An additional complication or feature incurred by this is that the weights are now variables themselves when considering sets of conformational distances. We call these types of weights locally adaptive. Dihedral angles describing disulfide bonds are supported but the presence of disulfide bonds destroys the notion of the effective masses (see CRLK_MODE for some background). The default weights for the Cα-Cβ-S-S and Cβ-S-S-Cβ torsions are simply set to 1.0. This means that a meaningful of this option in the presence of disulfide bonds requires setting CMODWEIGHTS to something other than 0.
  3. This option is largely identical to option 1. It carries all the same caveats with the exception of the periodicity of dihedral angles. Here, we expand each dihedral angle into its sine and cosine terms to construct a distance metric as follows:
    dl↔m = [ (0.5/K) · ΣkK (sin(φkl) - sin(φkm))2 + (cos(φkl) - cos(φkm))2]1/2
  4. This is the analogous modification of the previous option by introducing weights composed from the effective masses:
    dl↔m = [ 0.5 (ΣkK (IMkl+IMkm) ) -1 · ΣkK (IMkl+IMkm) · ( (sin(φkl) - sin(φkm) )2 + (cos(φkl) - cos(φkm) )2 ) ]1/2
    The same caveats and comments apply.
  5. This option is probably the most commonly used variant, the positional RMSD. The Cartesian position vectors {rk} for a set of K atoms are collected throughout the run. A list can be provided by using a dedicated input file (→ CFILE), otherwise all atoms in the system are used. The distance between two states is then given as:
    dl↔m = [ (1.0/K) · ΣkK ( rkl - RoTr(rkm) )2]1/2
    Here, RoTr is meant to indicate rotation and translation operators that superpose the {rk}m optimally with the frame provided by the {rk}l. This alignment uses the same quaternion-based algorithm mentioned elsewhere. Superposition (alignment) implies that the atomic RMSD is not necessarily a bona fide metric of distance as it is not guaranteed to satisfy dl↔m ≤ dl↔p + dp↔m, i.e., the triangle inequality. This is because the operator RoTr is different for computing dp↔m than it is for computing the other two distances. In reality, for similar structures, this is never really a problem in the context of clustering. RMSD-based clustering is - like any other method - vulnerable to Euclidean distances in high-dimensional spaces becoming uninformative and - in particular - to obscuring of the signal by uneven variances (a reason why very commonly terminal parts of polymer are excluded from such analyses). The alignment step for both this and the next option can be disabled with the help of keyword CALIGN (RoTR is then simply the identity operator). Without alignment external degrees of freedom become part of the distance criterion. The coordinate-based RMSD is generally difficult to use for sets of atoms spanning multiple molecules since intermolecular motion can easily provide most of the variance in the signal. In periodic boundary conditions, there is a particular difficulty of which image of a molecule to use. Keyword XYZ_REFMOL is supported in this context and can be use to circumvent this problem (although it should be kept in mind that there is no unique solution for assemblies of more than 2 molecules).
  6. This is similar to the previous option, and is only relevant if alignment is performed. Then, this option allows the user to split the atomic index sets used for alignment and distance computation, i.e., the alignment operator, RoTr, minimizes pairwise distances computed over an independent set of atoms that can either be a superset, subset or completely different set of atoms than the one specified via CFILE. Then, if we term the distance set {D} and the alignment set {A}, with {A} to be provided via ALIGNFILE, the distance between two states will be given as:
    dl↔m = [ (1.0/|D|) · Σd|D| ( rdl - RoTr{A}(rdm) )2]1/2
    Note that choosing disparate sets can easily destroy the fundamental meaning of alignment, i.e., the removal of differences caused purely by external (rigid-body) degrees of freedom. This in turn would almost certainly lead to violations of the assumption that members of different clusters are dissimilar, and can also eliminate the notion of similarity amongst members of the same cluster. Conversely, it can be useful in improving the signal-to-noise ratio for cases where one is interested in states populated by a specific part of a much larger system that moves as a single entity (specifically, states characterized by relative arrangements of parts of a system may emerge more clearly if alignment is performed on the whole entity, but distances are computed only over a small portion of interest). Note that errors in calculations relying on mean cluster properties computed for example in the tree-based algorithm or hierarchical clustering (→ CMODE) using mean linkage can easily become large if the two atom sets have little overlap. Specifically, a cluster of similar snapshots as determined by the distance set, which is constituted by elements with large differences in the alignment set, will produce deteriorating accuracy of, for example, computing a snapshot's mean distance to it. This is because the heterogeneity of the alignment operator is masked by the simplified algebra used to compute these properties in constant time. The general caveats for RMSD-based clustering mentioned for option 5 above remain relevant as well.
  7. Let us define a set of K interatomic distances, {rij} over unique atom pairs i and j. These distances are collected throughout the run. A list can be provided by using a dedicated input file (→ CFILE), otherwise a subset of randomly selected but unique interatomic distances is used. The number of randomly selected degrees of freedom is usually set to 3N where N is the number of atoms (it can be smaller for small N). In any case, the distance between two states will then be given as:
    dl↔m = [ (1.0/K) · ΣkK ( rij(k)l - rij(k)m )2]1/2
    I.e., the chosen distance metric is simply the root mean square deviation across the set of interatomic distances. Distance-based clustering inherently removes external degrees of freedom from the proximity measure, and it is therefore suitable to most applications. As with any other measure, Euclidean distances in high-dimensional spaces may become uninformative and results may be obscured by uneven variances.
  8. This is identical to the previous option only that each distance is weighted by the combined mass of the constituting atoms. The distance between two states will then be given as:
    dl↔m = [ (ΣkK (mi(k)+mj(k)) ) -1 · ΣkK (mi(k)+mj(k)) · ( rij(k)l - rij(k)m )2]1/2
    Here, mi denotes the mass of atom i. These (static) weights can be altered by changing masses, e.g., by a suitable patch, or by the dedicated facility (keyword CMODWEIGHTS).
  9. This is identical to option 7 above only that each interatomic distance is subjected to a locally adaptive weight. These weights increase the corresponding memory demands by a factor of 2 and are all initialized to be unity. It is necessary to use the dedicated facility (keyword CMODWEIGHTS) to make them meaningful. All localized weights available for interatomic distances require at least a window size parameter and a rule for how to combine weights from different snapshots. The latter is expressed as function g(x1,x2) specified by keyword CWCOMBINATION. The resultant functional form for pairwise distance between snapshots is:
    dl↔m = [ (ΣkK g(Ωklkm) ) -1 · ΣkK g(Ωklkm) · ( (rij(k)l - rij(k)m )2 ) ]1/2
    The same general caveats apply as for options 2 and 4 above.
  10. This is similar to options 5 and 9 above. Here, each of the 3K Cartesian coordinates, X, of a system of K selected atoms is subjected to a separate, locally adaptive weight. Due to the presence of these weights, pairwise alignment is currently not supported for this option. CAMPARI computes the Euclidean distance between snapshots, which means that any type of input data can be analyzed straightforwardly by transcribing the data set into a fake trajectory of atoms with each Cartesian coordinate corresponding to an input data dimension. The locally adaptive weights increase the corresponding memory demands by a factor of 2 and are all initialized to be unity. It is necessary to use the dedicated facility (keyword CMODWEIGHTS) to make them meaningful. As for option 9, weights require at least a window size parameter and a rule for how to combine them for different snapshots. The latter is expressed as function g(x1,x2) specified by keyword CWCOMBINATION. The resultant functional form for pairwise distance between snapshots is:
    dl↔m = [ 3(Σk3K g(Ωklkm) ) -1 · Σk3K g(Ωklkm) · ( (X(k)l - X(k)m )2 ) ]1/2
    For the weighting aspect, the same caveats apply as for options 2, 4, and 9 above. Due to the distance definition relying on absolute coordinates, the caveats mentioned for option 5, which relate to atoms sets encompassing multiple molecules, remain relevant as well.
The CAMPARI units for options 1-2 above are °, while the units for options 5-10 are all Å (more generally, in trajectory analysis mode, the units for options 5-10 will be those of the input data). This is relevant in understanding values of cluster sizes and similar parameters reported to standard log-output. Options 3-4 have no formal unit.

CFILE

If data for structural clustering or related analyses are to be collected (→ CCOLLECT), this keyword provides the path and location to an input file selecting a subset of the possible coordinates. For options 1-4 of the proximity measure, this file is a single column list of indices specifying specific system torsions (see elsewhere). For options 5, 6, and 10 it is a single column list of atomic indices (see elsewhere). Lastly, for options 7-9, it is a list of pairs of atomic indices (two columns, see elsewhere).

CALIGN

If structural clustering is performed (→ CCOLLECT), and an atomic RMSD variant is chosen as the proximity measure (→ CDISTANCE), this keyword can be used to specifically disable the alignment step that occurs before the actual RMSD of the two coordinate sets is computed. To achieve this, provide any value other than 1 (the default) for this on/off-type keyword. Note that alignment must be disabled for option 10 to be available for CDISTANCE.

CWCOMBINATION

If data for structural clustering or related analyses are collected (→ CCOLLECT) and locally adaptive weights are in use, this keyword sets the function to be used for combining locally adaptive weights from different snapshots. This is relevant for options 2, 4, 9, and 10 for CDISTANCE. The input is interpreted identically to that for keyword ISQM, i.e., values of -1, 0, and 1 give harmonic, geometric, and arithmetic means, respectively. Values outside of this range can be expected to degrade performance due to expensive powers being evaluated. Special options avoiding most arithmetics altogether simply use the smaller or larger of the two values. In reality, these correspond to the limits of negative and positive infinity, and they are available by selecting -999 and 999, respectively.

CPREPMODE

If data for structural clustering or related analyses are collected (→ CCOLLECT), this keyword offers the user a choice to perform simple data preprocessing operations. Specific options are as follows and are all applied independently for all data dimensions:
  1. The data are untouched.
  2. The data are centered (subtraction of the means).
  3. The data are centered and scaled by the inverse standard deviation. The resultant data are often referred to as standard or Z-scores.
  4. The data are smoothed by cardinal B-splines of specified order. This operation scales linearly with this order, and it is therefore computationally wasteful to specify very large values (the long tails of the polynomial functions contribute little to the smoothing). Note that virtually no result obtainable from these data is preserved upon smoothing (except the mean), which means that results may become difficult to interpret.
  5. The data are centered and smoothed.
  6. The data are converted to Z-scores and then smoothed.
Note that centering leaves distance relations untouched. Centered data generally offer improved precision when computing dependent quantities relying on summation.

CSMOOTHORDER

If data for structural clustering or related analyses are collected (→ CCOLLECT), data smoothing may be in use. It is enabled by certain choices for keywords CMODWEIGHTS and CPREPMODE. Smoothing currently relies on cardinal B-splines, and this keyword lets the user specify the order of these functions. Cardinal B-splines are also used elsewhere (keywords BSPLINE for the PME method and EMBSPLINE for structural density restraints), but the keywords are completely independent.

CMODWEIGHTS

If data for structural clustering or related analyses are collected (→ CCOLLECT), and either static or locally adaptive weights are in use (options 2, 4, 8-10 for CDISTANCE), it is possible to override the default weights with data-derived information obtained in post-processing. This is required for options 9 and 10 to be meaningful (the locally adaptive weights for these cases are all initialized to be 1.0). Depending on the chosen option, additional parameters may be required. A detailed list is as follows:
  1. This leaves all weights unchanged.
  2. This option computes local estimates of the root mean square fluctuation (RMSF) and takes the inverse as the resultant, locally adaptive weight. The window size is chosen by the user. The definition of "local" by proximity in the trajectory itself implies that the data are ordered, usually along a time or similar progress axis. Note that this option is invariant only to data translation (centering). The windowed MSF are computed using an incremental algorithm that has constant cost with window size.
  3. This replaces weights with weights derived from the autocorrelation function (ACF) evaluated at fixed lag time. The weights are static, i.e., they can be understood as a pre-scaling of the data. For dimensions with a negative ACF at the chosen lag time, the weight is explicitly adjusted to zero, which means that the effective dimensionality can be reduced considerably. As second moments, ACF values are noisy and generally more reliable at short lag time. For options 2 and 4 for CDISTANCE, the resultant weight is always the larger of the two obtained for sine and cosine terms. The ACF is invariant under data translation and global scaling operations.
  4. This option computes a composite weight by taking the square root of the product of the ACF at fixed lag time (as for option 2) and the inverse RMSF over a window of specified size (as for option 2).
  5. This option defines locally adaptive weights based on crossings of the global mean. Specifically, for each dimension, the global data mean is computed. Over a window of a user-defined size, it is then counted how many times the value of that dimension crosses the mean. Each data point receives a weight based on a window centered at this point in terms of the trajectory. The definition of "local" by proximity in the trajectory itself implies that the data are ordered, which is most often but not necessarily by time. Because it is possible that the count is zero, the resultant, locally adaptive weights are computed as (ncross+a)-1, where "ncross" is the aforementioned number of crossings of the global mean and "a" is a user-defined buffer parameter (see keyword CTRANSBUF). For options 2 and 4 for CDISTANCE, the resultant weight is always the larger of the two obtained for sine and cosine terms. The idea behind this type of weight is to deemphasize data dimensions sampling roughly symmetric distributions with a single peak and to emphasize data dimensions sampling multimodal distributions with locally small variance. False negatives can be produced if the global mean happens to coincide with one of the peaks of a multimodal distribution. These weights are exceptionally simple, can be computed efficiently and with high accuracy for large data sets, and require no additional parameters beyond the window size. They are also invariant for data translation and global scaling.
  6. This option is the same as the previous one (#4) except that the data are smoothed for the purpose of generating weights. This leaves the original data untouched, i.e., it does not imply data smoothing in general (see CPREPMODE for the latter). The smoothing entails an additional parameter, viz., CSMOOTHORDER.
  7. This option is a combinations of options #2 and #4. The final, locally adaptive weights correspond to the square root of the static weights derived from the ACF at fixed lag time and the weights derived from crossings of the global means within windows of user-defined size.
  8. This option is the same as the previous one (#6) except that the data are smoothed for the purpose of generating the local component of the weights (based on crossings of the mean). This does not imply data smoothing in general. The smoothing entails an additional parameter, viz., CSMOOTHORDER.
  9. Similar to option #4, this defines locally adaptive weights based on counting crossings. Here, a histogram is created for each data dimension (fixed number of 100 bins). From the histogram, CAMPARI automatically locates minima in the histogram (at least 3 bins to either side have to have larger counts). Over a window of user-defined size, crossings of any of these minima are counted, and the weight is constructed as wmax/(ncross+1). Here, wmax is an adjusting weight. Each minimum splits the data into two fractions of unequal size, and wmax is the maximum of the smaller fractional populations across all minima. If no minima are found, this option reverts to option #4 for the dimension in question. The histogram construction and minima parsing mask many parameters that cannot be controlled by the user at the moment. Histograms are constructed in a way that makes these weights invariant for shifted and scaled data. This option is marred primarily by the lack of both robustness and significance of the minima detection procedure. The meaning of keyword CTRANSBUF is preserved in exactly the same way as for option #4.
  10. This option is the same as the previous one (#8) except that the data are smoothed for the purpose of generating weights. This leaves the original data untouched, i.e., it does not imply data smoothing in general. The smoothing entails an additional parameter, viz., CSMOOTHORDER.
Note that the conversion of locally adaptive weights to static ones by this keyword will not reduce CAMPARI's memory footprint. When using any of the above options computing locally adaptive weights in conjunction with option 8 for CDISTANCE, the assumed window size is taken to be the entire trajectory, i.e., the weights remain static.

CWINDOWSIZE

If data for structural clustering or related analyses are collected (→ CCOLLECT), and certain types of locally adaptive weights are in use, this keyword sets the window size (in numbers of snapshots) from which to obtain the weight. Each snapshot is given a weight derived from data in a window centered around that point. This makes sense primarily if the data are in a specific order, most often they are assumed to be sorted by time. Points toward the beginning (or end) of the data set all obtain the same weight as the first (or last) snapshot to have access to a complete window. This implies that windows should generally be much smaller than the data set length (they can at most extend to half the data set length). This keyword is relevant for locally adaptive weights based on variances and transition counts that can be selected via CMODWEIGHTS.

CLAGTIME

If data for structural clustering or related analyses are collected (→ CCOLLECT), the autocorrelation function (ACF) at fixed lag time can play a role, and this lag time is set by CLAGTIME. This is relevant if either static or locally adaptive weights are in use (options 2, 4, 8-10 for CDISTANCE), or if time structure-based independent component analysis (tICA) is performed (see PCAMODE). This keyword sets the time (in numbers of snapshots) to be used for this purpose.
In the case of weighted distance functions, the ACF is evaluated for each dimension independently and assumes a single, generating process:
ACF(τ) = [ ΣN-τ(X(k)(n)-μ(k))(X(k)(n+τ)-μ(k)) ] / [σ(k)2(N-τ)]
Here, the global data mean and variance, μ(k) and σ(k)2, are estimated directly from the data for each dimension. Note that fewer data are available for large τ. Importantly, negative values for the ACF are all set exactly to zero meaning that these data dimensions are eliminated from distance evaluations. When applied to dihedral angles (options 2 or 4), the ACF is always evaluated separately for sine and cosine terms to avoid ambiguous definitions of variance for circular variables. The weight is then set to the larger of the two values. In case of tICA, the ACF features as a time-lagged covariance matrix that is computed for simple, centered data (no circular variables, no pairwise alignment, no locally adaptive weights). No corrections and truncations are applied to this matrix.

CREDUCEDIM

If data for structural clustering are collected (→ CCOLLECT), and a linear transformation is computed and applied (→ PCAMODE), this keyword allows the user to elect to run all further post-processing (→ CMODE) on a dataset of reduced dimensionality that corresponds to the first NV data vectors in the transformed space, where NV is set by the choice for this keyword. The components are sorted from largest to smallest eigenvalues such that the maximum amount of variance (PCAMODE is 3) or autocorrelation (PCAMODE is 5) is included.
Note that the transformed data are interpreted as simple, aperiodic signals, i.e., none of the peculiarities for different choices of CDISTANCE are considered any longer (CAMPARI internally converts everything to CDISTANCE being 7, which may lead to confusing output regarding units, etc). Specifically, for options 4, 9, and 10 for CDISTANCE, the underlying locally adaptive weights are averaged, and the data are pre-scaled by these averages. This means that use of this keyword for those cases changes more than just the dimensionality. Similarly, for options 5 and 6, if alignment is requested, this alignment is performed as a preprocessing step, and the last snapshot of the data is used as reference. Furthermore, for option 6, only the atom set chosen for distance evaluations is retained, and this is the set to eliminate further dimensions from with the help of this keyword. Note that Euclidean distances are invariant for the full-dimensional transformed data set relative to the original data set in PCA (PCAMODE is 3) but not in tICA (PCAMODE is 5). This of course applies only to the linear transformation and not to any possible preprocessing operations.
If no linear transform is computed, or if the choice for PCAMODE implies that the data transform is not actually computed, this keyword can be used to simply discard dimensions at the end of the internal list of dimensions. This is supported for specialized applications and should not be used unless absolutely needed (use CFILE to control dimensionality precisely). This option does not work with any distance measure requiring alignment. In all cases, if CREDUCEDIM is not specified or set to too large a value, data processing will proceed with the original data and the original size. If linear transforms have been computed, the transformed data are simply written to output file PRINCIPAL_COMPONENTS.dat but not used otherwise.

CTRANSBUF

If data for structural clustering or related analyses are collected (→ CCOLLECT), and weights are in use (options 2, 4, 8-10 for CDISTANCE), it is possible to use alter the definition of weights relying on counts of crossings (see option #4 and following for CMODWEIGHTS). In a general functional form of w = (n+a)-1, the offset or buffer parameter a is set by this keyword. The default value is 1. Large values will lead to weights with low sensitivity. The limit of CTRANSBUF approaching zero will lead to cases with n=0 receiving all the weight, which is not generally useful.

CMODE

If data for structural clustering are to be collected (→ CCOLLECT), this keyword allows the user the specify the algorithm by which the accumulated data are to be clustered. Before going into detailed options, a few general words are in order:
  1. CAMPARI strives to allow the geometric and other net quantities of a collection of snapshots to be computable irrespective of which metric of proximity is chosen (→ CDISTANCE). For options 3 and 7 this is trivial. For option 1, periodicity has to be accounted for. This is solved approximately by i) making sure the proper image of an added snapshot is considered, ii) adding appropriate periodic shift to the geometric center increments each time a boundary violation is found after updating. Other transforms are corrected accordingly. Options 2 and 4 incur the use of several additional cluster sums (means) due to the changing weights (for details, the reader is referred to the source code). For option 5 (atomic RMSD), the first member of a cluster defines a reference frame. This frame is used for alignment of all subsequently added frames (therefore, the definition and all derived quantities are approximate although the error is usually negligible for small clusters). Options 6 and 8 are the corresponding mass-weighted equivalents of 5 and 7 and work by storing the mass-weighted geometric center, i.e. for option 6, Mt-1 m • xk is aligned and accumulated. Here, "•" denotes the Hadamard product (element-by-element multiplication), m is the mass vector (three entries per atom with identical mass), and Mt is the sum of all masses of atoms in the coordinate set.
  2. With the geometric center being defined, certain properties of a cluster are computable at constant cost with respect to cluster size. For example, the average distance from the center ("radius") is given as: R2 = N-2 · [ N · ΣkN · xk2 - (ΣkN · xk)2]
    xk denotes the coordinate vector belonging to the kth member of the cluster. Other properties such as the mean snapshot-to-snapshot distance ("diameter") are similarly available. All that is required is that (in the simplest case) each cluster accumulates its linear sum vector and squared sum during construction.
The above is crucial for any clustering algorithm attempting to provide an assessment/annotation of the results and/or to implement refinement steps. The currently implemented options are as follows:
  1. The data are clustered according to the leader algorithm. This is a very simple algorithm that sequentially scans the data. Each new snapshot is compared to the center snapshots of preexisting clusters and added to the first one for which a provided distance threshold is satisfied (→ CRADIUS). If no such cluster is found, a new cluster is spawned. Results will be input order dependent and clusters will have ill-defined "centers" since the central snapshot is set at the time the cluster is spawned and remains unchanged. Processing direction(s) can be chosen with the auxiliary keyword CLEADER.
  2. The data are clustered according to a modified leader algorithm. This works very similarly to the standard leader algorithm with two important modifications. First, each new snapshot is compared to the current geometric center of preexisting clusters to evaluate the threshold criterion. Second, the result is (optionally → CREFINE) post-processed and snapshots belonging to smaller clusters that would also satisfy the threshold criterion for a larger cluster are transferred to that larger cluster. There are exactly two passes over the data of this refinement step (iteration is difficult and time-consuming due to continuously changing cluster centers). Processing direction(s) can be chosen with the auxiliary keyword CLEADER and the threshold criterion is set via CRADIUS. Modified leader-based clustering tends to generate fewer clusters compared to the standard leader algorithm due to better cluster centers. Due to centers changing position, the maximum snapshot-to-snapshot distance is no longer guaranteed to be below twice the value for CRADIUS (although in typical scenarios violations are very rare).
  3. The data are clustered according to a hierarchical algorithm. In theory, a hierarchical algorithm works by first creating a sorted list of all N(N-1)/2 unique snapshot-to-snapshot distances. Starting with the shortest distance, the two constituting snapshots do one of the following:
    1. They spawn a new cluster (if they are both unassigned and the threshold criterion is fulfilled).
    2. They merge the two clusters they belong to (if they are both assigned and the threshold criterion is fulfilled).
    3. The cluster the previously assigned snapshot is part of is appended with the unassigned snapshot (if one of them is unassigned and the threshold criterion is fulfilled).
    4. They terminate the algorithm (if the threshold criterion is not fulfilled).
    In theory, it is possible to terminate the algorithm after a certain number of clusters has been generated, but CAMPARI does not offer this (yet). Termination occurs strictly by size threshold, i.e., as soon as the next considered distance is larger than twice the value for CRADIUS. Note that the threshold criteria mentioned above for what happens to the two constituting snapshots rely on the setting for CLINKAGE. All snapshots that remain unassigned after clustering are interpreted to spawn their own clusters of size 1.
    Because the problem as stated is intractable for large datasets, CAMPARI uses a dedicated scheme to help keep the computation as feasible as possible. In the first step, a snapshot neighbor list is generated that uses a truncation cutoff set by CCUTOFF. The neighbor list generation uses a pre-processing trick that aims to minimize the number of required distance calculations. This pre-processing step relies on a truncated leader algorithm whose target (threshold) cluster size is set by the (borrowed) keyword CMAXRAD. The resultant clusters are then used to screen groups of snapshot pairs and to exclude them from distance computations. Unfortunately, the problem of dimensionality often renders this procedure worthless. In high-dimensional spaces → CFILE, volume grows with distance so quickly that the distance spectrum becomes increasingly δ function-like, and in turn becomes unsuitable for exploiting additive relationships. This stems from conformational distances having a rigorous upper bound for systems in finite volume and with fixed topology. The situation is obfuscated further if many of the dimensions are tightly correlated (such that the effective number of dimensions is indeed lower). Alternatively, this neighbor list can be read in from a previously obtained file (→ NBLFILE). The neighbor list is then further truncated to exactly match the size threshold specified via CRADIUS. For the algorithm to work properly, CCUTOFF has to be at least twice the value of CRADIUS. From this truncated list, a global list is created and sorted according to size. This can be quite memory-demanding. The global list is then fed into the algorithm as described. The results of hierarchical clustering depend very strongly on the linkage criterion (→ CLINKAGE).
  4. The data are arranged according to a method described in detail elsewhere (→ reference). Briefly, by using a specified criterion of distance, either the exact minimum spanning tree (MST) or an approximation to it is constructed for a graph. This graph is the complete graph constituted by all trajectory snapshots (vertices) and the N·(N-1)/2 unique, pairwise distances (weighted edges). Provided a certain starting snapshot, the spanning tree is used to generate a sequence of snapshots (progress index) in which a snapshot added has the minimum distance to any other snapshot that has already been added, i.e., it is the object nearest to the current set of objects in the sense of minimum linkage (as discussed in a different context elsewhere). The complete progress index is then simply a sequence of snapshots that is likely to group similar objects together. This assumes that the phase space density sufficiently is inhomogeneous, i.e., there are enclosed regions (basins) that are sampled preferentially and that consequently have higher point density than the regions connecting them. It is important to keep in mind that the chosen distance may transform the full, underlying phase space.
    The idea of this method is to provide an annotation function for the progress index that contains kinetic (or effectively kinetic) information. This assumes that the evolution of the system is incremental and happens on a continuous manifold. Therefore, apparent jumps in phase space such as those introduced by the replica-exchange methodology may diminish the interpretability of the results obtained with this algorithm (unlike purely structural clustering algorithms). There are a few possible annotation functions, and they are discussed further in the documentation of the corresponding output file. For practical concerns, there is a methodological choice to pick either the exact or the approximate scheme (→ CPROGINDMODE) in addition to providing a starting snapshot (→ CPROGINDSTART). There are further keywords associated exclusively with this methodology (see CPROGINDRMAX, CPROGMSTFOLD, CPROGINDWIDTH, CBASINMIN, and CBASINMAX) as well as auxiliary keywords overlapping with other approaches (see CPROGINDMODE for details).
  5. The data are clustered according to a tree-based algorithm (→ reference) that shares architectural similarities with the BIRCH clustering algorithm. The BIRCH scheme is focused on achieving i) minimal IO cost and processing of datasets that exceed the available memory in size; ii) time that increases quasi-linearly with the size of the dataset; iii) stable clusterings (invariant to input size or repeated application) that require only local information. To do so, a hierarchical tree is constructed consisting of several levels down to the outermost leaf nodes which represent the final clustering. The tree is built incrementally while the data are scanned. In the first pass, each new data point will propagate up the tree (from the root) and be associated with the closest member of the set considered or (at the leaf level) spawn a new cluster in case the (single) threshold criterion cannot be satisfied. Upon page size violations, a split is induced that may propagate all the way down to the root. The criteria for the splits are governed by considerations of consumed memory and disk space ("page size"). The initial result is an approximate clustering that is no longer representing individual data points, but represents sets of points as so-called clustering feature (CF) vectors (size, linear sum, squared sum, see above). The clustering is made stable by post-processing the initial tree in two steps: first, leaf nodes are reclustered with a hierarchical scheme; second, all data points are sequentially redistributed into the tree resulting from the previous step (this makes the clustering stable and eliminates a specific type of error of an identical data point ending up in two different clusters).
    The tree algorithm implemented in CAMPARI differs from the BIRCH scheme - amongst other changes - by dropping the requirements i) and iii) mentioned above. The entire dataset is kept in memory. The tree is assumed to be of a set height (number of hierarchical levels → BIRCHHEIGHT) that span a provided range of threshold criteria (upper bound set by CMAXRAD). After that, the algorithm will proceed in the first pass by choosing - at each level up to the penultimate one - the closest cluster and subsequently scanning only the child clusters of the chosen one. If the threshold at the given level is violated, a new cluster is created at that level. Path searching will still continue using the children of the nearest cluster even if the threshold is not satisfied. This defines a unique path through the tree. Due to children of a parent being able to probe a larger phase space volume than the parent itself, it can occur that failed assignments recover further up the tree. In such a case the recovery child is linked with the newly created clusters, and now possesses an extra parent. The first pass provides a fixed tree up the penultimate level. In additional passes, the procedure is repeated with modifications for the leaf level and any additional levels requested by BIRCHMULTI (moving in toward the root). The modification is that the clustering at the level in question is reset (if existing), and then populated while leaving the relevant rest of the tree (toward the root) fixed. This ensures that the input order dependency (clustering stability) is reasonable and that outliers are unlikely to occur for all levels of interest. Note that levels not refined this way should not be viewed as clusterings of the same quality (they are also not reported anywhere).
    The employed definition of proximity is that of the distance of the snapshot to the geometric center of the cluster. This leads to centroid drift while clusters are created at all levels, and means that assignments can still fail even in the second pass (handled identically to the first pass in such a case). We do keep a list of indices into the dataset associated with each cluster to be able to later quickly access that information. As for refinement, the challenge is to find protocols that do not exceed the time/space complexity of the algorithm itself. Currently, there is only one type of optional refinement step that will locally merge leaf clusters that have different, but proximal parent clusters, if the diameter of the joint cluster decreases upon merging (relative to the individual values).
    The tree-based algorithm is extremely fast (especially if refinement is skipped), and will generate more clusters than the leader algorithm with the same setting for CRADIUS. However, the cluster distribution is altered nonuniformly (the largest clusters in the tree-based algorithm will often be larger, but the number of very small clusters (1-5 snapshots) will increase substantially, especially for large height). Overall, the clusters tend to be substantially tighter. In essence, the multiple hierarchical levels act as a layered array of filters that creates a resultant net pore size that is smaller than any one of the filters by themselves.

CCUTOFF

If data for structural clustering are to be collected (→ CCOLLECT), and an algorithm is used that requires a rigorous snapshot neighbor list (currently either hierarchical clustering or the exact variant of the progress-index based scheme → CMODE), this keyword defines the cutoff distance for said neighbor list. It is very critical to choose an appropriate (as small as possible) value for this parameter as otherwise CAMPARI will both run out of (virtual) memory and create humongous files that are written to disk. Note that even with a minimal setting, the problem of computing and storing the neighbor list can very easily become intractable. Often simulation data in high-dimensional spaces will be clustered very unevenly in space meaning that multiple "length scales" in distance space matter. This is detrimental to a neighbor list relying on defining a single, specific length scale through CCUTOFF.

NBLFILE

If data for structural clustering are to be collected (→ CCOLLECT), and an algorithm is used that requires a rigorous snapshot neighbor list (currently either hierarchical clustering or the exact variant of the progress-index based scheme → CMODE), this keyword can be used to provide name and location of an input file in the appropriate format. CAMPARI uses the versatile binary NetCDF format for this purpose, and consequently the code needs to be linked to the NetCDF library for this option to be available (see installation instructions). Most commonly, this type of file will have been created by CAMPARI itself (it is automatically written if the code is linked against NetCDF and if an algorithm is used that requires a neighbor list → corresponding documentation). This keyword is primarily meant to circumvent the costly neighbor list generation in subsequent applications of the algorithm (for instance, with different settings for CRADIUS).

CRADIUS

If structural clustering is performed (→ CCOLLECT), and an algorithm is used that uses a distance (span) threshold criterion (→ CMODE), this keyword sets the value for said threshold criterion. For leader-based clustering this is either the distance from the center snapshot (standard leader) or from the current geometric center (modified leader) and therefore constitutes a maximum cluster radius. For hierarchical clustering, twice this value is the maximum distance of any two snapshots to be part of the same cluster, so again CRADIUS will control the maximum cluster radius. For tree-based clustering, this keyword again sets the maximum distance from the current geometric center. Values are to be provided in Å for proximity measures 5-10, unitless for 3-4, and in degrees for 1-2 (→ CDISTANCE).

CREFINE

If structural clustering is performed (→ CCOLLECT), this simple logical keyword lets the user control whether to apply any possible refinement strategies to the initial clustering results. Currently, there are two such procedures: for the modified leader algorithms, a refinement procedure is available which redistributes polyvalent snapshots to larger clusters. For the tree-based algorithm (for descriptions of these methods see elsewhere), a possible refinement consists of a (noniterative) merging of clusters with sufficient overlap.

CLEADER

If structural clustering is performed (→ CCOLLECT), and a leader-based algorithm is used (→ CMODE), this keyword allows the user to alter the processing directions of the leader algorithm by the following codes:
  1. The collected trajectory data are processed forward. Clusters are searched backward (starting with the most recently spawned one).
  2. The collected trajectory data are processed forward. Clusters are searched forward (starting with the one spawned first).
  3. The collected trajectory data are processed backward. Clusters are searched backward (starting with the most recently spawned one).
  4. The collected trajectory data are processed backward. Clusters are searched forward (starting with the one spawned first).
Note that the default is option 1, i.e. to process trajectory data forward, but to search clusters backwards. If the underlying data are continuous in time this may be more appropriate as it favors kinetic proximity in addition to the structural proximity criterion defined by CRADIUS. Note that this keyword also controls the trajectory processing direction in tree-based clustering (→ CMODE).

CLINKAGE

If structural clustering is performed (→ CCOLLECT), and the hierarchical algorithm is used (→ CMODE), this keyword allows the user to choose between different linkage criteria:
  1. Maximum linkage: Appending a cluster with a snapshot implies that the new snapshot is less than twice the value for CRADIUS away from all snapshots currently part of the cluster. For merging two clusters, maximum linkage implies that all possible inter-cluster distances satisfy the threshold condition. This creates clusters with an exact upper bound for their diameter (maximum intra-cluster distance) and therefore resembles leader clustering.
  2. Minimum linkage: Appending a cluster with a snapshot implies that the new snapshot is within a distance of twice the value for CRADIUS of at least one snapshot already contained in the cluster. Merging two clusters implies that at least one inter-cluster distance satisfies the threshold condition. With a minimum linkage criterion clusters no longer have a well-defined radius and tend to get very large unless tiny values are used for CRADIUS. This is rarely a useful option for molecular simulation data.
  3. Mean linkage: Appending a cluster with a snapshot implies that the snapshot is within a distance of CRADIUS of the current geometric center of the cluster. Merging two clusters implies that their respective geometric centers are within a distance of CRADIUS of one another. This will create clusters that no longer have a rigorous upper bound for the intra-cluster distance and therefore resembles the modified leader algorithm.
Note that the default is option 1, i.e. to use maximum linkage.

CMAXRAD

If structural clustering is performed (→ CCOLLECT), and the tree-based algorithm or the approximate progress index-based scheme is used (→ CMODE), this keyword sets the upper distance threshold value for the hierarchical tree, i.e., it corresponds to the coarsest threshold used outside of the (virtual) root (see BIRCHHEIGHT for additional details).

BIRCHHEIGHT

If structural clustering is performed (→ CCOLLECT), and the tree-based algorithm or the approximate progress index-based scheme is used (→ CMODE), this keyword sets the number of hierarchy levels in the algorithm. Briefly, the tree-based algorithm works by defining a series of threshold criteria (set by interpolating between CRADIUS and CMAXRAD) that define hierarchical levels. Each snapshot follow a specific path through the tree structure that is defined by identifying the closest existing cluster at each hierarchy with only those searched that belong to the parent cluster at the next higher level. The base of the tree is never counted as it always encloses all snapshots, so by specifying 1 for BIRCHHEIGHT one can recover an algorithm that is - in its basic outline - very similar to the modified leader scheme (see CMODE).
Larger numbers of level generally lead to the formation of more clusters. This is because of a specific type of error that is linked to the children of a cluster (i.e., a set of clusters at the next finer level) overlapping with children from a nearby cluster. If a snapshot then proceeds through such a hierarchy on a path exploring only the children of a single cluster, inevitably chances increase that an actual, appropriate target cluster at the finest level is missed. Then, a new cluster at the finest level is likely to be spawned. In terms of the snapshots contained, this new cluster could theoretically be combined with other clusters without the maximum intracluster distance ever exceeding the distance threshold.
To combat these errors, CAMPARI refines the results obtained via tree-based clustering by applying a merging scheme to all pairs of clusters that belong to parent clusters at the next higher level that have sufficiently close geometric centers themselves. However, the merging requirement is extremely stringent: the average intracluster distance has to decrease upon merging. While it would be possible to apply alternative merging criteria, those would either be too expensive to compute (remember that the average intracluster distance is available in constant time with respect to the cluster sizes) or would run the risk of diluting the threshold criterion and creating clusters that contain severe outliers.

BIRCHMULTI

If structural clustering is performed (→ CCOLLECT), and the tree-based algorithm or the approximate progress index-based scheme is used (→ CMODE), this keyword sets the number of hierarchy levels to refine during the second stage of the algorithm. Normally, only the most fine-grained level is populated in a second pass through the data. This leaves all levels closer toward the root in a less refined state. By specifying a value for BIRCHMULTI that is larger than the default of zero, the user requests CAMPARI to perform additional passes proceeding from the leaf level toward the root. The (virtual) root and the level with the coarsest actual threshold are both excluded from this refinement. The output in output file STRUCT_CLUSTERING.clu is adjusted to provide the correct number of coarse-grained trajectory annotations. Other analyses, unless specified otherwise, are only performed for the leaf level clustering (network).

CADDLINKMODE

If structural clustering is performed (→ CCOLLECT), this keyword allows the user to request different modifications of the link (edge) structure of the derived network (graph). This can be useful because the transition counts usually suffer from poor statistics for many if not most links. This can cause problems, e.g., by splitting the graph into several strongly connected components or by creating dramatic sensitivities of network-derived properties (such as the steady state) on very few elements of the transition matrix.
The available options are as follows:
  1. The network is left as is.
  2. Strongly connected components are identified using Tarjan's algorithm. They can result from supplying a file with trajectory breaks or a trace file for an MPI PIGS calculation. Any one-way links between different components are augmented with the reverse transition. The floating point weight for this reverse link is set by keyword CLINKWEIGHT. If there is no link in either direction, multiple components will remain.
  3. Any clusters (vertices) without any observed self-transitions (self-loops) are augmented with a self-transition with a floating point weight of CLINKWEIGHT. In a Markov model sense, this will increase residence times and populations for the augmented nodes.
  4. This is a combination of options 1 and 2.
  5. The count matrix is symmetrized. If one of the two corresponding elements is zero, this creates a new reverse link with the same properties as the existing forward one. If both directions are already populated, this means that the transition with a lower number of observed counts is augmented to match the exact count number of the more populated one. This option ignores keyword CLINKWEIGHT. This is different from symmetrization achieved by adding the entire transition count matrix obtained from the same trajectory reversed in time. Both variants should imply detailed balance.
  6. This is a combination of options 2 and 4.
Note that some properties reported in output file STRUCT_CLUSTERING.graphml will be inferred for added links. In particular, the reported mean square displacement values are copied (minimally required reverse transitions in options 1 and 3), set to half the value of CRADIUS squared (self transitions in options 2, 3, and 5), or constructed as a weighted average (reverse transitions in option 4 and 5). Most importantly, any substantial change to the link structure means that the network-derived equilibrium distribution (steady state) is altered. It therefore recommended to use CEQUILIBRATE in conjunction with this keyword. This ensures that information about the updated steady state is contained in output files such as STRUCT_CLUSTERING.graphml or MFPT_CFEP_xxxxxxxx.dat.

CLINKWEIGHT

If structural clustering is performed (→ CCOLLECT), and the addition of links (edges) to the derived network (graph) is requested (→ CADDLINKMODE), this keyword sets the floating-point weight for some of the added links (see above for details). Note that the basic unit is an (integer) count of observed transitions in the input trajectory. The default is therefore 1.0.

CPROGINDMODE

If structural clustering is performed (→ CCOLLECT), and the progress index-based algorithm is used (→ CMODE), this keyword allows the user to choose between the exact (1) and the approximate scheme (2 = default). The two cases differ as follows:
  1. In the exact scheme, CAMPARI attempts to construct the true minimum spanning tree (MST) for the trajectory of interest. This is achieved by following the same setup procedure used in hierarchical clustering (described under option 3 to CMODE), i.e., a heuristics-based scheme is used to construct a neighbor list in snapshot space up to a certain hard cutoff. Alternatively, the neighbor list can be read from a dedicated input file. From this list, a globally sorted list of near distances is constructed. This setup work provides the foundation to construct the MST without additional parameters via Kruskal's algorithm . The high cost (both in terms of time and memory) makes the exact scheme impractical for large data sets. Note that the neighbor list must be sufficient for the algorithm to run. This means that all the edges for the MST have to occur in the neighbor list, which is unfortunately not guaranteed even if each snapshot has multiple neighbors listed. Potential failures are therefore difficult to predict.
  2. In the approximate scheme, CAMPARI utilizes a two-stage approach. The goal is to improve upon the large computational cost associated with the exact scheme without sacrificing too much information encoded in the progress index. First, the trajectory is structurally clustered using the highly efficient, tree-based algorithm (described under option 5 to CMODE). This hierarchical tree of groups of snapshots (clusters) is not to be confused with the approximate MST we wish to generate. Because the tree-based clustering is used, keywords CRADIUS, CMAXRAD, and BIRCHHEIGHT are all relevant. The hierarchical tree is then used to grow a set of individual components of the approximate MST. The candidate edges for the spanning tree are those between unconnected components and will be formed primarily between snapshots in the same clusters at the finest level. Upon growing, the components are successively merged, and levels further toward the root of the hierarchical tree will be utilized. This procedure emulates Borůvka's algorithm with a search space limited by the hierarchical tree. Because the spanning tree thus constructed is not strictly minimal, it is important to update component memberships after each merging operation. The algorithm is dependent on two parameters. The first regulates the maximum number of search attempts used for finding the next-nearest and eligible neighbor for any snapshot (the minimum across a spanning tree component then becomes the candidate edge for that component). It is set by keyword CPROGINDRMAX. The respective clusters at the finest level of the hierarchical tree offering any eligible candidate edges may not offer CPROGINDRMAX guesses. In this case, the second parameter becomes relevant. It controls a depth as to how many additional levels of the hierarchical tree to descend into in order to satisfy the maximum number of guesses. This second parameter is set by keyword CPROGRDEPTH. There is a third parameter, CPROGRDBTSZ, which is a technical setting controlling how a cluster is searched randomly. This is only necessary if the number of eligible candidates in a cluster exceeds the number of missing guesses requested by CPROGINDRMAX. Then, CPROGRDBTSZ can be used to reduce the number of required random numbers. Depending on the settings, the algorithm is expected to run in approximately NlogN time with the constant prefactor determined by the clustering and the choice for CPROGINDRMAX. Similarly, the quality of the generated spanning tree depends nontrivially on both aforementioned search parameters as well as on the properties of the tree-based clustering. It is of course unlikely that the approximate MST be in fact the true MST for trajectories of appreciable length. Using appropriately large values for both CPROGINDRMAX and CPROGRDEPTH can create an asymptotic limit for recovering the true MST. However, this limit is of limited practical use as a guaranteed MST computed this way requires at least O(N2) time (which is worse than the time complexity of the exact form, which is aided by safe heuristics).
In both cases, the resultant (approximate or exact) spanning tree is used to determine the progress index per se. This is achieved using Jarník's (Prim's) algorithm on the spanning tree (note that a slightly different class of algorithms is used to construct it in the first place) either with a starting vertex (snapshot) selected via CPROGINDSTART or on an automatically determined set of starting vertices. At this stage, keyword CPROGMSTFOLD may become relevant. It allows preprocessing of the spanning tree to alter the edge lengths considered by Jarník's algorithm in a way that incorporates leaf vertices before any other connected vertex. Application of Jarník's algorithm will usually require linear time.

CPROGINDSTART

If structural clustering is performed (→ CCOLLECT), and the progress index-based algorithm is used (→ CMODE), this keyword allows the user to pick a specific snapshot to serve as starting point for the generation of the progress index (the default is the first snapshot). Note that the snapshot indexing refers to the sequence of analyzed snapshots, and not to general simulation settings or the input trajectory itself in trajectory analysis mode, i.e., it depends on the choice for CCOLLECT.
As a special option, specifying zero instructs CAMPARI to find a set of suitable starting snapshots. These are generally found by generating a sample profile (discussed elsewhere) that is then scanned for extrema using an automated detection system that can be tuned with two additional keywords, CBASINMAX and CBASINMIN. The idea behind this is to generate profiles starting from a complete set of putative basins. If this automatic detection is unsuccessful, CAMPARI will revert to using the first snapshot as a starting point.
As a further option that is only available in the approximate scheme (→ CPROGINDMODE), a specified value of "-1" instructs CAMPARI to use as starting snapshot the central snapshot of the largest cluster found during the preparatory tree-based clustering.
This keyword can serve an alternative function for specifying the target cluster (ordered by size) for the generation of a cut-based pseudo free energy profile (→ CMSMCFEP). For this second function, requests corresponding to the special choices above (-1 or 0) also work in different ways. If CPROGINDSTART is -1, CAMPARI will set the reference cluster as the one that contains the first snapshot in the accumulated data for clustering. The algorithm will proceed as if CPROGINDSTART had been given as the correct positive number. If CEQUILIBRATE is 1, and if CPROGINDSTART is 0, CAMPARI will find all strongly connected components of the underlying graph and use the largest cluster within each component (subgraph) as reference for multiple, distinct cut profiles (separate output files). Lastly, if CEQUILIBRATE is 0, and if CPROGINDSTART is also 0 (implying that the entire graph is analyzed irrespective of connectedness) CAMPARI defaults to using the largest cluster as the (only) target cluster.
There are two compatibility options for the case where the approximate progress index method is used, and the user is also interested in obtaining a similar cut profile from the auxiliary clustering itself. Setting -2 homogenizes the reference as the largest cluster (clustering) or its representative snapshot (progress index), and setting -3 homogenizes the reference as the first snapshot (progress index) or the cluster containing it (clustering).

CPROGMSTFOLD

If structural clustering is performed (→ CCOLLECT), and the progress index-based algorithm is used (→ CMODE), this keyword allows the user to modify the spanning tree underlying the progress index before the index is computed. The modification consists of "folding" or collapsing the leaves into their parent vertex, which means that they are added first as soon as the index encounters the parent in question. By specifying a positive integer, the user requests CPROGMSTFOLD applications of this inward folding procedure (each of which scales linearly with the number of snapshots in terms of computational cost). After each iteration, the identity of vertices as leaf vertices is updated, which means that branches are continuously folded inward. Note that already a single iteration will fold a large number of edges (the actual number is reported to log output). For multiple folded vertices connected to the same parent CAMPARI preserves the expected order (shortest distance first).
The reasoning behind this modification is the following. When operating on the (minimum) spanning tree, Prim's algorithm proceeds by always finding the shortest distance available. As long as basins are sampled densely and transitions are rare, this has the desired effect of arranging snapshots in a way that allows identification of basins by suitable annotation. However, it is extremely common for basins to have "fringe" regions where sampling density becomes low (and distances are large). Points in these regions will often be missed by the progress index and placed at the end (far away from "their" parent basin). Points in these regions are also likely to correspond to leaf vertices in the spanning tree. Therefore, it can be assumed that collapsing them into their parent will partially ameliorate this issue (they will occur in the correct basin). Users should keep in mind that this breaks the rule that the progress index is built to track local density as much as possible.

CPROGRDEPTH

If structural clustering is performed (→ CCOLLECT), and the approximate progress index-based algorithm is used (→ CMODE and CPROGINDMODE), this keyword allows the user to control the maximum search depth for random guesses. In this method, a hierarchical tree is used in conjunction with a parameter, CPROGINDRMAX, to restrict the search space for finding edges of a short spanning tree. The hierarchical tree is based on the tree-based clustering algorithm, and its height is set by keyword BIRCHHEIGHT. For each snapshot, the algorithm will start searching for putative edges within the cluster the snapshot is part of at the finest level offering any eligible candidates. Often, the number of candidates is smaller than the setting for CPROGINDRMAX. Then, CAMPARI will descend the hierarchical tree toward the root by at most CPROGRDEPTH levels to fulfill the requested number of guesses per snapshot. The reason for offering this restriction is that the search at additional level is often inefficient. This is because it introduces additional redundancy (the same candidates are evaluated more than once) and the candidates at a coarser-than-necessary level are unlikely to be better guesses than the ones at the finest available level. The default for CPROGRDEPTH is zero. Note that, with a meaningful clustering in place, the default setting will prevent the spanning tree from approaching the correct minimum spanning tree in almost all cases. This is because of the hard search space restrictions. At considerable cost, this keyword can overcome the impact of these restrictions.

CBASINMAX

If structural clustering is performed (→ CCOLLECT), the progress index-based algorithm is used (→ CMODE), and an automatic determination of multiple starting snapshots for profiles is requested (→ CPROGINDSTART), this keyword controls how a test profile using the standard annotation function described elsewhere is parsed to automatically identify minima in this function. Specifically, around each eligible point in the profile, environments of varying sizes are considered, and the following criteria are used:
  • The sum of values to the left over a stretch of ne points must be greater than the sum of values over a stretch of ne points centered at the point currently considered.
  • The sum of values to the right over a stretch of ne points must be greater than the sum of values over a stretch of ne points centered at the point currently considered.
  • The sum of values to the left and right over a stretch of ne points each must be greater than a reference sum that is given as twice the sum of values over a stretch of ne points centered at the point currently considered plus 4ne.
  • The left (far) half of the sum of values to the left over a stretch of ne points must be greater than the right (near) one.
  • The right (far) half of the sum of values to the right over a stretch of ne points must be greater than the left (near) one.
  • No point toward the left over a stretch of ne points must be greater than or equal to the point currently considered.
  • No point toward the right over a stretch of ne points must be greater than the point currently considered.
CBASINMAX controls the maximum value considered for ne in the above rules.

CBASINMIN

If structural clustering is performed (→ CCOLLECT), the progress index-based algorithm is used (→ CMODE), and an automatic determination of multiple starting snapshots for profiles is requested (→ CPROGINDSTART), this keyword controls the minimum value considered for ne as explained in the documentation of keyword CBASINMAX.

CPROGINDRMAX

If structural clustering is performed (→ CCOLLECT), the progress index-based algorithm is used (→ CMODE), and the approximate version is chosen (→ CPROGINDMODE), this keyword controls the maximum number of attempts for a search of the next correct spanning tree neighbor of a growing spanning tree component. Depending on the choice for keyword CPROGRDEPTH, such a search first exhausts the possibilities within a given cluster of the hierarchical tree underlying the approximate algorithm and will only consider a limited amount of clusters at coarser-than-necessary levels. Therefore, the parameter is interpreted as a maximum and not generally an actual value. Whenever the number of eligible candidate snapshots in a cluster is less than the missing amount of guesses for the snapshot in question, the search becomes deterministic. Otherwise, it is random (with replacement). In both cases, the eligible snapshot with the minimum distance to the spanning tree component under consideration is used to create the next link of the approximate MST.

CPROGRDBTSZ

If structural clustering is performed (→ CCOLLECT), the progress index-based algorithm is used (→ CMODE), and the approximate version is chosen (→ CPROGINDMODE), this keyword controls the structure of the random search in a cluster with a number of eligible candidates that exceeds the remaining number of required guesses. This keyword is primarily meant for developer use and defaults to 1. Values larger than 1 imply that the random search proceeds in systematic stretches of length CPROGRDBTSZ in the contiguous stretch of eligible candidates starting from a member selected with uniform probability. The specified value is an upper limit, i.e., the number of guesses is never exceeded.

CPROGINDWIDTH

If structural clustering is performed (→ CCOLLECT), and the progress index-based algorithm is used (→ CMODE), this keyword controls the auxiliary annotation function defined elsewhere. Specifically, it corresponds to the parameter lp in the documentation found by following the link.

CMSMCFEP

If structural clustering is performed (→ CCOLLECT), which includes the case of the approximate progress index method, this keyword allows the user to select a type of cut-based pseudo free energy profile to be computed (reference). The target node for this profile can be chosen with keyword CPROGINDSTART. Currently, there is only one fully supported option (some hidden options exist, which will not be disabled):
  1. The transition matrix is inferred from the simulation trajectory and the associated coarse-graining (clustering). The mean-first passage times to the target node (largest cluster by default) in the Markov state model approximation are computed iteratively. After sorting all clusters according to these mean first passage times, partitions can be defined as a function of a threshold time. The cut-based pseudo free energy profile associates each threshold time with the total weight of edges (number of transitions) crossing this threshold along the trajectory, and plots the normalized weight in logarithmic fashion (see elsewhere for details).
Note that the code does not alter or restrict the network in any way, i.e., an ill-defined (nonergodic) network will cause the iterative procedure to be ill-defined.

TRAJBREAKSFILE

If any type of structural clustering is performed (→ CCOLLECT), or if the exact progress index-based algorithm is used (→ CMODE), the resultant trajectory is used to infer the properties of a network. This is relevant for output file STRUCT_CLUSTERING.graphml (the mesostate (cluster) network itself), for cut-based free energy profiles (kinetic information derived from network properties), and the output of the progress index method. Essentially, the sequence of events in the trajectory defines a transition matrix. However, not all transitions in a trajectory may be equally valid, as they may be caused by trajectory concatenation (e.g., when using structural clustering with the MPI averaging technique, by replica exchange swaps, by nonlocal Monte Carlo moves and so on). It may therefore be appropriate to remove such spurious transitions from the analysis in order to keep inferences regarding the underlying dynamics accurate. This is what this file accomplishes, and the input and its interpretation are described in detail elsewhere. There are two additional notes. First, CAMPARI will not remove any transitions by default, and it may sometimes be difficult to obtain or preserve the required information (e.g., the replica exchange trace file must be used to extract the exact history of accepted swaps). Second, there is no guarantee that the graph remains intact (it may fracture into multiple, disconnected subgraphs), and this may impact the interpretability of the data in the aforementioned output files.

TRAJLINKSFILE

If any type of structural clustering is performed (→ CCOLLECT), or if the exact progress index-based algorithm is used (→ CMODE), the resultant trajectory is used to infer the properties of a network. This is relevant for output file STRUCT_CLUSTERING.graphml (the mesostate (cluster) network itself), for cut-based free energy profiles (kinetic information derived from network properties), and the output of the progress index method. Essentially, the sequence of events in the trajectory defines a transition matrix. However, trajectory concatenation may give rise to scenarios where some links are spurious (→ TRAJBREAKSFILE) and others are missing, e.g., if multiple trajectories are branched off from a common starting point and simply appended for analysis purposes. This keyword can be used to add such missing links at the snapshot (frame) level. This function can overlap with keyword CADDLINKMODE, which operates at the cluster level. It also overlaps with the use of keyword TRACEFILE for managing the reseeding operations of a PIGS calculation, which is a type of simulation yielding such a set of branched trajectories. The input format is described in detail elsewhere. We emphasize that considerable care is required to manage the links in a conformational space network through the aforementioned keywords (TRAJLINKSFILE, TRAJBREAKSFILE, CADDLINKMODE, and TRACEFILE). This is mostly due to the fact that data generation and post-processing (necessarily) are usually separate operations, which makes it difficult to achieve a compromise between controllability and ease of use.

CEQUILIBRATE

If structural clustering is performed (→ CCOLLECT), which includes the case of the approximate progress index method, the resultant coarse-grained trajectory serves to define a network (graph) of clusters (vertices). Strongly connected components within this network, when treated as Markov state models, can have a well-defined steady state that corresponds to the network-based prediction for the equilibrium population of clusters. This keyword (set to 1) requests CAMPARI to compute the steady state and use the corresponding population statistics in subsequent output files such as STRUCT_CLUSTERING.graphml or files containing cut-based free energy profiles.
As alluded to, the complete graph may not be strongly connected or even fractured (→ TRAJBREAKSFILE). Any modification to the link (edge) structure of the network (→ CADDLINKMODE) can influence the steady state and any other network-derived properties profoundly. Even for a single continuous trajectory, the observed probability distribution in cluster space does not exactly agree with the network-derived prediction due to the imbalance caused by having a beginning and an end. Consequently, by default (setting of 0), some of the network-derived output may be inconsistent with the network itself. In case of multiple strongly connected components, use of CEQUILIBRATE requests all the individual components to be equilibrated separately (preserving their relative weights set directly by the trajectory). In addition, the generation of cut-based free energy profiles will be reduced to the strongly connected component that the reference cluster resides in. Special rules for CPROGINDSTART can be used to request the computation of such profiles for all relevant components, always using the largest cluster within each component as reference.
The computation of the steady state uses an iterative algorithm that can become quite time-consuming due to the slow convergence behavior. The algorithm is numerically weak in that the convergence measure is unable to estimate the deviation of the current from the exact solution accurately and in that the convergence properties can differ across the network. The algorithm does detect periodicity, which generally prevents convergence (the easiest example is a system of two mutually connected states with no self-transitions), and will eventually report this and terminate. Lastly, if CEQUILIBRATE is not used, CAMPARI will include the entire set of clusters with their observed weights for any subsequent analysis (which might become ill-defined as a result).


(back to top)

Design downloaded from free website templates.