CAMPARI Keywords
Full Keywords Index:
 Parameter File:
 Random Number Generator:
 Simulation Setup:
 Box Settings:
 Integrator Controls (MD/BD/LD/Minimization):
 Move Set Controls (MC):
 Files and Directories:
 Structure Input and Manipulation:
 Energy Terms:
 Cutoff Settings:
 Parallel Settings (Replica exchange (RE) and MPI Averaging) Settings:
 Output and Analysis:
Preamble
The overall setup of simulations becomes more and more involved and complicated with increasing numbers of options offered by simulation software, and CAMPARI is no exception here. Not all settings are relevant in all circumstances (in fact, often very few are), and a complete understanding of all keywords is clearly not required to use subsets of CAMPARI's functionality. Users should keep the following points in mind:
 Most keywords have default choices. In case of doubt, check parsekey.f90 to locate the variable associated with the selection, and then initial.f90, allocate.f90, and sometimes other files for default assignments.
 Not all keywords can be connected and arranged such that they group nicely. The documentation here groups keywords into a small number of sections, some of which end up being very large. This has both advantages and disadvantages.
 For navigation, it is highly recommended to a) search for terms within the page with the help of the browser (all keywords are described within a single htmlpage), b) follow the links that are provided everywhere.
 If an option is unclear, but easily testable, it is probably fastest to just try it out. If it is difficult to test, post a question on the SF forums.
 The understanding of many implemented, standard methodologies requires the corresponding literature. This is why a bibliography is provided.
 The fastest way to learn how to run basic simulations or perform trajectory analyses is to consult the various tutorials. Tutorials offer the chance to group information in a more natural workflow compared to the documentation here. They cannot explain all options in detail, though, and it is crucial to follow the links within the tutorial pages that point back to this and the other documentation pages.
Note on Nomenclature:
All keywords used by CAMPARI are named FMCSC_* where the different possible strings for "*" are explained below. This means that in your keyfile the correct keyword to use to specify the simulation temperature is FMCSC_TEMP and not just "TEMP". There are only two exceptions to this, viz. keywords PARAMETERS and RANDOMSEED. This has purely historical reasons (as does the ad libitum acronym "FMCSC").
Parameter File Keywords:
PARAMETERS
This keyword allows the user to provide the location and name of the parameter file to be used for the simulation. The different files offered by default (shipped with CAMPARI) are listed below:Custom Parameter Sets:
The parameter sets fmsmc*.prm are outdated and should be used with utmost caution. They contain no bonded parameters except dummy declarations and are therefore only suitable for torsional space calculations.
In general, the LennardJones parameters for ions in these files require a cautionary note as they simply are those from Aqvist's work. They have not been specifically parameterized to work together with the ABSINTH continuum solvation model in case a full Hamiltonian is used (they merely have been shown to reside on the "safe" side). This is a matter of ongoing development. It may be be more appropriate to use parameters for ions that feature harder cores and better congruence between σ_{ii} parameters and actual contact distances.
fmsmc.prm:
This are basic parameters fit for
simulations in the excluded volume ensemble. As LennardJones
parameters, they employ Hopfinger radii with generic (and generally
small) interaction parameters. They contain a reduced charge
set derived from the OPLS brand of force fields but are thoroughly
unsuitable for simulations with "complete" Hamiltonians if just
for the fact that they lack support in many places.
fmsmc_exp.prm:
This file is identical to fmsmc.prm only
that pairwise LJterms (σ_{ij}) for pairs involving
a polar atom and a polar hydrogen are specifically reduced. It also
lacks support for phosphorus.
fmsmc_exp3.prm:
This file is identical to fmsmc_exp.prm only
that LJ interaction parameters (ε_{ii}) are raised
for polar heavy atoms (nitrogen and oxygen).
fmsmc_exp2.prm:
This file is identical to fmsmc_exp3.prm only
that LJ size parameters (σ_{ii}) for common atoms
atoms are bloated to approximately 107% which makes the parameter set
more OPLSAAlike in terms of LJ parameters.
abs3.2_opls.prm:
This file combines ABSINTH LJ parameters
with the full OPLSAA/L charges including
the Kaminski et al. revision. OPLSAA/L's bonded parameters are
only retained inasmuch as they are
required to maintain quasirigid geometries (i.e., bond length
and angle potentials, improper dihedral potentials,
and torsional potentials around bonds with hindered rotation). Comparison to the
reference parameter set may be useful. In
addition, the free energies of solvation are
reduced by ~30 kcal/mol for ionic groups on biomolecules. This is the
file used for most published work employing
the ABSINTH implicit solvation model thus far.
abs3.1_opls.prm:
This file is identical to abs3.2_opls.prm
only that the free energies of solvation are
not artificially lowered by ~30 kcal/mol for ionic groups on
biomolecules.
abs3.2_charmm.prm:
This file combines ABSINTH LJ parameters
with the full CHARMM charges from version 22 (polypeptides)
and 27 (polynucleotides), respectively. CHARMM's bonded parameters are
only retained inasmuch as they are
required to maintain quasirigid geometries (i.e., bond length
and angle potentials, improper dihedral potentials,
and torsional potentials around bonds with hindered rotation). Comparison to the
reference parameter set may be useful. In
addition, the free energies of solvation are
reduced by ~30 kcal/mol for ionic groups on biomolecules. In
conjunction with the ABSINTH implicit solvent model, CHARMM parameters
probably offer the best combination of simplicity (small enough dipole
groups) and completeness (support for both
nucleotides and peptides as well as most terminal groups and some small
molecules).
abs3.1_charmm.prm:
his file is identical to abs3.2_charmm.prm
only that the free energies of solvation are
not artificially lowered by ~30 kcal/mol for ionic groups on
biomolecules.
abs3.2_a94.prm:
This file combines ABSINTH LJ parameters
with the full AMBER charge set from the '94revision
(Cornell et al.). AMBER charges are generally not wellsuited
to be used in conjunction with the ABSINTH paradigm since the latter is
most meaningful for small dipole groups with local neutrality. AMBER
charges are determined by a more or less
unconstrained QMfit and spread polarization across the (arbitrary)
unit of each residue (see FMCSC_ELECMODEL).
AMBER's
bonded parameters are only retained inasmuch as they are
required to maintain quasirigid geometries (i.e., bond length
and angle potentials, improper dihedral potentials,
and torsional potentials around bonds with hindered rotation). Comparison to the
reference parameter set may be useful. In
addition, the free energies of solvation are
reduced by ~30 kcal/mol for ionic groups on biomolecules. Please refer
to the details provided for AMBER reference force fields below in
order to obtain answers concerning AMBERspecific implementation details
of force field parameters.
abs3.1_a94.prm:
This file is identical to abs3.2_a94.prm
except that the free energies of solvation are
not artificially lowered by ~30 kcal/mol for ionic groups on
biomolecules.
abs3.2_a99.prm, abs3.1_a99.prm, abs3.2_a03.prm, abs3.1_a03.prm:
These files are analogous to abs3.2_a94.prm
and abs3.1_a94.prm only that they incorporate AMBER
parameters of revisions '99 (Wang et al., abs3.2_a99.prm,
abs3.1_a99.prm) and '03 (Duan et al., abs3.2_a03.prm,
abs3.1_a03.prm), respectively.
abs3.2_GR53a6.prm:
This file combines ABSINTH LJ parameters
with full GROMOS53a6 charges.
Note that GROMOS53 is a united atom model and that aliphatic hydrogens
(which do exist here) therefore carry no charge.
This appears inconsistent  at least compared to other force fields in
which aliphatic hydrogens almost
universally carry a small positive charge of less than 0.1e 
but speeds up simulations with
screened electrostatics interactions considerably. Bonded parameters
are only retained inasmuch as they are
required to maintain quasirigid geometries (i.e., bond length
and angle potentials, improper dihedral potentials,
and torsional potentials around bonds with hindered rotation). Comparison to the
reference parameter set may be useful. In
addition, the free energies of solvation are
reduced by ~30 kcal/mol for ionic groups on biomolecules.
abs3.1_GR53a6.prm:
This file is identical to abs3.2_GR53a6.prm
except that the free energies of solvation are
not artificially lowered by ~30 kcal/mol for ionic groups on
biomolecules.
abs3.2_GR53a5.prm and abs3.1_GR53a5.prm:
These files are analogous to abs3.2_GR53a6.prm and
abs3.1_GR53a6.prm only for
the a5revision of the GROMOS53 charge set.
Some recommended settings for using any of these custom parameter files are listed below. Note that these are also the settings required to achieve an exact match with the ABSINTH reference.
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 1
FMCSC_ELECMODEL 2
FMCSC_MODE_14 1
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_BONDED_B 0.0
FMCSC_SC_BONDED_A 0.0
FMCSC_SC_BONDED_T 0.0
FMCSC_SC_BONDED_I 0.0
FMCSC_SC_EXTRA 1.0
We do, however, recommend replacing FMCSC_SC_EXTRA being unity with FMCSC_SC_BONDED_T set to unity since the above files will typically contain (unless otherwise noted) the required and "native" bonded potentials for each parent force field. This ensures better parameter coherence (the ones used SC_EXTRA are taken from OPLSAA/L) and  more importantly  control over all torsional potentials (and bonded potentials in general) through the parameter file. If the system to be sampled contains proline residues, other flexible rings, or chemical crosslinks, it will also be necessary to set FMCSC_SC_BONDED_A, FMCSC_SC_BONDED_B, and FMCSC_SC_BONDED_I to 1.0 to avoid obtaining nonsensical results.
Reference Parameter Sets:
The parameter sets below attempt to be as complete as possible for the biopolymer types supported by CAMPARI. In general, support for small molecules (which often use derived parameters) will often be limited (but can easily be added by the user). In addition, rare and generally poorly parameterized biopolymer constructs (such as zwitterionic amino acids or free nucleosides) may have incomplete parameter portings in particular of bonded parameters. If a perfect match of a certain parameter set paradigm cannot be achieved (against the reference implementation), this is stated explicitly.
oplsaal.prm (reference implementation: GROMACS 4.5.2)
This file provide full OPLSAA/L
parameters, i.e., it includes the Kaminski et al. revision
of peptide torsions and sulphur parameters. Note that GROMACS 4.5.2 was used as the reference
implementation (and not BOSS or MCPRO).
Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.5
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2
GROM53a6.prm, GROM53a5.prm (reference implementation: GROMACS 4.0.5)
Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.5
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2
This file provide full
GROMOS53 parameters. Torsional
potentials for which
the same biotype is attached multiple times to an axis atom are only
approximately
supported by replacing the potential acting on just an arbitrary and
single one of those atoms in the GROMACS reference implementation with
proportionally reduced potentials acting on all of those atoms.
This should be chemically more correct but prevents exact matches of
torsional terms. The choice within GROMOS is motivated by computational
efficiency, but evaluation of torsional terms is not a timecritical
execution component in almost all presentday simulations (and trivially parallelizable).
Moreover, cap and terminal residues may have been adjusted to use more
consistent parameters (terminal and cap residues are generally not specifically
parameterized in GROMOS from what we can tell, in particular for polynucleotides).
GROMOS uses a rather specific interaction model and represents
aliphatic CH_{n} moieties
in unitedatom representation. Note that revisions a5 and a6 only
differ in a few partial
charge parameters.
Required settings for emulating reference standard:
FMCSC_UAMODEL 1
FMCSC_INTERMODEL 3
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
amber94.prm, amber99.prm, amber03.prm (reference implementation: AMBER port in GROMACS 4.5.2)
Required settings for emulating reference standard:
FMCSC_UAMODEL 1
FMCSC_INTERMODEL 3
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
These files provide full AMBER parameters in
three different revisions which differ
mostly in their parameterization of torsional potentials for
polypeptides. Note that support for terminal
amino acid residues through the parameter file is marginal since AMBER's charge set is so
detailed that each atom in each
terminal residue would have to be an independent biotype. Normal
polypeptide caps are fully supported, however. To allow a more accurate emulation
of the AMBER standard for terminal polypeptide residues, the
charge patch functionality within CAMPARI can be used. We have tested this for
a few examples, and recovered 100% accurate matches to the AMBER standard that way. Keep in mind as well
that the parameterization of terminal polymer residues is often the "sloppiest" component in a
biomolecular force field since their impact on overall conformational equilibria is deemed small. Note
that we did not use the actual AMBER software in the porting.
Required settings for emulating reference standard (skipping eventual charge patches):
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.833
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2
charmm.prm (reference implementation: CHARMM35b2)
Required settings for emulating reference standard (skipping eventual charge patches):
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.833
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2
This file provides access to simulation
employing the full CHARMM parameters
as provided in parameter set 27 for polypeptides and polynucleotides.
CMAP corrections for
polypeptides are supported and included. Note that <ABSINTH_HOME>
should be the exact same
directory specified in the localization of the Makefile (see installation instructions). To simulate polynucleotides
with 5'phosphate groups using 100% authentic CHARMM parameters for the
terminal phosphate, the
charge patch functionality within CAMPARI has to be used. The same applies to
the polarization on the hydrogen atoms on the NH_{2} groups in
guanine and cytosine (this is a much smaller effect, though; also compare FMCSC_AMIDEPOL).
This has been tested thoroughly.
Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_AMIDEPOL 0.01 # or 0.01
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_M 1.0
FMCSC_CMAPDIR <ABSINTH_HOME>/data
FMCSC_SC_EXTRA 0.0
In order to create a new parameter file, it is advisable to start
with "template.prm". For details on the paradigms underlying the
construction of a parameter file consult the detailed documentation on this topic.Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_AMIDEPOL 0.01 # or 0.01
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_M 1.0
FMCSC_CMAPDIR <ABSINTH_HOME>/data
FMCSC_SC_EXTRA 0.0
Random Number Generator Keywords:
(back to top)
RANDOMSEED
This keyword allows the user to provide a specific seed for the PRNG. This is usually relevant in two contexts: Reproducibility:
Eliminate mismatches between different versions of the program (for example) by doing the stringent test that the results must be exactly the same if the PRNG is seeded with the same seed. Such tests may occasionally be hampered by a lack of precision in any input files and in particular by different compiler/architecture optimization levels.

Timing:
Eliminate identical calculations if jobs are submitted simultaneously. Normally the PRNG uses a seed derived from from system time, which can be identical if jobs are submitted exactly in parallel. Avoiding this behavior by specifying different values for RANDOMSEED is only adequate if the jobs are indeed submitted as individual, serial jobs. Conversely, in intrinsically parallel applications (MPI), CAMPARI uses the node number to vary the seed across different nodes unless RANDOMSEED is specified. This means that a provided value for RANDOMSEED will homogenize the PRNG across all replicas which is almost always undesirable.
Simulation Setup:
(back to top)
UAMODEL
This keyword is a simple but very important switch. It allows the user to control whether nonpolar hydrogens are going to be part of the system's topology or not. In particular in earlier simulation work, it was a common and convenient trick to improve simulation efficiency by uniting all atoms of a methyl or methylene group into a single, coarsegrained "united atom". Different force fields used or use different varieties of this trick. In the GROMOS line of force fields, for instance, all aliphatic hydrogen atoms are merged into the carbon atoms they are bonded to. Conversely, the CHARMM19 protein force field in addition eliminates nonpolar hydrogens bound to sp^{2}hybridized carbon atoms in aromatic rings.Unlike other simulation software, CAMPARI maintains a complete internal "knowledge" of biomolecular topology of those systems it allows the user to build from scratch. Therefore, choosing between all or unitedatom models is not simply a matter of parameter files (although it is possible to create inefficient unitedatom variants of force fields by disabling all interaction parameters pertaining to the required hydrogens). Instead, the software itself requires knowledge of this choice.
Choices are:
 Use an allatom model for those molecules represented explicitly.
 Use a unitedatom model according to GROMOS convention, i.e., all aliphatic hydrogen atoms are merged into the carbon atoms they are linked to (this does include terminal aldehyde hydrogen atoms).
 Use a unitedatom model according to CHARMM19 convention, i.e., all aliphatic and all aromatic hydrogens bound to carbon atoms are merged into the latter.
Outside of simulations using the GROMOS force field, and outside of future extensions to support CHARMM19, this keyword is most useful when using CAMPARI to analyze trajectory data generated by other software using such a unitedatom force field. Such a run would not tolerate atom number mismatches between the internal representation of the system and what is found in the binary trajectory files (mismatches are acceptable only if the input format is pdb → see below). Note that this keyword has no impact on systems involving residues not supported natively by CAMPARI (→ sequence input and PDB input).
PDBANALYZE
This keyword is a simple but very important logical. It specifies whether the proposed simulation is a trajectory analysis run: in these, a pdb (or xtc, dcd, NetCDF)trajectory is read from file and analyzed with CAMPARI's internal analysis routines. The desired format is chosen with keyword PDB_FORMAT. All outputs and parameters are completely analogous to normal calculations. Essentially, the snapshot readin replaces the sampling step. This means that low analysis frequencies will be desirable, since usually the number of snapshots will be relatively small compared to the number of simulation steps in a typical simulation). Note that  in particular for large systems (> 10^{4} atoms)  the analysis run may be slowed down by: Certain timeconsuming analyses scale poorly with the number of atoms (solution structure analyses, see for example PCCALC or CLUSTERCALC).
 At each step, the global system energy is calculated using  depending on the setting for DYNAMICS  either CAMPARI's energy (MC) or force (MD/LD) routines and making little to no simplifying assumptions. To ensure decent speed, this may require setting the system Hamiltonian to zero (see below) and/or using an efficient cutoff / neighborlist routine (see CUTOFFMODE).
 Very large files in particular in pdbformat may cause memory shortages which slow down the machine entirely. In general binary trajectory files in conjunction with an optional template file are the preferred and much faster way of performing analysis runs.
When using an MPI executable of CAMPARI in parallel, it is also possible to perform trajectory analysis across many processors. This uses the replica exchange setup and is described in detail elsewhere. The three primary applications are simultaneous analysis of several trajectories, the unscrambling of replica exchange trajectories that are normally output continuously for a given condition, and the post facto computation of energetic overlap distributions. Specific analysis routines (such as DSSP analysis may be restricted to specific types of residues, and this may limit the utility of these routines for entities that are not natively supported by CAMPARI (see sequence input). In general, analysis runs on systems featuring unsupported residues should be relatively straightforward. This is true at least as long no energetic analyses are required (which naturally entails the complex issue of parameterization).
NRSTEPS
This keyword sets the total number of simulation steps including equilibration.EQUIL
This keyword specifies the total number of equilibration steps. This implies that no analysis is performed as long as the current step number does not exceed this value. Note that this also means that no structural output (trajectory) is produced. Conversely, certain necessary diagnostics are provided irrespective of equilibration (see for example ENOUT or ACCOUT).TEMP
This keyword sets the absolute (target) temperature in K.PRESSURE
This keyword allows the user to specify the absolute (target) pressure in bar (not yet in use).ENSEMBLE
This crucial keyword determines which ensemble to simulate the system in. The options available are limited in that they depend strongly on the type of sampler (i.e., there is no NVE (microcanonical) ensemble if sampling is done via Monte Carlo → DYNAMICS).The options are as follows:
1) NVT (Constant Particle Number, Constant Volume, Constant Temperature):
Always available, this is the
canonical ensemble and currently the only option available
for pure Monte Carlo runs.
2) NVE (Constant Particle Number, Constant Volume,
Constant
Energy):
The microcanonical ensemble
(adiabatic conditions) is only supported (and possible) for
nondissipative, i.e., Newtonian dynamics (see option 2 in DYNAMICS).
5) μ_{i}VT (Constant Chemical Potential(s),
Constant Volume, Constant
Temperature):
This requests the grand canonical ensemble where the number of
particles in the system is allowed to fluctuate. Subscript i
indicates that not all particle types may be subject to number
fluctuation (typical for example in the simulation of macromolecules
and a (co)solvent atmosphere for
which only the small molecule would be treated in "grand" fashion. This
implies that technically incorrect hybrid ensembles are populated
(sometimes referred to as "partially grand" ensembles). The rigorous
grand canonical ensemble would require all particle types to be
permitted to fluctuate in number. Such partially grand ensembles are
not to be confused with the "semigrand" ensemble (see below).
Technically, the GC ensemble is realized in CAMPARI by allowing
molecules to transfer between a real and a shadow existence, the latter
also serving as the reference state. The discreteness of transitions
between shadow and real existence implies that currently the grand ensemble
is only available in pure Monte Carlo simulations.
Note that currently the reference
state is modeled in the infinite dilution limit (there are no
intermolecular interactions). This is consistent with the default
implementation choice (→ GRANDMODE), in
which the bath communicates with the system via an expected bulk concentration
and an excess chemical potential correcting for the interactions arising
from that finite bulk concentration.
6) Δμ_{i}N_{t}VT (Constant Chemical
Potential Difference(s), Constant Total Particle Number, Constant
Volume, Constant
Temperature):
This requests the semigrand ensemble as originally formulated by Kofke
and Glandt (1988), in which particle types are allowed to fluctuate in
number under the constraint that the total particle number (N_{t})
remains constant. Just like for the μ_{i}VTensemble, CAMPARI
allows the definition of partial semigrand ensembles in which  for
example  a bath of water and methanol solvating a macromolecule is
subjected to moves attempting to transmute methanol into water or vice
versa. Note that the amount of realworld applications for such an
ensemble to be appropriate is very small. Technically, the constraints
to keep N_{t} fixed may improve acceptance rates in dense fluid
mixtures. For both options (5 and 6), please refer to the documentation
for the particle fluctuation file, specified using PARTICLEFLUCFILE, for
details. Note that the sanity of results obtained with any partial
grand or semigrand ensemble must be investigated with utmost care.
To be added in the future:3) NPT (Constant Particle Number, Constant Pressure, Constant Temperature):
May eventually be made available for MC and Newtonian MD runs
4) NPE (Constant Particle Number, Constant Pressure,
Constant
Enthalpy):
May eventually be made available for Newtonian MD runs
Note to developers: there is rudimentary support for NPT and NPE ensembles in CAMPARI right now but those branches are completely disabled.
GRANDREPORT
If an ensemble is chosen that allows particle number fluctuations, this keyword acts as a simple logical whether or not to write out a summary of the grandcanonical setup, i.e., which particle types are allowed to fluctuate in numbers, what the initial numbers (bulk concentrations) are, and what (excess) chemical potentials are associated with those.GRANDMODE
If an ensemble is chosen that allows particle number fluctuations, this keyword acts to choose between two different implementation modes. In the first (choice 1), file input is used to provide CAMPARI with the initial numbers and absolute chemical potentials of fluctuating particle types. This is generally inconvenient for cases with realistic interaction potentials and/or multiple fluctuating particle types that require coupled chemical potentials (such as individual ionic species). The bulk concentrations are set implicitly by the chemical potentials. This formulation involves the "thermal volume" of particles meaning that a monoatomic ideal gas will require a massdependent chemical potential. In the second option (choice 2, which is the default), the same file input is used to set the bulk concentration explicitly (based on the initial particle number provided), and the chemical potentials listed are merely the excess terms. This formulation involves no massdependent terms, is numerically more stable (accuracy of exponentials), and provides an easy reference limit for dilute solutions (zero excess chemical potential).To illustrate the difference in implementation, consider the additional contribution to the acceptance probability (term c_{b} in description of keyword MC_ACCEPT) of a particle insertion attempt:
Mode 1:
c_{b} = e^{βμideal} · e^{βμexcess} · V· (N+1)^{1}· ζ^{1}
Here, V is the system volume, N is the current number of particles of the type to be inserted, μ_{ideal} and μ_{excess} are the components of the chemical potential, and ζ is the aforementioned thermal volume.
Mode 2:
c_{b} = e^{βμexcess} · <N> · (N+1)^{1}
This equation contains the expected bulk concentration as <N>.
While numerically the two cases can be made equivalent, the latter contains a selfconsistency check by being able to compare the measured <N> to the assumed <N> given the chosen μ_{excess}. In the former, the assumed <N> is unknown, because the partitioning between μ_{ideal} and μ_{excess} is not explicit. For a singlecomponent system (or a system with multiple independent components), the measured <N> can be used to derive the μ_{excess} that the simulation essentially corresponded to. With dependent components, however, this becomes very difficult to adjust. For general calibration strategies of excess chemical potentials and background, see references.
DYNAMICS
This is one of the core keywords and specifies how to sample the system:1) Pure Monte Carlo sampling (see keyword MC_ACCEPT and section on Monte Carlo move sets)
2) Molecular Dynamics:
Integration of Newton's equations
of motion. In internal coordinate space (see CARTINT), this is fully
supported, but is based upon an unpublished algorithm. Some more details are found in
the documentation to keywords TMD_INTEGRATOR
Specifically:  Dynamics are performed on internal degrees of freedom which are assumed to be independent (rigid body translation, rotation around the Cardinal x, y, and z axes of the laboratory frame (static) centered at the center of mass of each molecule, torsional degrees of freedom).
 Dynamics for polymers vary along the chain (faster at the termini) as they should, but this does not happen in any fashion proven to comply rigorously with a specific dynamics. By altering the chain alignment mode, more exotic dynamics can be produced. This is because the building directions of any polymer chains represent an arbitrary choice in the method.
 By assuming a diagonal mass (inertia) matrix (viz., a block of the mass metric tensor), applicability of simple integrators is a given. In the absence of interactionbased forces, the goal is to preserve rotational kinetic energy (but not angular momentum) by considering the effective masses associated with various rotational degrees of freedom as timedependent variables in a discrete integration scheme. This treatment is intrinsically consistent, and agreement with data obtained from Monte Carlo simulations has been shown (for select cases). However, no generalized proof exists for thermodynamic averages obtained with this method. CAMPARI provides a simple diagnostic of the impact of assuming a diagonal mass matrix by printing kinetic energies in both internal and Cartesian coordinates to logoutput.
 Because the algorithm does not produce dynamics that obey Gauss' principle of least constraint or conserve angular momentum, integrator stability can be inferior to that for a case of identical constraints realized as holonomic constraints in Cartesian molecular dynamics. This effect cannot always be quantified since the holonomic constraints implied by the internal coordinate space treatment often become too highly coupled (→ SHAKEMETHOD). Select cases with quickly varying masses highlight the effect, and the most significant example are probably rigidbody simulations of water (water has tiny rotational inertia and is a prototypical test case for rigidbody integrators). Quantification of relative integrator stabilities for such a case can be performed.
 Subtle equipartition artifacts (i.e., some individual or collective degrees of freedom heating up at the expense of others because they are either more susceptible to integration error or weakly coupled to the rest of the system) can always occur. Effects differ between internal coordinate and Cartesian treatments. This is because dihedral angles will generally have a rather different level of energetic coupling and integration stability than the positional coordinates of an atom embedded in a polyatomic molecule.
Sampling all Cartesian coordinates of all atoms
represents the more canonical approach to molecular dynamics. These
algorithms are conceptually much simpler and hence  from a theoretical point of view 
more robust. In practice, however, an entire construct of additional procedures is needed
in almost all cases, for example the enforcement of holonomic constraints through appropriate algorithms such
as SHAKE or LINCS. Most threepoint water models
are explicitly calibrated as rigid models, and it is therefore
necessary to maintain water geometry as a set of holonomic constraints throughout
a Cartesian dynamics simulation. However, most of these procedures aim at improving simulation efficiency and
overcoming inherent time step limitations in the Cartesian treatment.
3) Langevin Dynamics: Integrations of Langevin equation
of
motion: this is supported via the impulse integrator due to Izaguirre
and Skeel (reference). With respect
to the torsional dynamics implementation, the same caveats apply as for
Newtonian dynamics. There is an additional limitation in that the only implementation supported
currently is an approximate scheme (corresponding to keywords
TMD_INTEGRATOR being 2 and
TMD_INT2UP being 0). This is because the
structure of the impulse integrator is more complex, thus allowing a straightforward
extension to our torsional dynamics only for the simplest case (research in progress).
Note that all LD simulations work in the fluctuationdissipation limit,
which means that all degrees of freedom are automatically coupled to a
heat bath, and which assumes an underlying continuum providing frequent collisions
as the source of the stochastic term as well as the frictional damping. In addition,
note that hydrodynamic interactions are neglected and that
currently there is only a single, uniform frictional parameter for all
degrees of freedom (see FRICTION). The
latter is a major and nonobvious assumption in internal coordinate spaces
featuring polymers with flexible dihedral angles. This is because it is
not clear what the frictional drag incurred by rotations around molecular bonds is and
what the results of ignoring communication between these drag effects are.
5) Mixed Monte Carlo and Newtonian (Molecular) Dynamics: This hybrid method mixes MC with MD sampling
and assumes consistency of ensembles at all times. Since MC sampling only
supports the canonical ensemble at the moment, this means that Newtonian MD has
to be performed with a thermostat preserving the correct ensemble, e.g.,
the Andersen or Bussi et al. schemes.
Then, the entire trajectory should be treatable as a Markov chain and
analysis is performed as if the sampling engine were one of the two.
A potential caveat lies in velocity autocorrelation. The method is implemented such that segments of MC sampling alternate with MD segments. Upon switching from MC to MD, new velocities are assigned from the proper Boltzmann distribution. This may introduce some amount of noise. Aside from this particular concern, all independent concerns about both Monte Carlo and dynamicsbased methods apply. It is up to the user to ensure that either sampler yields the required ensemble rigorously.
A particular concern lies with the selection of degrees of freedom. In general, it will be highly desirable for the set of sampled degrees of freedom to be exactly identical between the two samplers. This is not always possible, however, e.g., when sampling sugar pucker angles in MC, but not in dynamics. In these scenarios it will be desirable to use short segments lengths in order to improve the chances of convergence (in the given example, convergence is unlikely if long dynamics segments only "see" few frozen conformations of the sugar pucker states in the system). This issue is particularly difficult in mixed Cartesian/internal coordinate space simulations attainable by selecting a hybrid scheme here and 2 for CARTINT. Some improvement can be made by including geometric constraints in Cartesian space, but a rigorous match will generally be out of reach.
Technically, the simulation simply alternates between MCbased and dynamicsbased segments whose minimum and maximum lengths are controllable by the user (→ keywords CYCLE_MC_FIRST, CYCLE_MC_MIN, CYCLE_MC_MAX,CYCLE_DYN_MIN, and CYCLE_DYN_MAX).
6) Minimization: A potential caveat lies in velocity autocorrelation. The method is implemented such that segments of MC sampling alternate with MD segments. Upon switching from MC to MD, new velocities are assigned from the proper Boltzmann distribution. This may introduce some amount of noise. Aside from this particular concern, all independent concerns about both Monte Carlo and dynamicsbased methods apply. It is up to the user to ensure that either sampler yields the required ensemble rigorously.
A particular concern lies with the selection of degrees of freedom. In general, it will be highly desirable for the set of sampled degrees of freedom to be exactly identical between the two samplers. This is not always possible, however, e.g., when sampling sugar pucker angles in MC, but not in dynamics. In these scenarios it will be desirable to use short segments lengths in order to improve the chances of convergence (in the given example, convergence is unlikely if long dynamics segments only "see" few frozen conformations of the sugar pucker states in the system). This issue is particularly difficult in mixed Cartesian/internal coordinate space simulations attainable by selecting a hybrid scheme here and 2 for CARTINT. Some improvement can be made by including geometric constraints in Cartesian space, but a rigorous match will generally be out of reach.
Technically, the simulation simply alternates between MCbased and dynamicsbased segments whose minimum and maximum lengths are controllable by the user (→ keywords CYCLE_MC_FIRST, CYCLE_MC_MIN, CYCLE_MC_MAX,CYCLE_DYN_MIN, and CYCLE_DYN_MAX).
This uses the potential energy gradient to
steer the system to a near minimum through a
variety of techniques (see MINI_MODE).
Minimization is not a technique to sample phase
space in terms of a welldefined ensemble, and the closest approximation of its results is
probably that of a locally sampled constantvolume (NVT) condition at extremely low temperature.
In general, minimizers are apt at finding local, but not global minima.
Note that these algorithms are still numerically discrete schemes, i.e.,
they employ finite step sizes. This means that
irrespective of any theoretical guarantees or expectations an algorithm offers,
results may not always be as straightforward. In addition, minimizers
are poor tools if the basic step sizes should be heterogeneous for different
degrees of freedom, e.g., for a dilute phase of LennardJones atoms or clusters.
7) Mixed Monte Carlo and Langevin Dynamics: This is analogous to 5) only that Newtonian
dynamics are replaced with Langevin dynamics (see 3). (example reference)
To be added in the future are:
4) Brownian Dynamics
Note that in all of the above methods relying on forces (options 27), it is very likely that optimized loops will be used (depending on settings for the Hamiltonian). These currently have the property of using stackallocated array variables that may become large if cutoff settings are very generous or if no cutoffs are in use. This may lead to unannotated segmentation faults (depending on compiler, architecture, and local settings). There are several workarounds (on Unixsystems, the shell command "ulimit" can for example be used to increase stack size for the local environment) some of which will be compilerspecific (for example to force the compiler to always allocate local arrays from the heap). Stack access is faster and therefore generally desirable in the speedcritical portions of the code.
MC_ACCEPT
If the simulation uses (at least partially) Monte Carlo sampling, this very important keyword allows the user to choose between (currently) three different types of acceptance rules for MC moves that are as follows: The Metropolis criterion is used. A random number sampled uniformly over the interval is compared to the term c_{b}·e^{β ΔU}. Here, ΔU is the difference in (effective) energy of the new vs. the original conformation (U_{new}  U_{old}), β is the inverse temperature, and c_{b} is a bias correction factor that is specific to the move type. If the random number is less than the term above, the move is accepted. Note that c_{b} can encompass different types of bias. It is also important to keep in mind that some advanced move types may imply incorporating biasing terms during the picking of a new conformation (see TORCRMODE), and no longer show up in c_{b}. The Metropolis criterion has the advantage that it is rejectionfree in the limit of no energetic or other biases. With a nonzero energy function in place, the distribution sampled from is the Boltzmann distribution.
 A Fermi criterion is used. A random number sampled uniformly over the interval is compared to the term (1 + c_{b}^{1}·e^{β ΔU})^{1}. If the random number is less than the term above, the move is accepted. The Fermi criterion's only advantage over the Metropolis criterion is that it defines an actual probability on the interval [0,1]. The downside is that the limiting acceptance rate is only 50%. However, the impact is much weaker if ΔU is relatively large on average (in absolute magnitude). The sampled distribution is again the Boltzmann distribution.
 A WangLandau / Metropolis criterion is used. A random number sampled uniformly over the interval is compared to the term c_{b}·e^{β Δln T} or to the term c_{b}·e^{β ΔU  Δln T} (see keyword WL_MODE). Here, Δln T is the difference in the logarithms of the current and proposed estimates of the target distribution (e.g., the density of states), i.e., Δln T = ln T_{new}  ln T_{old}. The WangLandau algorithm is explained in detail elsewhere, but it should be pointed out that the sampled distribution is no longer the Boltzmann distribution (instead it is illdefined, and the simulation results require snapshotbased reweighting), the simulation does not satisfy detailed balance (the estimate of the density of states changes continuously), and convergence/errors are much more difficult to assess (since the method is essentially an iteration and not an equilibrium sampling scheme). It is crucial to keep in mind that the standard Metropolis criterion is used while the simulation has not exceeded the number of equilibration steps. This is mostly to avoid range problems when starting from random initial configurations.
FRICTION
This keyword allows the user to specify the uniform damping coefficient acting on all degrees of freedom. The value is interpreted to be in ps^{1}. Currently, this is only relevant if DYNAMICS is set to either 3 or 7. In Langevin dynamics, the velocity damping through friction is given by e^{γ·δt}. Here, γ is the damping coefficient, and δt is the integration time step (see TIMESTEP). Note that in Cartesian dynamics (see CARTINT) each degree of freedom is an orthogonal direction of the Cartesian movement of each atom. Typically, Langevin dynamics integrators may make the friction on those degrees of freedom dependent on atom mass but CAMPARI does not support this at the moment since the hydrodynamic properties of individual atoms are poorly described in any case. Conversely, in torsional dynamics, the rigidbody and torsional degrees of freedom of each molecule are integrated and the friction is applied uniformly to all of those. This means that hydrodynamic properties are  again  illrepresented. Bias torques on account of variable effective masses for most dihedral angle degrees of freedom will continue to be in effect (see elsewhere).When applying Stokes' law (which should be inapplicable when the diffusion object is strongly aspherical and/or of similar size compared to the molecules comprising the surrounding fluid) to the selfdiffusion of water, the measured diffusion constant of around 2.3·10^{9} m^{2}s^{1} is roughly consistent through the EinsteinStokes equation with the measured viscosity of about 8.9·10^{4} kgm^{1}s^{1} (both at 25°C). By dividing by the mass, a damping constant of about 90ps^{1} can be obtained from the Stokes approximation. When performing stochastic dynamics simulations of large, spherical rigid bodies, such a value may be appropriate. For molecular simulations, however, it is not. First, in conjunction with typical time steps, the value is so large that the impulse integrator in use (→ DYNAMICS) can no longer sample the correct ensemble (it becomes overdamped implying temperature artifacts). Second, in a Cartesian treatment, unless one samples a monoatomic fluid of inert particles, the correlations between particles are so high that a treatment as independently diffusing spheres is not just inaccurate, but nonsensical in the absence of hydrodynamic interactions. Third, in internal coordinate spaces, the individual degrees of freedom hardly ever fit the Stokes approximation. Torsional and rigidbody rotational degrees of freedom would require a completely different model of friction. Furthermore, unlike in a Cartesian treatment, the degrees of freedom are not all similar to one another. The above means that the damping constant should be understood as an empirical parameter. Better control over values for individual degrees of freedom will be implemented in the future. It defaults to a value of 1.0 ps^{1} on par with the coupling times of thermostats in molecular dynamics (→ TSTAT_TAU).
CYCLE_MC_FIRST
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the length of the first segment (in number of steps) which is always a MC segment. This is to ensure that hybrid runs can safely be started from poorly equilibrated (random) structures where forces are large and integrators quickly become unstable.CYCLE_MC_MIN
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the minimum length of MC segments (in number of steps) with the exception of the first segment.CYCLE_MC_MAX
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the maximum length of MC segments (in number of steps) with the exception of the first segment.CYCLE_DYN_MIN
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the minimum length of dynamicsbased segments (in number of steps). This should probably be significantly larger than the velocity autocorrelation time of the system.CYCLE_DYN_MAX
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the maximum length of dynamicsbased segments (in number of steps).PH
This keyword sets the assumed simulation pH which currently possesses significance for titration moves only → PHFREQ. This keyword may later be extended to represent the assumed (bath) pH in constantpH simulations.IONICSTR
This keyword sets the assumed simulation ionic strength for simplified pK_{a} computations. The units are molar (M). Ionic strength is used in a grossly simplified DebyeHückel approach to estimate crossinfluences between multiple ionizable sidechains on a polypeptide ( see PHFREQ). Note that this keyword can not be used to set an assumed ionic strength for the generalized reactionfield method (see RFMODE).RESTART
A simple logical whether to restart a previously discontinued run:This keyword tells the program to attempt to restart a simulation which was accidentally or intentionally terminated. The program writes out ASCIIfiles containing all relevant information in high precision (see RSTOUT). This file (one for each node in MPI calculations) is called {basename}.rst (see elsewhere). If it is successfully read, the simulation is extended from the simulation step the file was last written for. Nonsynchronous MPI runs are synchronized to the step number of the slowest node. Note that instantaneous output of the crashed run should be saved separately (i.e., moved to another directory) since with the exception of running trajectory pdb/xtc/dcdoutput new files will replace the old ones. All noninstantaneous analysis of the crashed run is unfortunately lost. The simulation will then proceed starting effectively at that step, so the same keyfile (with the exception of the RESTARTkeyword itself of course) can be used. If it is past the equilibration step, onthefly analysis will begin immediately. Final output will reflect only the restarted portion of the run. The program will acknowledge in the logfile that it's restarting, and will post a warning message if the energies of the structures reported in the restartfile and recomputed by the program are inconsistent. Note that it is  rigorously speaking  only safe to restart the exact same calculation, since the information contained in the restart file will depend on the type of calculation performed. It will often be possible to start MC runs (see DYNAMICS) from a nonMC restart file, however. For the opposite and all other cases, consider using the auxiliary keyword RST_MC2MD. Finally, it should be noted that in dynamics calculations the restart is not fully deterministic (i.e., it deviates from the original run (which is typically unknown) after a few thousand steps depending on the system). The reasons for this behavior mostly lie in the lack of precision of the data in the restart file.
RST_MC2MD
This is a rather specialized keyword meant for the specific case of (re)starting a dynamics run from a restartfile generated by an MC run. In this case, the restart file is shorter and only contains atomic positions, the Zmatrix, and whatever else is necessary. When set to 1, this keyword instructs the restartfile reader to assume the MC format even though the run is set to be a dynamics run (see DYNAMICS). Initial velocities are then generated from a Boltzmann distribution using the bath temperature (see TEMP). Ff this keyword is not set, an attempt to read mismatched restart files will crash the program (most likely in a segmentation fault). This is due to the rigid assumed formatting. The inverse procedure (reading a restart file generated by a dynamics run as the starting point for an MC run) is currently not supported. Note that the typical application for this is to use MC for equilibration of a system and to continue the run using a dynamics sampler. In singleCPU calculations, this simplifies the overall procedure and avoids using the lowprecision pdb format as an intermediate step. For replicaexchange runs (see REMC), restart files are actually the only option which allows starting the individual nodes from individual, nonrandom conformations stored in an input file. The primary application for this keyword therefore probably lies in replicaexchange molecular dynamics runs which use ReplicaExchange Monte Carlo runs for equilibration purposes.DYNREPORT
This minor keyword is a simple logical which ensures that in calculations with different temperaturecoupling groups a summary is provided of the partitioning in that regard.CHECKGRAD
This keyword is a simple logical which instructs CAMPARI to test the gradients for the current calculation given the Hamiltonian, system, and starting structure. It tests Cartesian gradients first, followed by the transformed gradients acting on the internal degrees of freedom (if settings allow that: see CARTINT). It is mostly for developer's usage and creates at most two undocumented output files: NUM_GRAD_TEST_XYZ.dat and NUM_GRAD_TEST_INT.dat). The procedure works by numerically computing gradients using pure energy routines (finite differencing) and juxtapositioning the analytical solution. It is slow and can sometimes be misleading or uninformative for the following reasons: For just a single molecule, rigidbody gradients are always net zero (outside of boundary contributions).
 The dynamics Hamiltonian must be identical to the MC Hamiltonian (in particular see LREL_MC and LREL_MD).
 For Cartesian gradients to be accurate, no strictly torsional space Hamiltonian terms should be used (see for example SC_ZSEC and SC_TOR). For those, Cartesian gradients are circumvented unless CARTINT is 2.
UNSAFE
This keyword is a simple logical (default off) which allows selected fatal errors to be transformed into warnings (for example the simulation of systems which are not netneutral). It should be used with caution (obviously) and the logoutput should always be studied meticulously. In addition, enabling unsafe execution may skip some costly sanity checks, e.g., when reading in trajectories in pdb format.CRLK_MODE
CAMPARI currently provides limited support in dealing with chemical crosslinks which either create one (or multiple) intramolecular loops, or link multiple molecules together. For forcebased sampling in Cartesian space only (see CARTINT and DYNAMICS), this functionality matters exclusively for the following reasons: A chemical crosslink can be thought of as a branch in the mainchain. Such nonlinear polymers violate CAMPARI's model of identifying topologically connected sequence neighbors purely based upon primary sequence. Therefore, nonbonded interactions have to be corrected if the two residues in question are crosslinked to each other (to comply with the settings provided via INTERMODEL and ELECMODEL). This is supported by CAMPARI independent of crosslink type (even though there currently are only disulfide linkages supported → sequence input).
 A single intermolecular crosslink essentially merges two molecules into a single one. However, CAMPARI continues to treat both chains as if they were independent molecules. This has a variety of reasons most of which pertain to the consistency of internal data representation and to the support of internal analysis routines. One area where this is tricky is for simulations in periodic boundary conditions (→ BOUNDARY), as shift vectors are generally applied only to intermolecular contacts. For two crosslinked molecules, this continues to be the case thereby allowing  given a poor simulation system setup  the theoretical possibility of one of the two crosslinked molecules to interact with parts of different images of the other molecule. Trajectory output may also appear confusing for the same reason.
 New bonded interactions are created which have to be correctly accounted for. In accordance with the previous point this implies that distance vectors have to be imagecorrected in periodic boundary conditions even for those. For the crosslink to be actually established it is necessary that the parameter file offer support for the required bond length, angle, and dihedral terms. This is of course true for any topological interaction in a Cartesian treatment. Request a report to obtain more information at the beginning of the simulation.
 For random initial structures it will be necessary for the crosslink to be satisfied to allow stable integration of the equations of motion. This is elaborated upon elsewhere.
 If the ABSINTH implicit solvation model is used (→ SC_IMPSOLV), the crosslink usually modifies two solvation groups (one on each "side") to yield a single new unit. CAMPARI will typically split this group such that the solvation groups may remain associated with their "host residue".
 The crosslink is treated as restraints and the sampler is unaware of its explicit existence.
 The crosslink is treated as a set of (hard) constraints and the sampler is adjusted to preserve these constraints. This mode is currently under development and not yet supported.
The latter is the primary reason for supporting mode 2 in the future. Here, the move set will be explicitly adjusted to only allow moves which automatically satisfy the crosslink exactly. For torsional dynamics this option will be less useful as CAMPARI does not possess the capability to enforce highlevel loop closure constraints in torsional space and consequently all residues within the loop region would have to be completely constrained for the crosslink to remain intact exactly.
BIOTYPEPATCHFILE
This simple keyword lets the user provide the location and name of an optional input file that can be used to (re)set the assigned biotypes for specific atoms or groups of related atoms in the system. The corresponding biotype number has to be available (listed) within the parameter file in use. Biotypes are the most fundamental assignment for atoms within in CAMPARI and can indirectly set many other properties such as charge, mass, etc. This is explained in detail elsewhere. However, there are parameters not affected by biotype assignment, specifically the default geometries and parameters derived from them. This means that it is generally impossible to, for example, mutate a molecule into a different molecule using such patches. Applications of this type may be more feasible for simulations in Cartesian space.The main domains of application for biotype patches are twofold. First, they allow the fastest and most convenient route to include parameter support for atoms in residues not supported natively by CAMPARI (→ sequence input). Second, they allow to diversify a parameter file regarding natively supported residues, .e.g., by maintaining multiple parameterizations for a small molecule or by including extra distinctions for atoms in terminal polymer residues. Biotype patches are applied first and may be largely overridden by successive application of other patches, e.g., atom type patches, charge patches, etc.
MPATCHFILE
This simple keyword offers the user to provide the location and name of an optional input file that can be used to alter the masses of specific atoms in the system (in g/mol). Normally, masses are chosen for atoms based on the assigned atom types in the parameter file, and this behavior can be overridden by this keyword specifically for atomic mass. Note that this different from changing the atom type of the atom itself, for which a dedicated patch facility is in place. Some more details are given elsewhere.RPATCHFILE
Similar to keyword MPATCHFILE, this simple keyword offers the user to provide the location and name of an optional input file that can be used to alter specifically the radii of individual atoms in the system (in Å). By default, these radii are inferred either from the assigned atom types, i.e., computed from the LennardJones size parameters, or they are overridden at the level of the parameter file by the "radius" specifications. Because the latter still operate at the resolution of assigned atom types, this keyword offers an atomspecific override facility. Note that there is a distinct hierarchy to this. Specifically, changing the radius via a patch does not change the atom type for that atom. It does, however, alter the default values of parameters that depend on radius, such as maximum SAV fractions or atomic volume reduction factors, which are then again patchable themselves. Furthermore, a radius patch overrides a radius inferred by applying a patch to the LennardJones parameters of a specific atom. Details on the input are given elsewhere.WL_MODE
By specifying the WangLandau acceptance criterion for a (partial) Monte Carlo run, the WL method is enabled. This keyword defines the reaction coordinate of choice and the coupled pair to be iterated (see below). Suppose we have an augmented Hamiltonian as follows:H = K + λE + X(Y)/β
Here, K and E are kinetic and potential energies, β is the inverse temperature, and X(Y) is an unknown function of a selected reaction coordinate. The factor λ can be either 0 or 1. Assuming that the Hamiltonian is separable, expected sampling weights from the Boltzmann distribution for the augmented Hamiltonian are:
w(Y_{1})/w(Y_{2}) = (p_{λ}(Y_{1})/ p_{λ}(Y_{2})) exp[X(Y_{2})−X(Y_{1})]
Here, p_{λ}(Y) is the expected probability (usually treated numerically as the integral over a finite interval, i.e., by binning). If λ is 1, it corresponds to the equilibrium (Boltzmann) probability for the original Hamiltonian. Conversely, if it is 0, p_{λ}(Y) corresponds to the density of states (distribution as T→∞). If Y=E, p(E) can be written simply as p(E) = g(E) exp(λE/β), with g(E) being the density of (energy) states. This simple form is not available for other reaction coordinates. The WangLandau method's key ingredient is choosing X(Y) such that w(Y_{i})/w(Y_{j}) = 1 ∀ i,j over an interval of interest. This statement is equivalent with the definition of a flat walk in the space of Y. A flat walk eliminates all barriers in the projected space of Y and should therefore be efficient at exploring phase space (see associated keywords for details on this). The main use of the flatness is as a diagnostic, however, and the WangLandau algorithm uses X(Y) and the apparent distribution in Y as a coupled pair to iteratively build up X(Y). If the apparent distribution becomes flat, confidence rises that X(Y) corresponds to the target distribution of interest. The target distribution is set by this keyword:
 The target distribution is ln g(E) (arbitrary offset). This is achieved by letting λ be zero and Y=E. This is also the implementation chosen in the original publication. Interest in the density of states comes from the fact that it (theoretically) enables reweighting of the flatwalk ensemble to any condition of interest. This is the default.
 The target distribution is ln p(Z) or ln p(Z_{a},Z_{b}) (arbitrary offset), where the Z are geometric reaction coordinates (→ WL_RC) restricted to specific molecules (→ WL_MOL). By letting λ be unity, the target distribution is actually the potential of mean force (PMF) for that (pair of) reaction coordinate(s). Unlike for umbrella sampling (see, e.g., Tutorial 9), it is obtained without further postprocessing. This variant was introduced here. As stated, it is possible to estimate a twodimensional target distribution.
 The target distribution is ln p(E) or ln p(E,Z) (arbitrary offset). This is achieved by letting λ be unity and Y=E. In comparison to the first option, this will oversample low likelihood states rather than low degeneracy states. It can be combined with a geometric reaction coordinate (Z) in a twodimensional approach.
A few technical comments are necessary. First, the WangLandau acceptance criterion can be combined with a hybrid sampling technique. In such a case, the dynamics segments will propagate the system as usual, but will contribute in no way to the WangLandau histograms. They merely serve to evolve the system to find new states that may be hard to access given the Monte Carlo sampler. The MC segments will utilize the WangLandau criterion and increment the histograms. As a result, it may be possible that a dynamics segment starts in a high energy state. This may make the integrator unstable initially, and cause unforeseen crashes. Second, WangLandau sampling is also supported in parallel runs. For pure Monte Carlo simulations, the MPI averaging technique implies a parallel WangLandau implementation, i.e., an implementation in which the histograms are updated globally. WangLandau sampling is also supported in conjunction with the replicaexchange method, but here each replica is confined to its own iterative WangLandau procedure (since the Hamiltonians are most likely different).
WL_MOL
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, and if a molecular reaction coordinate was chosen as the histogram to consider (→ WL_MODE), this keyword allows the user to select the molecule that the reaction coordinate is computed on. The numbering of molecules follows the userselected sequence in sequence input. Note that it is up to the user to ensure that the chosen reaction coordinate is defined and has a meaningful range for the chosen molecule (see WL_MAX, WL_EXTEND, and WL_BINSZ). If a twodimensional variant with two geometric reaction coordinates is chosen, it is theoretically possible to supply two different molecules here. Note that the effective coupling is likely to be low in this scenario, which may lead to poor convergence properties in the 2D space. In conjunction with WL_MODE being 3, specification of a legal entry for WL_MOL will extend the WL estimation of ln p(E) to a twodimensional case with an additional, geometric reaction coordinate (ln p(E,Z)). Note that this keyword is the only way to control the dimensionality for WL_MODE being either 2 or 3.WL_RC
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, and if a molecular reaction coordinate was chosen as the histogram (or as one or both axes of the 2D histogram) to consider (→ WL_MODE), this keyword allows the user to select amongst few geometric reaction coordinates as follows: The molecule's radius of gyration is used (default). The range of this quantity is difficult to predict and depends on the constraints in the system. For example, in Cartesian space, it will be advisable to restrict the range of the histograms (→ WL_MAX and WL_EXTEND) to those values that do not coincide with steric overlap (low end) or stretching of bonds (high end).
 The molecule's mean αcontent is used as defined for the global seconday structure biasing potential. The quantity always has finite range, but for small systems and typical settings, it exhibits sharp spikes connected by low likelihood regions that may challenge the discretization of the WL scheme.
 The molecule's mean βcontent is used. See previous option for details and caveats.
WL_HUFREQ
This is one of the keywords that controls the convergence properties of a WangLandau run. The target distribution in question is accumulated as a histogram (always logarithmic), and this keyword sets the frequency (step interval) for updating it with the current value of the f parameter, i.e., the current increment size (equivalent to multiplication by f in the linear space). The accumulation of the target distribution begins only after the equilibration phase has passed. Naturally, a small setting here will quickly increment the histogram, which may accelerate convergence (in case the effective "mobility" of the system defined by system properties and sampling engine is good enough). However, a small setting may also interfere with convergence because it emphasizes the noise in initial estimates of the target distribution (in absolute magnitude), and this may make it harder to refine the guess upon reductions of the f parameter (see WL_HVMODE and WL_FREEZE). The default choice is 10 elementary steps. Note that if the parallel WangLandau implementation is used, the step number provided refers to the sampling amount for each individual node.WL_HVMODE
This is one of the keywords that controls the convergence properties of a WangLandau run. It has been argued that the flatness of the accumulated histogram for the target distribution in question (usually tested via some maximum relative deviation criterion) is not generally useful as a criterion for considering a switch to the next stage of refinement (by lowering the f parameter), and can be replaced with a recurrence (minimum visitation) criterion (discussed for example in Zhou and Bhatt). This keyword selects two different options for such a recurrence criterion. Option 2 requires each (relevant) bin to be visited exactly once in every stage, whereas option 1 mandates that each bin be visited the nearest integer of 1/sqrt(f) times (at least once, though). In the parallel parallel WangLandau implementation, the condition will always be checked against the combined data. If the condition is fulfilled, and if the number of postequilibration WangLandau steps exceeds the buffer setting, ln f will be reduced (initial value set by keyword WL_F0) by a factor 2. Note that the f parameter is implied to operate on a logarithmic scale (same as target distribution) of counts to avoid numerical issues with large numbers. The rule used here is equivalent to the square root update rule suggested in the original publication. Belardinelli and Pereyra suggest that the exponential update becomes inappropriate for small f and CAMPARI implements their suggestion to switch over to f ∝ 1/N_{steps}, where N_{steps} is the current number of WL steps having being executed. In the parallel parallel WangLandau implementation, this implies the combined total of WL steps from all replicas. This modified update rule is implemented irrespective of the fulfillment of the criterion defined by WL_HVMODE.It is useful to keep in mind that option 1 will initially lead to fewer reductions of the f parameter, which may be beneficial for establishing correctness, and at the same time may be harmful for the rate of convergence. An issue often affecting convergence adversely are verylowlikelihood bins. In this context, it should be emphasized that the relevance of a bin toward defining flatness is partially controlled by keyword WL_FREEZE, which consequently serves two purposes, and partially controlled by the general range settings (WL_MAX, WL_EXTEND, and WL_BINSZ).
WL_FLATCHECK
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword can be used to control the step interval at which the evaluation of the visitation criterion for the temporary histogram is performed. If the parallel WangLandau implementation is used, this coincides with the requirement to (at least temporarily) combine the data from all replicas and therefore imposes a communication requirement. Should a check return a positive result, the temporary histogram is added to the overall estimate, the temporary histogram is reset to zero, and the f parameter is altered as described elsewhere. In the parallel version, additional operations are performed to broadcast the new total (combined) histogram identically to all replicas. In case the criterion is not fulfilled, the temporary histogram(s) is (are) left unchanged.The technical use of this keyword is twofold: First, to reduce communication requirements for the parallel implementation; second, to artificially delay the progression of the iteration. The latter can sometimes be useful for complex systems with strong degeneracy in the chosen reaction coordinate (also see WL_RC). Note that for the parallel code the step number provided refers to the sampling amount for each individual node.
WL_F0
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword defines the starting value for the f parameter (logarithmic). The f parameter is meant to decay from some positive number to 0, which corresponds to multiplicative factors larger than 1 reducing to 1 in the linear space. The default is 1.0. The number of reductions of the f parameter by the exponential rule (see elsewhere) is printed to log output. Depending on the properties of the system and the resultant convergence rate, the rule may change as described for WL_HVMODE.WL_MAX
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword sets the (initial) upper bound (given as the bin center of the last bin) of the energy or reaction coordinate histogram (→ WL_MODE and WL_RC). At the beginning, 100 bins of equivalent size are created. Depending on the choice for WL_EXTEND, the histogram and its upper limit may be extended throughout the simulation. It is safe to extend the histogram to values that are impossible to realize for the system in question, since bins that are strictly empty do not meaningfully contribute to the algorithm (see WL_FREEZE). CAMPARI accepts two separate entries for any 2D histogram. Note that the choice for this keyword may be overwritten if a dedicated input file is used to set an initial guess for the target histogram (→ WL_GINITFILE). The maximum value that will not trigger a range exception or an automatic histogram extension is of course the value given here plus half the relevant bin size.WL_BINSZ
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword sets the fixed bin size for the energy or reaction coordinate histogram (→ WL_MODE and WL_RC). At the beginning, 100 bins are created. Depending on the choice for WL_EXTEND, the histogram and its lower and upper limits may be extended throughout the simulation. However, the bin size will remain fixed. CAMPARI accepts two separate entries for any 2D histogram. Note that the histogram bin size and the initial number of bins may be overwritten if a dedicated input file is used to set an initial guess for the target histogram (→ WL_GINITFILE).WL_EXTEND
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword controls whether the energy or geometric reaction coordinate histogram (→ WL_MODE) is allowed to grow in range during the simulation. Choices are as follows: The histogram is fixed. Note that any WangLandau simulation performed over a restricted interval bares the danger of generating incorrect results even after reweighting. For common interaction potentials and standard energybased WangLandau sampling, this is particularly true for truncation of the energy histogram on the lower end.
 The histogram is allowed to grow only towards lower (more negative) values. This can be useful for energy histograms, where the initial energy range is not known.
 The histogram is allowed to grow in both directions. It is strongly recommended not to use this feature for energy histograms with a realistic interaction potential (since the energy is unbound on the positive side, and memory exceptions / segmentation faults are likely). This option is meant primarily for histograms defined purely on geometric reaction coordinates (→ WL_MODE).
WL_GINITFILE
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword allows the user to replace the default initial guess for the (logarithmic) target distribution with a usersupplied one. The default guess is flat. Supplying a nonflat guess can be useful in several scenarios: i) ongoing refinement of a WL run; ii) cases where a more useful "zero order guess" is available, e.g. an exponentially growing function for a condensed phase system with inverse power potentials; iii) convergence tests. The details regarding the format of this input file are provided elsewhere.WL_FREEZE
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword controls whether the range of bins in the energy or reaction coordinate histogram (→ WL_MODE) that is considered for proceeding to the next iteration stage (updating the value of the fparameter) is fixed after the first such update or not. The update procedure is described for keywords WL_HUFREQ, WL_HVMODE, and WL_FLATCHECK.Any positive integer specified here will prescribe a minimum number of preliminary simulation steps beyond equilibration that must be exceeded before an update of the fparameter is considered. After such an update, the range of bins considered for the histograms is the continuous one (and it must be continuous on account of the update rule) currently populated. If during further simulation steps additional bins were to be visited, those moves are instead considered as range exceptions and are rejected (the summary statistics provided in logoutput for range exceptions can therefore contain results from two different contributions → WL_EXTEND). Any negative number provided will specify by its absolute value the aforementioned minimum number of preliminary steps in identical fashion. However, in this case, CAMPARI is instructed to allow further bins to be added for consideration during later stages of the algorithm. Note that this violates the refinement idea behind the WangLandau scheme, and can lead to severe convergence problems due to the numerical mismatch created by the extra bin "missing out" on fincrements during early stages of the algorithm. It is therefore strongly recommended to choose a relatively large and positive number for this keyword (to ensure that appropriate coverage of the accessible range has been reached).
Note that if the parallel WangLandau implementation is used, the step number provided refers to sampling amount for each individual node.
WL_DEBUG
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this simple logical allows the user to request debugging information regarding the WangLandau iterative algorithm. If turned on, CAMPARI will report in logoutput the progression through the various updating stages and may  depending on settings  also write temporary output files for the relevant histograms.Box Settings:
(back to top)
BOUNDARY
Every simulation has to occur within an explicitly or implicitly defined volume. CAMPARI presently supports three ways of defining this volume listed below: For constant volume ensembles (→ ENSEMBLE), naturally the volume remains exactly constant throughout the simulation What type of boundary condition to use, there are currently three available and one obsolete choice: Periodic boundary conditions (PBC):
This is the most commonly used boundary condition in molecular simulations. Here, the polyhedral simulation cell is assumed to be replicated as a  theoretically infinite  periodic system around the central one (which constitutes the actual, physical simulation container). The implementation is such that all distance calculations are amended by determining the smallest distance amongst those between a particle and any of the replicated images of another particle. This socalled minimum image convention implies that for normal pairwise interaction potentials (for example SC_IPP) a particle only interacts with at most one "version" of another particle, never two or more. The idea of PBC is borrowed from crystals in which the assumption of periodicity is justified given that the simulation volume can be chosen such that it coincides with the crystal's unit cell (or exact multiples thereof). Conversely, in liquids there is no persistent longrange order (homogeneous density, no pair correlations), and the approximation of a system of thermodynamic size by infinite replication of a nanoscopic system is at least questionable. Given typical cutoff schemes, however, the contribution of longerrange interaction is often exactly zero unless explicit techniques are used enumerating the periodic sum (→ Ewald summation). This means that the actual impact of PBC is often just to mimic a continuous environment for particles close to the edge of the physical simulation volume. Note that no realspace interaction cutoff should exceed half the shortest linear dimension realizable in the simulation volume since otherwise it becomes possible for multiple images of the same particle to be within interaction distance. In conjunction with the minimum image convention cited above, this invariably leads to artefactual results (reference). Note that in CAMPARI the convention of using the nearest image operates at the molecule level, i.e., the general rule is that intramolecular distances always refer to atoms in the same image of a molecule. CAMPARI will occasionally warn users about cases where an image interaction would be within the cutoff distance, but these warnings are not part of all routines (for efficiency reasons). Enabling boxconsistent trajectory output may help in diagnosing such issues independently.
 Hardwall boundary condition (HWBC):
This option is obsolete and cannot be selected. It may be reactivated in the future to enable simulations in containers with hard, particle momentumconserving (i.e., reflective) walls.  Residuebased softwall boundary condition (RSWBC):
In simulations employing a continuum description of solvent, the resultant density is almost always low, in particular in the limit of simulating just a single macromolecule. In those cases, it may neither be meaningful nor beneficial to introduce additional replicas of the simulation cell. CAMPARI offers to define a systemvolume via a softwall for such a scenario. Here, the simulated particles are prevented from leaving a spherical simulation volume (droplet) by an applied boundary potential modeled as:
E_{BND} = Σ_{i} k_{BND}·H(r_{i}r_{D})·(r_{i}r_{D})^{2}
Here, r_{i} is the distance from a suitable reference point on residue i to the simulation sphere's origin, r_{D} is the sphere's radius, k_{BND} is the force constant and H(x) is the Heaviside step function. A hardwall boundary may be approximated by letting k_{BND} → ∞. Note that these means that the boundary penalty is imposed on the reference atom of each residue (for peptide residues this is always Cα). This may lead to potential boundary artifacts with parts of large residues sticking out of the sphere and hence being deprived of interactions with smaller residues. Additionally, it must be pointed out that softwall boundary conditions lead to somewhat illdefined system volumes since the code assumes the fixed volume inside the boundary to be the system volume whereas realistically it should be slightly extended depending on temperature and stiffness. The latter is not easily computed, however, since 1) the purely kinetic (entropic) pressure may be altered by the presence of nonrigid molecules, and 2) the virial pressure is generally unaccounted for. Hence, an exact volume is only recovered in the limit of an infinitely stiff boundary (HWBC).  Atombased softwall boundary condition (ASWBC):
This option is analogous to the previous (RSWBC) option only that the boundary term is computed for each atom in the system:
E_{BND} = Σ_{j} k_{BND}·H(r_{j}r_{D})·(r_{j}r_{D})^{2}
Here, r_{j} is the distance of atom j from the simulation sphere's origin, and all other quantities are the same as defined above. This will minimize artifacts of the aforementioned type but is also the most expensive droplet BC to compute. Because multiple atoms will contribute to the boundary penalty for each residue, it is generally recommended to use smaller force constants than for the RSWBC.
SHAPE
This keyword lets the user specify the shape of the simulation container the system is enclosed in. It is mostly useful for future extensions since as of now the type of boundary condition will determine the shape of the container unequivocally. The "choices" are: Rectangular cuboid (= rectangular parallelepiped)
 Sphere
ORIGIN
This keyword lets the user set the origin of the simulation system as a vector of three elements (xyz). The reference point depends on the container's shape and is its origin for a sphere and its lower left corner for a cuboid. Note that for simulations started from "scratch" (no structural input), this keyword is mostly irrelevant. There are two things to consider, though: Structural output may be compromised if values are used that are far away from zero. This is because binary trajectory files and in particular the strictly formatted PDBfiles have finite representation widths and fixed units (Å or nm) such that output may be severely compromised. It is therefore recommended to adjust this keyword such that the minimum and maximum values for Cartesian coordinates (largest dimension) are symmetric around the origin of the coordinate system.
 If structural input it used, it is strongly recommended to match the settings for ORIGIN to that implied in whatever structural input is provided. In droplet BC, it may otherwise occur that parts of the system overlap with the illplaced droplet boundary and that their internal arrangement is destroyed or that the simulation explodes during the first few steps of simulation.
SIZE
This keyword allows the user to define the size of the simulation container. Depending on its shape, SIZE takes on alternative meanings. If the system volume is spherical, just one real number is needed that specifies the sphere's radius. Conversely, if the container is a rectangular cuboid, a vector of three floatingpoint numbers is read in that specifies the three side lengths of the cuboid in the x, y, and zdirections, respectively. Note that highly asymmetric boxes often place very stringent settings on cutoffs since it is generally the shortest dimension that matters.SOFTWALL
This keyword sets the harmonic force constant both for the residuebased and the atombased SWBCs (see BOUNDARY). It is to be provided in units of kcal·mol^{1}Å^{2} and corresponds to parameter k_{BND} in the equations above.Integrator Controls (MD/BD/LD/Minimization):
(back to top)
TIMESTEP
If any dynamicsbased (including hybrid methods of course) method is used, this keyword lets the user set the integration time step for the integrator in units of ps.CARTINT
This keyword determines  at a very fundamental level  the choice of degrees of freedom that CAMPARI shall sample. The "native" CAMPARI degrees of freedom are the rigidbody coordinates of all molecules and a subset of internal coordinates (almost exclusively freely rotatable dihedral angles). This option is the default and specified by choosing 1 for this keyword. Alternatively, the Cartesian positions of all atoms in the system may serve as the underlying degrees of freedom as is commonly the case in molecular dynamics calculations (option 2). There are several very important limitations and considerations that are mentioned throughout the documentation and reiterated here. CAMPARI does not support the direct sampling of Cartesian degrees of freedom in Monte Carlo simulations. This applies to the MC portion of hybrid simulations as well. While it is trivial to design and implement simple move sets doing precisely that, their efficiency is negligible due to the large amount of motional correlation present between an atom and its immediate molecular environment.
 Internal space simulations do not require the full amount of bonded interaction parameters that are typically part of molecular mechanics force fields, specifically no bond length terms, and typically no or very few improper dihedral and bond angle terms (→ PARAMETERS).
 For freely rotatable dihedral angles, there is a distinction between those deemed important vs. those deemed unimportant. Details are listed in the documentation for providing sequence input. These choices generally pertain to methyl groups and/or to bonds describing electronically hindered rotations with identical groups. The resultant sets of degrees of freedom are not always entirely consistent (.e.g., between polypeptide sidechains and their respective small molecule model compounds). Related keywords are OTHERFREQ (MC) and TMD_UNKMODE (dynamics).
 While unsupported residues pose no problems in the setup of Cartesian coordinates, internal coordinate space simulations need to infer which dihedral angles are rotatable from the input topology. This happens automatically and is described elsewhere. For eligible dihedral angles not identified with standard polypeptide or polynucleotide backbone angles, relevant keywords are again OTHERFREQ (MC) and TMD_UNKMODE (dynamics).
 The choice of degrees of freedom in internal coordinate space simulations can be customized rather flexibly by introducing additional constraints (see corresponding input file). For MC simulations, the preferential sampling utility offers and additional level of control.
 Conversely, algorithms to enforce holonomic constraints in Cartesian space simulations are often limited to weakly coupled constraints (see SHAKEMETHOD for details). This means that it is not (yet) possible to mimic torsional space constraints in a Cartesian space run but that it is possible to follow a typical MD protocol by simulating a flexible macromolecule with some bond length constraints in a bath of rigid water molecules.
 The existence of virtual sites (effectively atoms with no mass) poses stringent requirements to Cartesian dynamics, in that those sites have to be constrained exactly relative to real atoms. At each integration time step, the forces acting on these sites are transferred to the surrounding atoms, and their positions are rebuilt post facto (see elsewhere for more details). Virtual sites in internal coordinate space simulations can only cause issues if a degree of freedom's effective mass depends solely on such sites. Then, CAMPARI will automatically freeze the corresponding degree of freedom.
TSTAT
This keyword lets the user choose the thermostat to be used to generate an NVT (or NVTlike) ensemble in dynamics simulations using a Newtonian formalism (option 2 or 5 in DYNAMICS). Currently, three options are fully supported: Berendsen weakcoupling scheme (reference):
This is a deterministic and global velocity rescaling scheme which creates an exponential relaxation toward the target temperature. The velocity rescaling factor is computed for each coupling group (see TSTAT_FILE) according to:
f_{v,i}^{2} = 1.0 + (δ_{t}/τ_{T})·[ (T_{target}/T_{i})  1.0 ]
As is apparent, whenever the instantaneous group temperature (T_{i}) matches the ensemble target (T_{target}), velocities are not rescaled (f_{v,i} is unity). Any deviations from T_{target} will lead to a systematic rescaling for all velocities that are part of the coupling group toward the target with a relative decay rate of τ_{T} (→ TSTAT_TAU). If τ_{T} approaches the discrete time step (δ_{t}), the relaxation becomes instantaneous. Note that the coupling of subparts of the system to essentially different thermostats is an obsolete method used in early days of simulations to prevent obscure freezing events sometimes encountered when the system is effectively partitioned into subsystems with very different levels of integrator stability, noise, and inherent relaxation. Then such an approach may circumvent the most dramatic pitfalls resulting from the inherent incorrectness of the weakcoupling scheme (and masking said incorrectness in the process). It is crucially important to realize that the Berendsen thermostat does not generate a welldefined ensemble and that the method only relaxes "safely" to the microcanonical one for τ_{T} approaching infinity. The quenched fluctuations observed in the Berendsen scheme may severely distort results on fluctuationsensitive computations such as free energy growth calculations (see GHOST).  Andersen scheme (reference):
The Andersen thermostat is a stochastic thermostat which introduces "collisions" rerandomizing the velocity associated with a given degree of freedom to one coming from the ensemble at that given temperature. This method is shown to sample the canonical ensemble and one of the recommended options for any calculation sensitive to the details of ensemble fluctuations. Implementationwise, it works by reassigning the velocity for each degree of freedom at each time step with a probability equivalent to δ_{t}/τ_{T}. This effectively gives rise to a "bath"induced relaxation over a timescale τ_{T}. Note that some prior implementations in other software packages may have synchronized the application of these velocity resets. This is not the case in CAMPARI where each degree of freedom is treated independently (whichever those may be → CARTINT). Much like in Langevin dynamics, a concern here can be the artificial loss of velocity correlations between multiple particles which may slow down largescale dynamics.  Extended ensemble methods:
Methods such as those by NoséHoover, MartynaTobiasKlein, or Stern are currently not supported, but may be in the future. They often show poor relaxation behavior due to coupled oscillations in particular in the NPT ensemble, which they are most useful for.  Bussi et al. scheme (reference):
This thermostat can be thought of as a hybrid approach of the NoséHoover and Berendsen thermostats. It preserves the exponential relaxation kinetics of the weakcoupling scheme if the ensemble target is far away but introduces fluctuations to the kinetic energy such that at equilibrium the global rescaling does not quench fluctuations. The implementation is that of evolving the kinetic energy via an auxiliary stochastic dynamics much like the Langevin piston for pressure coupling does. Here:
f_{v,i}^{2} = e^{δt/τT} + f_{T,i}(1  e^{δt/τT}) (R_{1}^{2} + R_{Γ,Nf,i1}) + 2e^{0.5δt/τT}·R_{1}[ f_{T,i}(1  e^{δt/τT}) ]^{0.5}
With:
f_{T,i} = N_{f,i}^{1}·(T_{target}/T_{i})
Here, N_{f} is the number of degrees of freedom in the respective coupling group, R_{1} is a normal random number with mean of zero and unity variance, and R_{Γ,Nf,i1} is a random number drawn from the gamma distribution with outside scale factor of 2.0 and shape of (N_{f,i}1)/2.
TSTAT_TAU
If the simulations is performed in the NVT ensemble and if Newtonian dynamics are used, this keyword allows the user to set the key parameter of the employed thermostat, i.e. its coupling (decay) time, τ_{T}, in units of ps (the default is 1.0ps). Note that it is really the ratio of the time step δ_{t} (see TIMESTEP) and this number that matters, hence TSTAT_TAU cannot be less than the integration time step.TSTAT_FILE
If the simulations is performed in the NVT ensemble and if Newtonian dynamics are used, this keyword sets name and location of an optional input file for defining thermostat coupling groups. These are meaningful only if the Berendsen weakcoupling or the Bussi et al. scheme is used (options 1 or 4 for TSTAT). For details, the user is referred to the description of the input file itself.SYSFRZ
This keyword controls the removal of net drift artifacts in dynamics runs (which are primarily relevant for fully ballistic MD). Predominantly in periodic boundary conditions (see BOUNDARY), it can happen that all kinetic energy is transferred into global translations or rotations of the system. This collective "degree of freedom" is typically frictionfree and therefore represents a stable trap for the system's kinetic energy to accumulate in. Such behavior will give rise to grossly misleading results (the effective ensemble sampled has a much lower temperature). This can be avoided by periodically removing such global motions. For translational displacements, this is easy, but for rotational motion problems arise if subensembles have access to modes that are quasi frictionfree themselves. This is often the case in mixed rigidbody/torsional dynamics and at the moment not dealt with properly.Choices are as follows:
 No removal of global motions is performed (the safest setting for most applications).
 CAMPARI will attempt to only remove translational motion of the system.
 CAMPARI will attempt to remove both global translation and global rotation (this option should be used with caution).
TMD_INTEGRATOR
If a simulation is performed in mixed torsional/rigidbody space that contains a Newtonian dynamics portion, then this keyword allows the user to choose between (currently) two basic integrator variants. All integrators are derived from the following discrete scheme that relies on the aforementioned assumptions, i.e., a diagonal mass matrix (equations of motion formally decoupled) and the accuracy/correctness of the total kinetic energy expressed in terms of this diagonal mass matrix. Then, we can define pseudosymplectic conditions as shown below for a (rotational) degree of freedom with index k:I_{k}(t_{2})ω_{k}(t_{2})^{2}  I_{k}(t_{1})ω_{k}(t_{1})^{2}  δt [ω_{k}(t_{1}) + ω_{k}(t_{2}) ] F_{k}(t_{1.5}) = 0
Here, δt is the integration time step, I_{k} denotes the diagonal element of the mass matrix for the kth degree of freedom (function of time), ω_{k} is the associated angular velocity, and F_{k} denotes the deterministic force projected onto this degree of freedom (torque). The projection yielding the torques and the mass matrix elements are computed with recursive schemes, i.e., they operate in linear time with the number of atoms in the molecule (more or less irrespective of how many rotatable bonds there are). The above scheme defines a quadratic equation that has a maximum of two solutions for ω_{k}(t_{2}) (formula omitted). The correct one must be picked (which may be difficult), and an alternative must be defined if no solutions are possible. For both purposes, we use a welldefined approximation to the full solution that yields:
ω_{k}(t_{2}) ≈ [I_{k}(t_{1})/I_{k}(t_{2})]^{1/2} ω_{k}(t_{1}) + δt F_{k}(t_{1.5})/I_{k}(t_{2})
This solution is always available and can be used to pick the correct solution amongst two alternatives for the full quadratic equation (simply as the closer one). The setting for TMD_INTEGRATOR determines whether the correct solution to the quadratic equation should be used whenever possible (option 1), or whether the approximation is used exclusively (option 2, which is the default for historical reasons). As written, the equations still contain the problem that they require knowledge of I_{k}(t_{2}), whereas only the halfstep mass matrix elements (which are structural quantities) are available in a typical leapfrog scheme. If the I_{k} are slowly varying functions in time, a simple approximation solving this problem is to allow a lag of half a time step:
ω_{k}(t_{2}) ≈ [I_{k}(t_{0.5})/I_{k}(t_{1.5})]^{1/2} ω_{k}(t_{1}) + δt F_{k}(t_{1.5})/I_{k}(t_{1.5})
This is again written for the approximative version (setting 2). The resultant leapfrog integrator is extremely simple and efficient, and it is obtained by setting the related keyword TMD_INT2UP to 0. However, at each integration time step, we can also take a halfstep guess using a similar approximation to obtain a value for all the I_{k}(t_{2}). This done by explicitly perturbing the coordinates and recomputing just the mass matrix elements (little additional cost for all but tiny or trivial systems). With the values obtained, we can integrate the second equation above as written (this is obtained by setting TMD_INT2UP to 1). While theoretically more accurate, this variant can be noisy due to the extrapolation of the masses. In practice, for systems with very small and quickly varying I_{k} (such as rigid water molecules), performance is similar for all four pairings (TMD_INTEGRATOR 1 or 2, TMD_INT2UP 0 or 1), and reveals that additional corrections are recommended if the rate of change of the I_{k} is high (see below). Conversely, if the rate of change is negligible, all possible settings obtainable by combinations of the two keywords mentioned here relax to the exact same integrator (standard leapfrog in rotational space). This covers the special case of linear (translational) degrees of freedom, which have constant mass.
Note that this keyword is currently irrelevant for stochastic dynamics (always uses a derivation analogous to the last equation above), but that it is relevant for the stochastic minimizer.
TMD_INT2UP
If a simulation is performed in mixed torsional/rigidbody space that contains a Newtonian dynamics portion, then this keywords allows the user to control the number of incremental velocity update steps used to improve integrator stability for cases with quickly varying elements of the mass matrix (see above). The cases of 0 and 1 have already been covered in the documentation on TMD_INTEGRATOR. The remaining options assume that values for the diagonal elements of the mass matrix at times t_{1}, t_{1.5}, and t_{2} are available explicitly (as in: computed directly from coordinates) when trying to compute the updated angular velocity for a degree of freedom at time t_{2}. Rather than solving the velocity update step in one step, the interval from t_{1} to t_{2} is instead divided into TMD_INT2UP subintervals, and the velocity is updated incrementally for each subinterval. If TMD_INT2UP is larger than 2, additional values are obtained by linearly interpolating between explicit values at the three times. This is why it is recommended to set this keyword to multiples of 2, and this is also why the added benefit becomes successively smaller. A recommended value is 4. Note that this only matters for velocity updates, and that the torque is assumed constant over the entire interval (F_{k}(t_{1.5}) above). As a result, this option does not notably alter speed for a system of appreciable size and is not at all equivalent to a change in integration time step.TMD_UNKMODE
If a simulation is performed in mixed torsional/rigidbody space with a gradientbased sampler (including minimization), then this keyword controls default constraints operating on certain rotatable dihedral angles. As described for sequence input, there is a selection of "native" CAMPARI torsional degrees of freedom that does not include every rotatable dihedral angle in natively supported residues, and for obvious reasons does not include any degrees of freedom within unsupported residues. This keyword therefore controls how to deal with these two categories of additional degrees of freedom. Options are as follows: Only native CAMPARI degrees of freedom are sampled. This will leave any unsupported residues and molecules completely rigid.
 In addition to native CAMPARI degrees of freedom, all identified degrees of freedom in unsupported residues and molecules will be sampled.
 In addition to native CAMPARI degrees of freedom, all torsional degrees of freedom in natively supported residues, which are frozen by default, are sampled. This will leave any unsupported residues and molecules completely rigid.
 All aforementioned classes of degrees of freedom are sampled.
SHAKESET
It is standard practice in molecular dynamics simulations in Cartesian space to employ holonomic constraints such that the system evolves according to Gauss's principle of least constraint. The reader is referred to the literature as to what exactly constitutes a timereversible, symplectic integrator if holonomic constraints are enforced. In general, it will possible to formulate an algorithm which at least is driftfree, has some target precision for the constraints, and is approximately symplectic when the microcanonical ensemble is in use.The idea behind holonomic constraints in molecular dynamics is to eliminate fast vibrational modes in the system to allow for a larger integration time step to be used. This keyword allows the user different choices for which holonomic constraints to employ as follows:
 No holonomic constraints are used.
 All "native" bonds to terminal atoms with a mass of less than 3.5 a.m.u. are constrained in length. A terminal atom is defined as any atom bound to exactly one other atom. "Native" means that only bonds consistent with the assumed molecular topology (codeinternal) are considered. This selection will usually constrain all bonds to hydrogen atoms.
 All "native" bonds of any type are constrained in length. This does include bonds formed by virtue of chemical crosslinks.
 All "native" bonds of any type are constrained in length as in mode 3. In addition, several bond angles are constrained explicitly. For a molecule free of rings of size 6 or less all bond angles are constrained (this also constrains improper dihedral angles at trigonal centers). For molecules with rings of size 6 or less, ringinternal bond angles are generally omitted. Note that more bond angles can be formulated at a tetrahedral site than constraints are needed, and that  systemdependent  redundant constraints may be created (which may be harmful). This option is only supported for the standard SHAKE constraint algorithm at the moment.
 This is nearly identical to option 4. However, bond angles are constrained by additional distance constraints rather than explicitly. This means this option is theoretically available for constraint algorithms other than SHAKE.
 An input file is read and used to derive the list of constraints. Note that it is possible to derive intra and intermolecular longdistance constraints that way (geometric information will be taken from starting structure), but that those will very easily cause CAMPARI to crash.
The cost, accuracy, and applicability of constraint algorithms all scale poorly with the level of coupling. Options 4 and 5 from the list above will therefore be usable only in special cases (→ SHAKEMETHOD) such as systems without any rings or planar, trigonal centers. For specific applications using angle constraints, we strongly recommend defining a minimum set of distancebased constraints via option 6 above. This has the best chance to succeed.
SHAKEFILE
If SHAKESET is set to 6, this keyword specifies the name and location of the file defining userselected holonomic constraints to be enforced during the simulation. Its format and requirements are documented elsewhere.SETTLEH2O
This keyword allows the user to append/modify the constraint set selected via SHAKESET to replace all preexisting constraints acting on three, four, or fivesite water molecules (SPC, TIP3P, TIP4P, TIP4PEw, or TIP5P) with constraints that completely rigidify each water molecule. It acts as a simple logical and is turned on by default since CAMPARI as of now does not support explicitly any inherently flexible water models. This means that a setting of 2 or 3 SHAKESET for a calculation in explicit water will still constrain waters to be rigid, and therefore correspond to a standard (and  for the supported water models  correct) simulation setup. Specifying this keyword and setting it to anything else but 1 will disable this override. Note that for water models possessing virtual sites (all four and fivesite models), it is assumed that the extra sites have no mass (see below). If this is not the case, the use of the analytical SETTLE algorithm for water is no longer possible, and the more complex set of constraints may no longer be solved efficiently (or may no longer be solved at all).SHAKEMETHOD
This keyword allows the user to choose which of the currently implemented algorithms CAMPARI should use to enforce the chosen set of holonomic constraints during a molecular dynamics simulation in Cartesian space. Options are as follows: The standard, iterative SHAKE procedure is used. Coupled constraints are solved iteratively by assuming independence and linearity (Newton's method). SHAKE may converge in very few steps to good accuracy if the coupling is weak (coupling matrix is sparse). This is the only method that currently supports explicit constraints on bond angles (see SHAKESET). Due to the use of Newton's method, SHAKE is not guaranteed to converge if the underlying "landscape" is nonlinear due to the introduction of coupling between constraints. Then convergence is only guaranteed within a small enough environment around the actual solution. Therefore, SHAKE places an upper limit on the time step that can be used even though it is meant to allow increases of precisely that time step. Nonetheless, in canonical applications (bond length constraints only), SHAKE will be a reasonably efficient solution. The main weakness of SHAKE and related algorithms is their inherent inability to enforce planarity at a given site. This is because at a planar site all bond vectors which form the basis set for the application of iterative corrections are part of the same plane, i.e., it is impossible to correct an outofplane motion using those vectors. Depending on the exact set of constraints used, SHAKE will require many steps, not converge at all or converge with limited accuracy, and occasionally crash if bond length and angle constraints at a site deem it to be perfectly planar.
 A mix of the SHAKE and PSHAKE (see below) algorithms is used in which PSHAKE is applied only to those constraint groups which are internally entirely rigid.
 The socalled PSHAKE (preconditioned SHAKE) procedure is used. In PSHAKE, SHAKE is augmented by introducing a preconditioning step which changes the convergence rate from linear to quadratic. The preconditioning step is a matrix multiplication essentially forming linear combinations from the bond vectors in the constraint vectors. Corrections employed along those new directions minimize the linear error by decoupling the constraints (within the bounds of a linear theory → hence the quadratic and not instantaneous convergence). Unfortunately, this method currently is implemented either inefficiently or incorrectly and does not usually offer a discernible improvement. However, it is also fundamentally flawed as large constraint groups are handled inefficiently due to the requirement of a full matrix multiplication that is needed to increment the coordinates at each iteration step. This operation in PSHAKE has a cost of 3·n_{p}·n_{c} and of only 6·n_{c} in standard SHAKE. In addition, the matrix used to precondition the procedure, has to be recalculated frequently if a molecule undergoes significant conformational changes (currently hardcoded to 100 integration steps). PSHAKE is therefore suitable only for enforcing holonomic constraints in small rigid or quasirigid molecules that can be solved by SHAKE as well. Just like SHAKE, it fails badly for planar sites (see above). In such a case, CAMPARI may crash without any indicative messages due to failures in the LAPACK routines used by the PSHAKE algorithm (see installation).
 The LINCS method is used. LINCS is a linear constraint solver that uses a projection approach. In the end, a matrix equation needs to be solved which requires the inversion of a matrix related to the coupling matrix of the constraints in the group. This is the critical step and grossly ineffective as a general procedure. For sparse matrices, however, the inversion can be performed approximately by a series expansion. It is the order of this expansion and its applicability that will determine the success and accuracy of LINCS. LINCS is generally inapplicable to anything involving bond angle constraints, in particular in allatom representation. It will work well for loosely coupled groups of constraints. Since the accuracy depends on the unknown convergence properties of an infinite sum, the accuracy of LINCS cannot be tuned to yield a specific tolerance for satisfying the constraints.
There is an additional issue that arises when virtual sites (technically atoms with no mass) are used, for example in rigid water models like TIP4P. Such sites have to be circumvented by the integration scheme (displacement is dependent on inverse mass), and therefore they have to be exactly constrained with respect to the positions of atoms with finite mass. These constraints can not be solved within the standard framework (also dependent on inverse mass). Instead, the least constraint solution is obtained by simply rebuilding the positions of these sites with fixed internal geometry. For this to yield a correct integrator, however, the forces acting on the sites need to be remapped to the atoms they are connected to. This is done by decomposing the Cartesian force acting on the site into internal forces, for which compensating terms are added to all the atoms comprising the respective internal degree of freedom. This cancels exactly the net force on the site, and makes integration symplectic. Virtual sites cannot occur in constraint groups that are handled by a method other than standard SHAKE or SETTLE.
SHAKETOL
If SHAKE or PSHAKE are in use (→ SHAKEMETHOD), this keyword allows the user to set the target tolerance for satisfying distance constraints. The tolerance is relative to the target value of the constraint. As soon as the maximum deviation is less than this value, the iteration stops unless it is terminated earlier for other reasons (→ SHAKEMAXITER).If LINCS is in use, this keyword still has meaning even though the tolerance can not be set explicitly. Should CAMPARI find that LINCS with the given settings satisfies the constraints significantly worse than defined by this keyword, it will adjust the open parameter of the method (→ LINCSORDER) in an attempt to remedy this situation. Similarly, should the opposite occur (LINCS satisfies constraints significantly more accurately than the desired tolerance), the parameter will be adjust in the opposite direction. All this happens within sane bounds.
SHAKEATOL
If SHAKE (→ SHAKEMETHOD) is in use with explicit bond angle constraints (→ SHAKESET), this keyword allows the user to set the target tolerance for satisfying angular constraints. The tolerance is absolute and applies to the unitless cosine of the respective angle. As soon as both maximum deviations drop below the threshold tolerances (see also SHAKETOL) the iteration stops unless it is terminated earlier for other reasons (→ SHAKEMAXITER).SHAKEMAXITER
If SHAKE or PSHAKE are in use (→ SHAKEMETHOD), this keyword allows the user to alter the maximum number of iterations permissible to the algorithm. Since poor convergence properties are generally indicative of a more fundamental problem, increasing the value for SHAKEMAXITER will rarely be useful. After exceeding this many steps, the algorithm will simply continue with its current solution meaning that  for a good case  constraints will be violated slightly more than specified by SHAKETOL and eventually SHAKEATOL. Note that CAMPARI will then adjust the constraint targets in an attempt to rescue a simulation otherwise doomed. This may not always work and also lead to unwanted drift. Appropriate warnings are provided.LINCSORDER
If LINCS is in use (→ SHAKEMETHOD), this keyword allows the user to define the initial expansion order for the approximate matrix inversion technique. As mentioned above, the convergence properties of this approximation are not really known and prevent LINCS from satisfying an exact tolerance explicitly. Should CAMPARI find that constraints are satisfied significantly better or worse than what is provided through SHAKETOL, the expansion order will be adjusted automatically. This is to prevent unnecessarily inefficient or unnecessarily inaccurate maintenance of constraints.MINI_MODE
If a minimization run is performed, this keyword lets the user select the method of choice. CAMPARI currently supports three canonical and one nonstandard minimizer. All minimizers can operate either in mixed rigidbody/torsional space, i.e., the "native" CAMPARI degrees of freedom or in Cartesian space; → CARTINT. However, there are algorithmic restrictions that the canonical minimizers (options 13 below) only support trivial constraints (see FMCSC_FRZFILE), which is an issue in Cartesian space (rigid water models, etc).Let us define γ as a vector of base increment sizes suitable for each of the degrees of freedom (partitioned into three classes: rigidbody translation, rigidbody rotation, and dihedral angles; keywords MINI_XYZ_STEPSIZE, MINI_ROT_STEPSIZE, and MINI_INT_STEPSIZE are used to specify each element γ_{i}). Also, let f_{m} be an outside scaling factor in units of mol/kcal set by keyword MINI_STEPSIZE. Lastly, we introduce a unitless dynamic step length factor λ. If we now denote the heterogeneous vector of phase space coordinates as x, and the Hamiltonian is written as U(x), then we can write how the system is evolved through either one of four different protocols as follows:
 Steepestdescent:
x_{i+1} = x_{i}  λ·f_{m}γ•∇U(x_{i})
Here, "•" denotes the Hadamard (Schur) product, i.e., simply the elementbyelement multiplication. Should the new conformation have overstepped in the direction of steepest descent, λ is iteratively reduced by a constant factor until a valid step is found (lower energy). In case of successful steps, λ is iteratively increased to improve the efficiency of the procedure if the underlying landscape is relatively smooth and flat. Successful steps are used as well to construct an appropriate guess for the initial step size should a complete reset be necessary. This mimics a line search.  Conjugategradient:
x_{i+1} = x_{i}  λ·f_{m} [ γ•∇U(x_{i}) + f_{CG,i}d_{i1} ]
f_{CG,i} = [ ∇U(x_{i})·∇U(x_{i}) ] / [ ∇U(x_{i1})·∇U(x_{i1}) ]
d_{i1} = γ•∇U(x_{i1}) + f_{CG,i1}d_{i2}
This conjugategradient method follows the PolakRibiere scheme and augments the steepestdescent prediction by an additional term that is estimated according to the suggestion by Fletcher and Reeves. Much like in steepestdescent, should the new conformation have overstepped, λ is iteratively reduced by a constant factor until a valid step is found (lower energy). In case of successful steps, λ is iteratively increased analogously to what is described above.  Memoryefficient BroydenFletcherGoldfarbShanno method (LBFGS)
according to Nocedal (reference):
x_{i+1} = x_{i}  λ· [ H^{1}·(γ•∇U(x_{i})) ]
This quasiNewton approach technically employs the inverse of the Hessian which is typically unknown. However, the LBFGS method constructs a numerical estimate directly for the matrix product H^{1}·(γ•∇U(x_{i})) from the recent history of the minimization process. This widely used recursive twoloop scheme has the advantage of i) only requiring very few floating point operations, and ii) not requiring a running guess for the complete Hessian (inverse or not) due to the recursive formulation. Note that the inverse Hessian in our implementation is constructed from γ•∇U(x_{i}), i.e. has units of mol/kcal throughout, irrespective of which degree of freedom is considered. This means that the factor f_{m} does not show up in the LBFGS equation except for the first step (initially or after a reset) when the steepestdescent approximation is used (see mode 1). The usage of (estimated) second derivative information should generally help inform the minimizer of more useful directions to pursue but step size limitations and inadequate guesses of the Hessian may render this potential benefit ineffectual. The reader is referred to the literature for further details.
 Thermal noise
quasistochastic (akin to simulated, thermal annealing):
This minimizer couples the system to a variable temperature bath. By changing the coupling parameters, the degrees of freedom are successively brought to a state consistent with a very low temperature ensemble. A similar quench in conditions is used in simulated annealing, a general solution strategy for optimization problems.
Initially, the system uses a heat bath as defined by the settings for TSTAT and TEMP. The system is then evolved using NVT molecular dynamics in either mixed rigidbody/torsional space or Cartesian space. Depending on initial conditions, this may heat up the system to a variable extent, and the maximum temperature is recorded. After a prescribed fraction of the total simulation steps, the target temperature is successively lowered to the value specified by keyword MINI_SC_TBATH. This interpolation uses a Gaussian function on the normalized time axis such that all interpolation curves can be rescaled in temperature to exactly coincide. Simultaneously, the algorithm measures the rate in change of temperature from the recorded maximum toward MINI_SC_TBATH. If the actual rate appears too slow or too fast, the time constant, τ_{T}, of the thermostat in use (→ TSTAT_TAU) is successively altered so as to achieve a cooling down of the system to a negligible temperature within the remaining number of available iterations. These alterations happen within bounds of 10 times the integration time step on the low end and the original setting for TSTAT_TAU on the high end.
This minimization approach employs two convergence criteria as soon as the number of steps specified via MINI_SC_HEAT has passed. During the cooling schedule, the procedure will stop either because the RMSgradient fell below the threshold (→ MINI_GRMS) or because the target temperature (MINI_SC_TBATH) was reached which  per se  does not provide information on the local gradient. Of course, it may be possible to minimize such a structure further using a canonical approach. Both temperature and RMSgradient are written to logoutput to allow for easy inspection whether the parameters are set reasonably well. As an additional note it must be pointed out that  much like in standard molecular dynamics  runs starting from very unfavorable structures will cause large accelerations which may lead to a catastrophic blowup of the system. This behavior can be avoided by performing a number of steepest descent minimization moves upfront. This number is set by keyword MINI_SC_SDSTEPS.
In general a minimization run will terminate after either the maximum number of iterations has passed (see NRSTEPS) or after convergence is achieved (see MINI_GRMS). Note that bad combinations of the various step sizes and the convergence criterion can easily lead to nonterminating runs even if convergence is achieved de facto.
MINI_STEPSIZE
If a canonical minimization run is performed, this keyword acts as a scale factor applied to all conformational increments applied during minimization. It therefore sets the global step size and corresponds to factor f_{m} in the equations above. It  for technical reasons  has units of mol/kcal to eliminate the energy units of the normalized gradients. There are no canonical rules one can formulate but values significantly less than unity will typically be most appropriate to avoid that the algorithm frequently oversteps in a subset of the degrees of freedom and then has to iteratively reduce the step size. However, step size management is dynamic (consult factor λ introduced in the equations for minimization modes 12(3) above). This means that the impact this keyword has may be less than what one would generally expect.MINI_GRMS
If a minimization run is performed, this keyword allows the user to set the convergence criterion in units of kcal/mol. Since minimization runs can occur in torsional and rigidbody space, the "raw" gradient over all degrees of freedom is unsuitable. CAMPARI utilizes a simple workaround by normalizing all gradients by a basic step size for the respective types of degrees of freedom (see keywords MINI_XYZ_STEPSIZE, MINI_ROT_STEPSIZE, and MINI_INT_STEPSIZE). The resultant, normalized gradient is used to obtain its root mean square (→ GRMS) which is compared to the convergence criterion provided here. Since the normalized gradients assume a default step size, this parameter becomes dependent on them. For unit values for all three base step sizes, values around 10^{2} are recommended. Conversely, in Cartesian space, only MINI_XYZ_STEPSIZE is relevant for the gradient criterion.MINI_XYZ_STEPSIZE
If a minimization run is performed, this keyword determines a basic step size to be considered for all rigidbody translations of molecules and for all Cartesian displacements of atoms. This value is to be provided in units of Å. Note that this keyword determines the effective initial translation step size in conjunction with MINI_STEPSIZE and that it is mostly needed to be able to handle the different units occurring when minimizing in mixed rigidbody and torsional space. All translational gradients are normalized by this number such that numerical estimates of the Hessian (→ BFGS) or even a meaningful root mean square can be written (→ MINI_GRMS). Note that for simulations in (effective) Cartesian space, it would be possible to combine this parameter with MINI_STEPSIZE to a single step size parameter.MINI_ROT_STEPSIZE
If a minimization run in mixed rigidbody and torsional space is performed, this keyword determines a basic step size to be considered for all rigidbody rotations. This value is to be provided in units of degrees (compare MINI_XYZ_STEPSIZE).MINI_INT_STEPSIZE
If a minimization run in mixed rigidbody and torsional space is performed, this keyword determines a basic step size to be considered for all dihedral angles. This value is to be provided in units of degrees (compare MINI_XYZ_STEPSIZE).MINI_UPTOL
If a minimization run is performed, and if the BFGS method is used, this keyword lets the user choose a tolerance criterion in kcal/mol for accepting uphill steps. At most ten or MINI_MEMORY (whichever one is smaller) such steps will be tolerated until a reset of the estimate of the Hessian occurs. This reset will reorient the (multidimensional) direction back onto a steepest descent path and the procedure can start anew. This feature is included since the curvaturebased estimate of the direction in the BFGS method does not always guarantee a downhill direction (i.e., the energy resultant upon a perturbation in such a direction is larger than the current one for all steps within a finite interval (including arbitrarily small ones → this is a different problem from "overstepping" for which step size reductions are employed).MINI_MEMORY
If a minimization run is performed, and if the BFGS method is used, this keyword lets the user choose the memory length for the running estimate of the Hessian. Since the system will evolve throughout the minimization, the estimate of the Hessian is of course a moving target and it will only be useful to include points from the immediate vicinity in its numerical, gradientbased estimate. This keyword simply gives the (integer) number of immediately preceding steps to consider. Note that very long values will typically be irrelevant since the BFGS procedure will  in rough landscapes  frequently propose an illfated (uphill) direction (see MINI_UPTOL for comparison). Such moves will eventually lead to a reset of the estimate of the Hessian which includes "forgetting" all the memory. Hence, the effective usable memory length will be limited by the system as well. Note that the resets are necessary for the BFGS method to find any minima.MINI_SC_SDSTEPS
If a stochastic minimization run is performed, this keyword allows the user to request the program to first run the specified number of steps as canonical steepestdescent (SD) minimization. These SD moves will follow the same parameter settings as described above and are completely independent of the stochastic steps. Note that these steps are always skipped if the settings request the use of holonomic constraints when minimizing in Cartesian space.MINI_SC_HEAT
If a stochastic minimization run is performed, this keyword specifies the fraction of the total number of steps (NRSTEPS) that are going to be used to perform NVT dynamics at the usersupplied initial temperature and thermostat settings. Generally, for an efficient annealing protocol, it is probably advisable to combine a large value for this keyword with a high enough temperature and/or a comparatively large value for the thermostat's time constant, τ_{T}, such that NVE dynamics are mimicked over short periods of time (this will lead to heating in itself). Conversely, for straight minimization, it will be more appropriate to supply small values in conjunction with tight thermostat settings and low initial temperature.MINI_SC_TBATH
If a stochastic minimization run is performed, this keyword lets the user specify the target temperature of the bath the system will be coupled to at the very end of the run. From the simulation step defined by MINI_SC_HEAT onward, the target temperature is interpolated between TEMP and MINI_SC_HEAT using a Gaussian function operating on a normalized time axis. For the protocol to work as intended, it will not be useful to specify anything but values close to (but not exactly) zero here.Move Set Controls (MC):
(back to top)
Preamble (this is not a keyword)
A Monte Carlo simulation is a series of biased or unbiased random perturbation attempts to the system, in which some moves will be accepted (the Markov chain transitions to a new microstate) and the others rejected (the Markov chain remains in place) dependent on some criterion. This acceptance criterion is designed to sample a specific distribution, and the most common example is the Metropolis criterion designed to produce Boltzmanndistributed ensembles.The type of random perturbation attempts possible constitute the move set, and the resultant microstate transitions are usually very different from those observed in molecular dynamics (MD). In dynamics, all unconstrained degrees of freedom evolve simultaneously (high correlation), but in small increments (low effective step size). In Monte Carlo, one or few degrees of freedom evolve at a given time, but in step sizes of varying amplitudes. It is not required that individual degrees of freedom are all sampled with equal weight (nor would it be clear how to establish this). The effective sampling weight is determined by three components:
 The overall picking frequencies for move types (e.g., OTHERFREQ) are implemented by CAMPARI through a binary decision tree invoked at each step of the MC simulation. This means that the decisions taken at the root will influence the actual number of attempted moves of types chosen further up the tree, and that it may be complicated to calculate the expected numbers of attempts for those moves. This is why formulas are provided. Some totals (attempted and accepted moves) are reported in the log output at the end.
 The organizational unit for a move is often a residue, but not all residue may possess equal numbers of degrees of freedom. For instance, sidechain moves have a variable number of degrees of freedom they sample (→ NRCHI), but the actual numbers per degrees of freedom will not be uniformly distributed since different residues may have different numbers of χangles.
 Sampling weights can be adjusted explicitly with the help of the preferential sampling utility.
PARTICLEFLUCFREQ
This keyword is relevant only when ENSEMBLE is set to either 5 or 6, i.e., those ensembles which allow numbers of particles to fluctuate. In this case, the keyword defines the fraction of all moves that attempt to sample the particle number dimension of the thermodynamic state of the system. For the semigrand ensemble, this corresponds to attempting to transmute one particle type into another while preserving the position of the target particle. For the grand ensemble, it will with 50% probability try to insert a particle of permissible type in a random location in the simulation container and with 50% probability attempt to delete a permissible particle. These moves are applied at the molecule level and most closely related to rigidbody moves in terms of complexity (→ RIGIDFREQ).Technically, the GC ensemble is supported in CAMPARI by maintaining a set of ghost particles for each fluctuating type which work as "standins". This framework entails certain limitations which are detailed elsewhere.
Expected numbers of such moves overall are calculated trivially as:
NRSTEPS · PARTICLEFLUCFREQ
Note that the default picking probabilities are such that every molecule type allowed to fluctuate in numbers receives equal weight. In case of particle permutation moves, which are implemented as joint insertion/deletion, there is no way to adjust these. This is because the implementation mandates the molecule types to be different, which would require additional corrections in the acceptance probability, which would cancel out the preferential sampling weights. For independent insertion and deletion available in the grand ensemble, the preferential sampling utility allows the user to at least adjust the picking probabilities on a pertype basis. This can be relevant for example in electrolyte mixtures with disparate target concentrations (and correspondingly disparate bath particle numbers), for which it would make sense to preferentially insert and delete those particle types with overall larger numbers. Such an adjustment would also bring the sampling weights in line with the default picking probabilities for rigidbody moves, which are flat on a permolecule basis.
RIGIDFREQ
This keyword specifies what fraction of all remaining moves (i.e., 1.0  PARTICLEFLUCFREQ) is to perturb rigidbody degrees of freedom. This encompasses translations and rotations of individual molecules as well as of groups of molecules (the latter are only available in case rotation and translation are coupled → COUPLERIGID). The default picking probabilities are even for all molecules regardless of type, size, or other properties. They can be adjusted via the preferential sampling utility, and this may be relevant in dense or semidilute systems with different molecule types of vastly different size (e.g., proteins and inorganic ions). In such a case, the acceptance rates for the macromolecules will be noticeably smaller, and this could be compensated for by sampling them preferentially.COUPLERIGID
This keyword is a simple logical deciding whether or not to couple translational and rotational rigidbody moves for single molecules. Like any type of move coupling, this means that up to six independent perturbations of individual degrees of freedom are employed (translation in x,y,z, rotation around three axes) before energies and the acceptance criterion are evaluated. Note that molecules with no rotational degrees of freedom will have their moves counted as pure translation moves in the logoutput.ROTFREQ
This keyword can be used to set the subfrequency for purely rotational moves if uncoupled moves are used (→ COUPLERIGID is false). It will then determine the fraction of those rotational moves. Total number:NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · ROTFREQ.
And the total number of purely translational moves will be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0ROTFREQ)
Note that the above formulas do not account for the choice between randomizing and stepwise perturbations (→ RIGIDRDFREQ), which would introduce an additional factor into the above product.
RIGIDRDFREQ
This keyword sets a terminal choice in the selection tree that is common to many of the moves in CAMPARI (see similar keywords PIVOTRDFREQ, NUCRDFREQ, and so on). Amongst the available rigidbody moves (it applies to three separate branches: coupled singlemolecule moves, coupled multiplemolecule moves, and decoupled singlemolecule moves), the keyword chooses the fraction to completely randomize the underlying degrees of freedom. For example, the complete randomization of translational degrees of freedom would displace the molecule's reference center to an arbitrary point in the simulation container. The remaining fraction will correspond to stepwise perturbations in which a usually small random increment is added to the degrees of freedom in question. For example, such a move would displace a molecules reference center by a random vector small in absolute magnitude.As an example consider singlemolecule translation moves. The total number of expected randomizing translation moves would be (assuming COUPLERIGID is false):
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0ROTFREQ) · RIGIDRDFREQ
And the number of stepwise translation moves would be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0ROTFREQ) · (1.0RIGIDRDFREQ)
The same modifications apply to any other branch of rigidbody move as explained above. As an additional complication, the decision about randomization vs. stepwise perturbations is decoupled itself in coupled rigidbody moves. Also note that the log output does not distinguish between the stepwise and randomizing varieties for any move type.
ROTSTEPSZ
For any stepwise perturbation of rotational rigidbody degrees of freedom, this keywords sets the maximum step size in degrees. It is implemented such that the actual step size is drawn with uniform probability from an interval from 0.0° to ROTSTEPSZ°.TRANSSTEPSZ
For any stepwise perturbation of translational rigidbody degrees of freedom, this keywords sets the maximum step size in Å. Analogous to ROTSTEPSZ, it is implemented such that the actual step size is drawn with uniform probability from an interval from 0.0 to TRANSSTEPSZ Å.CLURBFREQ
This keywords sets the fraction of all available coupled rigidbody moves to simultaneously perturb the rigidbody degrees of freedom of more than one molecule in concerted fashion. In other words, these moves allow the concerted translation (by the same vector) and rotation (around the "cluster" centerofmass) of several molecules in one shot.The expected number of multimolecule moves would be (assuming COUPLERIGID is true):
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · CLURBFREQ
And that of coupled singlemolecule moves would be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0CLURBFREQ)
Currently, the picking of the molecules in a "cluster" is completely random. Note that cluster moves in can easily become tricky: In periodic boundary conditions, the nearest image and hence the internal structure of the cluster may actually change upon rotation of a cluster, whereas in droplet boundary conditions rotations and translations of clusters formed by distal molecules may incur significant boundary penalties and hence be inefficient overall. Like all other rigidbody moves, cluster moves can be stepwise or completely randomizing (still in concerted fashion). This is all regulated by the previously introduced keywords RIGIDRDFREQ, ROTSTEPSZ and TRANSSTEPSZ. The picking frequencies are regulated at the molecule level. With the preferential sampling utility, it is possible to alter the picking weights on a permolecule basis. Note that this should yield either zero or reasonably large weights for all molecules, because the weights combine in a product sense during the picking process. This also means that it is tedious to compute the expected sampling probabilities for all possible "clusters" of molecules of sizes 2 to the maximum value.
CLURBMAX
This keyword sets the maximum "cluster" size for concerted multimolecule rigidbody moves (see CLURBFREQ). The assignment is completely random at any given step such that detailed balance is maintained. Note that the number of possible "clusters" grows as binomial coefficients with increasing size of the cluster until CLURBMAX reaches half the number of molecules in the system. It is important to point out that picking values close to the number of molecules can cause search problems that CAMPARI actively avoids. Specifically, if the total sampling weight of available molecules remaining is less than 10%, a new molecule has not been found to add to the "cluster" in 100 tries, and the current size is at least 2, then the value picked initially for CLURBMAX is decreased to the current size. This is to avoid the code spending an excessive amount of time in an inefficient search procedure. The control on total sampling weight is particularly relevant for cases where the picking weights have been altered on account of the preferential sampling utility.ALIGN
This keyword is an integer indicating how to handle the fact that lever arm effects can be asymmetric in multimolecular simulations. A brief explanation is in order. Consider a macromolecule with multiple dihedral angles along the backbone. Then, a perturbation of an individual of those dihedral angles may be implemented in two basic implementations corresponding to two building directions of the (unbranched) main chain. Either one of the ends will swivel around (leverarm) while the other remains fixed in place. In a simulation with just a single molecule, the new conformations for either type will be identical except for an implied rotation of the reference frame. In a simulation with multiple molecules, however, the two conformations will be explicitly different since the other molecules define the now static reference frame. In general, moves with longer leverarms will have lower acceptance rates and are slower to evaluate and should generally be avoided. For MC, this affects polypeptide pivot moves (coupled and uncoupled (see COUPLE)), ωmoves (see OMEGAFREQ), Favrin et al. inexact CR moves (see CRMODE), pivottype nucleic acid moves (see NRNUC), sugar pucker moves (see SUGARFREQ), and polypeptide cyclic residue pucker moves (see PKRFREQ). It affects single torsion pivot moves (see OTHERFREQ) in a slightly different manner, and this is described there. It is also relevant for torsional dynamics for which it in similar vein determines the assumed building direction for the chains. Options are as follows: Always leave Nterminus unperturbed (Cterminus swings around).
 Always leave Cterminus unperturbed (Nterminus swings around). This is only recommended in special applications since the Cterminal alignment requires the whole molecule to be rotated around, which makes this mode more expensive but analogously asymmetric when compared to mode 1.
 Always leave the longer end unperturbed (shorter leverarm is chosen). This is the default (and a good) choice as it should be the most efficient one for simulations with multiple chains of significant length. It is also the recommended setting for torsional dynamics in which the kinetics at one of the termini will otherwise be artificially slowed (note that the criterion determining lever arm length uses number of atoms rotated rather than number of residues in dynamics).
 A stochastic modification of mode 3 only available in MC: The
probability with
which the longer end swivels around is equal to:
p_{lt} = (L_{st} + 1) / (L_{st} + L_{lt} + 2)
And conversely:
p_{st} = (L_{lt} + 1) / (L_{st} + L_{lt} + 2)
Here, L_{st} is the smaller number of residues beyond the pivot point towards the nearer terminus and L_{lt} is the larger number of residues beyond the pivot point towards the more distant terminus such that L_{st}+L_{lt}+1 yields the total number of residues in the molecule. For example, a molecule with six residues would yield probabilities for doing Cterminal alignment (the Nterminus swings around) of 6/7 for residue 1, 5/7 for residue for residue 2, and so on down to 1/7 for residue 6.
This choice represents the most flexible move set and should normally be preferred in MC when sampling problems are encountered.
COUPLE
If this keyword is set to 1 (logical true), all polypeptide pivot moves are coupled to sidechain moves on the same residue (→ PIVOTMODE). This means that new conformations for the φ and ψangles as well as for all of the sidechain χangles are proposed before the energy and acceptance criterion are evaluated. Like any other unbiased move perturbing multiple degrees of freedom, this procedure drastically increases the chance of generating an unacceptable conformation (assuming a typical excludedvolume interaction potential is used). Consequently, acceptance rates will be very low and it is generally not recommended to use this option. Note that it is still possible to use independent sidechain moves but that it is impossible to do independent pivot moves for residues with sidechains. In other words, all frequency settings are used as normal but all standard polypeptide pivot moves (the default move type of the decision tree) are coupled to a mandatory sidechain move (of all sidechain angles in that residue). Keywords PIVOTRDFREQ, PIVOTSTEPSZ, CHIRDFREQ, CHISTEPSZ are observed in the respective parts of the coupled moves while NRCHI and CHICYCLES are not.The expected number of those coupled moves would be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · (1.0OTHERFREQ)
Note that the same formula applies to uncoupled polypeptide pivot moves.
PIVOTMODE
Polypeptide pivot moves are historically the oldest move type in CAMPARI. Therefore, they are placed at the outermost branch of the move selection tree and possess no frequency selection keyword. In general, pivot moves simultaneously sample the φ and ψangles of a single polypeptide residue unless the residue is ringconstrained (such as proline or hydroxyproline) in which case only the unconstrained degree of freedom (ψ for proline) is sampled. See PKRFREQ for "pivot" moves which sample the φangles of proline and analogous residues. The default picking probabilities for polypeptide moves are even for all residues with peptide φ/ψangles. They can be adjusted with the help of the preferential sampling utility. An example where this can be useful is in reducing the picking weight of proline and similar residues, for which the number of degrees of freedom is smaller.Mostly for historical reasons, this keyword allows the selection of different modes for pivot moves as follows:
 Blind backbone sampling, i.e., all angles have equal likelihood (unbiased and the default)
 Using grids (requires GRIDDIR), i.e., angle pairs are sampled come from within an approximate envelope derived from the space available to the corresponding dipeptide if one assumes typical excluded volume interactions (biased).
PIVOTRDFREQ
Much like for other move types, CAMPARI allows the user to mix two types of polypeptide pivot moves: the first randomizing the φ and ψangles of the residue in question (for proline only the ψangle, for coupled moves also the sidechain χangles → COUPLE), the second perturbing them by a small increment whose size is set by the auxiliary keyword PIVOTSTEPZ. Note that randomizing moves may be extremely ineffective for the sampling of dense phases (collapsed states of macromolecules) and that the only accepted moves will be those realizing small displacements by chance.To calculate the expected number of randomizing and stepwise polypeptide pivot moves, the user may employ the formula listed under COUPLE and multiply it with PIVOTRDFREQ and 1.0PIVOTRDFREQ, respectively.
PIVOTSTEPSZ
This keyword sets the step size in degrees for local perturbation attempts to the φ and ψangles of polypeptide residues (see PIVOTRDFREQ). Note that this step size encompasses the entire symmetric interval around the original position, i.e. a value of 10° will attempt uniformly distributed random displacements within the interval of 5° to 5°.GRDWINDOW
This keyword sets a parameter determined by external input files which are used to assist conformation space sampling in biased fashion when PIVOTMODE is set to 2. Then, GRDWINDOW needs to specify half the bin size for the steric grids (see GRIDDIR). The files are supplied in the datadirectory and the default value to be used here would be 5.0°. Note that gridassisted sampling is not a fully supported option in CAMPARI and may be removed entirely in the future.OMEGAFREQ
In polypeptides, the dihedral angle along the actual peptide bond (ω) is different from the φ and ψbonds since the carbon and nitrogen atoms have partial sp^{2}character. This inhibits free rotation around the bond due to electronic effects and means that only a very narrow range of conformations is typically available to the ωangle. The two dominant states are the planar cis and transconformations with the latter being almost exclusively seen for nonproline residues and both contributing for proline. In molecular mechanics force fields, these effects are typically represented via strong torsional potentials (see SC_BONDED_T and SC_EXTRA). From a sampling point of view, this means that it would be unwise to couple the sampling of such a stiff degree of freedom to any other degree of freedom. ωmoves therefore perturb nothing but the ωangle of an individual polypeptide residue. They technically are equivalent to pivot moves in that the "free" end will swivel around lowering the acceptance rates additionally if the perturbations are large (→ ALIGN).To calculate the number of expected ωmoves use:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · OMEGAFREQ
Note that the moves are additionally split up into those attempting to completely randomize the ωangle and those that attempt stepwise perturbations (→ OMEGARDFREQ). It should be emphasized that the randomizing move will typically be the only way of converting between cis and transconformations due to the height of the barrier separating the two. The default picking probabilities are identical for all residues with ωtype bonds. They can be adjusted with the help of the preferential sampling utility, and such adjustment could be useful in mixed systems with small molecule amides and polypeptides, where it may be beneficial to preferentially sample the polypeptide ωbonds.
OMEGARDFREQ
This keyword is completely analogous to PIVOTRDFREQ but applies to ωmoves instead of φ/ψmoves.OMEGASTEPSZ
This keyword is completely analogous to PIVOTSTEPSZ but applies to ωmoves instead of φ/ψmoves.PKRFREQ
Of the fraction of all pivottype polypeptide backbone moves, what is the fraction of backbone moves to selectively alter the dihedral angles around the NC_{α} bond in proline or similar residues? These rotations are hindered by the presence of the ring and hence they cannot be sampled independently. Moves of this type therefore alter the pucker state of the amino acid sidechain belonging to the chosen residue and the backbone conformation of the polypeptide (pivottype move) simultaneously. These moves are analogous to sugar pucker moves for polynucleotides (see SUGARFREQ).The expected number of polypeptide pucker moves would be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · PKRFREQ
Note that these moves are split up into two variants  a nonergodic one which inverts the pucker state, and one which introduces new degrees of freedom, bond angles, but allows sampling of most of the relevant phase space (bond length changes remain quenched). This is determined by PKRRDFREQ. When analyzing highresolution structural databases, it can be seen that proline residues occupy two dominant pucker states separated by a barrier. The nonergodic move can jump across this barrier but is unable to explore the basin around its current position. The latter requires bond angle changes as otherwise the problem is overconstrained. This introduction of new degrees of freedom is generally undesirable (see discussion under ANGCRFREQ) but in this particular case of small impact since none of the bond angles along the main chain are allowed to change. This keeps the effects of bond angle changes local while allowing exploration of the continuous manifold of conformations of the fivemembered ring.
The exact set of degrees of freedom used to sample the ergodic move type is explained in detail elsewhere, and an implementation reference is given in the literature. The default picking probabilities for this move type are flat for all polypeptide residues possessing ring pucker degrees of freedom. The probabilities can be adjusted by the preferential sampling utility, and this could be used to finetune sampling weights in polymers. For example, puckering equilibria for central residues in polyproline are expected to be both more relevant and more difficult to sample than those for terminal residues and may benefit from being sampled preferentially.
PKRRDFREQ
As pointed out above, finding arbitrary conformations of a fivemembered ring while keeping all bond lengths and angles constant is an overconstrained problem (→ PKRFREQ). Therefore, CAMPARI releases the constraint on bond angle rigidity for those systems which include proline and similar polypeptide residues. This necessitates the use of bond angle potentials (see SC_BONDED_A) to keep local geometries reasonable. To sample different ring conformers effectively, CAMPARI uses a strategy of combining a nonergodic reflection of the pucker step (nonlocal) with stepwise but unbiased excursions away from the current state. This keywords regulates the fraction of pucker moves to be of the former type (reflection). The formulas listed under PKRFREQ multiplied with PKRRDFREQ and (1.0PKRRDFREQ), respectively, would give the expected numbers for either type. Note that it typically is not a good idea to set this to either zero or unity. A value of unity would create an effective twostate model (with fixed bond angles), while a value of zero would make it very difficult for the gross pucker state to switch due to the barrier separating the two (this last statement assumes typical interaction potentials).PUCKERSTEP_DI
This keyword applies to the second type of pucker sampling (see PKRRDFREQ) and controls the maximum step size for dihedral angles in degrees for the random stepwise excursions from the current state. It simultaneously applies to the problem of sugar pucker sampling (→ SUGARFREQ). In both cases, four of the seven freely sampled degrees of freedom are dihedral angles.PUCKERSTEP_AN
This keyword applies to the second (stepwise) type of pucker sampling (see PKRRDFREQ) and controls the maximum step size for bond angles in degrees for the random stepwise excursions from the current state. Much like PUCKERSTEP_DI, this keyword simultaneously applies to the problem of sugar pucker sampling (→ SUGARFREQ). In both cases, two of the seven freely sampled degrees of freedom are bond angles and one bond angle is derived to correctly close the loop.NUCFREQ
This keyword controls the frequency of all types of polynucleotide moves excepting those sampling just sidechain degrees of freedom. This set includes algorithms to sample stretches of polynucleotides with endconstraints (concerted rotation → NUCCRFREQ), dedicated algorithms to sample the constrained dihedral angles around the sugar bond (→ SUGARFREQ), and simple polynucleotide backbone pivot moves. The description below applies only to the latter type which does not possess a dedicated keyword but is the default fallthrough choice for this branch of the decision tree.Nonterminal polynucleotides have six backbone degrees of freedom one of which is not sampled by this type of move. Much like for proline, the rotation around the sugar bond is hindered and a dedicated algorithm is needed to sample this dihedral angle (→ SUGARFREQ). An overview of the backbone degrees of freedom for terminal and nonterminal nucleotides can be gleaned from the description of sequence input. Nucleotide pivot moves are physically analogous to polypeptide φ/ψmoves in that they sample the backbone of a single nucleotide residue. The new conformation will imply the rotation of a lever arm which will render largescale perturbations very unlikely to be accepted (→ ALIGN). Technically, these moves are implemented slightly differently in that the number of sampled degrees of freedom may vary (→ NRNUC). This is to make it possible to finetune sampling efficiency. As with any move coupling the sampling of independent degrees of freedom blindly, efficiency will typically be unacceptably low for more than two backbone dihedral angles given a realistic interaction potential and the complicated topology of polynucleotides. In the future, these moves are sought to cover any type of nonpolypeptide polymer and the flexible setup was implemented partially with that in mind.
Expected numbers for all polynucleotide pivot moves may be calculated as follows:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · NUCFREQ · (1.0NUCCRFREQ) · (1.0SUGARFREQ)
Remember that NUCFREQ does not control the fraction of polynucleotide pivot moves directly but only sets the expected number for all polynucleotide moves. Note that the moves are additionally split up into those attempting to completely randomize the nucleotide backbone angles and those that attempt stepwise perturbations (→ NUCRDFREQ). The default picking probabilities for these pivot moves are flat on a perresidue basis. They can be adjusted by the preferential sampling utility, and this could become routinely relevant in future applications, for which other polymer types are subjected to pivot moves through this facility. In such a case, it would almost certainly be desirable to make the picking frequencies (at the very least) proportional to the number of backbone degrees of freedom in each residue, which may not necessarily be homogeneous.
NRNUC
This keyword allows the user to set the maximum number of nucleic acid backbone angles to be sampled within a pivot polynucleotide move. The dihedral angles will always come from the same residue. The implementation has the following features: Whenever NRNUC is equal to or larger than the number of backbone angles on a certain residue, all backbone angles on that residue will be sampled simultaneously.
 Whenever NRNUC is smaller than the number of backbone angles on a certain residue, on average NRNUC of the available angles should be sampled simultaneously. However, the actual average will be larger since always at least one angle has to be sampled (in other words, there is a stochasticity to the number of angles chosen, and the asymmetry is introduced by the constraint to always have at least one angle in the set).
NUCRDFREQ
This keyword is completely analogous to PIVOTRDFREQ but applies to polynucleotide backbone pivot moves instead of φ/ψmoves.NUCSTEPSZ
This keyword is completely analogous to PIVOTSTEPSZ but applies to polynucleotide backbone pivot moves instead of φ/ψmoves.NUCCRFREQ
This keyword sets the fraction of exact nucleic acid concerted rotation (CR) moves amongst all nucleotide moves. Concerted rotation algorithms are provided both for polypeptides and polynucleotide and function generally analogously although there are important implementation differences. Important general information for this type of move is provided elsewhere, along with parameters that apply to all variants of exact CR moves (such as UJCRBIAS, UJCRSTEPSZ, and UJCRWIDTH). In particular, the reader is referred to both the literature and the documentation on CR moves for polypeptides (→ CRFREQ and TORCRFREQ) in particular with regards to the interpretation of auxiliary keywords (NUCCRMIN and NUCCRMAX) and the handling of picking probabilities and their alteration by userlevel constraints and preferential sampling weights.The general idea of a concerted rotation move is to sample a stretch of polymer without changing the absolute positions and relative orientation of the termini. Six degrees of freedom are required to solve this constrained problem. Note that for nucleic acid CR moves the rotation around the sugar bond (C4*C3*) is always excluded from the algorithm (treated as a rigid segment). The order of angles is as follows:
 Any number of consecutive and permissible backbone dihedral angles immediately preceding nuc_bb_4 on residue i
 O5PC5*C4*C3* (nuc_bb_4 on residue i)
 C4*C3*O3PP (nuc_bb_5 on residue i)
 C3*O3PPO5P (nuc_bb_1 on residue i+1)
 O3PPO5PC5* (nuc_bb_2 on residue i+1)
 PO5PC5*C4* (nuc_bb_3 on residue i+1)
 O5PC5*C4*C3* (nuc_bb_4 residue i+1)
The expected number of nucleic acid concerted rotation moves is obtained as follows:
Expected numbers for polynucleotide CR moves may be calculated as follows:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · NUCFREQ · NUCCRFREQ
The user is reminded again that some of the parameters required for this move type apply universally to all exact CR methods while some apply specifically to the nucleic acid variant.
SUGARFREQ
This keyword sets the fraction of polynucleotide backbone moves to selectively alter the dihedral angles around the sugar bond (C4*C3*) amongst all polynucleotide moves not of the CR variety. Exactly analogous to the case for proline and similar cyclic residues in polypeptides (→ PKRFREQ), these rotations are hindered by the presence of the ring and cannot be sampled blindly. Moves of this type will therefore alter the pucker state of the sugar belonging to the chosen nucleotide and the backbone conformation of the polynucleotide (including lever arm) simultaneously.The expected number may be calculated as follows:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · NUCFREQ · (1.0NUCCRFREQ) · SUGARFREQ
The approach chosen to sample sugars is identical to the one for proline. There are two basic move types, one which inverts the pucker state by flipping the sign of two dihedralangles, and a second one which perturbs the bond angles and dihedral angles defining the 5remembered ring by small random increments while maintaining bond lengths exactly (→ SUGARRDFREQ). The default picking probabilities for this move type are even for all eligible, sugarcontaining residues. They can be adjusted by the preferential sampling utility. An example application could be to preferentially sample sugars close to the binding interface of a welldefined proteinDNA complex rather than those in the rigid portion of the DNA.
SUGARRDFREQ
This keyword is exactly analogous to PKRRDFREQ but applies to sugar pucker moves in polynucleotides instead of to polypeptide pucker moves.CHIFREQ
Most biologically relevant polymers possess at least minor branches off the main chain. These sidechains are typically short and usually encode the alphabet underlying for instance polypeptides and polynucleotides. From a technical point of view, such short branches are much easier to sample than the backbone of a polymer since the impact of a change in conformation of the branch only affects the branch (lever arm effects are minimal and the assumed direction is always from the main chain outward towards the end of the branch). Since the perturbation is local, energy evaluations are much less costly and acceptance rates generally higher. There is no need for advanced algorithms and simple pivotstyle moves resetting or perturbing the dihedral angles angles in such a sidechain branch are sufficient to explore phase space. This keyword sets the fraction of all sidechain moves including a specialized move type used for analysis only (→ PHFREQ).Expected numbers for actual sampling moves (denoted as χmoves) are:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · CHIFREQ · (1.0PHFREQ)
And for moves trying to determine the pKvalues of ionizable polypeptide sidechains:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · CHIFREQ · PHFREQ
Note that the former are decomposed further into those randomizing the contributing degrees of freedom and those applying stepwise perturbations (→ CHIRDFREQ). The default picking probabilities for this move type give equal weight to all residues with at least one χangle independent of the number of χangles. This can be adjusted by the preferential sampling utility, which as an example would allow making all residue picking probabilities directly proportional to the number of χangles for each residue.
CHIRDFREQ
This keyword is completely analogous to PIVOTRDFREQ but applies to χmoves instead of φ/ψmoves.CHISTEPSZ
This keyword is completely analogous to PIVOTSTEPSZ but applies to χmoves instead of φ/ψmoves.NRCHI
Many sidechains have different numbers of χangles and the complexity of a move would depend on the number of such angles sampled concurrently. Therefore, this keyword allows the user to set the maximum number of χangles to be sampled within a sidechain move. The dihedral angles will always come from the same sidechain on the same residue. Analogously to NRNUC, the implementation has the following features: Whenever NRCHI is equal to or larger than the number of χangles on a certain residue, all χangles on that residue will be sampled simultaneously.
 Whenever NRCHI is smaller than the number of sidechain angles on a certain residue, on average NRCHI of the available angles should be sampled simultaneously. However, the actual average will be larger since always at least one angle has to be sampled (in other words, there is a stochasticity to the number of angles chosen, and the asymmetry is introduced by the constraint to always have at least one angle in the set).
OTHERFREQ
MC move sets are highly specialized tools that have to reflect the choice of the system's degrees of freedom, its density, etc. Some of the choices enforced by the "standard" CAMPARI move sets and mandated by the default parameterization of the ABSINTH implicit solvent model are somewhat arbitrary. This is primarily an issue for degrees of freedom describing rotations around electronically hindered bonds and for rotations around terminal bonds between heavyatoms (methyl and ammonium spins). For example, the amide bond in secondary amides is allowed to vary with dedicated moves, but these are not available for primary amides (the reasoning behind it is connected to the vanishing relevance of cis/trans isomerization in the latter case). However, these choices may not always be desirable. Second, when attempting to simulate entities that CAMPARI does not support natively, the majority of "standard" move types may not be available (exceptions apply if the entities are recognized as conforming to a supported biopolymer type). This would limit simulations containing such entities to pure rigidbody sampling.To address both issues, CAMPARI offers a separate class of dihedral angle pivot moves that can be applied to any freely rotatable torsion angle in any of the system's components. There is a requirement that the Zmatrix be constructed such that only a single Zmatrix angle needs to be edited to describe the perturbation, and this is true for all candidate dihedral angles in residues supported natively by CAMPARI that are frozen by default (e.g., CN bond in the lysine sidechain, all CN bonds in primary amides, CACB bond in alanine, and so on). For unsupported residues, the Zmatrix is inferred from the input structure, and it may require some reordering of atoms to achieve the desired results (see a tutorial relevant in this context). In addition, these moves can also sample torsional degrees of freedom supported by other move sets as long as they fulfill the Zmatrix criterion (this currently excludes the polypeptide φ/ψangles, which are supported by the widest range of specialized move sets).
In terms of parameters, some care has to be taken that torsional potentials describing electronic effects (e.g., in primary amides) are included. Technically, moves of this type are unique in that they always sample only a single degree of freedom. Chain alignment works slightly differently for these moves. Specifically, for options 3 and 4, the number of atoms (rather than the number of residues) moving is critical in determining alignment. Also, all degrees of freedom are eligible for an inverted alignment including sidechain degrees of freedom. Even for option 3, this may consequently lead to the absence of a "base of motion" that would stay rigorously in place in the absence of rigid body moves. For option 2, CAMPARI attempts to preserve a welldefined base of motion at the Cterminus, but this may not work as expected, in particular for polynucleotides and/or very short chains.
To calculate the number of all expected moves of type of OTHER, use:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · OTHERFREQ
Note that these moves are additionally split up into three basic types (see OTHERUNKFREQ and OTHERNATFREQ for choosing different subsets of degrees of freedom), each of which is again split into two variants, i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). For each subcategory of degrees of freedom, sampling weights can be adjusted individually with the preferential sampling utility. Details and examples are given for the individual subcategories.
OTHERUNKFREQ
If single dihedral angle pivot (OTHER) moves are in use, and if the simulation utilizes entities (residues, molecules) that are not natively supported by CAMPARI, this keyword allows the user to choose the bulk sampling weight for degrees of freedom in those unsupported residues. The use of unsupported residues in simulations is explained in a dedicated tutorial.To calculate the number of expected moves acting on single dihedral angles in unsupported residues, use:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · OTHERFREQ · OTHERUNKFREQ
As mentioned above, these moves are additionally split up into two subtypes i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). They can be adjusted at the level of individual degrees of freedom by the preferential sampling utility. As an example, this can be useful when sampling an unsupported polymer (e.g., a polyester) and greater sampling emphasis should be placed on backbone degrees of freedom.
OTHERNATFREQ
If single dihedral angle pivot (OTHER) moves are in use, and if not all OTHER moves are consumed on unsupported residues (→ OTHERUNKFREQ), this keyword allows the user to choose the bulk sampling weight amongst remaining OTHER moves for degrees of freedom that are supported natively by CAMPARI.To calculate the number of expected moves acting on single dihedral angles natively supported, use:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · OTHERFREQ · (1.0  OTHERUNKFREQ) · OTHERNATFREQ
This keyword also controls the fraction of moves acting on dihedral angles frozen by default, but located in residues supported natively by CAMPARI. Compute expected number as:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · OTHERFREQ · (1.0  OTHERUNKFREQ) · (1.0  OTHERNATFREQ)
Both subclasses are additionally split up into two subtypes i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). They can be adjusted at the level of individual degrees of freedom by the preferential sampling utility. For the natively supported degrees of freedom, this could be useful in order to aid sampling of backbone degrees of freedom, whereas for the natively frozen degrees of freedom it could be used to selectively enable a few of those degrees of freedom (e.g., enable flexibility of arginine sidechains, but keep suppressing the methyl spins in hydrophobic residues).
OTHERRDFREQ
This keyword is completely analogous to PIVOTRDFREQ but applies to all moves of type OTHER instead of polypeptide backbone pivot moves.OTHERSTEPSZ
This keyword is completely analogous to PIVOTSTEPSZ but applies to all moves of type OTHER instead of polypeptide backbone pivot moves.CRFREQ
This keyword is a global frequency setting which controls and entire branch of Monte Carlo moves all sharing the feature that they are of the concerted rotation (CR) type and apply to polypeptides. The general idea of a CR move is to sample a stretch of polymer without changing the absolute positions and relative orientation of the termini. Six degrees of freedom are required to solve this constrained problem exactly but simpler methods exist to use more degrees of freedom to solve it approximately (→ CRMODE). The reader is referred to NUCCRFREQ for CR moves on polynucleotides.There are four different types of CR moves for polypeptides provided in CAMPARI:
 Exact CR moves utilizing both bond angles and dihedral angles
along the polypeptide backbone to solve the closure problem exactly
given fixed end constraints: these
moves are based on the work of Ulmschneider and Jorgensen (→ ANGCRFREQ). (reference)
 Exact CR moves utilizing φ, ψ, and ωangles along the
polypeptide backbone to solve the closure problem exactly given fixed
end constraints: these
moves are primarily based on the work of Dinner (→ TORCRFREQ and TORCROFREQ). (reference)
 Exact CR moves utilizing just φ and ψangles along the polypeptide backbone to solve the closure problem exactly given fixed end constraints: these moves are also based on the work of Dinner (→ TORCRFREQ and TORCROFREQ).
 Inexact CR moves utilizing just φ and ψangles along the
polypeptide backbone to approximate a solution to the closure problem
by linear response: these
moves are based on the Favrin, Irbäck, and Sjunnesson (default
fallthrough for this branch). (references)
The general appeal of exact CR methods partially lies in the reduced complexity of energy evaluations since the move only perturbs conformation locally and large parts of the polymer (assuming sufficient length) will remain static with respect to each other. This is never true for pivottype moves applied to residues at the center of the chain. The other aspect which makes CR moves appealing is that they introduce correlation into the MC move set (the reader is referred to Vitalis and Pappu for further reading).
To compute expected numbers, use (same numbering as above):
 NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · CRFREQ · ANGCRFREQ
 NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · CRFREQ · (1.0ANGCRFREQ) · TORCRFREQ · TORCROFREQ
 NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · CRFREQ · (1.0ANGCRFREQ) · TORCRFREQ · (1.0TORCROFREQ)
 NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · CRFREQ · (1.0ANGCRFREQ) · (1.0TORCRFREQ)
ANGCRFREQ
This keyword selects the (sub)fraction of UlmschneiderJorgensen (UJ) CR moves (see J. Chem. Phys. 118 (9), pp42614271 (2003)) according to the formulas shown above. Like any other exact CR move implemented in CAMPARI, UJCR moves combine two strategies for efficient conformational sampling: the approach of Favrin et al. (→ CRMODE) is used to obtain a variable length prerotation which biases the end of the prerotation segment to a position with high chance of having at least one real solution when attempting to close it. The closure problem is solved exactly using a numerical root search for an algebraically transformed equation for the following six degrees of freedom: Dihedral angle C_{i2}, N_{i1}, C_{α,i1}, C_{i1} (φ_{i1})
 Bond angle N_{i1}, C_{α,i1}, C_{i1}
 Dihedral angle N_{i1}, C_{α,i1}, C_{i1}, N_{i} (ψ_{i1})
 Bond angle C_{α,i1}, C_{i1}, N_{i}
 Bond angle C_{i1}, N_{i}, C_{α,i}
 Dihedral angle C_{i1}, N_{i}, C_{α,i}, C_{i} (φ_{i})
 The chain closure algorithm relies on a search process to locate roots for a complicated equation, which makes repeated matrix operations necessary which generate a considerable computational overhead for a single UJCR move. This is true for all exact CR methods, even much more so for exact torsional variants than for UJCR moves (→ TORCRFREQ).
 The inclusion of bond angles into the prerotation stretch is not
a particularly useful extension but
required for reasons of ergodicity. Additional parameters are needed to
manage this aspect properly
(→ UJCRSCANG). The inclusion of
bond angles in the closure segments simplifies the root search
procedure by eliminating branches for solution space and generally
reducing the number of possible solutions. This makes the algorithm
faster than comparable methods using dihedral
angles only. However, varying bond angles cause two crucial issues:
 Allowing bond angles to change violates CAMPARI's typical paradigm of fixed geometry in MC calculations and therefore might invalidate some of the force field calibration done under this assumption. In general, it is very important to match the degrees of freedom chosen for the calibration phase of a force field with that for the application phase. The commonly held belief that the introduction of constraints does not alter the positions and relative weights of basins but merely influences barriers in the free energy landscape is not correct.
 CAMPARI currently has no way of independently sampling bond angles in Monte Carlo simulations. This means that effectively a subset of all bond angles are introduced as new degrees of freedom, for which is there is no a priori justification whatsoever (in other words: selectively sampling a few bond angles makes unjustified assumptions about the remaining bond angles). It is therefore recommended to use this feature with the utmost caution until a more sound implementation surrounding it is added. Presently, it may be most suitable as part of the MC move set in hybrid runs (see DYNAMICS) employing Cartesian sampling in the dynamics portions (see CARTINT) although this approach has its own caveats.
TORCRFREQ
Aside from the UJCR moves which employ bond angles (see ANGCRFREQ), analogous methods have been formulated to instead employ exclusively dihedral angles in both the closure and prerotation stretches. This keyword sets the frequency with which both subtypes of those moves occur during the simulation according to the formulas listed above. The preceding discussion has outlines the appeal of exact CR methods and it is not repeated here. Much like Ulmschneider and Jorgensen, CAMPARI employs a hybrid scheme of biased prerotations according to Favrin et al. (see CRMODE) and of exact closures according to Dinner. The latter half of the algorithm is the costintensive one. The algebraically transformed equation requires a numerical root search, for which we use a modified Newton scheme outlined below. Typically, multiple solutions need to be found, and a careful weighting and biasremoval strategy has to be employed to choose solutions with the proper probabilities (→ TORCRMODE). Those comments apply equally to exact polynucleotide CR moves (see NUCCRRFREQ). For polypeptides, there are two variants available which differ in which peptide torsions are used to close the chain (described below).Note that proline (or any other cyclic residue with constrained flexibility around any of the backbone dihedral angles) causes additional problems. In theory, one could formulate algebraic solutions which skip the proline φtorsion. Since the number and positions of proline residues in the closure stretch are not known a priori, this appears impractical. We therefore provide a coupling to (weakly biased and simplified) pucker moves (see PKRFREQ) which will simultaneously determine and propose a new pucker state while solving the chain closure problem. This means that:
 Sampling of the φangle becomes coupled to the proline sidechain conformation (as it should be).
 The acceptance rate for CR moves will be significantly lower due to the extra degrees of freedom included.
 The sampling of the sidechain conformation will be weakly biased towards proper pucker states. In detail, some of the proposed closures will yield φangle values incompatible with sidechain closure and those will be discarded. For those which yield a sane φangle, a corresponding χ_{1}value is proposed with bias toward closable states. One of two free bond angles is perturbed slightly in random fashion and the last one is given by the closure as usual.
 Due to the above, it will be advantageous to not rely overly on CRsampling for prolinerich systems  both for reasons of efficiency and accuracy. Conversely, it should be difficult to find a statistically significant impact of the sampler on global chain properties for polypeptides with low proline content.
TORCROFREQ
This keyword lets the user set the fraction amongst exact, torsional polypeptide CR moves to include ωangles in the formulation of the closure problem? Conversely, the remaining moves will use only φ/ψangles to close the chain. Expected numbers for either type are listed above. In detail the ωvariant uses the following six degrees of freedom: Dihedral angle C_{α,i2}, C_{i2}, N_{i1}, C_{α,i1} (ω_{i1})
 Dihedral angle C_{i2}, N_{i1}, C_{α,i1}, C_{i1} (φ_{i1})
 Dihedral angle N_{i1}, C_{α,i1}, C_{i1}, N_{i} (ψ_{i1})
 Dihedral angle C_{α,i1}, C_{i1}, N_{i}, C_{α,i} (ω_{i})
 Dihedral angle C_{i1}, N_{i}, C_{α,i}, C_{i} (φ_{i})
 Dihedral angle N_{i}, C_{α,i}, C_{i}, N_{i+1} (ψ_{i})
Conversely, for the nonωvariant we have:
 Dihedral angle C_{i3}, N_{i2}, C_{α,i2}, C_{i2} (φ_{i2})
 Dihedral angle N_{i2}, C_{α,i2}, C_{i2}, N_{i1} (ψ_{i2})
 Dihedral angle C_{i2}, N_{i1}, C_{α,i1}, C_{i1} (φ_{i1})
 Dihedral angle N_{i1}, C_{α,i1}, C_{i1}, N_{i} (ψ_{i1})
 Dihedral angle C_{i1}, N_{i}, C_{α,i}, C_{i} (φ_{i})
 Dihedral angle N_{i}, C_{α,i}, C_{i}, N_{i+1} (ψ_{i})
The need for different implementations is that the problems differ algebraically (for once) and that the stiffness of the ωbond may make those moves using the ωbonds in the closure particularly ineffective. This is not the only reason, however, to favor the nonωvariant which is also betterbehaved in terms of finding solutions to the closure reliably. Note that several diagnostics of the performance of exact CR methods are reported during the simulation and after its completion in the logfile.
CRMODE
This defines the mode to use for concerted rotation moves roughly according to the Favrin et al. reference: J. Chem. Phys. 114 (18), 81548158 (2001). In general, this type of move attempts to introduce correlation into a MC move by coupling several consecutive backbone angles (only φ/ψ are considered) together to minimize a cost function which in this case is the difference of the position of the last atom in the stretch compared to its original position. Larger biases lead to smaller moves and higher acceptance. More often than not, this algorithm suffers from its computational inefficiency. Because the loop is only approximately closed, energy evaluations of high complexity (even more expensive than a pivot move) are necessary. It is not recommended to use moves of this type extensively.There are two modes available:
 A matrix relating changes in the degrees of freedom to changes in the cost function (dr/dφ) is computed by considering effective lever arms. In this implementation six effective restraints are imposed through the three reference atoms (N, C_{α}, C) on the residue following the last one of those whose torsions are sampled (note, though, that algorithmically all nine Cartesian positions are used). Note that this mode therefore requires an additional buffer residue at the Cterminus. Specifically, sampling is possible only within an interval from the third residue (in addition to the ineligible terminal residues, there is a symmetrycreating Nterminal buffer residue as well) to the third last residue in each polypeptide chain. In that sense, these moves are trivially nonergodic since they fail to sample a subset of the chosen degrees of freedom (i.e., those within terminal residues).
 The dr/dφ matrix is computed by nested rotation matrices (propagating changes via matrix multiplication). This directly accounts for peptide geometry within the reference atoms and yields six actual restraints. Here, the reference atoms are C_{α}, C, and O on the last residue of which torsions are to be sampled. The implementation with nested rotation matrices is costlier and this mode is only marginally supported, i.e., offers very limited adaptability through the keywords below.
CRDOF
If inexact concerted rotation moves for polypeptides are in use (→ CRMODE), this keyword allows the user to provide the exact number of torsions to use each time such a move is performed. The default value is eight but a different number may be chosen as long as the chain is long enough to accommodate these moves. A minimum of seven degrees of freedom applies since the linear equations are otherwise overdetermined and only trivial solutions are (asymptotically) found. Note that this keyword is only supported if CRMODE is set to 1. Extensions of this to support mode 2 or to allow random, variable lengths during the simulations are currently not anticipated. This is due to the overall inefficiency of the Favrin et al. approach (see discussion here).CRWIDTH
This keyword gives the standard deviation in radian of the random normal distribution underlying inexact concerted rotation moves for polypeptides (→ CRMODE), from which the (unbiased) displacement vectors are implicitly drawn. This corresponds to parameter "a" in the reference but is specified here as its inverse (a = 1/CRWIDTH). Note that the actual resultant distribution width is only set by this keyword if the bias toward minimizing the cost function is zero. If the latter is nonzero the resultant distribution width will be cocontrolled by the setting for CRBIAS. Note that only values up to π/2 may be specified to avoid wraparound artifacts which may upset the procedure of removing the bias from these moves.CRBIAS
This keyword specifies the strength of the bias for inexact concerted rotation moves for polypeptides (→ CRMODE) and corresponds to parameter "b" in the reference. It essentially controls how close the end of the rotated segment will end up to its original position (satisfying the restraints). Unfortunately, this also coregulates the step size, hence there is a need for parameter optimization (i.e., the variance of the resultant biased distribution cannot be controlled easily). Intuitively, the reason is that  in a linear responsetype theory  tiny step sizes always represent one way of satisfying the restraint. Note that with a choice of zero for this keyword, these inexact CR moves relax to random pivot moves of multiple residues in a row (→ CRDOF) with a sampling width controlled by CRWIDTH. Conversely, when choosing very large numbers for this keyword, it should be kept in mind that the evaluation of the acceptance criterion requires inclusion of an exponential factor, exp[ (Δφ^{T} A Δφ) + (Δφ'^{T} A' Δφ') ]. Here, the primed quantities are for the reverse move. Matrix A is diagonal if this keyword is set to zero which implies A = A', and the bias correction is unity. For large values of CRBIAS, the two elements within the exponential become disparate in magnitude very quickly and the exponential may exceed numerical limits even for double precision variables. This may cause some compilers to throw exceptions. Note that the complete bias correction formula includes the determinant of matrix A as well.UJCRBIAS
Despite its name, this keyword regulates the biasing strength for the prerotation steps in all exact CR methods, i.e., nucleic acid CR moves, UJCR moves and both types of exact polypeptide CR moves (→ ANGCRFREQ, TORCRFREQ, and NUCCRFREQ). The strength of the bias controls how close the end of the prerotation segment remains to its original position hence improving the chances for successful closure. This parameter is strongly codependent "with" the default distribution width in the absence of any bias (→ UJCRWIDTH). This keyword parameter is analogous to CRBIAS in the Favrin et al. scheme and is called "c2" in the UJ reference. It should be stressed that all caveats outlined above apply here as well.UJCRWIDTH
Despite its name, this keyword regulates the general (in the absence of bias) width of the distribution (in degrees) sampled in the prerotation segment for all exact CR methods (→ ANGCRFREQ, TORCRFREQ, and NUCCRFREQ). As in the Favrin et al. scheme (which is practically embedded in all exact CR methods implemented in CAMPARI), the resultant width is codependent on the bias factor (see UJCRBIAS and for comparison: CRBIAS and CRWIDTH). It corresponds to "1/c1" in the UJ reference and therefore larger values give wider distributions.UJCRSTEPSZ
The chain closure algorithm works in most exact CR implementations by reducing a multidimensional variable search to a 1D rootsearch, which is then solved by some form of stepthrough protocol and subsequent bisection. This keyword allows the user to choose the stepsize for that root search in degrees for all exact CR methods. Currently, the UJCR method (→ ANGCRFREQ) uses a simple, nonadaptive stepping protocol (see also UJCRINTERVAL). Larger stepsizes there increase the speed of the algorithm significantly, but also increase the fraction of attempts in which no solution is found at all (a quantity reported at the end of the logfile). The recommended value by the authors is 0.05°. Conversely, the exact torsional CR methods for both polypeptides and polynucleotides (→ TORCRFREQ and NUCCRFREQ) employ a modified Newton scheme to map out the complete solution space in three hierarchical steps. In those cases, this keywords merely defines the largest step size to ever be used (i.e., if target function and derivative indicate that no root is near, the step size is not adjusted to very large values but instead to the value given by this keyword). For these methods, a setting of around 1.0 appears much more appropriate. In the future, the implementation of the UJCR method may be adjusted to use the same protocol as the torsional methods. For clarity, it shall be repeated that this keyword applies to all exact CR methods (but is inapplicable to inexact CR moves: → CRMODE). It is very important to understand that the numerical root search will invariably be unreliable, i.e., that there are conformations for which the function may be approaching zero asymptotically while also approaching imaginary solution space. This implies that with such a technique, it will be nearly impossible to eliminate all biases rigorously although it will be possible to reduce their amplitude below that of statistical noise, even when the settings are such that satisfactory computational efficiency is provided (which of course is a crucial element to consider for expensive algorithms such as exact CR methods).UJCRMIN
Specifically for the bond anglebased UlmschneiderJorgensen algorithm (→ ANGCRFREQ), this specifies the minimum requested length (in terms of number of residues) for the prerotation segment in the implementation. Note that if no molecule in the system is at least UJCRMIN+4 residues long (two for closure, two terminal buffer residues that can be caps), CR moves will be disabled entirely. Due to the problems outlined above, this suboptimal implementation has not yet been improved. Note that UJCRMIN and UJCRMAX are analogous to keywords TORCRMIN_DO and TORCRMAX_DO, but use residue numbers instead of numbers of degrees of freedom. Another restriction is that  unlike for TORCRMIN_DO and analogous keywords  UJCRMIN is enforced strictly, i.e., candidate residues are only those that provide the correct padding on either side (for the exact, torsional variants, the specified minimum padding is generally adjusted to the absolute minimum for stretches that would otherwise be too short). Therefore, the implementation of the angular UJCR moves generally offers less flexibility.UJCRMAX
Specifically for the bond anglebased UlmschneiderJorgensen algorithm (→ ANGCRFREQ), this keyword specifies the maximum requested length (in numbers of residues) for the prerotation segment in those moves. Note that this parameter is automatically reduced if a move is attempted for a molecule which is too short to allow the full range of segment lengths (but long enough to satisfy UJCRMIN of course). This will make it difficult to predict the resultant distribution of prerotation segment lengths (compare TORCRMIN_DO).UJCRINTERVAL
Specifically for the bond anglebased UlmschneiderJorgensen algorithm (→ ANGCRFREQ), this keyword lets the user choose the size of the search interval for the onedimensional rootsearch (see UJCRSTEPSZ). The algebraically isolated degree of freedom is scanned over the interval [φd;φ+d] where φ is the original value and d is the (half)interval size specified by this keyword. The recommended value is 20.0°. Note that this implementation is unique to the bond angle UJCR method and offers much reduced overhead cost per CR move compared to the exhaustive search performed by exact torsional methods. The efficiency and justifiability of the method both rely on the crucial assumption that  given a typical prerotation  approximately one solution will be found in the scanned interval. If the number of solutions is often zero or larger than one, the algorithm violates detailed balance and the resultant distributions will be strongly biased. It is generally recommended to analyze the performance of the algorithm beforehand by checking for proper Boltzmann weights in the distributions of both torsional and angular degrees of freedom. This is most easily and meaningfully done employing only bond angle potentials (→ SC_BONDED_A) but no other terms in the Hamiltonian. Then, the distributions of the dihedral angles must be flat and those for the angular degrees of freedom must be such that to k_{b}T·ln(p(α)) equals the acting bond angle potential on α.UJCRSCANG
This keyword applies exclusively to the bond anglebased UlmschneiderJorgensen CR algorithm for polypeptides (→ ANGCRFREQ). It lets the user set a scaling factor to reduce the magnitude of prerotation perturbations of bond angle degrees of freedom (in the absence of prerotation bias, resultant width will be proportional to UJCRWIDTH·UJSCRANG → values less than unity are desirable). Large perturbations on those bond angles would reduce the efficacy of the method considerably due to the stiff potentials typically used to keep bond angles in the valid regimes. Note that the UJCR method never considers ωangles for conformational sampling and that they are consequently excluded from prerotation sampling in their entirety. This is a bit of an arbitrary choice  in particular when considering the problems introduced by the bond angle sampling in the first place (discussion here)  and remedied in exact but purely torsional CR methods (→ TORCRFREQ). The parameter specified here corresponds to "1/c3" in the UJ reference.TORCRMODE
Unlike standard MC moves (such as φ/ψpivot moves), exact CR methods do not constitute an ergodic move set beyond the subspace satisfying the constraint (which is of course invariant toward sampling on that manifold). This necessitates mixing exact CR moves with other types of moves to achieve sampling of the entire phase space. Moreover, they solve an analytical problem numerically with finite error rate, i.e., not all solutions are always found. If these errors are dependent on the "position" of the constraint, i.e., on polymer conformation, the resultant sampling is biased even though Jacobian corrections are applied. This small bias is nearly impossible to remove entirely. CAMPARI supports two implementations for exact, torsional CR methods: When set to 1, at each step, a superset of solutions is created containing the original solution, a set of alternative closures given the original prerotation state, and a set of new conformations with a given, altered prerotation state and a set of closures for that altered state. For each solution, the Jacobian determinants with respect to the closure constraint and the prerotation constraint are evaluated, multiplied, and a solution is picked using the net Jacobian as a weight factor. The chosen move is then evaluated via the acceptance criterion given the additional bias correction of evaluating the randomness of the prerotation move forward and backward as in the Favrin et al. scheme. In the absence of any prerotation bias, this algorithm is conceptually rejectionfree. It also (in theory) satisfies detailed balance on account of the construction of the solution superset.
 When set to 2, at each step, a finite number of trials (see UJMAXTRIES) of prerotations according to the Favrin et al. scheme is performed. Closure is attempted and in case solutions are found, the possible closures along with the sampled prerotation constitute the set of possible moves. A random one is chosen (uniform probability) and the new conformation is evaluated via Metropolis with the Jacobian corrections for the proposed vs. the current state (with respect to both types of constraints) and the randomness correction for the prerotation step. Because solutions only need to be found given the prerotation, this algorithm is usually twice as fast as the one above given sane prerotation settings. This implementation does not satisfy detailed balance even in theory but attempts to remain globally balanced.
TORCRMIN_DO
This specifies the minimum requested number of degrees of freedom for the prerotation segment for exact CR moves for polypeptides utilizing ωangles during closure (→ TORCRFREQ). Note that this minimum number is not rigorously enforced but will be ignored if closure residues too close to the Nterminus are used. This is done in the interest of generality and to prevent the code from disabling these types of moves frequently. It is therefore not as straightforward as one may think to compute the expected distribution of prerotation segment lengths (and which residues are part of them with what probability) for each polypeptide. Note that here numbers of degrees of freedom are specified whereas for the bond angle UJ method, numbers of residues are specified (→ UJCRMIN).TORCRMAX_DO
This specifies the maximum requested number of degrees of freedom for the prerotation segment for exact CR moves for polypeptides utilizing ωangles during closure (→ TORCRFREQ). Note that this maximum number is in fact a rigorous upper limit and never exceeded but that the length of some polypeptides in the system may be such that it is never realizable. In the latter case, there will be an additional complication in predicting the resultant distribution of prerotation segment lengths (see TORCRMIN_DO as well).TORCRMIN_DJ
This keyword is exactly analogous to TORCRMIN_DO but applies to exact CR moves for polypeptides without using ωangles in the closure.TORCRMAX_DJ
This keyword is exactly analogous to TORCRMAX_DO but applies to exact CR moves for polypeptides without using ωangles in the closure.TORCRSCOME
This parameter is analogous to UJCRSCANG and scales down the magnitude of the stepsize for ωbonds in the prerotation segment of exact torsional CR methods for polypeptides. Since stiff torsional potentials usually act on ωbonds (→ OMEGAFREQ), the likelihood of obtaining rejected moves mostly on account of excursions of the ωangle is high. This unwanted behavior may be alleviated by employing small values for TORCRSCOME. Remember, however, that the prerotation step size will often be relatively small in general.UJMAXTRIES
Despite its name, this keyword regulates the maximum number of prerotation sampling events to consider in exact, torsional CR methods with TORCRMODE set to 2. If no solution is found within UJMAXTRIES, the move is counted as rejected. Naturally, detailed balance is maintained only if there is always at least one solution found given the new prerotation (i.e., this keyword is rendered obsolete). As alluded to above, this is never the case for the entirety of a simulation. It is difficult to predict what setting in those cases would best preserve global balance. The main utility of this keyword, however, lies in different sampling applications, e.g., in the efficient and exhaustive sampling of different loop conformations given a fixed constraint.NUCCRMIN
This keyword is analogous to TORCRMIN_DO but applies to exact CR moves for polynucleotides. Note that the sugar bond (C3*C4*) is always excluded from prerotation sampling.NUCCRMAX
This keyword is analogous to TORCRMAX_DO but applies to exact CR moves for polynucleotides. Note that the sugar bond (C3*C4*) is always excluded from prerotation sampling.PHFREQ
This is the frequency out of all sidechain moves (see CHIFREQ) whether to perform a (de)ionization MC move. These moves will be turned off automatically in case there are no titratable residues in the system (currently only polypeptide residues D, E, R, K, and H (use neutral form) are supported). Note that these are pseudoMC moves, i.e., they do not interface intuitively with the rest of the MC code. This means that the guidance criterion for accepting / rejecting titration moves is based on a distinct and simplified energy evaluation which has no impact on the actual Markov chain. These moves are therefore analyzing (onthefly) an independently generated Markov chain (using whatever Hamiltonian was specified) but do not perturb the conformational ensemble generated by said Markov chain in any way. This essentially corresponds to the assumption that the generated ensemble is independent of titration states  an assumption which is always wrong but may  in certain circumstances such as extreme denaturing conditions  nonetheless be justified. These moves rely on environmental settings (PH and IONICSTR) and are required for obtaining output in PHTIT.dat. The default picking probabilities for ionizable residues are flat and cannot be altered.FRZFILE
This keyword specifies name and location (full or relative path) of the input file for the selection of molecules or residues for which selected degrees of freedom are to be excluded from sampling by explicit removal from Monte Carlo sampling lists and/or by not integrating equations of motion for them. This means that only such degrees of freedom can be constrained that are in fact explicit degrees of freedom of the sampling scheme in use (see DYNAMICS and CARTINT). If this keyword is not present, no constraints are going to be used beyond the systemimposed ones, which may be samplerdependent. Note that restricting the Monte Carlo move set defines effective constraints not covered here. In Cartesian space, explicit constraints to the x, y, and z coordinates of selected atoms are possible. However, indirect geometric constraints are also supported (differently and independently via SHAKESET).The input for explicit constraints is described in detail elsewhere. Hard constraints may be necessary for specialized applications, for example when one attempts to just reequilibrate the sidechains in a folded protein while leaving the fold intact. In general, it will be possible to use restraints (see for example SC_TOR or SC_DREST) as alternatives. Those allow the selected degrees of freedom to respond and fluctuate around a stable equilibrium position.
Note that constraint requests are not entirely arbitrary, and that the level of control being offered depends on the sampling engine. It is is not possible  for instance  to constrain just one out of several χangles in a protein sidechain in Monte Carlo simulations. In general, custom constraints in combination with a hybrid sampling approach may prove challenging when trying to match the sampled sets of degrees of freedom between Monte Carlo and dynamics segments. Furthermore, introducing constraints may prohibit certain MC samplers from being applied not just to the residues carrying the constraints but surrounding ones as well (such as concerted rotation methods → CRFREQ) due to underlying and conflicting assumptions. Lastly, CAMPARI will exit with an error if userselected constraints deplete the sampling list for a given move type entirely. Here, it is requested of the user to explicitly adjust the move set, since otherwise these moves would have to be converted to another type that is not necessarily desirable (note that this still happens if moves are requested that the system simply does not support).
FRZREPORT
If constraints are used (→ FRZFILE) in torsional space simulations, this keyword acts as a simple logical whether or not to write out a summary of the constraints in the system to the logfile.SKIPFRZ
If constraints are used (→ FRZFILE) in torsional space simulations, this keyword gives the user control over the calculation of effectively frozen interactions due to constraints. In Monte Carlo simulations (see DYNAMICS), incremental energies are computed by only considering the parts of the system that move relative to one another. This automatically addresses constraints. Conversely, in dynamics the total system energy and forces are calculated at each step. If this keyword is set, interactions between parts which have no chance of moving relative to one another (relative orientation completely constrained), will no longer be considered. Note that the potentials rigorously have to be at most pairwise decomposable for this option to be available (e.g., the polar term in the ABSINTH implicit solvation model is not strictly pairwise decomposable; → SC_IMPSOLV and SCRMODEL). Usage of this keyword can significantly accelerate dynamics runs or minimization runs in heavily constrained systems (such as ligand optimizations within a rigid protein binding site). Note that any reported energies do not contain the frozen contributions either if this option is chosen.PSWFILE
This keyword specifies name and location (full or relative path) of an optional input file parsed to alter the default picking probabilities for all types of moves in CAMPARI at most down to the residue level (but not further). In general, the idea of preferential sampling rests on the realization that any ergodic and unbiased move set is theoretically capable of producing a Markov chain yielding the correct phase space distribution. This means that the sampling weights given to degrees of freedom of the system need not be equivalent, but rather can be chosen arbitrarily (as long as a choice of zero somewhere does not eliminate ergodicity). Of course, the convergence properties of a Monte Carlo simulation are an exceptionally complicated function of the move set, and therefore deviation from default choices should be properly justified. Examples have been listed above, e.g. in the discussion of sidechain sampling.CAMPARI generally allows the preferential sampling facility to overlap with userlevel constraints. Constraints are applied first, and then picking probabilities are altered. In the process, it is possible to effectively introduce additional constraints on account of setting selected sampling weights to zero. This is tolerated as long as it does not deplete the pool for a class of moves entirely. In such a case, the program terminates with an error. There is a notable difference in zero sampling weights and constraint requests for concerted rotation moves of polymers (described elsewhere). Note that it is not possible to control frequencies that would lead to incorrect sampling. In particular, it is impossible to control picking probabilities for particle permutation moves, and particle insertion and deletion moves can only be controlled down to the molecule type level. Rigidbody moves are generally limited to the scope of molecules, not residues. The format of the input file is described elsewhere.
PSWREPORT
If the default picking probabilities are altered (→ PSWFILE) in torsional space Monte Carlo simulations, this keyword acts as a simple logical whether or not to write out a summary of the resultant picking frequencies for every move type that is active and has been modified (to the logfile).Files and Directories:
(back to top)
Preamble (this is not a keyword)
In general, files and directories should be provided using absolute paths. This is often advantageous in deploymentbased computing where relative directory structures and/or shortcuts may change or not exist. However, CAMPARI may fail in reading strings longer than 200 characters leading to truncation and subsequent failure. This should be kept in mind. Also, this section is merely a list of the auxiliary files potentially required by CAMPARI. The functionalities itself (including the files) are usually explained elsewhere.BASENAME
This keyword allows the user to pick a name for the simulation/system that is going to be used in the names of all structural output files. However, all other output files produced by CAMPARI use generic names and will be overwritten if simulations are continuously run in the same directory.SEQFILE
This is the most important input file as it instructs CAMPARI which system to simulate. Its format and possible entries are explained in detail elsewhere.SEQREPORT
This keyword is a simple logical (specifying 1 means "true", everything else means "false") that controls whether CAMPARI writes out a summary of some of the system's features initially. In detail, it will provide an overview of the identified molecule types, the numbers of each molecule type present, the first instance, and their highlevel suitability for performing CAMPARIinternal analyses. The latter would for example report that urea molecules are not suitable for peptidecentric analysis such as secondary structure analyses.ANGRPFILE
See below.BBSEGFILE
This keyword lets the user choose an input file containing a map annotating φ/ψspace for polypeptides with canonical secondary structure regions. This mapping is used to perform segmentbased analyses of polypeptide secondary structure. CAMPARI provides two such files already (in the data/ subdirectory). These and the files' format are explained in detail elsewhere.GRIDDIR
This keyword sets the directory CAMPARI browses to find input files for gridassisted sampling (see above). CAMPARI provides by default sample input files in $CAMPARI_HOME/data/grids/. The code assumes filenames to follow a systematic naming convention "xyz_grid.dat", where xyz is the lowercase, threeletter code of the standard 20 amino acids.This functionality is de facto obsolete and should not be used. It may be removed entirely in the future.
TORFILE
See below.POLYFILE
See below.TABCODEFILE
See below.TABPOTFILE
See below.TABTANGFILE
See below.REFILE
See below.PCCODEFILE
See below.SAVATOMFILE
See below.ALIGNFILE
See below.TRAJIDXFILE
See below.FRAMESFILE
See below.CFILE
See below.TRAJBREAKSFILE
See below.DRESTFILE
See below.FEGFILE
This keyword lets the user specify name and location of the input file from which CAMPARI extracts which residues and/or molecules to subject to scaled interaction potentials with the rest of the system in free energy growth (ghosting) calculations.BIOTYPEPATCHFILE
See above.MPATCHFILE
See above.RPATCHFILE
See above.BPATCHFILE
See below.LJPATCHFILE
See below.CPATCHFILE
See below.FOSPATCHFILE
See below.SAVPATCHFILE
See below.ASRPATCHFILE
See below.NCPATCHFILE
See below.FRZFILE
See above (note that this is not just relevant in Monte Carlo simulations).PSWFILE
See above.SHAKEFILE
See above.TRACEFILE
See below.PARTICLEFLUCFILE
This keyword is relevant only when ENSEMBLE is set to either 5 or 6 (ensembles with fluctuating particle numbers). It provides the location of the file that specifies the particle types that are allowed to fluctuate, the numbers of particles of those types to initially include in the system, and the chemical potentials of each fluctuating particle type (see here).WL_GINITFILE
See above.Structure Input and Manipulation:
(back to top)
RANDOMIZE
This keyword determines the randomization aspects of initial structure generation. The possible degrees of freedom being randomized are the backbone dihedral angles of flexible chains and the rigidbody coordinates of the various molecules. If the excluded volume potential is in use, this weakly biases the system toward conformations with little or no steric overlap; otherwise the configurations are completely random. The only exception to this are possible boundary potentials that restrict the randomization of rigidbody coordinates. The excluded volume bias does not occur in any meaningful quantitative fashion, however. It utilizes a finite number of attempts for each degree of freedom (see RANDOMATTS) and applies a universal energy threshold criterion (see RANDOMTHRESH). The resultant system conformations can serve to avoid initial structure biases when running multiple copies of the same simulation. For dense systems (including longer, randomized polymers) in the presence of excluded volume interactions, they will generally be so high in initial energy that a short Monte Carlobased simulation is indispensable in order to relax the system to a configuration that can then be used to start gradientbased simulations on. These errors can be particularly sever if the molecule(s) contain(s) chemical crosslinks.In detail, the options are:
 No randomization is performed. This option is the default and only available if the complete system configuration is provided by file, generally via PDBFILE. If no file is given, the choice will be changed automatically to option 2 below.
 Full randomization is performed. Technically, any polymers' internal conformations are randomized possibly with an excluded volume bias. This happens independently for all molecules and only for those polymer stretches that are not constrained by a crosslink. For every residue in a stretch, the randomization occurs in three phases (1/3 each of the total attempts per residue). In the first, only freely rotatable backbone angles (excluding all pucker and ωangles) are considered, e.g., the φ/ψangles of polypeptides or any backbonelike angles in unsupported residues. Energies are evaluated for residue pairs involving the current residue vs. all residues further toward the Nterminus of the stretch (already processed) and the single residue immediately following in the stretch (not yet processed). If the sum of these energies is less than the threshold, the algorithm proceeds to the next residue. The second phase only comes into play if 1/3 of the attempts are exceeded, and now involves rotatable sidechain angles (excluding those in native CAMPARI residues that are frozen by default) of the current residue as well. The last phase is triggered analogously, and additionally includes all aforementioned degrees of freedom for the residue immediately prior in the sequence. If no satisfactory solution is found, the energetically most favorable one is picked. Similarly, rigidbody coordinates (position of centroid, and rotational orientation) are randomized for every molecule. Here, there is only a single phase with the same number of total attempts (now per molecule). Energies are evaluated in pairwise fashion for all molecules occurring prior in sequence input vs. the current molecule. Intramolecular crosslinks are attempted to be closed with a potentially larger number of trials than set by RANDOMATTS. This is because simultaneous stretch randomization and successful closure are required (also see below). Intermolecular crosslinks are satisfied by moving the second molecule after rigidbody randomization and randomizing the crosslink itself. Both types of crosslinks can easily lead to repeated failures during randomization and/or very high initial energies. If structural input is provided, this choice will be changed automatically to option 2 below with the exception that any missing parts (structural input is truncated) are reconstructed randomly with respect to internal degrees of freedom. Thus, there can be a difference between specifying option 2 explicitly in conjunction with structural input (missing parts are built in default conformation internally) vs. letting CAMPARI downgrade option 1 to option 2 (missing parts are internally random).
 Partial randomization of rigidbody coordinates is performed as described for option 1. If a structural input file is provided, CAMPARI will extract the internal arrangements of all molecules from file and only randomize rigidbody coordinates. This can be useful for generating random starting structures for studies of the assembly of a protein complex from rigid components. If no file is given, polymers will be built in their default configurations (mostly extended), which will rarely be useful.
 Partial randomization of internal arrangements of flexible molecules is performed. This uses the same degrees of freedom and protocols as option 1) above, and works only in conjunction with a structural input file. Here, the positions of the Ntermini of all flexible polymers are taken as in the input file, but the chains are then randomized. This is an option useful only for highly specialized applications due to the poor relationship between the original rigidbody arrangement and the resultant one for polymers of appreciable length. If no structural input file is provided, this choice will be changed automatically to option 1 above.
It is a very important restriction that initial structure randomization does not observe userlevel constraints. In order to have a degree of freedom, which is accessible to randomization, start out in a welldefined state, randomization of the corresponding class of degrees of freedom must be disabled entirely (in which case the initial state comes either from the CAMPARI default or  more likely  from structural input).
For systems containing intramolecular crosslinks, randomization of an individual chain follows a hierarchical procedure ensuring that all crosslinks can (theoretically) be satisfied. First, a randomization of the loop stretch occurs (with additional bias using halfharmonic restraints on every bracketing crosslink pair), and then the crosslink constraint is attempted to be satisfied by finding suitable values for a minimal set of degrees of freedom encompassing the entire linkage and as few mainchain dihedral angles as possible. This successive randomization of "free" degrees of freedom and subsequent closure of the crosslink are repeated until a solution is found. This solution does not observe excluded volume bias in the same way that unconstrained randomization does. The entire procedure is repeated for all crosslinks as long as they can be arranged such that they remain weakly coupled. Note that intramolecular randomization is necessary for systems containing crosslinks for the crosslink to be satisfied initially (i.e., if RANDOMIZE is 0 or 2, and if no structural input is provided, the default geometries are used irrespective of the presence of any crosslinks as requested in the sequence file).
Lastly, depending on whether the user provided a request for structural input (→ PDBFILE or FYCFILE), the setting for RANDOMIZE may be altered as described above. The importance of initial structure randomization lies in avoiding initial structure biases that may be difficult to detect (for example, when starting identical replicas of a simulation all from full extended polymers). Note that in replica exchange or MPI averaging runs, all replicas will start from different conditions unless RANDOMSEED is given explicitly by the user.
RANDOMATTS
If any type of initial structure randomization is requested, this keyword sets the general number of maximum attempts in randomizing the permissible degrees of freedom for a single residue or molecule. Large numbers (> 10000) may produce unacceptably slow performance when trying to randomize a long, complex polymer and/or a dense fluid. Large numbers can also be counterproductive in the presence of intramolecular constraints since they limit the search space.RANDOMTHRESH
If any type of initial structure randomization is requested, this keyword sets the universal energy threshold to be applied with respect to energetic penalties for excluded volume, boundary potentials (rigidbody only), and intramolecular crosslinks bias terms. For every residue or molecule being processed, the sum of the (at most) three terms above for all pairwise terms containing the residue or molecule in question as described above (see information on option 1). All these terms are pure penalty terms and cannot yield negative energies. The strength of the halfharmonic crosslink restraints is also set as 0.1*RANDOMTHRESH (in kcal mol^{1}Å^{2}), whereas their maximum distance is a function of the spacing between the residue in question and the position of the crosslink constraint. Small values (less than 1000 kcal/mol) may be harmful in the presence of intramolecular constraints since they may overemphasize excluded volume bias.FYCFILE
This nearobsolete keyword allows the user to provide an input file to rebuild the coordinates of a single macromolecule based on a list of its "native" CAMPARI degrees of freedom. This serves to start a simulation from a nonrandom, fileencoded structure. CAMPARI provides an output file suitable for this task (→ TOROUT) and the format has to match exactly. Note that FYC.dat does not encode rigidbody coordinates which is the reason why only single molecule structural input is supported via this method. Only if this keyword is present will CAMPARI attempt to read any such torsional input. This functionality is overridden should the user provide a valid specification for PDBFILE as well. Note that this form of input is not supported for analysis runs (→ PDBANALYZE) and that the systems of course have to be identical.PDBFILE
This keyword provides the (base)name and location of a structure or trajectory input file for reading in Cartesian coordinates from a file in pdbconvention. There are two possible interpretations: PDBANALYZE is false:
In this case, PDBFILE operates analogously to FYCFILE in that it attempts to read an external file to construct an initial nonrandom conformation for the system. Depending on the setting for RANDOMIZE, only some of the information may be used. Naturally, the system (sequence) in the pdbfile has to be consistent with the choices made via SEQFILE. Note that parallel runs can use multiple input structures (→ PDB_MPIMANY). Depending on the choice for PDB_READMODE, the program then follows either of the following approaches: It extracts dihedral angles, pucker states, and rigidbody coordinates (i.e., the "native" CAMPARI degreesoffreedom) using the coordinates of the appropriate heavy atoms in the file. This preserves the CAMPARIdefault geometry (based on highresolution crystallographic databases) for the remaining degrees of freedom (most bond lengths, angles, improper and some proper dihedral angles). This option is often unsuitable for reading in larger polymers due to mismatch propagation along the backbone. It is generally unavailable for simulations featuring residues not natively supported by CAMPARI.
 It reads all Cartesian coordinates from the file and reconstructs the internal coordinate values from the Cartesian ones, i.e., CAMPARI uses the implicitly pdbencoded covalent geometry instead of its inherent default one for the constant elements of the Zmatrix. This option is recommended for systems containing longer polymers. Note that CAMPARI is able to correct for a fair amount of conflicting atom naming conventions in pdbfiles (see PDB_R_CONV as well) but that it may sometimes be necessary to change atom names in a given trajectory file. Missing coordinates will generally lead to multiple warnings but may not stop a simulation from running regardless.
 PDBANALYZE is true:
Since the point here is to analyze a given trajectory (whatever structural data it may encode), and not to start a simulation from a suitable input file, the setting for PDB_READMODE is ignored if PDBANALYZE is true, and all structural input is processed by trying to extract all Cartesian coordinates (option 2 for PDB_READMODE). A pdb trajectory file has to fulfill the requirement that it conforms to the MODEL / ENDMDL syntax (the actual numbering is ignored, however). Naturally, for CAMPARI to run and terminate properly, the number of snapshots in the file has to equal or exceed the request for NRSTEPS. Alternatively, CAMPARI offers the option to read in individual pdb files (one snapshot each) that employ a systematic numbering scheme (plain numbers, or numbers with leading zeros). In this scenario, the first of such files should be provided; CAMPARI will then try to extract the numbering scheme and open NRSTEPS1 consecutive snapshots. Note that in this mode the filename must not contain any additional numeric characters (i.e., foo_001.pdb is permissible while ala7_001.pdb is not). To choose between singlefile and multiplefile formats, keyword PDB_FORMAT is used.
Lastly it is important to mention that PDBFILE provides some functionality that is overlapping with that provided by PDB_TEMPLATE. Specifically, runs containing residues not natively supported by CAMPARI require the topology of those moieties to be inferred from file. If an analysis run operates on a single pdb file, a trajectory file in pdb format or a series of pdb files, or if a simulation run is supposed to start from a specific structure supplied via PDBFILE, then PDBFILE can (but need not) serve the function of topology inference as described for PDB_TEMPLATE.
XTCFILE
This is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (xtc format) to analyze. Like all other xtcrelated options, this is only available if the code was in fact compiled and linked with XDR support (→ installation instructions). See PDB_TEMPLATE for instructions how to convert binary trajectory files with nonCAMPARI atom order. If the analysis run is parallel (→ REMC), an example is given elsewhere.DCDFILE
Analogous to XTCFILE, this keyword is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (dcd format) to analyze. See PDB_TEMPLATE for instructions how to convert binary trajectory files with nonCAMPARI atom order.NETCDFFILE
Analogous to XTCFILE, this keyword is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (NetCDF format) to analyze. Like all other NetCDFrelated options, this is only available if the code was in fact compiled and linked with NetCDFsupport (→ installation instructions). See PDB_TEMPLATE for instructions how to convert binary trajectory files with nonCAMPARI atom order.FRAMESFILE
If PDBANALYZE is true, it is possible for CAMPARI to analyze a specific set of frames from the trajectory file (see PDB_FORMAT) rather than the entire trajectory. This simplifies subset analyses that may occur for example after clustering data. It is important to note that the settings for NRSTEPS and EQUIL and all related frequency settings for analysis routines (see corresponding section) still refer to the original, full trajectory file. CAMPARI will simply skip all the frames that are not part of the list. This implies that it is possible for a set of frames with 20 members to fail to produce any output for polymeric quantities even if POLCALC is set to 10, 5, or even 2 (simply on account of chance). It will therefore generally be desirable to set such frequency flags to 1 if frame lists are used (this is the only setting that guarantees that the number of analyzed snapshots will be exactly proportional to the size of the list).In addition, this functionality allows to alter the type of averaging that is normally assumed for CAMPARI analysis functions. By default, each data point (trajectory snapshot) contributes the same weight to computed averages or histograms (distribution functions). This implied that the input trajectory conforms (was sampled from) the distribution and ensemble of interest. If, however, the input trajectory does not correspond a welldefined ensemble (or to a different one), it is common and possible to apply snapshotreweighting techniques based on analyses of system energies or coupled parameters using weighted histogram methods. The result is a set of weights for each snapshot, which allows simulation averages and distribution functions to conform to that target distribution and ensemble. As an example, one may combine all data from a replicaexchange run (that no longer conform to a canonical ensemble at a given temperature), use a technique such as TWHAM to derive a set of snapshot weights for a target temperature that was not part of the replicaexchange set, and then use this input file containing the weights to compute proper simulation averages at the target temperature.
The input file for this functionality is very simple and explained elsewhere. There are three important points of caution. First, floatingpoint weights imply that floatingpoint precision errors may occur. The implied summation of weights of very different sizes may then become inaccurate. CAMPARI provides a warning if it expects such errors to be large (based purely on the weights themselves). Second, snapshot weights do not influence the values reported for instantaneous output such as POLYMER.dat or for analyses that do not imply averaging (such as structural clustering). Third, reweighting techniques have associated errors that are difficult to predict. Simultaneous assessment of statistical errors via block averaging or similar techniques is therefore strongly recommended.
PDB_FORMAT
This simple keyword lets the user select the file format for a trajectory analysis run: CAMPARI expects a single trajectory file in pdbformat using the MODEL /ENDMDL syntax to denote the individual snapshots.
 CAMPARI expects to find multiple pdb files with one snapshot each that are systematically numbered starting from the file provided via PDBFILE.
 CAMPARI expects to find a single trajectory in binary xtcformat (GROMACS style).
 CAMPARI expects to find a single trajectory in binary dcdformat (CHARMM/NAMD style).
 CAMPARI expects to find a single trajectory in binary NetCDFformat (AMBER style). (reference)
PDB_READMODE
This integer is only used if a pdbfile is sought to provide the starting structure for an actual simulation (→ PDBFILE). In this scenario, two options are available: CAMPARI attempts to read in the Cartesian coordinates of heavy atoms from the pdb file, proceeds to extract the values for CAMPARI's "native" degrees of freedom (i.e., those corresponding to the unconstrained ones in Monte Carlo or torsional molecular dynamics runs → CARTINT), and lastly rebuilds the entire structure using the determined values as well as internal geometry parameters for the constrained internal degrees of freedom (extracted from high resolution crystallographic databases). This hybrid approach will often lead to a propagation of error along the backbone of longer polymers and is therefore unsuitable for reading larger proteins or particularly for macromolecular complexes.
 CAMPARI attempts to read in the Cartesian coordinates of all atoms from the pdb file and uses those explicitly (i.e., it implicitly adopts the encoded geometry even for degrees of freedom that are normally constrained within CAMPARI). This will produce warnings if very unusual bond lengths or angles are encountered (see PDB_TOLERANCE_A and PDB_TOLERANCE_B) which are most often indicative of missing atoms in the pdbfile (in particular termini and hydrogens). Some of these problems will be dealt with automatically, but it is highly recommended to check the file {basename}_START.pdb and to make sure that no drastic deviations occur. If deviations of due to CAMPARI rebuilding atoms along the backbone do occur, it is recommended to increase the thresholds for PDB_TOLERANCE_B and PDB_TOLERANCE_A.
PDB_HMODE
If structural input from a pdb file is requested in mode 2 (see PDB_READMODE and PDBFILE) or if a trajectory analysis run) is being performed, this keyword offers two choices for dealing with hydrogen atoms (which may be missing from the input file and/or may be illdefined): CAMPARI will attempt to read in the Cartesian coordinates of all hydrogen atoms directly and only rebuild those hydrogen (and other) atoms which cause a geometry violation defined by keywords PDB_TOLERANCE_B and PDB_TOLERANCE_A.
 CAMPARI will rebuild all hydrogen atoms according to its underlying default models for local geometry in chemical building blocks. This is most useful if hydrogen atoms are missing entirely from the input file.
PDB_NUCMODE
For processing structural input, keyword PDB_NUCMODE explained below is ignored. It is listed here nonetheless to explain what CAMPARI actually does when reading in a pdb file supplied via PDBFILE or via PDB_TEMPLATE:If the input file is in CAMPARI convention, i.e., the O3* oxygen atom is part of the same residue as the phosphate it belongs to, readin is consistent with internal convention. If, however, the input file is in pdb convention (also used by almost all other simulation software), i.e., the O3* oxygen atom is always part of the same residue as the sugar it belongs to, a heuristic is used to avoid an incorrect assignment. This heuristic relies on the geometry of the input structure being sane as it checks the bond distance to the appropriate phosphorous atom.
As long as atom names can be parsed (see below), the user should therefore not have to worry about the convention used in pdb input files. This implies that it is possible to supply a binary trajectory file (for example via DCDFILE) written in the nonCAMPARI convention of assigning the O3*atom to the residue carrying the sugar it is attached to by the use of an appropriate template.
PDB_R_CONV
CAMPARI can in general process different conventions for the formatting of PDB files. A large fraction of simple atom naming convention multiplicities is handled automatically without the use of any keywords. PDB_R_CONV allows the user to select the format a readin pdbfile is assumed to be in to be able to deal with more severe discrepancies. Possible choices currently consist of: CAMPARI format (of course suitable for reading back in any CAMPARIgenerated output even if PDB_NUCMODE was used → see above).
 GROMOS format (nucleotide naming). This option offers very little unique functionality since most of the supported conversions are handled automatically regardless of the setting for this keyword. It is primarily used to handle the GROMOS residue names for nucleotides (ADE, DADE, and so on).
 CHARMM format (in particular atom naming, cap and nucleotide residue names and numbering (patches), ...). Note that there are two exceptions pertaining to Cterminal cap residues (NME and NH2) which arise due to nonunique naming in CHARMM: 1), NH2 atoms need to be called NT2 (instead of NT) and HT21, HT22 (instead of HT1, HT2), and 2), NME methyl hydrogens need to be called HAT1, HAT2, HAT3 (instead of HT1, HT2, HT3). For nucleotides, there is an additional exception to 5'residues carrying a 5'terminal phosphate (the hydrogen in the terminal POH unit needs to be called "5HO*" instead of " H5T"). This is again due to nonunique naming conventions within CHARMM.
 AMBER format (atom and residue naming in particular for nucleotides). Note that this option is the least tested one. Please let the developers know of any additional issues you may encounter.
PDB_TOLERANCE_A
This setting allows the user to override CAMPARI's builtin defaults for the tolerances it applies on a readin structure (usually xyz from pdb). Since it is not always easy to distinguish distorted structures from missed input, the code applies a tolerance when comparing readin bond angles to the internal reference value (which is derived from crystallographic databases). The default is an interval to either side of 20.0° and this setting can be expanded or contracted using this keyword. If a violation is found, the code usually overrides the faulty value with the default since it assumes that atomic positions were missing. This can sometimes lead to unwanted effects which can be avoided by setting this to a large number.PDB_TOLERANCE_B
This is analogous to PDB_TOLERANCE_A, but allows the user to change the interval for considering bond length exceptions. The difference here is that two numbers are required: a lower fractional (down to 0.0) and an upper fractional number (preferably larger than 1.0 of course). This is because bond lengths ranges are inherently not normalized and in addition nonlinear (exceptions with too long bond lengths are much more frequent). The default is an interval between 80% and 125% of the crystallographic reference value (settings 0.8 and 1.25).PDB_TEMPLATE
This keyword allows the user to provide name and location of a pdb file that serves in possibly several auxiliary functions. A template pdb file is relevant in the following circumstances: In a trajectory analysis run, it can serve as a map to correct a mismatch in atom ordering between a binary trajectory file (dcd, xtc, NetCDF) and CAMPARI's intrinsic convention. Typically, a pdb file provided by the program having generated the binary file will serve this purpose. In order for the map to work, it is crucial to ensure that every single atom to be readin has a proper match (by atom name) in the pdb file, i.e., it is not tolerable to provide a pdb template with missing atoms or with atom names that CAMPARI cannot parse. In general, CAMPARI's pdb parser is relatively flexible and allows additional control via PDB_R_CONV. It is typically not possible, however, to correct mismatches in the grouping of atoms into residues.
 The template pdb file can simultaneously serve as a reference structure if alignment is requested in trajectory analysis runs (→ ALIGNCALC).
 In all types of runs, the template pdb file can be used to infer the topology of
residues not natively supported by CAMPARI. This is crucial for handling these systems.
Importantly, using the template for this purpose, decouples the topology determination
from structural input for simulation runs, which allows
initial randomization of systems containing such unsupported residues.
The content of the template must match the sequence file, and there are some precise requirements
for both input files. They are listed in the corresponding documentation for both the
sequence file and the pdb file).
Assuming both files to be properly formatted, CAMPARI then does the following:
 From the sequence file, the number of unknown residues and their linkages are extracted.
 The template is read and the atomic indices delimiting all unknown residues are extracted. Basic parameters such as the effective residue radius and the reference atom are inferred. It is therefore important that the conformation of the residue in the pdb file is somewhat representative.
 The remainder of the system topology is constructed. All atomic positions are set to the corresponding values in the template. Internal order of atoms for unsupported residues always reflects the order in the input pdb file exactly.
 From the PDB atom names, the chemical element is guessed (C, O, N, H, P, S, halogens, various metals and metalloids) and the mass is set to that of an appropriate atom type in the parameter file (identification by attempts to match mass and valence). The assignment will be poor if the parameter file does not support the chemical element in question. Further details are found elsewhere. This can later be overridden by a biotype patch and/or a combination of other patches.
 A new biotype is created for every new atom type encountered. This biotype is initialized to be empty with the exception of keeping the atom name and the (already) assigned atom type. The numbering of these new biotypes continues from what the highest number in the parameter file is. It is therefore not possible to use the parameter file for these assigned biotypes directly. Instead, it is recommended to use a biotype patch or specialized patches. The assignment of an atom type is sufficient to provide basic support, so for certain applications no patches may be required.
 The covalent bond information is used to infer the molecular topology (including a detection of rings). This defines the Zmatrix entries (internal coordinate representation) for unsupported residues. Similarly, the linkage to covalently bound residues that are either supported or also unsupported is inferred. In the process, rotatable dihedral angles are detected automatically. This procedure, which explicitly tests for bond angle or length variations upon rotation, is critical to most subsequent assignments.
 Given a set of PDB names, atom types, valences, and a topology, CAMPARI attempts to conclude by analogy whether the residue conforms to the backbone of one of the supported polymer types (currently, polypeptides and polynucleotides). If it does, as many internal pointers as possible are set to identify the residue accordingly (this does not work for singleresidue molecules).
 If a residue is recognized as being part of a supported polymer type, the topology itself is corrected (the goal is that it should make no difference to CAMPARI whether a residue is supported or whether it is masked (by changing the name) as an unsupported one and all the information has to be inferred from the input structure). Further corrections pertain to the setup of interactions, etc. Note that the match cannot always be perfect, e.g., fudge factors that are not zero or unity in conjunction with MODE_14 being 2 and INTERMODEL being 1 may lead to energetic inconsistencies. The interaction setup relies on determining local rigidity via its knowledge of which dihedral angles are rotatable. Due to codespecific reasons (scanning for shortrange exceptions, exclusions, etc), it is highly recommended to parse the chain into residues such that any pair of atoms in residues i and i+2 is separated by a least four rotatable bonds.
 All flexible dihedral angles may be made part of basic sampling routines if the simulation is in internal coordinate space. These are the torsional dynamics sampler (→ TMD_UNKMODE for details) and the basic Monte Carlo moves for degrees of freedom of this type (→ OTHERUNKFREQ). Furthermore, access will be granted to the specialized samplers if the residue is detected as eligible. This, however, may sometimes lead to an altered interpretation of the absolute values of certain dihedral angles or even alter details of the sampler slightly, e.g., the pucker sampling of prolinelike, unsupported residues may end up perturbing different sets of auxiliary bond angles.
 If analyses are requested, these routines will respond to the unsupported residue according to the values set in the previous steps. Basically, the better the match to natively supported entities is, the more analysis functionalities will be available. Straightforward cases depend only on Cartesian coordinates (e.g., RHCALC or CONTACTCALC), whereas polymer typespecific analyses (e.g., DSSPCALC) require an unsupported residue to be recognized as the corresponding polymer type. Care must be taken in mixed polymers or other exotic cases, and it may occasionally be necessary to disable certain analysis routines.
PDB_MPIMANY
For certain types of parallel runs, this logical keyword (1 means "on") allows the user to provide different starting structures via pdbfiles for different replicas. The keyword is irrelevant in trajectory analysis mode. The four eligible classes of calculations are as follows: Any type of MPI averaging run
 Any type of simulation using input pdb files merely to extract values for torsional and rigidbody degrees of freedom (→ PDB_READMODE)
 Replica exchange calculations in Cartesian coordinate space
 Any type of (formal) replica exchange calculation without any swap attempts
If this option is active, CAMPARI expects to find systematically named pdb files with the base name given via keyword PDBFILE. The naming is analogous to the convention CAMPARI uses for outputs of parallel runs and also identical to what parallel trajectory analysis runs require. It is explained elsewhere. A list of keywords specific to running CAMPARI in parallel is found found below.
Energy Terms:
(back to top)
HSSCALE
This keyword controls a generic scaling factor for size parameters (Lennard Jones σ_{ii} and σ_{ij}) that were read in from the parameter file. This fundamentally alters the excluded volume properties of the system. Motivation for using this keyword (which naturally defaults to 1.0) may arise during parameter development or in specialized calculations.SC_IPP
This keyword allows the user to specify the linear scaling factor controlling the strength of the inverse power potential (IPP) defined as:E_{IPP} = c_{IPP}·4.0ΣΣ_{i,j}ε_{ij}f_{14,ij}·(σ_{ij}/r_{ij})^{t}
Here, the σ_{ij} and ε_{ij} are the size and interaction parameters for atom pair i,j, f_{14,ij} are potential 14 fudge factors (see FUDGE_ST_14) that generally will be unity, r_{ij} is the interatomic distance, t is the exponent, and the (double) sum runs over all interacting pairs of atoms. Lastly, c_{IPP} is the linear scaling factor controlled by this keyword which  unlike most other scaling factors for energy terms  defaults to 1.0. In most applications, the inverse power potential will be the repulsive arm of the LennardJones potential (t = 12 → 12^{th} power, see IPPEXP). The interpretation and application of the provided parameters (see documentation and keyword PARAMETERS) can be controlled through keywords SIGRULE, EPSRULE, INTERMODEL, FUDGE_ST_14, and MODE_14. Note that the use of the WeeksChandlerAndersen (WCA) potential (→ SC_WCA) is mutually exclusive with inverse power potentials.
IPPEXP
This keyword allows the user to adjust the exponent (an even integer that defaults to 12) for the inverse power potential. An important restriction is that many of the optimized loops in dynamics calculations do not support any other choice except 12. Note that very large numbers will of course  possibly in compilerdependent fashion  slow down code execution due to the increasing complexity of expensive operations in innermost loops. By (formally) setting this to a value greater than 100, CAMPARI is instructed to replace the IPP potential with a hardsphere (HS) potential, which is only available in pure Monte Carlo runs. In this case the scaling factor is ignored, the "infinity"value (penalty for nuclear fusion) is determined by the setting for BARRIER, and the use of a size reduction factor (HSSCALE) is strongly recommended. In hardsphere potentials, any energy readout for the IPP term should now be in multiples of BARRIER, and all persisting nonzero values would indicate a frustrated (nonrelaxable) system. The actual value specified for IPPEXP is then irrelevant.SIGRULE
This keyword defines the combination rule for combining the size parameters of LennardJones (and WCA) potentials, i.e., how to construct σ_{ij} from σ_{ii} and σ_{jj} if σ_{ij} is not provided as a specific override in the parameter file (for details see PARAMETERS).The choices are:
1) σ_{ij} = 0.5·(σ_{ii} + σ_{jj}) (arithmetic mean)
2) σ_{ij} = (σ_{ii} · σ_{jj})^{0.5} (geometric mean)
3) σ_{ij} = 2.0·(σ_{ii}^{1} + σ_{jj}^{1})^{1} (harmonic mean)
EPSRULE
Analogous to SIGRULE, this keyword defines the combination rule for interaction parameters of LennardJones potentials. The same options are available and the same caveats apply with respect to overrides in the parameter file.SC_ATTLJ
This keyword allows the user to specify the linear scaling factor controlling the strength of the dispersive (van der Waals) interactions defined as:E_{ATTLJ} = c_{ATTLJ}·4.0ΣΣ_{i,j}ε_{ij}f_{14,ij}·(σ_{ij}/r_{ij})^{6}
Here, the σ_{ij} and ε_{ij} are the size and interaction parameters for atom pair i,j, f_{14,ij} are potential 14 fudge factors (see FUDGE_ST_14) that generally will be unity, r_{ij} is the interatomic distance, and the (double) sum runs over all interacting pairs of atoms. Together with an inverse power potential with scaling factor 1.0 and exponent 12 (see SC_IPP), the canonical LennardJones potential is constructed if the scaling factor, c_{ATTLJ}, is set to unity.
INTERMODEL
This very important keyword controls the exclusion rules for shortrange interactions of the excluded volume and dispersion types (see SC_IPP, SC_ATTLJ, and SC_WCA). For Monte Carlo or torsional dynamics simulations assuming rigid geometries, the computation of spurious (constant) LJ interactions is inefficient. Conversely, in Cartesian sampling, bonded interactions are almost always parametrized with all 14, and certainly with all 15interactions in place. The latter refer to intramolecular atom pairs separate by either exactly three (14) or four (15) bonds. The ABSINTH implicit solvation model, which is one of the core features of CAMPARI, was parametrized with a reduced interaction model. Hence, this keyword allows two choices: Consider only interactions which are not rigorously or effectively frozen when using internal coordinate space sampling. This setting for example excludes all interactions within aromatic rings. As for determining 14interactions, the rules outlined under MODE_14 apply.
 Consider all interactions separated by at least three bonds to be valid. This is the default setting for molecular mechanics force fields. Note, however, that many of these interactions are quasirigid and that their computation is somewhat nonsensical even in a full Cartesian description. Also note that due to the inherent assumption that every bond is rotatable the setting for MODE_14 does not matter if INTERMODEL is set to 2. All atoms separated by exactly three bonds will be considered 14. It is important to point out that the setting chosen for INTERMODEL affects the setting for ELECMODEL as well (see ELECMODEL).
 The GROMOS force field uses a very specific set of nonbonded exclusions which is supported by choosing this option for INTERMODEL. It is essentially a weakened version of the first (sane) option. Note that to reproduce the GROMOS force field exactly, ELECMODEL (which remains an independent setting) has to be set to 2 and INTERMODEL to 3.
LJPATCHFILE
This keyword can be used to provide the location and name of an input file that allows reassigning the size exclusion and dispersion parameters used in describing generic shortrange potentials of the LennardJones (see SC_ATTLJ and SC_IPP) or WCA types. The parameter file that CAMPARI parses will contain atom entries that specify general atom types. These types have associated with them entries of the contact and epsilon types specifying the LennardJones σ_{ij} and ε_{ij} parameters (see equations provided with scale factor keywords). Within the list of biotypes, each biotype is assigned an atom type, and the patch functionality described here allows the user to change this to a different atom type for a specific instance of a biotype. Note that the reassignment is restricted to the LennardJones parameters, but excludes other atomic parameters specified by atom types such as mass, proton number, description, or valence. Conversely, parameters derived from LennardJones parameters are altered. This is particularly important for the derived atomic radii and volumes used in the continuum solvation model and analysis. If those parameters are meant to be left unchanged or set to yet another set of values, either the radius facility of the parameter file must be employed (if it is not already in use for the original atom type in question), or a patch of atomic radii must be applied in addition. Because size exclusion and dispersion parameters rely on combination rules and/or many overrides for special cases, it can be tedious to patch them. This is because a patch will often require the user to define a new atom type, which, for example, for the GROMOS force fields can be a lot of work. Some more details are given elsewhere.SC_EXTRA
This (somewhat obsolete) keyword specifies a linear scaling factor for certain structural correction potentials. Assuming the typical set of torsional space constraints (see CARTINT), these are applied to rotatable bonds with electronic effects which cannot be captured by atomic pairwise contributions. These consist of: Secondary amides: The rotation around the CN bond is hindered due to the partial doublebond character present in amides. Corrections are therefore applied to residues which have an ωangle (all nonNterminal peptide residues excluding NH2 as well as the secondary amides NMF and NMA → sequence input). These keep the peptide bond roughly planar while allowing for cis/transisomerization and increased overall flexibility. The potentials are directly ported from OPLSAA.
 Phenolic polar hydrogens: The rotation around the CO bond in phenols is hindered due to its partial double bond character and inplane arrangements of the attached hydrogen are favored. Corrections are applied to tyrosine (TYR) and pcresol (PCR). These keep the polar hydrogen in their favored position. The potential is not overly stiff so that outofplane arrangements will be populated as well. The parameters are again ported directly from OPLSAA.
SC_BONDED_B
This keyword gives the linear scaling factor for all bond length potentials. Their usage is permissible in all simulations but not meaningful unless bond lengths are actually allowed to vary, i.e., typically unless sampling happens fully in Cartesian degrees of freedom (see CARTINT). It is important to remember, however, that even in rigidbody / torsional space simulations, specific move types and systems will require setting this to unity (so we recommend it throughout). For bond length potentials, the only such exceptions are crosslinked molecules (see CRLK_MODE). Note that the parameter file has to provide support to be able to use this energy term (see PARAMETERS for details), and that simulations relying on those terms will otherwise fail, crash, or produce nonsensical results. Use GUESS_BONDED to circumvent those issues for incomplete parameter files.SC_BONDED_A
Similar to SC_BONDED_B for all bond angle potentials. Bond angle potentials (see PARAMETERS for details) matter for sampling in Cartesian space (see CARTINT), for crosslinked molecules (see CRLK_MODE), and for the sampling of fivemembered, flexible rings (see PKRFREQ and SUGARFREQ). The coordinate derivatives for bond angles diverge at the extreme values of both 0° and 180°. This means that care must be taken in setting up the Zmatrix such that no terms are included, which would explicitly demand these values. In other software, this is sometimes overcome by the use of dummy atoms. In CAMPARI, this is unlikely to be problematic in Monte Carlo simulations. In dynamics, forces are buffered to avoid program crashes due to floating point errors, but the actual values are no longer meaningful. This issue is primarily relevant when modifying the code or when simulating unsupported residues, for which the Zmatrix is inferred from input (see elsewhere for details).SC_BONDED_I
Similar to SC_BONDED_A for all improper dihedral angle potentials.SC_BONDED_T
Similar to SC_BONDED_B for all torsional potentials. Note that these do in fact encompass degrees of freedom sampled in all types of simulations supported within CAMPARI and hence are always relevant. As alluded to above, torsional potentials can be easily set up to cover the same correction terms as the ones applied within SC_EXTRA. If that is the case, we therefore recommend not using SC_EXTRA (otherwise energy terms will in fact be applied twice, which is effectively scaling up those torsions; in such a case, CAMPARI produces an appropriate warning as well).SC_BONDED_M
Similar to SC_BONDED_B for all CMAP potentials. These gridbased correction potentials are part of the CHARMM force field and explained in PARAMETERS. This keyword specifies the "outside" scaling factor. Note that CMAP corrections can theoretically be relevant for all possible simulations of biopolymers within CAMPARI since they act on consecutive dihedral angles. The default CMAP corrections from CHARMM only apply to polypeptides, however, and are only contained within the reference CHARMM parameter file.IMPROPER_CONV
If improper dihedral potentials are in use (→ SC_BONDED_I), this very specialized keyword can be used to force a reinterpretation of the input sequence for the assignment of improper dihedral angle potentials to bonded types (see elsewhere). When set to 2, this keyword forces CAMPARI to switch the meaning of the first and third specified bonded type when it comes to energy and force evaluations. This allows a more or less exact match to the convention used in the AMBER set of force fields (and by extension: in OPLSAA). For any other value specified, CAMPARI will use the CAMPARInative convention (that is the same as in the CHARMM and GROMOS force fields).CMAPORDER
If CMAP corrections are used (→ SC_BONDED_M), this keyword sets the interpolation order for cardinal splines (assuming those are chosen through parameter input → PARAMETERS). A higher spline order will yield a smoother surface. Since the splines are noninterpolating, however, rapidly varying or coarsely tabulated functions may not be well approximated in such cases. The only interpolating cardinal B spline is the linear one which requires a choice of 2 for this keyword. This keyword is irrelevant should bicubic splines be chosen.CMAPDIR
If CMAP corrections are used (→ SC_BONDED_M), this keyword lets the user specify the absolute path of the directory in which the CMAP files are to be found (by default they are in the data/subdirectory of the main distribution tree).BPATCHFILE
This keyword can be used to provide the location and name of an input file that allows reassigning or adding bonded potential terms (see bond length potentials, bond angle potentials, improper dihedral angle potentials, torsional potentials, and CMAP potentials). At the level of the parameter file that CAMPARI parses to generate default assignments based on biotypes (see elsewhere), there are limitations to how finely the system can be parsed. For instance, it is technically not possible to have different bond length potentials acting on the N→C_{α} bonds of two nonterminal glycine residues (because biotypes are identical). Of course, even providing bonded parameter assignments exactly at biotype resolution would generally be inordinately complicated, which is the reason for grouping biotypes into socalled bonded types in the parameter file. In cases where specific alterations to a given a system are desired, the patch functionality provided by this input file will generally be the most convenient (and often the only) route to take. For bond length and angle terms, CAMPARI can also guess values based on initial geometries. Applied patches to bonded interactions are always printed to logoutput In order to diagnose their correctness more easily, it is recommended to use the report functionality for bonded potential terms. Note that the most critical limitation is that extra or alternative bonded potentials can only be applied to such internal coordinates that are eligible for default assignments themselves, e.g., it is not possible to apply a bond angle potential to atoms abc if a is not covalently bound to b or if b is not covalently bound to c.GUESS_BONDED
This keyword is a simple logical whether to construct a set of bonded parameters from the default molecular geometries (highresolution crystal structures). Force constants are flat and madeup and the potentials are always harmonic. This is useful if a parameter file is meant to work with Cartesian dynamics but lacks the necessary support to carry out initial tests.BONDREPORT
This report flag allows the user to request a summary of the bonded potentials found and not found during processing of the parameter file. This is primarily useful as a sanity and debugging tool for creating parameter files. Note that missing but necessary parameters (necessary ones are all bond length and angle potentials if and only if CARTINT is 2) as well as guessed parameters (see GUESS_BONDED) are always reported upon.SC_WCA
Mutually exclusive to the use of the LennardJones potential, CAMPARI allows using the extended WeeksChandlerAndersen (WCA) potential which is defined as :E_{WCA} = 4.0·c_{WCA}ΣΣ_{i,j}ε_{ij}f_{14,ij}·[(σ_{ij}/r_{ij})^{12}  (σ_{ij}/r_{ij})^{6} + 0.25·(1.0  c_{2})] if r_{ij} < σ_{ij}·2^{1/6}
E_{WCA} = c_{2}·c_{WCA}ΣΣ_{i,j}ε_{ij}f_{14,ij}·[0.5·cos(c_{3}·(r_{ij}/σ_{ij})^{2} + c_{4})  0.5] if r_{ij} < σ_{ij}·c_{1}
E_{WCA} = 0.0 else
with:
c_{3} = π·(c_{1}^{2}  2^{1/3})^{1}
c_{4} = π  c_{3}·2^{1/3}
(reference)
Here, the size, interaction, and fudge parameters are used as defined before. c_{1} is the interaction cutoff (in units of σ_{ij}) while c_{2} is the depth of the attractive well to be spliced in (in units of ε_{ij}). c_{1} and c_{2} can be set by keywords ATT_WCA and CUT_WCA, respectively. The potential provides a continuous function mimicking a LJ potential in which the dispersive term can be spliced in without shifting the position of the minimum. c_{WCA} denotes the linear scaling factor specified by this keyword.
ATT_WCA
This allows the user to specify the well depth (positive number) for the attractive part of the WCA potential in units of ε_{ij} (parameter c_{2} under SC_WCA).CUT_WCA
This allows the user to specify the cutoff value for the extended WCA potential in units of σ_{ij} (parameter c_{1} under SC_WCA). Note that the minimum allowed choice here is 1.5.VDWREPORT
This keyword is a simple logical deciding whether or not to print out a summary of the LennardJones (size exclusion and dispersion) parameters, i.e., to report the base values (σ_{ii} and ε_{ii}), the combination rules, and in particular all "special" values which overwrite the default combination rulederived result.INTERREPORT
Mostly for debugging purposes, this simple logical allows the user to demand a summary of shortrange interactions. Naturally, this output can be very large and the keyword should only be used when absolutely needed, for example to understand the settings for MODE_14 and FUDGE_ST_14.SC_POLAR
CAMPARI only supports fixedcharge atombased electrostatic interactions which work by defining partial charges for each atom and then writing the potential as:E_{POLAR} = c_{POLAR}·ΣΣ_{i,j} [ (f_{14,C,ij}·q_{i}q_{j}) / (4.0πε_{0}·r_{ij})]
Here, the q_{i} are the atomic partial charges, f_{14,C,ij} are potential 14 fudge factors (see FUDGE_EL_14) that generally will be unity, ε_{0} is the vacuum permittivity, r_{ij} is the interatomic distance, and the (double) sum runs over all eligible atom pairs (see ELECMODEL). c_{POLAR} is the linear scaling factor for all polar interactions specified by this keyword. Since electrostatic interactions are characterized by the potential to yield longrange effects (distance scaling ranges from r^{1} for monopolemonopole terms to r^{6} for dipoledipole terms between molecules tumbling freely), the Coulombic term can employ a different cutoff in MC calculations (see below) than shortrange potential. The correct longrange treatment of electrostatic interactions is one of the most investigated areas in molecular simulations and the user is referred to current literature and keywords LREL_MC and LREL_MD for details. All required partial charges are read either through the parameter file or can be set by a dedicated patch.
Note that the functional form given above is only correct if no implicit solvation model is in use. In such a scenario, Coulomb interactions are usually modified by an extra term s_{ij} which can be complicated function of interatomic distance and/or the positions of all nearby atoms. The reader is referred to Vitalis and Pappu for the exact functional forms used in the ABSINTH implicit solvation model.
ELECMODEL
This important keyword is somewhat analogous to INTERMODEL and allows the user to set the interaction model for electrostatic interactions: Depending on the setting for INTERMODEL, interactions are either screened for connectivity and frozen interactions are excluded (INTERMODEL is 1), or are purely considered based on the number of bonds separating two atoms (INTERMODEL is 2). In any case, partial charges interact without considerations of net neutrality (see below), which is problematic for shortrange interactions. Consider for example the ωbond in polypeptides and assume that CO and NH both form neutral groups supposed to indicate dipole moments. If INTERMODEL is 2 and ELECMODEL is 1, the interaction between O and H is considered (14) but none of the others as they are topologically too close. This leads to spurious (and very strong) Coulomb interactions between what essentially are fractional, net charges. This is an inherent weakness of the point charge model which is typically addressed by extensive coparameterization of bonded parameters, 14fudge factors, etc. (see FUDGE_EL_14).
 The partial charge set in the parameter file is read and the assigned charges are screened for (generally) net neutral charge groups. These charge groups are determined largely automatically and are currently not patchable per se. The automatic charge group generation operates by trying to group partial charges into groups of minimum size and spanning a minimum number of covalent bonds satisfying a target net charge. The default target net charges are derived from knowledge of every CAMPARIsupported residue and assumptions about their titration states (if any). This means, for example, that a nonterminal lysine residue will be processed by first looking for a charge group with a net charge of +1 before trying to identify as many net neutral groups as possible. While CAMPARI does not allow grouping charges arbitrarily, there is a dedicated patch which allows defining a series of (arbitrary) target values for the net charges of charge groups in a given residue. This is required to deal with charge sets that do not group at all, or to deal with residues that contain multiple ionic moieties. For example, depending on the charge set in use, one may want to partition free, zwitterionic alanine either as multiple groups with +1, 1, and 0 charges, or simply as one or more net neutral groups. For multiple targets, failure of the grouping algorithm at a given stage will lead to this stage being skipped. Conversely, failure at the last stage will result in all remaining atoms in the residue being members of a single group. Groups that are not welldefined charge groups according to CAMPARI's standards will be reported on in log output. With the groups in place, only interactions between those groups, for which all possible atomatom pairs are separated by at least one significant degree of freedom, are computed. Interactions within a group are always excluded. What constitutes a significant degree of freedom is predetermined by the choice for INTERMODEL, and the reader is encouraged to read up on this if necessary. Essentially, INTERMODEL will define the maximum set of shortrange interaction pairs that can also be considered for polar interactions. As an example, for the 6 net neutral CH units in benzene, if INTERMODEL is 1, no intramolecular polar interactions can be considered (the maximum set is empty). Conversely, if INTERMODEL is 2, several groupgroup interactions are now permissible (C1HC4H, C2HC5H, C3H C6H). Depending on the charge set and on the choice for INTERMODEL, setting ELECMODEL to 2 can lead to a massive depletion of shortrange electrostatics. This paradigm clashes heavily with traditional force field development, but  in the authors' opinion  is the only sane treatment of dipoledipole interactions if the latter are represented by point charges.
 The charge groups are important for deciding how longrange electrostatic interactions between ionic groups are computed exactly (see options 1, 2, and 3 for LREL_MC and options 4 and 5 for LREL_MD).
 The charge groups are used as the basis for computing groupaveraged screening factors for certain screening models in the ABSINTH framework (see options 1, 3, 5, and 7 for SCRMODEL).
AMIDEPOL
One "flaw" in the biotype setup in CAMPARI (see PARAMETERS) is the fact that the two polar hydrogens on primary amides are treated as chemically equivalent which  on a typical simulation timescale  they are not. Instead of creating yet more biotypes, this keyword simply allows to add a small polarization term for partial charges on those hydrogens. The value specified will be added to the hydrogen cis to the oxygen (the electronegative atom nearby increase the partial positive charge) and subtracted from the transH to keep them both at the same total charge. For example, if both hydrogens have a charge of +0.36, a specification of 0.05 here will yield charges of 0.41 (cisO) and 0.31 (transO). It will be useful to track these changes using ELECREPORT. It is very important to note, however, that fundamentally a sampling algorithm may isomerize the amide bond and hence render the correction incorrect and  moreover  that reading in a structure may flip the two hydrogens to start out with (because of inconsistent numbering between two software packages). Hence, this keyword should be used only when absolutely necessary (and its sign may have to be flipped to achieve the desired effect).This correction to primary amides is a specific example for the occasional need to overwrite partial charge parameters for atoms due to "biotype splitting". The more general approach provided CAMPARI for this explicit purpose is to "patch" the partial charge set by a dedicated input file.
CPATCHFILE
If the polar potential is in use, this keyword can be used to provide the location and name of an input file that allows overriding some or all of the partial charge parameters CAMPARI obtains from the parameter file (see elsewhere). This can be required to match the exact standard given by a force field with a finer biotype parsing. Note that  by default  such corrections are errorprone and should only be used when absolutely needed. In any case, the user is recommended to use ELECREPORT for a detailed summary of final partial charges in the system.DIPREPORT
This simple logical will  when turned on by the user  produce two summary files (see DIPOLE_GROUPS.vmd and MONOPOLES.vmd), which allow to graphically assess the automatically determined charge groups. The former will visualize all charge groups in the system (not just the net neutral ones) by highlighting all atoms belonging to each group. The second will visualize the "center" atom of all groups carrying a net charge (the meaning of this is defined by the value for POLTOL). Note that  naturally  this option is not available if SC_POLAR is zero.NCPATCHFILE
If the polar potential is in use, CAMPARI automatically determines charge groups, i.e., groups of atoms within a residue that are topologically close and whose partial charges sum up to zero or to an integer net charge. If LREL_MD is 4 or 5 and/or LREL_MC is 1, 2, or 3, this information is used to flag residues as carrying ionic groups, which leads to the computation of additional interactions even if residues are not in each others' neighbor lists. A residue is flagged if it contains at least one charge group with a total, absolute charge greater than a tolerance that is zero by default (and there should be very good reasons for increasing this tolerance).This keyword allows the user to specify location and name of an optional input file that can perform two important tasks:
 It allows removal of the net charge flag for specific residues, thereby altering the overall interaction model (if the corresponding options for LREL_MD and/or LREL_MC are selected).
 It allows the manual specification of sequential target values for the total charges of charge groups to be identified. This is currently the only way to manually alter the charge group partitioning, and can be crucial when simulating unsupported residues and/or when dealing with charge sets that do not group naturally (such as those within the AMBER family of force fields).
POLTOL
If the polar potential is in use, CAMPARI automatically determines charge groups, i.e., groups of atoms within a residue that are topologically close and whose partial charges sum up to zero or to an integer net charge. As described above, these net charge values can be patched. This may, for example, be used to obtain a grouping into approximately neutral groups for partial charge sets that include complex polarization patterns. In order to avoid having the resultant groups cause CAMPARI to flag the corresponding residue as carrying a net charge (i.e., they are treated like ions), this keyword allows the user to defined an increased tolerance for what is considered "approximately neutral". This is relevant because treatment of residues as ions can have substantial implications for the interaction model in particular in terms of computational efficiency (see LREL_MC and LREL_MD). Note that this keyword operates at the charge group level, whereas patches via NCPATCHFILE can (also) disable the ionic flag status of residues. Therefore, both offer different levels of control. The numerical value specified here (in units of e) is compared to the total charge of a given charge group.As an example, consider a terminal nucleotide residue carrying a 5'phosphate with an integer negative charge. Suppose that the partial charges on the phosphate linker to the next residue are such that  in addition to the terminal phosphate  this leaves a charge group with a small, fractional charge. In this case, the residuelevel patch could only remove the net charge flag for the entire residue (probably undesirable), whereas the tolerance setting described here could specifically eliminate the group with the fractional charge from the list of ionic groups. The default tolerance is set to be zero within reasonable numerical precision. Note that this keyword has no impact on the charge group partitioning itself and is relevant only if LREL_MD is 4 or 5 and/or LREL_MC is 1, 2, or 3.
FUDGE_ST_14
This keyword provides a flat 14 scaling factor for interatomic, nonbonded interactions of specific types. 14 interactions are defined according to the choice for MODE_14 and depend on the setting for INTERMODEL as well. The value for FUDGE_ST_14 is applied to all steric and dispersion potentials, i.e., the potentials turned on by SC_IPP, SC_ATTLJ, and SC_WCA. The only other 14scaled interaction potential is the electrostatic one for which a separate 14scaling factor is in use (see FUDGE_EL_14). All other pairwise, nonbonded potentials are never subjected to 14corrections (see for example SC_TABUL or SC_DREST). Note that the value for FUDGE_ST_14 is applied in addition to corrections applied at the parameter level by providing 14specific σ and εparameters in the parameter file (see PARAMETERS).FUDGE_EL_14
Similar to FUDGE_ST_14, this keyword specifies a scale factor for 14interactions. Here, the provided value will be applied specifically to electrostatic interactions (see SC_POLAR) only. If ELECMODEL is set to 2, any charge group interaction will be scaled as a whole by this factor, as soon as any of the possible atom pairs fulfill the 14criterion (see MODE_14).MODE_14
This keyword's relevance is limited to the case in which INTERMODEL is 1. Then, this essentially defines what a 14interaction is, specifically whether anything separated by exactly three bonds or by exactly one relevant rotatable bond should be considered 14: Only two interacting atoms separated by exactly three bonds are treated as 14.
 Two interacting atoms separated by exactly one relevant, freely rotatable bond are always treated as 14.
Take a phenylalanine residue and consider the CACBCGCD1 stretch (from C_{α} to one of the C_{δ}). This is exactly three bonds and the bond CBCG is the only relevant rotatable one (CACB is also rotatable but irrelevant, since CA lies on the axis, while CGCD1 is not rotatable). CA and CD1 are treated as 14 in both modes. Now consider the CACBCGCD1CE1 stretch. These are four bonds and CA and CE1 are not considered 14 in mode 1. However, there is still only one relevant rotatable bond in between (CBCG, since CGCD1 is rigid), so CA and CE1 are in fact treated as 14 in mode 2.
Note that CAMPARI allows specific modifications of 14interactions, either through the use of fudge factors (see FUDGE_ST_14 and FUDGE_EL_14) or through specific parameters provided in the parameter file. If neither of those indicates a deviation from normal interaction rules, then this keyword becomes irrelevant as well.
ELECREPORT
If the polar potential is in use, this simple logical allows the user to request a summary for the closerange electrostatic interactions in the system. Similarly to INTERREPORT, this keyword mostly serves debugging purposes and should only be needed/required to understand the details of the shortrange interaction setup.SC_IMPSOLV
This keyword serves two functions. First, as a logical it enables the ABSINTH implicit solvent model, i.e., it will compute the direct meanfield interaction (DMFI) of each solute with the continuum and enable screening of polar interactions (if turned on → SC_POLAR). For the former (the DMFI) it simultaneously serves as the linear scaling factor. Note that the amount of screening of polar interactions is not dependent on this keyword and solely determined by other parameters (in particular IMPDIEL). The DMFI is defined as:E_{DMFI} = c_{DMFI}·Σ_{k} ΔG_{FES,k} [Σ_{i} ζ_{i}^{k}·υ_{i}^{k}]
Here, υ_{i}^{k} is the solvation state of the i^{th} atom in the k^{th} solvation group and ζ_{i}^{k} is its weight factor. The solvation states are computed by CAMPARI and vary throughout the simulation whereas the weight factors are constant. The reference free energies of solvation for each solvation group (ΔG_{FES,k}) are provided through the parameter file and are constant as well (for the latter see PARAMETERS). Note that the computation of the DMFI given the υ_{i}^{k} is a computation of negligible cost and that CAMPARI obtains the υ_{i}^{k} while computing shortrange nonbonded interactions at a moderate additional cost. This implies that the ABSINTH implicit solvation model is speedlimited almost exclusively by the complications incurred by the screening of polar interactions. The user is referred to Vitalis and Pappu for further details (reference).
To employ the ABSINTH implicit solvent model as published use:
FMCSC_FOSMID 0.1
FMCSC_FOSTAU 0.25
FMCSC_SCRMID 0.9
FMCSC_SCRTAU 0.5
FMCSC_SAVPROBE 2.5
FMCSC_IMPDIEL 78.2
FMCSC_SCRMODEL 2 # (or 1)
Note that as of 2013 the more rigorous screening model (option 1) appears in published literature only for the work on argininerich peptides (Mao et al.). Finally, note that the DMFI can be made temperaturedependent by additions to the parameter file and use of keyword FOSMODE.
SAVPATCHFILE
This keyword can be used to provide the location and name of an input file that allows overriding the default, topologyderived values for the maximum fractions of the solventaccessible volume, η_{i,max}. Because values depend on hardcoded parameters (geometry) and userlevel settings (choice of parameters and keyword FMCSC_SAVPROBE), CAMPARI (re)computes these values at the beginning of each run. This utilizes the default local geometries (not input structures) and works by decomposing the molecule into suitably small model compound units. The patch prints a summary of all successful changes, and results can also be assessed via column 4 in output file SAV_BY_ATOM.dat. Note that these values rely on other patchable quantities, most notably atomic radii. Patches follow a hierarchy, and a patched value for the η_{i,max} overrides values derived from radii that could be patched themselves (here, RPATCHFILE overrides indirect reassignment via LJPATCHFILE) without touching the atomic radii. This means that it possible for the patched values of η_{i,max} to be grossly inconsistent with the underlying set of radii.ASRPATCHFILE
This keyword can be used to provide the location and name of an input file that allows overriding the default, topologyderived values for the pairwise reduction factor for atomic volumes used in most computations using the atomic volume, most prominently the ABSINTH implicit solvation model. Reduction factors are needed, because the exclusion volumes of covalently bound atoms overlap. The reduction factors are computed in linear approximation, and  by default  the overlap volume is subtracted evenly from the remaining atomic volume of each partner. These values depend on various parameters (parameters and hardcoded geometry), and CAMPARI (re)computes them at the beginning of each run. The patch prints a summary of all successful changes, and results can also be assessed via column 7 in output file SAV_BY_ATOM.dat. See SAVPATCHFILE for remarks on the hierarchy of patches of atomic parameters.FOSPATCHFILE
Since there is no external way to control details of the solvation group assignments relevant to the computation of the DMFI (→ SC_IMPSOLV) through the parameter file, CAMPARI offers users to alter the default group partitioning and to control reference free energies of solvation on a permoiety basis through a dedicated input file. This also supports alterations to transfer enthalpies and heat capacities at the patch level if a temperaturedependent DMFI is in use. This keyword is used to provide the location and name of this input file. There are some underlying restrictions to the freedom of choices, but in principle it is possible to completely redesign the underlying DMFI model using this facility. Restrictions and formatting are explained elsewhere. The applied patch implies that CAMPARI will keep the builtin default partitioning along with the default reference values from the parameter file (see elsewhere) for unpatched residues and molecules. As with other force field patches, these corrections are errorprone and CAMPARI output should always be doublechecked against the intended input. For this purpose, keyword FOSREPORT and associated output file FOS_GROUPS.vmd will be of particular use.FOSMODE
Simulation temperature is used frequently in biomolecular sampling both to explicitly probe temperaturedependent behavior and to enhance sampling. For the former, the correctness of fixed force field parameters becomes questionable. If the DMFI of the ABSINTH implicit solvation model is in use, this keyword allows the user to make some of the parameters of the model temperaturedependent themselves. There are currently two options: All values for ΔG_{FES} in the equation above are fixed to the reference values specified in the parameter file independent of temperature or any other environmental parameters. This is the default.
 CAMPARI tries to extract values for temperatureindependent enthalpies and heat capacities of the transfer
process of a given model compound from a fixed conformation in the gas phase into water from the
parameter file. By default, all CAMPARI
parameter files do not contain these parameters. The temperaturedependent values are computed as:
ΔG_{FES}(T) = (ΔG_{FES}(T_{0})  ΔH_{FES})T/T_{0} + ΔH_{FES} + ΔC_{p,FES}[T[1  ln(T/T_{0})]  T_{0}]
Here, ΔH_{FES} and ΔC_{p,FES} are the aforementioned enthalpies and heat capacities of transfer, whereas T denotes the simulation temperature and T_{0} denotes the reference temperature for the listed free energy value. T_{0} is set by keyword FOSREFT.
FOSREFT
If the DMFI of the ABSINTH implicit solvation model is in use, and if a temperaturedependent model has been requested, this keyword sets the assumed reference temperature for transfer free energies of solvation listed in the corresponding section of the parameter file. It defaults to 298K.FOSREPORT
This simple logical allows the user to request CAMPARI to print a summary of the groupbased reference free energies, enthalpies, and heat capacities of solvation read from the parameter file. The latter two terms are only relevant if a temperaturedependent model has been selected. In general, the reference free energies will correspond exactly to the terms ΔG_{FES,k} above. Note, however, that this initial output is not a summary of the system but rather of the parameters, i.e., it is more like VDWREPORT and unlike ELECREPORT or INTERREPORT. If some solvation group assignments and parameters are changed via a corresponding patch file, this keyword will also ensure that the applied patch is documented in detail in CAMPARI's log output. The actual group partitioning for the system at hand (but not the associated numerical parameters) is available from output file FOS_GROUPS.vmd.SAVPROBE
This keyword is crucial for the ABSINTH implicit solvent model and specifies the size of the solvation shell around individual atoms. The input value is interpreted to be the radius in Å of a solvent sphere rolled around each atom and consequently twice the value of SAVPROBE will yield the thickness of the assumed first solvation layer. The resultant first solvation shell volume is the starting point for determining solventaccessible volume fractions (η_{i}) which are then mapped to yield atomic solvation states (υ_{i}) which are relevant for the DMFI and screened electrostatic interactions (→ SCRMODEL). It is important to note that SAVPROBE is the only keyword directly controlling the η_{i} which are otherwise purely functions of atomic parameters (see PARAMETERS). Lastly, note that this keyword is still relevant for SAV analysis even though the implicit solvent model might not be used (→ SAVCALC).FOSTAU
The atomic solventaccessible volumes, η_{i}, are mapped to solvation states by two different sets of parameters, the first being responsible for obtaining υ_{i,f} which are the solvation states describing the change in DMFI with changes in conformation (the second set is responsible for obtaining υ_{i,s} which describe the change in dielectric response with change in conformation). The details of the mapping function are complicated by the requirement to normalize the υ_{i,f} to the welldefined interval [0:1] but in essence it holds:υ_{i,f} ~ [ 1.0 + exp[  (η_{i}h(χ_{f}))/τ_{f} ] ^{1}
Here, τ_{f} is the parameter determining the steepness of the sigmoidal interpolation and this is the parameter determined by this keyword. Large values will yield an approximately linear remapping between the natural limits of η_{i} which are derived from closest packing of spheres (lower limit) and model compound topology (upper limit). This case is not obvious from the above equation but is obtained via τ_{f}dependent rescaling to match the target interval. Conversely, very small values yield a stepfunction like interpolation. h(x) is a linear function shifting the midpoint parameter χ_{f} (set by FOSMID) such that symmetry between the two natural limits of η_{i} is obtained.
FOSMID
As explained for FOSTAU, the mapping from solventaccessible volumes η_{i} to solvation states υ_{i,f} relies on a midpoint parameter, χ_{f}. In the functional form given above, the midpoint of the sigmoidal function (i.e., the point of maximal slope) can be shifted toward either one of the natural limits of η_{i} by varying this keyword between zero and unity. Since the sigmoidal nature of the interpolation disappears in the limit of large values chosen for FOSTAU, FOSMID is only relevant for sufficiently small values of FOSTAU and its impact deteriorates progressively with growing FOSTAU. Note that the default is 0.5 but that it is easily possible to generate fairly asymmetric interpolation functions in the process (i.e., at values close to zero atoms are considered solvated at almost all times while at values close to unity the opposite is true). There is a Matlab script in the toolsdirectory (sigmainterpol.m) that helps assess the effect FOSTAU and FOSMID have given values for the natural limits of η_{i}.IMPDIEL
This keyword lets the user set the assumed continuum dielectric. Primarily, this is used in the ABSINTH solvation model to treat the screening of electrostatic interactions. The dielectric constant enters the equation for the modified Coulomb sum in different ways depending on the choice for SCRMODEL. In general, the solventaccessible volumes will be mapped to yield solvation states υ_{i,s} for dielectric screening. The mapping process is identical to the one described for FOSTAU but relies on parameters τ_{s} (→ SCRTAU) and χ_{s} (→ SCRMID) instead. In the published ABSINTH model, the screening factor for the polar interaction is given as:s_{ij} = [ 1  aυ_{i,s} ]·[ 1  aυ_{j,s} ]
a = (1  ε_{r}^{1/2})
Here, ε_{r} is the relative dielectric constant set by this keyword. The above equation corresponds rigorously only to using screening model 2. Note how the functional form ensures an interpolation between the vacuum (υ_{i,j} = 0.0 → ε_{eff} = 1.0) and the fully screened cases (υ_{i,j} = 1.0 → ε_{eff} = ε_{r}).
This keyword also sets the assumed continuum dielectric outside the cutoff sphere when treating electrostatics interactions with reactionfield methods (→ LREL_MD). For this latter purpose, it may be advantageous to set a very large value.
SCRMODEL
This keyword has several options which allow the user to control how dielectric screening of charges is done, specifically what functional form is used for the pairwise screening factor s_{ij} for a pair of interacting atoms i and j. The electrostatic framework within ABSINTH aims specifically at ensuring that only moieties with welldefined net charges interact (this is discussed in a different context for ELECMODEL). This means that for every base functional form of s_{ij} there will be two variants, one in which the υ_{i,s} are used directly (atombased) and one in which a charge groupbased υ_{k,s} is precomputed for each group k out of its constituent atoms' solvation states υ_{i,s}^{k}. Only the latter ensures rigorously that two formally neutral charge groups interacting will not create effective charge imbalances by atomspecific screening. The downside of those models (and the reason we generally do not recommend using them) is the higher computational cost associated and the dependence on the local neutrality in the partial charge set (i.e., should the base parameters not yield any locally neutral subgroups within a residue, the relevant charge group may be as large as an entire polynucleotide residue and dielectric responses of fairly distant moieties may become coupled which suggests a length scale to the solvent response vastly inconsistent with the setting for SAVPROBE). In the latter case, it may be necessary to attempt to patch the charge groups so that an approximate grouping is obtained. For every charge group, the solvation states for the
individual sites are averaged in chargeweighted fashion (groupbased →
see above).
The resultant group solvation state υ_{k,s} is used to screen
all the charges belonging to this
group:
s_{ij} = [ 1  aυ_{k,s} ]·[ 1  aυ_{l,s} ]
a = (1  ε_{r}^{1/2})
Here, we assume atom i is part of the k^{th} charge group and atom j is part of the l^{th} charge group. ε_{r} is provided by IMPDIEL.  This is the published atombased model and explained above (→ IMPDIEL). The atomspecific screening via atomic solvation states υ_{i,s} will break the neutral paradigm somewhat but localizes and strengthens specific interactions.
 Since electrostatic interactions tend to be somewhat weak with
the aforementioned options, this model
extends the default model (1) by an important change. If the distance
of atoms i and j, r_{ij} approaches
the lengthscale of the first solvation shell, the dielectric is
augmented by a distancedependent contribution
intended to strengthen specific interactions. This yields a very
complicated (although computationally not much more
expensive) model:
s_{ij} = s_{env,ij} if r_{ij} ≥ (r_{0,ij}+d_{W}) or s_{env,ij} > [ ε_{c}·r_{0,ij} ]^{1}
s_{ij} = [ 1  f_{MIX}·[1  d_{w}^{1}(r_{ij}r_{0,ij})] ]·s_{env,ij} + f_{MIX}·[1  d_{w}^{1}(r_{ij}r_{0,ij})]·[ ε_{c}·r_{0,ij} ]^{1} if r_{ij} < (r_{0,ij}+d_{W}) and r_{ij} > r_{0,ij}
s_{ij} = (1  f_{MIX})·s_{env,ij} + f_{MIX}·[ ε_{c}·r_{0,ij} ]^{1} if r_{ij} < r_{0,ij}
s_{env,ij} = [ 1  aυ_{k,s} ]·[ 1  aυ_{l,s} ]
a = (1  ε_{r}^{1/2})
Here, d_{W} is the thickness of the solvation shell (2·SAVPROBE) and r_{0,ij} is given by the sum of the atomic radii of atoms i and j. f_{MIX} is the impact of the distancedependent contribution and set by keyword SCRMIX. ε_{c} is set by CONTACTDIEL (compare model 4). Note that the distancedependence is achieved by the interpolation performed in the distance regime r_{0,ij} < r_{ij} < (r_{0,ij}+d_{W}) but that no explicit distancedependence is introduced otherwise. Furthermore, the contact dielectric ε_{c}·r_{0,ij} is generally overridden if the environmental dielectric s_{env,ij} would lead to a stronger interaction (less screening). Importantly, model 3 operates on the groupconsistent solvation states (as model 1 does). The atomspecific modification corresponds to model 9. It should be noted that these models are largely untested and were part of initial calibration studies with the ABSINTH implicit solvent model. They are fully supported by CAMPARI, however.  This model implements a (more or less) pure distancedependent
dielectric:
s_{ij} = [ ε_{c}·r_{ij} ]^{1} if r_{ij} > r_{0,ij}
s_{ij} = [ ε_{c}·r_{0,ij} ]^{1} else
Here, ε_{c} is the strength of the distance increase of the dielectric constant and r_{0,ij} is the contact distance below which no further distance dependence to s_{ij} is applied. The resultant effective dielectric constant is ε_{c}·r_{0,ij} which should never be less than unity. ε_{c} is set by CONTACTDIEL and r_{0,ij} is defined by the sum of the atomic radii of atoms i and j. This means that the derivative of the potential is discontinuous at the contact point. Note that distancedependent dielectric models break for a variety of limiting cases, in particular for anything involving net charged species. They also rely on a cutoff criterion since they otherwise do not converge upon a meaningful limiting dielectric. In this way, distancedependent dielectrics may be seen as somewhat analogous to reactionfield treatments (see LREL_MD).  This model is a groupbased variant and therefore similar to
option 1). It attempts to take a different route
toward computing an effective dielectric. Whereas models 1, 2, 3, and 9
use an effective charge approach, this model
(just like models 6, 7, and 8) employs an effective dielectric
approach. The former implies that the solvation
state enters the potential energy for Coulombic interactions as υ_{i,s}·υ_{j,s},
i.e.,
E_{POLAR} will scale with changes in the υ_{i,s}
differently than the DMFI.
Consequently, screening model 5 implies:
s_{ij} = M( [1  a·υ_{k,s}], [1  a·υ_{l,s}] )
a = (1  ε_{r}^{1})
Here, we assume atom i is part of the k^{th} charge group and atom j is part of the l^{th} charge group and M is a function corresponding to a generalized mean whose exact form is determined by the choice for ISQM. The latter will be able to give rise to fundamentally different scaling behavior of E_{POLAR} with the υ_{i,s} illustrated for example by taking the arithmetic mean. This can more closely approximate the behavior seen for the DMFI and may allow using much more similar parameter sets τ_{s} and χ_{s} compared to τ_{f} and χ_{f} than is the case with models 1 or 2.  This model is the atombased variant of model 5:
s_{ij} = M( [ 1  a·υ_{i,s}], [ 1  a·υ_{j,s}] )
a = (1  ε_{r}^{1})
 This model is an equivalent modification to model 5 as model 3 is to model 1.
 This model is an equivalent modification to model 6 as model 3 is to model 1.
 This model is an equivalent modification to model 2 as model 3 is to model 1.
CONTACTDIEL
For certain screening models, (SCRMODEL = 3, 4, 7, 8, or 9) a value for the effective dielectric at an interatomic distance matching the sum of the two atomic radii exactly is postulated to have the limiting value of ε_{c}·r_{0,ij} (see equations above). This keyword provides the value for the parameter ε_{c}.SCRTAU
As explained before (see IMPDIEL and FOSTAU), the ABSINTH implicit solvent model employs two sets of solvation states υ_{i,f} and υ_{i,s}. This keyword determines the steepness of the sigmoidal interpolation that yields the υ_{i,s} from the η_{i}, τ_{s}. The functional form is identical to the one described for FOSTAU. The υ_{i,s} determine the effective dielectric acting between polar atoms (see equations above).SCRMID
This is the specification analogous to FOSMID but provides χ_{s} rather than χ_{f}.SCRMIX
Several of the screening models (choice of 3, 7, 8, or 9 for SCRMODEL) splice a distancedependent term into the environmental chargescreening over a welldefined length scale. The impact of this contribution is set by this keyword which corresponds to the parameter f_{MIX} in the equations above. If set to values close to zero, the model approaches its unmodified base model, e.g. model 3 essentially converges to model 1. Conversely, a value close to 1.0 would yield maximum impact and let  for example  model 3 approximate model 4 for distances close to the contact distance r_{0,ij}. The choice here is naturally tightly coupled to that for CONTACTDIEL.ISQM
In those screening models postulating an effective dielectric rather than effective charges, the generalized mean function M(x,y) was introduced (see equations above). This can be an integer from 10 to 10, but large absolute values slow down the computation drastically and are not recommended. The specification here defines the order m for the generalized mean:M(x,y) = [0.5·( x^{m} + y^{m} ) ]^{1/m} if m ≠ 0
With the limiting case of:
M(x,y) = (x·y)^{1/2} if m = 0
Common cases aside from the geometric (m=0) are the arithmetic (m=1) or the harmonic (m=1) mean. Any m>1 will favor large values in an asymmetric pair, i.e., let both participating atoms appear desolvated leading to stronger interactions, while any m<1 will favor small values in an asymmetric pair, i.e., let both participating atoms appear solvated and weaken such interactions (it is the derived screening factors and not the solvation states that enter the mean). The former scenario (m>1) would rarely seem desirable as it means that  for instance in solutions of small, polar molecules  the cooperativity for converting between fully dissociated and fully associated states becomes overly pronounced on account of the positive coupling between adding more and more species to a growing cluster and the enthalpic benefit offered by that process.
SC_TOR
This keyword specifies the linear scaling factor controlling the "outside" scaling of torsional bias terms, V_{TOR}. Such a potential allows to either harmonically restrain virtually all freely rotatable dihedral angles to specific target values or to softly bias them toward such target values. The setup for these is handled through an input file (details of the format are described elsewhere). Note that a particularly useful application of E_{TOR} is to apply torsional restraints according to structural input which is useful for equilibrating molecules meant to remain in a specific, internal arrangement.TORFILE
This keyword specifies the location and name (absolute paths preferable) of the input file for individual backbone torsional bias potentials, V_{TOR} (see elsewhere for description).TORREPORT
This is a simple logical allowing the user to instruct CAMPARI to write out a complete summary of the torsional bias terms contributing to V_{TOR} (naturally parsed by residue) in the system.SC_ZSEC
This keyword gives the linear scaling factor for a global secondary structure bias term. For values larger than zero, a harmonic bias is applied on two order parameters, f_{α} and f_{β} which measure the secondary structure content of the chain. f_{α} and f_{β} are calculated as the sequenceaveraged (excluded termini) values of a mapping function defined for each residue:z_{α} = e^{τα·(dαrα)2} if d_{α} < r_{α}
z_{α} = 1.0 else
The radius of the (spherical) αregion, r_{α}, is provided by ZS_RAD_A and its center φ/ψposition by keyword ZS_POS_A. The distance d_{α} is taken from the center of the circle and corrected for periodic wraparounds in φ/ψspace. z_{β} is defined analogously. This function represents a smooth "top hat" function which is continuous and differentiable. By tuning the parameter τ_{α} through keywords ZS_STP_A and ZS_STP_B, the Gaussian decay beyond the limits of the spherical plateau region can be turned from very shallow to step functionlike. The default definitions (all of which can be overridden) are:
α
Center: φ/ψ=(60.0,50.0)°; r_{α} = 35.0°; 1.0/τ_{α}^{1/2} ≅22.36°
β
Center: φ/ψ=(155.0,160.0)°; r_{β} = 35.0°; 1.0/τ_{β}^{1/2} ≅ 22.36°
The global values (if there are multiple polypeptide chains in the system, the average is over all of them) are then restrained:
V_{ZSEC} = c_{ZSEC}·(k_{α}·(f_{α}  f_{α}^{0})^{2} + k_{β}·(f_{β}  f_{β}^{0})^{2})
Here, c_{ZSEC} is the linear scaling factor specified by this keyword. The other parameters are explained below. Note that it may not be a good idea to use such a residuebased restraint potential for very short sequences. Here, the net content idea breaks down and (for typical choices of τ_{α/β}) the chain will have access only to values in the vicinity of those given by a discrete residue content. This may lead to a specific sampling of the ring regions around the plateaus to satisfy intermediate target values which runs counterintuitive to the intent of the potential.
ZS_FR_A
This keyword specifies the target αcontent for the global secondary structure bias (f_{α}^{0}) potential (values [0.0:1.0]).ZS_FR_B
This keyword specifies the target βcontent for the global secondary structure bias (f_{β}^{0}) potential (values [0.0:1.0]). Note that the sum of f_{β}^{0} and f_{α}^{0} (see ZS_FR_B) should usually not exceed unity, especially in conjunction with stiff spring constants. Doing so would generate a frustrated system for which results will often be irrelevant.ZS_FR_KA
Through this keyword, (twice) the spring constant (in kcal/mol) operating on f_{α} is provided (k_{α}) if the global secondary structure bias potential is in use.ZS_FR_KB
Analogous to ZS_FR_KA, this keyword lets the user specify the spring constant (in kcal/mol) operating on f_{β} (k_{β}) if the global secondary structure bias potential is in use. If both parameters are meant to be restrained, it usually would not seem meaningful to choose very different values for the two spring constants. In doing so, one would essentially create a primary bias (stiffer term) and a secondary bias (softer term) operating "within" the primary restraint.ZS_POS_A
This is one of the few keywords that requires two floating point numbers as input. It allows the user to override the default location of the αbasin (see SC_ZSEC). The two numbers are interpreted to be the φ and ψvalues (in degrees) for the center of the (spherical) basin. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.ZS_POS_B
See ZS_POS_A, only for the βbasin.ZS_RAD_A
This keyword requires one floating point number to be specified. It allows overriding the default radius of the αbasin (see SC_ZSEC) and is assumed to be given in degrees. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.ZS_RAD_B
See ZS_RAD_A, only for the βbasin.ZS_STP_A
This keyword requires one floating point number. It allows overriding the default steepness of the decay (τ_{α}) of the order parameter value beyond the spherical plateau region defining the αbasin (see SC_ZSEC). It is assumed to be provided in inverse degrees squared. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.ZS_STP_B
See ZS_STP_A, only for the βbasin.SC_DSSP
This keyword provides the outside scaling factor, c_{DSSP}, on biasing potential acting on order parameters derived from the secondary structure annotation of polypeptides in the simulation system using the DSSP alogrithm. In essence, this allows to bias the system to populate more and stronger hydrogen bonds characteristic for either αhelices (H) or βsheets  whether parallel, antiparallel, multipleated or hairpins (E). Since secondary structure annotation is essentially a discretized and on/off variable, it may seem surprising that a restraint potential can be applied in meaningful fashion.V_{DSSP} = c_{DSSP}·(k_{H}·(f_{H}  f_{H}^{0})^{2} + k_{E}·(f_{E}  f_{E}^{0})^{2})
Here, the k_{H} and k_{E} are (twice) the spring constants for the harmonic restraints applied to the secondary structure scores, f_{H} and f_{E}. The spring constants are set by keywords DSSP_HSC_K and DSSP_ESC_K for Hscore and Escore, respectively. f_{H} and f_{E} are exactly identical to the Hscore and Escore defined below and rely on the same base parameters (→ DSSP_MODE). Essentially, they correspond to a multiplicative function of the assignment and the quality of the hydrogen bonds giving rise to the assignment. They can  depending on system and DSSP settings  be continuous and approximately smooth order parameters over a large part of the accessible regime. The target values f_{H}^{0} and f_{E}^{0} are set via keywords DSSP_HSC and DSSP_ESC. There are a few noteworthy peculiarities which the user should keep in mind:
 DSSP Eassignments can rely both on intra and intermolecular hydrogen bonds rendering the DSSP term a true systemwide potential. Currently, CAMPARI only allows restraining global E and Hscores which may make calculations with multiple polypeptides more difficult to interpret.
 In the limit of no hydrogen bonds, the order parameters will always be discontinuous since the discrete assignment score has to be nonzero for the quality score to matter.
 Due to the potential discontinuities, dynamics calculations utilizing the DSSP biasing potential may suffer from substantial noise, in particular for stiff restraints and small systems.
 Again, due to the functional form, there is no direct driving force to form new hydrogen bonds of the right type. The potential relies on random encounters and the cooperativity of secondary structure elements.
 Lastly, in case some proper hydrogen bonds are formed, the resultant energy landscape is often very rugged and sampling may be severely hampered by the presence of the restraints. It is therefore advisable  at the very least  to perform multiple independent simulations when using DSSP restraints.
DSSP_HSC
In case DSSP restraints are used (→ SC_DSSP), this keyword allows the user to set the target Hscore (αcontent, f_{H}^{0} above). Its value is limited to the interval from zero to unity. A large value will steer the system toward forming many i→i+4 hydrogen bonds.DSSP_ESC
In case DSSP restraints are used (→ SC_DSSP), this keyword lets the user set the target Escore (βcontent, f_{E}^{0} above). Just like for DSSP_HSC, values are restricted to the interval [0.0:1.0]. A large value will bias the system toward forming characteristic βhydrogen bonds but does not distinguish between parallel or antiparallel arrangements. Note that the sum of DSSP_HSC and DSSP_ESC should probably never approach unity. Also note that the Escore can never be exactly unity for a monomeric, finite length polypeptide even when discarding termini (turn requirement).DSSP_HSC_K
If DSSP restraints are in use (→ SC_DSSP), this keywords sets (twice) the spring constant (in kcal/mol) operating on the DSSP Hscore, i.e., it sets the value of k_{H} above.DSSP_ESC_K
If DSSP restraints are in use (→ SC_DSSP), this keywords sets (twice) the spring constant (in kcal/mol) operating on the DSSP Escore, i.e., it sets the value of k_{E} above.SC_POLY
In studies of generic polymers coarse descriptors like size and shape of the macromolecule may be more relevant than structural characteristics tailored specifically to polypeptides. CAMPARI supports restraint potentials on such coarse descriptors, specifically the parameters t and δ (see description of output file POLYAVG.dat) which measure size and shape asymmetry, respectively. Twodimensional histograms of these quantities can be computed and written by CAMPARI (see output file RDHIST.dat). These moleculebased restraint potentials yield a bias term to the total potential energy, V_{POLY}, and this keyword provides its "outside" scaling factor c_{POLY}. Note that with the exception of the scaling factor, requests are generally handled through a dedicated input file (see elsewhere for details).POLYFILE
This keyword should point to the location of the input file for individual molecular polymeric biasing potentials (→ elsewhere for description).POLYREPORT
Like other report flags, this keyword is a simple logical which allows the user to obtain a complete summary of the polymeric bias terms (by molecule) in the system. It is only meaningful if polymeric biasing terms are in use (→ SC_POLY).SC_TABUL
CAMPARI has an extensive facility to supply tabulated nonbonded potentials which are then applied to the system. This keyword specifies the "outside" linear scaling factor c_{TABUL} according to:E_{TABUL} = c_{TABUL} ·ΣΣ_{i,j} I(V_{ij}^{k},V_{ij}^{k+1},m_{ij}^{k},m_{ij}^{k+1},d_{ij})
Here, the sum runs over all atom pairs i,j which have a tabulated potential specified for them, V_{ij}^{k} is the k^{th} tabulated value of the acting potential and d_{ij} is the interatomic distance. d_{ij} is located uniquely within the interval given by the k^{th} and k+1^{th} tabulated value. I(...) is the interpolation function, and CAMPARI currently performs only cubic interpolation with cubic Hermite splines:
I(V_{ij}^{k},V_{ij}^{k+1},m_{ij}^{k},m_{ij}^{k+1},d_{ij}) = (2t^{3}  3t^{2} + 1)·V_{ij}^{k} + (3t^{2}  2t^{3})·V_{ij}^{k+1} + (d_{k+1}d_{k})·[(t^{3}  2t^{2} + t)·m_{ij}^{k} + (t^{3}  t^{2})·m_{ij}^{k+1}]
t = (d_{ij}  d_{k})/(d_{k+1}d_{k})
Here, t is the relative position in the interval from k to k+1 normalized to unit length. The m_{ij}^{k} are the tangents to (slopes at) the control points (tabulated values) of the potentials. The spline is set up to recover both values and tangents at the control points. This means that the resultant function is continuously differentiable regardless of the values used for the tangents. Tangents are either read from file (without error checks → description of dedicated input file) or estimated numerically via finite differences from the potential input (see description of dedicated input file). In the latter case, some options are available to tune the spline (see TABIBIAS and TABITIGHT).
There are a few additional characteristics of the implementation of tabulated potentials in CAMPARI:
 Aside from Coulombic terms, these potentials are the only ones captured by the longer of the nonbonded cutoffs in MC runs (→ ELCUTOFF).
 When used concurrently with other nonbonded potentials, a lot of wasteful distance calculation may be performed. This is since tabulated potentials have to use their own data structure to be able to function efficiently both for cases with universal use and for very sparse use.
 Atom pairs that are in close proximity and are excluded from all other nonbonded potentials are not excluded from tabulated potentials.
TABCODEFILE
This keyword provides the index input file which determines which tabulated potential to use for which atom pair (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.TABPOTFILE
This keyword should give the name and location of the actual input file for the tabulated potentials (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.TABTANGFILE
This keyword should give the name and location of the optional input file for providing derivatives of the tabulated potentials specified via another keyword. If this file is not provided, the derivatives are estimated numerically to generate the necessary tangents for the cubic interpolation scheme. If the file is provided, however, no checks are performed on the supplied values (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.TABITIGHT
If tabulated potentials are in use, and if the input file providing derivatives of the potentials is either missing or incomplete, the cubic interpolation scheme applied to the discrete input data (using cubic Hermite splines) utilizes numerical estimates of the tangents (slopes) at the nodes (control points). The shape and nature of the resulting spline can be varied somewhat with two control parameters, the first controlling the "tightness", and the second (see below) controlling a left/rightsided bias with respect to the control points. The control parameters are used in the construction of the tangents as follows:m_{ij}^{k} = [ (1t_{t})·(1+t_{b})·(V_{ij}^{k}  V_{ij}^{k1}) + (1t_{t})·(1t_{b})·(V_{ij}^{k+1}  V_{ij}^{k}) ] / (d_{k+1}  d_{k1})
This is essentially a simplified KochanekBartels spline scheme skipping the discontinuity parameter and assuming identical distance spacings. The V_{j} are the potential values at the specified distances, d_{k}, supplied via the required input file. t_{t} is the tightness parameter controlled by this keyword, and t_{b} is the bias parameter controlled by TABIBIAS. For both parameters being zero, the wellknown CatmullRom spline is obtained. Regardless of the choices for t_{t} and t_{b} (allowed values span the interval from 1 to 1), the resultant interpolation scheme will yield a function that is continuous and smooth (i.e., continuously differentiable). However, unless the control points are very sparse with respect to the features of the potentials, any nonzero settings for t_{t} and/or t_{b} will most likely lead to undesirable effects, in particular at the level of derivatives.
TABIBIAS
If tabulated potentials are in use, and if the input file providing derivatives of the potentials is either missing or incomplete, the cubic interpolation scheme applied to the discrete input data (using cubic Hermite splines) utilizes numerical estimates of the tangents (slopes) at the nodes (control points). The shape of the resulting spline utilizes a bias parameter, t_{b}, that is specified by this keyword. Its exact interpretation is explained above. Simply speaking, positive values lead to a lag (along the distance axis) in the interpolated, piecewise polynomial compared to the control points, whereas negative values do the opposite.TABREPORT
If tabulated potentials are in use (see SC_TABUL), this keyword lets the user instruct CAMPARI to print out a report of all the tabulated interactions in the system. This output can be quite large and is written to a separate output file (see TABULATED_POT.idx).SC_DREST
A lot of experimental techniques (in particular NMR or FRET) can derive distance restraints on the relative position of two sites in a biomolecule. Hence, several computational techniques are able to utilize such restraints (prominent for example in the computational determination of protein structures via NMR). CAMPARI offers the simple facility to harmonically restrain atoms which otherwise need not have any particular relationship. These restraints can be made onesided, i.e. they can also restrain a distance to simply be within or beyond a certain threshold, which is usually a more appropriate treatment for incorporating experimental results. Such requests are handled and processed through a dedicated input file (see FMCSC_DRESTFILE), and details are provided there. This keyword discussed here simply provides the "outside" scaling factor c_{DREST} for the V_{DREST} term.DRESTFILE
This keyword should give the location and name of the input file containing specific atomatom distance restraint requests (see elsewhere for format description). Naturally, this is only relevant if custom distance restraints are in use.DRESTREPORT
If distance restraint potentials are in use (see SC_DREST), this keyword allows the user to request a summary of the active distance restraint terms in the system.SC_EMICRO
This keyword sets the global scaling factor for a spatial density restraint potential. The method was introduced recently (Vitalis and Caflisch), and the user is referred there for additional details. The potential relies on reading and quantitatively interpreting an input density map. The interpreted density for a given lattice cell with indices l, m, and n is denoted Ξ_{lmn} and is meant to correspond to some atomic property such as mass (→ EMPROPERTY). The potential itself is as follows:E_{EMICRO} = f_{EMICRO} Σ_{ijk} (ρ_{ijk}  Ξ_{ijk} )^{2}
The value of f_{EMICRO} is set by this keyword. The potential is extensive with the number of grid cells. If it is the dominant contribution in terms of CPU time to energy evaluations, the use of Monte Carlo sampling is currently quite wasteful since the values for ΔE_{EMICRO} are not actually incremental. The sum implied in the above equation is over all lattice cells of an evaluation grid reduced in resolution to exactly that of the input density map. Note that the dimensions of the evaluation grid are controlled by system size and shape, and that its formal resolution is either assumed to be that of the input map or set explicitly by keyword EMDELTAS (although the resultant lattice is required to have cell boundaries that align exactly with those of the input map). If the resolution of the evaluation grid is finer, the values for its cells are summed up to give the coarser resolution. Furthermore, the evaluation grid may extend beyond the input map, and in such a case the summation also includes (coarse) cells where the input is assumed to be exactly the background density. Taken together, these caveats mean that it is rarely useful not to match the input lattice exactly. Importantly, the spatial density restraint provides an absolute reference in space, which means that it is most likely incorrect to use drift removal techniques. Another unusual aspect about this potential is that it only applies to physically present molecules in simulations in ensembles with fluctuating particle numbers. This is despite it not being a pairwise interaction term, and distinguishes it from potentials affecting the bath particles as well (such as bonded potentials). Because the potential is strictly a penalty term, this creates an effective mismatch that must be lumped manually into the excess chemical potential. This is neither pretty nor clean meaning that concurrent use of this techniques should be accompanied by the appropriate skepticism.
Depending on the choice for EMMODE, E_{EMICRO} can also be written using an average of the simulation density that is typically not equivalent to the canonical ensemble average:
E_{EMICRO} = f_{EMICRO} Σ_{ijk} ( ⟨ ρ_{ijk} ⟩  Ξ_{ijk} )^{2}
Here, the angular brackets indicate an average that depends on keyword EMIWEIGHT and is explained there. Further details as to why the canonical average is not used are below. Note that the potential utilizing this average no longer corresponds to a unique Hamiltonian, i.e., every time the average is updated the energy landscape changes. This means that the ensembles generated are no longer straightforward to interpret. The obvious benefits of using an ensembleaveraged restraint are twofold. First, explicit heterogeneity can explain data that would be inconsistent with a unique structure. Second, sampling is aided by the fact that "stuck" conformations will tend to become unstable in terms of E_{EMICRO} over time. As a final remark, users should keep in mind that the actual ensemble average generated may not agree with input given that this quantity was never actually restrained during the simulation.
EMMODE
If the density restraint potential is in use, this keyword allows the user to choose between two options. Setting this keyword to 1 computes the restraint term by comparing the instantaneous simulation density to the input density map, whereas a choice of 2 computes the restraint term by comparing an ensembleaveraged simulation density to the input density map.While the first option is straightforward, the second one requires some additional considerations as follows. Irrespective of whether a run is in parallel or not, the ensemble average is currently obtained over the previous sampling history (beyond equilibration) of the exact trajectory in question. Note that any average is created in terms of numbers of steps, which may cause inconsistencies in hybrid sampling runs due to the different average phase space increments. Choosing an appropriate type of average is not trivial (see, e.g., this reference), because the naive approach of including the entire sampling history leads to a continuously decreasing impact of the restraint term. There are currently two ways to address this. First, the accumulation frequency for the ensemble average can be reduced by keyword EMCALC. This slows down the reduction in impact and effectively gives the system more time to explore, because it results in concatenated runs of length EMCALC, during which the potential is in fact constant. Second, CAMPARI uses a fixed weight for the instantaneous component of the average while evaluating the potential. This fixed weight is set by keyword EMIWEIGHT and provides a way to utilize the entire history without degrading the impact of the restraint potential. A third route would be to use an appropriate kernel function in the time averaging, but this is inconvenient and potentially inefficient for spatial density analysis due to the large number of terms that would have to be stored and processed to recompute the kernelbased average.
A third option for this keyword may be added in the future that allows a lateral ensemble average to be restrained in MPI averaging calculations.
EMIWEIGHT
If the density restraint potential is in use, and if the potential acts on some ensembleaveraged simulation density, this keyword allows the user to set a fixed weight for the constructed average:⟨ ρ_{ijk} ⟩ = (1w_{inst}) N_{steps}^{1} Σ_{i} ρ_{ijk}(i) ) + w_{inst} ρ_{ijk}(current)
Here, the factor w_{inst} is set by this keyword and bound to the interval from 0 to 1. The ρ_{ijk}(i) are the N_{steps} values contributing to the running, canonical average of the density, and ρ_{ijk}(current) is the density produced by the current conformation at that given lattice cell. The limiting case of w_{inst} being 1.0 recovers the instantaneous treatment (→ EMMODE). The limiting case of w_{inst} being 0.0 does not, however, produce a meaningful restraint (since it is independent of the current conformation). Both limiting cases are therefore forbidden. Note that it is currently not possible to recover the naive approach of a restraint that continuously decreases in relevance.
EMMAPFILE
This keyword provides the location and name of the mandatory density input file when using the density restraint potential. The file format is described in detail elsewhere, and here it suffices to say that the external NetCDF library is needed, and that currently no other common density file formats (.ccp4, .mrc, ...) are read directly by CAMPARI. UCSF Chimera is able to convert between various densitybased file formats, and does read and write NetCDF files.The most common application is likely that of a simulation with 3D periodic boundary conditions and a rectangular cuboid simulation volume. Here, the cells of the input lattice should align exactly with those of the analysis and evaluation lattice CAMPARI uses, and generally it will be easiest to match both origin and dimensions exactly. By default, CAMPARI will obtain the lattice cell dimensions from the input map. For nonperiodic boundaries (including simulation systems with curved boundaries), it will be required, however, to deviate from such an exact match. Here, keyword EMBUFFER can be used to define the buffer in size for the evaluation grid at any nonperiodic boundaries. Furthermore, keyword EMDELTAS can always be used to request the analysis and evaluation lattice to have cells of a smaller size, which, with the restraint potential in place, has to yield the exact input cell size by integer multiplication for all three dimensions. Lastly, keyword EMREDUCE can be used to average the input map to a lower resolution by rebinning.
Assuming no further transformations are applied (→ keywords EMREDUCE, EMTRUNCATE, EMFLATTEN), the interpreted density based on the input file is as follows:
Ξ_{ijk} = ρ_{sol} + c(ω_{ijk}  ω_{bg})
Here,the final density for a given lattice cell, Ξ_{ijk}, has units of physical density, c is a scale factor explained below, ω_{ijk} is the original input density for the same lattice cell, and ρ_{sol} and ω_{bg} are the assumed physical and input background signals, respectively. ρ_{sol} is set by keyword EMBGDENSITY, and ω_{bg} can be set by keyword EMBACKGROUND if the value determined automatically from the histogram of input densities is not appropriate. Factor c is given as follows:
c = [ M_{M}  ρ_{sol} Σ_{ijk} V_{ijk} H(ω_{ijk}ω_{t}) ] · [Σ_{ijk} (ω_{ijk}ω_{bg})V_{ijk} H(ω_{ijk}ω_{t}) ]^{1}
Here, the first term in square brackets is a hypothetical excess signal (mass) using the apparent macromolecular volume (the sum of the volume of all lattice cells with signals exceeding the threshold, ω_{t}) and the assumed total mass. The V_{ijk} are the volumes of individual lattice cells and currently have to be all equal, and H(x) denotes the Heaviside step function. The second term in square brackets is the actual excess signal (mass) derived from the input map obtained by analogous summation. Factor c has units that convert optical density (input) to physical density. It is important to note the crucial impact of keywords EMTHRESHOLD and EMTOTMASS on the quantitative interpretation of the map. In particular, many combinations of values will be rejected by CAMPARI, because they cannot produce an excess signal larger than the background. The resultant interpreted map is written to a dedicated output file at the beginning of each run. Note that this includes all optional transformations controlled by keywords EMREDUCE, EMTRUNCATE, and EMFLATTEN.
EMREDUCE
If the density restraint potential is in use, this keyword can be used to change the formal resolution of the input density map. This is accomplished by simple rebinning, i.e., the target and original lattices are aligned at the origin, and the original signal for each cell is distributed to the target cells by simple overlap. Because the input is assumed to be a density, volume renormalization is performed. Note that it is generally meaningless to create a finer grid this way, because no new information is available, and CAMPARI distributes signal assuming a flat distribution inside each original input cell. Similar to keyword EMDELTAS, this keyword requires the specification of three floating point numbers that set the target lattice cell sizes of the rebinned input map in Å for the x, y, and z dimensions, respectively. Note that the exact values will generally be slightly different because of the requirement to have the outer dimensions of both grids align exactly. Finally, users should keep in mind that physical resolution and formal resolution of the lattice used to represent the data are two distinct quantities.EMBACKGROUND
If the density restraint potential is in use, this optional keyword can be used to override the value determined to correspond to background in the input density map (ω_{bg} in the equation above). This value is commonly set by binning the densities in all cells, and identifying a wellresolved peak in the histogram. If the map does not contain encode much background signal, the histogrambased determination may be inappropriate, and this is when this keyword is useful. Note that values refer to the original input density map.EMTHRESHOLD
If the density restraint potential is in use, this important keyword controls the linear transform used to interpret the input density map in terms of a physical mass density. Specifically, it sets a threshold level in units and numbers of the (potentially rebinned) input that distinguishes signal from background. Since measurements often have low contrast, the threshold is not an obvious property of the input map. The threshold set here corresponds to parameter ω_{t} in the equation above. It is primarily responsible for the overall scaling factor, i.e., larger threshold values will generally produce interpreted maps with a wider spectrum of physical density values. Using the apparent molecular volume and the total mass, the chosen threshold directly determines the apparent physical density (reported in logoutput). This quantity poses constraints on the chosen value, because the integrated signal must be yielding a density larger than the assumed physical background density.EMTOTMASS
If the density restraint potential is in use, this keyword sets the mass in g/mol to be assumed to correspond to the signal in the input density map exceeding the threshold. In general, this can be set to correspond exactly to the explicitly represented matter in the simulation (this is the default), but exceptions may desire an override, e.g., when simulating only a part of the system without wanting to distort the interpretation of the map. The parameter corresponds to M_{M} in the equation above.EMTRUNCATE
If the density restraint potential is in use, this keyword enables truncation of the input map below the chosen value as long as it is higher than the minimum and lower than the assumed threshold level (ω_{t} in the equation above). Truncation implies that the spectrum of values for the interpreted density is completed depleted below the specified level, because all values are simply assigned the background level, ω_{bg}. This technique can be used to eliminate noise from the input that may hamper sampling. Note that values refer to the original input density map. This keyword is the exact complement to EMFLATTEN.EMFLATTEN
Depending on how a density map is generated, the signal may cover a wide spectrum of values. This is particularly true if the contrast to the background is generally low, and the lack of contrast is compensated for by averaging over similar, but heterogeneous conformations. In such cases, the ratio of peak to barely detectable signals may be impossible to describe by physical densities of instantaneous conformations. If the density restraint potential is in use, this keyword therefore allows the user to flatten an input density map at a given level specified by this keyword. The requirement is that the value be larger than the assumed threshold level. This keyword is the exact complement to EMTRUNCATE, and using both concurrently can produce an interpreted map that is purely an envelope of homogeneous density.EMHEURISTIC
The evaluation of the density restraint potential involves the summation of contributions from all the grid cells. Each cell contributes a squared difference of the input density and the actual density for the current conformation of explicit matter in the system. If the formal resolution is high, the evaluation of the potential can be costly. Occasionally, it may be possible to save some CPU time by applying dedicated heuristics, and this is what is controlled by this keyword. Choices are as follows: No heuristic is used. At each global evaluation of the density restraint potential, all grid cells are recomputed and summed up.
 When spreading the atomic masses in the system onto the analysis and evaluation grid, CAMPARI keeps track of whether any given xzslice of the input map actually received a contribution from any atom. If not, the cells constituting this xzslice are not recomputed, but instead a precomputed value for the entire slice is used. This is possible because the simulation densities in all the cells of the slice will be equivalent to the assumed background density. Efficacy of this heuristic obviously depends on the details of the system.
 This works identically to the previous option, except that xlines are considered rather than xzslices.
 This works identically to the previous options, except that local rectangular supercells are used rather than xzslices or xlines. Here, the algorithm will try to combine existing grid cells to yield approximately 1000 supercells. This option is probably the most successful in general, because it can match arbitrary arrangements of explicit matter best.
GHOST
This keyword is a simple logical that determines whether or not to (partially) "ghost" the interactions of selected particles (see FEGFILE) with the rest of the system (and eventually amongst themselves → FEG_MODE). Such scaling of interactions creates artificial systems which can be used to interpolate between two welldefined end states. The most common need for such an application arises in cases where the two end states are significantly different and one is interested in the free energy difference. For example, to calculate the aqueous free energy of solvation of a small molecule in water, one could scale the interactions of the small molecule with water from zero to their full value. Such growthbased calculations are usually complicated to set up and perform since i) trajectories evolved at a given Hamiltonian have to be evaluated (onthefly usually) assuming different Hamiltonians, and ii) it is difficult to maintain an internally consistent system of interactions such that all changes induced by the ghosting can be mapped to atomic parameters of the ghosted species. In CAMPARI, FEG (free energy growth/ghosting) calculations are therefore supported in conjunction with limited Hamiltonians only: the only potentials allowed are IPP, ATTLJ, POLAR, and the bonded interactions. In other cases, it may be possible to extract the same or related quantities through other techniques realizable in CAMPARI. As an example, the free energy of solvation for a flexible (single) solute immersed in the ABSINTH continuum solvation model can be obtained by simultaneously scaling the dielectric from 1.0 to 78.0 and the DMFI from 0.0 to 1.0. The default settings for the auxiliary keywords to GHOST are such that the molecules or residues listed in FEGFILE will be completely ghosted (i.e., invisible to the system).FEG_MODE
In FEG calculations interactions (see GHOST) are always scaled between the ghosted species and the rest of the system. A natural question arises as to what happens to interactions between or within ghosted species (if any are present)? If they are not scaled but instead use the background Hamiltonian it will be impossible to map the effect of the scaling to a change in atomic parameters which is desirable from the viewpoint of rigor. As an example, consider polar interactions between a single ghosted butane molecule and a bath of nonghosted water. A scaling of the atomic charges on the ghost butane by a factor f would give rise to interactions with the bath scaled by f and selfinteractions scaled by f^{2}. This type of scaling is enforced in CAMPARI if a method requires it such as treating electrostatics with the reactionfield method (see LREL_MD). In general, however, it is impossible to find a unique mapping while leaving the background Hamiltonian intact. It is therefore left to the user to determine which of two options to choose:1) Interactions between/within ghosted species use the full background Hamiltonian.
2) Interactions between/within ghosted species use the scaled Hamiltonian.
The choice made here is important only if such interactions are present in the system. If so, however, the raw results will usually depend strongly on it and corrections may have to be applied. As an example, consider the butanewater example from above. The fact that intramolecular interactions are scaled will contribute toward the apparent free energy obtained when interpolating between the fully ghosted and the fully present states. Hence, gas phase corrections have to be applied. They are obtained by repeating the calculation in the absence of water to compute the thermodynamic cycle which then allows isolating the free energy of solvation. Additional complications may arise if molecules are constrained (see FRZFILE).
FEG_IPP
This keyword specifies the "outside" scaling factor for the ghosted inverse power potential. Note that depending on the choice for FEG_LJMODE this is not as simple as SC_IPP and that additional parameters may determine the impact this keyword has. The setting here corresponds to the parameter s_{gIPP} below. Note as well that the inverse power potential supported in calculations with ghosted interactions always uses an exponent of 12 (i.e., setting IPPEXP to anything but the default of 12 will cause CAMPARI to abort). This keyword is only relevant if GHOST is true.FEG_ATTLJ
This keyword is analogous to FEG_IPP but controls the "outside" scaling of the attractive r^{6} dispersive term. The setting here corresponds to the parameter s_{gattLJ} below. Note that scaling this up while FEG_IPP is set to zero (or  depending on the mode  even set to something smaller) will potentially lead to numerical instabilities.FEG_LJMODE
The exact functional form of the scaled (ghosted) LennardJones potential is as follows:E_{gLJ} = 4.0·ΣΣ_{i,j} ε_{ij}f_{14,ij}·[ g(s_{gIPP})·[α·h(s_{gIPP}) + (r_{ij}/σ_{ij})^{6}]^{2}  g(s_{gattLJ})·[α·h(s_{gattLJ}) + (r_{ij}/σ_{ij})^{6}]^{1} ]
Here, the ε_{ij} and σ_{ij} are the standard pairwise LennardJones parameters (see PARAMETERS), the f_{14,ij} are potential 14 fudge factors (see FUDGE_ST_14) that generally will be unity, g(s) and h(s) are auxiliary functions whose functional form depends on the choice for this keyword, and α is the socalled softcore radius (unitless). The two scaling factors s_{gIPP} and s_{gattLJ} are provided by keywords FEG_IPP and FEG_ATTLJ). There are three possible choices determining g(s) and h(s):
 g(s) = s
h(s) = 0  g(s) = s^{f1}
h(s) = 1.0  s^{f2}  g(s) = (1.0  e^{sf1})/(1.0  e^{f1})
h(s) = (1.0  s)^{f2}
FEG_LJRAD
This keyword allows the user to specify the parameter α in the above equations (see FEG_LJMODE), i.e., the softcore "radius" for the modified LennardJones potential. It is generally of limited utility to set this to zero since in that case the scaled potential could as well be created by setting FEG_LJMODE to 1 in which case this parameters becomes meaningless. Conversely, for large softcore radii, the potential is modified for large distances which generally represents unnecessary modification which may slow down convergence in free energy calculations relying on interpolation via ghosting. Generally speaking, values around 0.5 are recommended for either mode 2 or 3. This keyword is only relevant if GHOST is true.FEG_LJEXP
This keyword sets the parameter f_{1} in the above equations (see FEG_LJMODE). It represents a way to  in a simple way  alter the weight of change experienced by the system depending on the choices of FEG_IPP and FEG_ATTLJ. In that sense, it is very closely tied to the design of the interpolation schedule (i.e., both address the exact same issue). There are no gold standard rules for picking this and the user is referred to the literature for further details. In case of free energy calculations, it will be best to inspect the schedule empirically by metrics such as the statistical precision of the pairwise estimates or overlap metrics such as (theoretical) swap probabilities and to then refine either the schedule itself or the global settings accordingly. This keyword is only relevant if GHOST is true.FEG_LJSCEXP
This keyword sets the parameter f_{2} in the above equations (see FEG_LJMODE). Much of the same discussion applies here as already mentioned for keywords FEG_LJRAD and FEG_LJEXP. This keyword is only relevant if GHOST is true.FEG_POLAR
The only other nonbonded potential besides LennardJones supported in FEG calculations is the polar potential (see SC_POLAR). This keyword provides a scaling factor (s_{gPOLAR}) for the softcore Coulomb potential. Much similar to the case for scaled LJ interactions (see above), this may involve three additional parameters (see FEG_CBMODE). Note that it would be most common to only scale this up while FEG_IPP is set to unity so as to avoid potential numerical instabilities.FEG_CBMODE
In analogy to FEG_LJMODE, this keyword determines what exact functional form CAMPARI uses for the scaled (ghosted) Coulomb potential with the "outside" scaling factor s_{gPOLAR} set by\ FEG_POLAR):E_{gLJ} = (4.0πε_{0})^{1}·ΣΣ_{i,j} g(s_{POLAR})·q_{i}q_{j}·f_{14,C,ij}·[α_{C}·h(s_{gPOLAR}) + r_{ij}]^{1}
Here, the atomic partial charges are represented as q_{i,j}, ε_{0} is the vacuum permittivity, and r_{ij} is the interatomic distance. f_{14,C,ij} denotes potential fudge factors acting on 14separated atom pairs (see FUDGE_EL_14) but will generally assume a value of unity. g(s) and h(s) are the same auxiliary functions defined above for the LennardJones potential (→ FEG_LJMODE) and α_{C} is the softcore radius (unitless) specific to the Coulomb potential (controlled by keyword FEG_CBRAD). For completeness the options are listed again in detail:
 g(s) = s
h(s) = 0  g(s) = s^{fC,1}
h(s) = 1.0  s^{fC,2}
FEG_CBRAD
This keyword is analogous to FEG_LJRAD and allows the user to choose the value for the softcore radius specific to the Coulomb potential (α_{C} in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.FEG_CBEXP
This keyword is analogous to FEG_LJEXP and allows the user to choose the value for the polynomial scaling exponent to the Coulomb potential (f_{C,1} in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.FEG_CBSCEXP
This keyword is analogous to FEG_LJSCEXP and allows the user to choose the value for the softcore scaling exponent to the Coulomb potential (f_{C,2} in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.FEG_BONDED_B
Nonbonded interactions provide a straightforward interpretation for parsing the energetics of the system into solutesolvent, solutesolute, and solutesolvent contributions. This is used in a thermodynamic cycle argument when computing  for instance  the free energy of solvation of a solute in solvent via FEG methods. Sometimes (as alluded to under FEG_MODE), it may be desirable to scale intramolecular nonbonded interactions as well. But what about intramolecular bonded interactions? This keyword allows the FEGlike scaling of bonded terms associated with a ghosted species but not of those associated with nonghosted particles. Beyond that this keywords operates just like SC_BONDED_B. Note that this almost certainly creates a pathological situation if bond length potentials are allowed to approach zero and naturally relies on bond lengths being allowed to vary (see CARTINT) to be meaningful. Note that for all bonded parameters the assignment of terms to individual residues in a multiresidue molecule is somewhat arbitrary if atoms from two different residues participate.FEG_BONDED_A
This is analogous to FEG_BONDED_B only for bond angle potentials. Note that this may lead to a pathological simulation if bond angle potentials are allowed to approach 0° or 180° and  again  relies on bond angles actually being varied throughout the simulation to be meaningful.FEG_BONDED_I
This is analogous to FEG_BONDED_B only for improper dihedral angle potentials. Note that this may lead to a pathological simulation if improper dihedral angle potentials are allowed to approach zero and  again  relies on these degrees of freedom actually being varied throughout the simulation to be meaningful.FEG_BONDED_T
This is analogous to FEG_BONDED_B only for proper dihedral angle potentials. Note that this relies on torsional angles actually being varied throughout the simulation to be meaningful (there may be subsets).FEGREPORT
This simple logical keyword lets the user instruct CAMPARI to write out a summary of the ghosted particles (residues or molecules) in free energy growth/ghosting calculations.EWALD
CAMPARI supports using the Ewald decomposition technique to compute longrange electrostatic interactions in periodic systems (see LREL_MD). There are two supported approaches to computing the reciprocal space sums in the Ewald formalism: ParticleMesh Ewald (PME): This elegant and vastly popular method
introduced by Darden et al.
uses discrete Fourier transforms (DFFTs) and
cardinal Bsplines to simplify the computation of the reciprocal space
sum.
Due to the DFFTs, CAMPARI needs to be linked against the free open
sourcelibrary
FFTW for this option to be available.
Briefly, PME reciprocal space sums have different scaling components,
i) the
number of charges; ii) the number of gridpoints; iii) the
interpolation order
for the cardinal Bsplines. It depends strongly on the system which of
these
components is the speedlimiting factor, in particular since the
accuracy of the reciprocal sum depends on the simultaneous optimization
of the spline order (see
BSPLINE) and the gridsize (EWFSPAC) given
that the realspace part codetermines the Ewald parameter (EWPRM). Note,
however, that the fundamental scaling with the number of charges is
O(N).
PME is almost always the recommended (since fastest)
implementation of Ewald sums.
 Standard Ewald: A straightforward computation of the reciprocal part of the original decomposition introduced by Ewald is supported by CAMPARI as well. This method is slow and scales poorly (K^{3}) with the (linear) cutoff size in the reciprocal dimension. Much like PME, the reciprocal sum fundamentally scales as O(N) with the number of charges, however, such that it might be a reasonably efficient alternative should tight cutoffs in reciprocal space be permissible (or should PME be slowed down due to a dominant cost imposed by DFFTs such as in dilute systems).
BSPLINE
When using the PME method (see LREL_MD and EWALD), this keyword determines the order of the cardinal Bsplines to be used. The order can be increased at a moderate cost, such that it is often advantageous to choose a high interpolation order coupled to a relatively coarse mesh (see EWFSPAC) instead of a lower interpolation order coupled to a finer mesh. The default order is 8, and currently only even numbers are permitted.EWFSPAC
When using the PME method (see LREL_MD and EWALD), this keyword determines the grid spacing for the mesh in Å. A smaller value yields a finer mesh which in turn yields more accuracy. The cost associated with finer grids easily becomes substantial (K^{3}scaling), though, even when using the DFFTs provided by FFTW. The code will occasionally adjust too coarse a value since the interpolation order (BSPLINE) requires a certain minimum for the number of available mesh points in each dimension. When using the standard Ewald method, this determines the reciprocal space cutoff directly as the ratio of the longest box side length and EWFSPAC.EWPRM
When using the Ewald method (see LREL_MD and EWALD), this can be used to overwrite the automatically determined value for the Ewald parameter. The Ewald parameter is given in units of Å^{1} (but can just as well be defined as a dimensionless parameter). It determines the relative weight of the realspace and the reciprocal sum in determining the total electrostatic energy of the system. The larger EWPRM is the more weight shifts to the reciprocal sum. Note that the accuracy of the Ewald method is highly sensitive to this parameter in conjunction with the realspace and reciprocal space cutoffs and that a catastrophic lack of accuracy can easily be realized. Therefore, the code tries to determine a reasonable value for the Ewald parameter based on the (hard) settings for the realspace cutoff (NBCUTOFF) as well as EWFSPAC and  in the case of the PME method  BSPLINE. Unfortunately, the accuracy predictor formulas in use are currently somewhat flawed (they are based on the mean force error estimates presented by Petersen). They should be more accurate for the standard Ewald method than for PME since in the latter certain error contributions from the splinebased interpolation are missing. Hence, the automatically chosen parameter should by no means considered an optimal one, merely one which  given the cutoff settings  provides comparatively small errors in forces and energies. Should the procedure be deemed inadequate or should there be an independent estimate of the error this keyword comes into play.RFMODE
When using the ReactionField method (see LREL_MD), this keyword determines whether the corrections include a continuum electrolyte assumption (generalized reaction field) or not: The generalized reactionfield correction is used. The code determines the concentration of net charges (including those which are part of macromolecules) and derives an effective ionic strength. This bulk electrolyte concentration is used to model the dielectric response outside of the cutoff sphere for an individual charge in a PoissonBoltzmann sense.
 The standard reactionfield correction is used. Irrespective of the existence of free, net charges in the system, the dielectric response is simply an approximate solution to the Poisson equation.
Cutoff Settings:
(back to top)
NBCUTOFF
This keyword is interpreted differently dependent on the type of calculation: For MC calculations (see DYNAMICS), it simply sets the nonbonded (IPP, ATTLJ, WCA, and IMPSOLV) cutoff in Å. All the potentials governed by NBCUTOFF are (and have to be) shortrange in nature.
 For MD/LD/BD calculations, it defines the shortrange regime, within which all interactions and forces are computed at every time step. It does not truncate said interactions at a distance of NBCUTOFF Å.
ELCUTOFF
Similarly to NBCUTOFF, this keyword is interpreted differently dependent on the type of calculation: For MC calculations (see DYNAMICS), it simply sets the second nonbonded (TABUL and POLAR) cutoff in Å. All the potentials governed by ELCUTOFF are potentially longrange in nature. Note that interactions beyond this second cutoff, which are Coulomb terms involving moieties flagged as carrying a net charge, are potentially still calculated (see LREL_MC).
 For MD/LD/BD calculations, it defines the midrange regime, within which all interactions and forces are computed accurately, but only every n^{th} time step, i.e., at a lower frequency which is set by the neighbor list update frequency (see NBL_UP). It truncates said interactions at a distance of ELCUTOFF unless they involve longrange electrostatic corrections (in particular in cases involving Coulomb terms involving moieties flagged as carrying a net charge). The twinrange terms (forces and energies stemming from particle pairs with distances between NBCUTOFF and ELCUTOFF Å) are assumed to be approximately constant for the number of steps between neighbor list updates. Twinrange cutoffs are explicitly disallowed for the Ewald and reactionfield methods. If CAMPARI computes additional interactions, i.e., if LREL_MD is either 4 or 5, these interactions are subjected to the same assumption for forces and energies (particle pairs with distances beyond ELCUTOFF Å).
NBL_UP
This keyword provides the update frequency for neighbor lists in MD/LD/BD calculations. Every NBL_UP^{th} step, it is recalculated which residues are within a distance of NBCUTOFF Å (shortrange) and which ones are within a distance of ELCUTOFF Å (midrange). Interactions with the former are computed at every time step explicitly and those with the latter are computed only every NBL_UP^{th} step explicitly. For interactions outside of either cutoff, truncation occurs unless the electrostatic model chosen provides a longrange term (see LREL_MD). These latter interactions will then be recomputed at the same frequency as the midrange ones (with the exception of the reciprocal space sum in Ewald methods which is always computed at every step). Note that this keyword is irrelevant if CUTOFFMODE is set to 1, a setting useful only for debugging purposes.The assumptions made by this keyword are rather aggressive, and it is therefore recommended to use it with caution. Specifically, the neighbor lists here should not be thought of as "buffered" in any way. The integrator noise accumulating by setting this to something large can be quite substantial, and should probably be offset by a large choice for the outer cutoff distance (→ ELCUTOFF). Conversely, the use of residuelevel neighbor list with large effective radii tends to bloat the effective cutoff radius, which creates something akin to an effective buffer zone. This implementation may be changed in the future.
LREL_MC
This keyword determines CAMPARI's method of handling longrange electrostatic interactions in MC calculations. There are currently several options for this with more being added in the future. A general problem is hidden in the fact that MC calculations have to be able to compute relative energies of drastically different configurations at every step such that similarity assumptions cannot be used to speed up the calculations as is the case in MD/LD/BD. All monopoledipole and monopolemonopole interactions are computed explicitly (at full atomic resolution). By default, the governing factor is the parser for the partial charge sets which determines the individual charge groups (see option 2 for ELECMODEL and output files DIPOLE_GROUPS.vmd and MONOPOLES.vmd). Those with a total charge exceeding a threshold (usually zero) are considered "net charges", and those without are considered "dipoles". The flagging is at the residue level, and can be overwritten by a dedicated patch facility. Interactions between dipole groups are skipped even if one or both of the participating residues are flagged.
 All monopolemonopole interactions are computed explicitly (at full atomic resolution). As in the option above, the flagging is at the residue level, and here both residues are required to be flagged. Dipoledipole and dipolemonopole interactions are skipped even if both of the participating residues are flagged.
 This is identical to the previous option except that monopolemonopole terms are computed at a reduced resolution, viz., polyatomic monopole groups are represented by collapsing the total charge onto a single atom, which is nearest to the true monopole center. This choice is currently the default.
 No additional interactions are computed (rigorous truncation).
LREL_MD
Much like LREL_MC, this keyword controls how CAMPARI handles longrange electrostatic interactions in MD/LD/BD calculations. There are currently several options for this which are generally different from those available for Monte Carlo runs since two core assumptions are true for dynamics calculations; i) only global energy/force evaluations are needed; and ii) the system remains selfsimilar through several integration steps. The options are as follows: No additional interactions are computed, i.e., everything beyond the midrange cutoff is discarded. This setting can be used along with LREL_MC set to 4 and ELCUTOFF being equal to NBCUTOFF to create an exact match between dynamics and MC Hamiltonians which may be relevant for hybrid calculations (→ DYNAMICS).
 Ewald summation is used, which relies on periodic boundary conditions,and (currently) cubic boxes → BOUNDARY and SHAPE). This technique relies on the decomposition of an infinite sum over all periodic images into two quickly convergent contributions, a realspace and a reciprocal space part. The realspace part involves a modified Coulomb interaction, which therefore requires separate loops. Hence, support for Ewald sums is currently limited to "gasphase"type calculations with LennardJones and polar interactions only. The reciprocal space part can be solved in a number of different ways (see EWALD and associated keywords). Note that the two cutoffs are collapsed into the shorter one (there is no midrange regime) when using Ewald techniques. Both the realspace and the reciprocal sums are recomputed at every step. Ewald summation replaces the standard Coulomb term and is relevant for all polar interactions even in the absence of full charges.
 The (generalized) reactionfield correction is used. The mode is picked with keyword RFMODE. This involves a modified Coulomb sum and relies on the assumption that truncation can be dealt with by assuming that a low dielectric cutoff sphere is embedded in a high dielectric medium, which gives rise to a reactionfield correction, which lets the force on a charge vanish at the cutoff distance if the difference in dielectric constants is large. The high dielectric is set with keyword IMPDIEL, and the size of the cutoff sphere is given by ELCUTOFF. This method requires modified Coulomb interactions and support is limited similar to Ewald sums. Note that reactionfield corrections assume dielectric homogeneity, i.e., the underlying theory breaks down if the effective dielectric inside or outside the cutoff sphere might become inhomogeneous. The latter is always the case, if, for example, a large enough macromolecule is present or if the system is nonperiodic. Note that algorithmically this is not a longrange correction and that (G)RFcorrected terms are computed with the same frequency as short and midrange terms are (see NBCUTOFF and ELCUTOFF). Due to stability issues, twinrange cutoffs are not allowed for reactionfield methods. Even then, the force discontinuity at the cutoff distance (vanishes only if the dielectric is assumed to be infinite) may cause more noise than a simple truncation scheme (option 1). The reactionfield solution replaces the standard Coulomb term, i.e., it is relevant for all polar interactions even in the absence of full charges.
 The same option as 3) in LREL_MC. The same rules and caveats apply. By matching the methods this way and setting the two cutoff criteria equal to one another, this allows a consistent choice of Hamiltonian in hybrid runs (→ DYNAMICS). This option is currently the default choice.
 The same option as 1) in LREL_MC. The same rules and caveats apply. By matching the methods this way and setting the two cutoff criteria equal to one another, this allows a consistent choice of Hamiltonian in hybrid runs (→ DYNAMICS).
CUTOFFMODE
For determining spatial neighbors, different modes are available: If  for whatever reason  cutoffs are undesirable, the code will assume that all residues are spatial neighbors and compute all interactions at every step. Note that not all MD/LD/BD calculations might support this option since optimized loops relying on neighbor lists are often employed (and/or the method may rely in its formulation on a cutoff).
 This option is obsolete.
 This option instructs CAMPARI to employ gridbased cutoffs. The gridassociation is governed at the residue level by the position of the residues' reference atoms. All gridbased methods (with a uniform mesh) are difficult/inefficient for systems with very asymmetric density (such as a single very long extended chain in a large periodic box) since those systems would either require too large grids (inefficient and memoryconsuming) or are so coarse that no efficient prescreening can occur. Gridbased cutoffs are a good choice for systems with homogeneous density and many small (few atoms) residues. They are absolutely indispensable for simulations of large explicit water systems as any other cutoff mode supported by CAMPARI will critically slow down simulations in such scenarios.
 The last available option instructs CAMPARI to employ topologyassisted cutoffs. Here, interatomic distances are simply prescreened by a master value for the two reference atoms of residue pairs. This takes advantage of molecular topology to simplify the generation of spatial neighbor lists since only residues which pass the prescreen are assumed to be spatial neighbors. Note that the program will compare the distance between the two reference atoms to the sum of the cutoff and the effective radii of the two residues in questions. These radii are currently hardcoded. This mode is the method of choice for systems with heterogeneous density and/or large (many atoms) but relatively few (<1000) residues. Note that in the presence of nonbonded interactions methods 3 and 4 reduce the scaling of CPU time with system size from N^{2} to something considerably faster.
GRIDDIM
If gridbased cutoffs are in use (→ CUTOFFMODE), this keyword allows the user to specify the three integers determining the x,y,z dimensions for the rectangular cutoff grid. The origin and the size of the grid are determined by the box parameters (see BOUNDARY and SHAPE). The total number of grid points should not be so large that operations scaling linearly with this number become a contribution of significant computational cost. Setting the size of the grid cells equal to the cutoff is typically not an effective strategy due to the requirement of having large margins. The latter are a result of the residuebased grid association CAMPARI uses which requires accounting for the effective residue radii in determining spatial neighbor relationships via the grid.GRIDMAXRSNB
If gridbased cutoffs are in use (→ CUTOFFMODE), this keyword allows the user to specify an initial limit for the maximum number of residues associated with a single grid point. Arrays are dynamically resized during the simulation but if the initial setup fails already, an error is returned (also see GRIDMAXGPNB). This keyword is required mostly so CAMPARI has a realistic estimate of the required memory at the beginning.GRIDMAXGPNB
If gridbased cutoffs are in use (→ CUTOFFMODE), static gridpoint neighbor lists are set up initially and used to simplify the generation of neighborlists using the grid. This keyword specifies the maximum number of gridpoint neighbors each gridpoint may possess. If the number is too small, the program will fail during the initial setup. This is again to avoid inadvertent memory emergencies (as for GRIDMAXRSNB).GRIDREPORT
If gridbased cutoffs are in use (→ CUTOFFMODE), this simple logical instructs CAMPARI to write out a summary of the initial grid occupation statistics.CHECKFREQ
This keyword is interpreted differently dependent on the type of calculation. Firstly, for an MC calculation, it specifies the interval how often to recompute the total energy assuming a lack of cutoffs (N^{2} sum). A simple reason for this is that incremental energies may cause incremental drift errors even in the absence of any algorithmic simplifications. Depending on boundary conditions, the N^{2} sum may not be a particularly meaningful reference state. Note that the reset to the N^{2} energy has no implications for the Markov chain, but that it can affect absolute energy values, which may be relevant for certain free energy calculations, for comparisons of simulation results obtained with different cutoff lengths, etc. In addition, if cutoffs are turned on, a sanity check is performed as well, i.e. given the current structure, are the derived interactions in fact complete given the chosen maximum cutoff distance set by ELCUTOFF? If not, this would most likely mean that the parameters used for deriving the list of relevant interactions (specifically, the maximum residue radii) are inappropriate (this can happen for simulations of unsupported residues). Other than that, the usefulness of this check lies mostly in debugging the code itself. Because both the N^{2} energy evaluation and the cutoff check can be extremely slow for large systems, low frequencies are highly recommended for these cases. In order to track the progress of a longer simulation, it is recommended to rely on the instantaneous output files (e.g., ENERGY.dat) rather than on what is printed to log output. Secondly, for MD/LD/BD runs, CHECKFREQ simply sets the interval for how often to report global ensemble variables to log output. In hybrid runs, the functionality varies depending on what type of segment the simulation is currently in.N2LOOP
This keyword is a simple logical which allows the user control over whether or not to initially compute the full N^{2}loop of nonbonded interactions (on by default). This keyword is extremely useful for simulations of very large systems for which this number may take a considerable amount of time to compute and be largely uninformative (in particular in periodic systems). As an auxiliary function in Monte Carlo calculations, it determines whether the sanity check procedure for cutoffs is performed every CHECKFREQ steps. Since N2LOOP is turned on by default, it needs to be explicitly disabled for the cutoff checks to be skipped.USESCREEN
This logical keyword applies to all Monte Carlo elementary moves (except particle deletion moves). The normal sequence of events in CAMPARI is: Perturb configuration.
 Compute shortrange terms for moving parts for new conformation.
 Compute corresponding longrange terms.
 Restore original conformation.
 Compute shortrange terms for moving parts for original conformation.
 Compute corresponding longrange terms.
 Evaluate Metropolis criterion.
 Process acceptance or rejection.
From the above, it is clear that at step 2 we do not yet have access to a difference in energies (which is only available after step 5). Consequently, this quantity is simply compared to the net value of the shortrange energy terms (→ SC_IPP, SC_ATTLJ, SC_WCA, boundary interactions, SC_BONDED_B, SC_BONDED_A, SC_BONDED_I, SC_BONDED_T, SC_EXTRA), and certain bias terms (→ SC_ZSEC, SC_POLY, SC_DSSP, SC_EMICRO, SC_DREST). With the exception of SC_ATTLJ, SC_WCA, SC_BONDED_T, and SC_BONDED_I, these are all strictly penalty terms that can only yield positive contributions to the total energy. Because of the above, the screen is most useful if SC_IPP is used. Inverse power potentials diverge for small distance and can yield arbitrarily large values, which allow meaningful choices for the associated keyword BARRIER. If all aforementioned terms are either zero or negative, the screen will not have any effect. Harmonic potentials (as used in most of the bias terms) can also yield very large values, but the likelihood of this happening during simple MC moves is very small except for SC_DREST, SC_BONDED_B, and SC_BONDED_A (for the latter two terms, this only holds in the presence of soft crosslinks). Therefore, the difficult cases are those, for which the penalty terms are generally high, but do not necessarily vary quickly or strongly upon MC moves. It may then become impossible to use a simplification of this type, i.e., if the chosen screen height is too small, the Markov chain will be corrupted, and if it is made larger, the screen no longer has any effect. To buffer against incorrect use of the method, there is an additional criterion that the incremental energy must exceed twice the total system energy (for typical interaction potentials and an equilibrated system, the latter is often a negative number, and this condition becomes trivially fulfilled).
Note that this technique assumes that the Markov chain remains unperturbed even though the actual acceptance criterion is circumvented. Depending on the setting for BARRIER, this will often be rigorously true for a finitelength simulation. Because the same threshold is used for all types of moves, the efficacy of the screen is likely move typedependent. Finally, simulations using the WangLandau acceptance criterion may not be able to use this technique (a warning is printed in any case).
BARRIER
This keyword is used in two different contexts. First, Monte Carlo moves can take advantage of a cutofflike screen eliminating proposed conformations after only a partial evaluation of the relevant energy terms. (this is enable with USESCREEN). Then, BARRIER sets the energy threshold (screen height, cutoff value, barrier) in kcal/mol.Second, the value of BARRIER in kcal/mol is used as the hardsphere penetration penalty in the hardsphere excludedvolume implementation (enabled by setting IPPEXP to a sufficiently large value).
Parallel Settings (Replica exchange (RE) and MPI Averaging):
(back to top)
Preamble (this is not a keyword)
Most biomolecular simulation software packages allow a form of parallelization which one may refer to as domain decomposition. Here, the system is partitioned into a number of subsystems corresponding to the number of processor cores available to the parallel computation. Each core then  more or less  computes only interactions of its own subsystem. The main requirements for an efficient implementation are to keep the communication load as small as possible and the workload even (refer for example to publications on the parallel model within NAMD). This option is not yet supported within CAMPARI. The only parallel algorithms supported by CAMPARI are sparse communication algorithms such as replica exchange. Like most simulation software, CAMPARI uses the MPI standard for handling interprocessor communication. Here, each slave process only has direct access to its own memory image. With modern multicore processors and multiCPU machines, dedicated parallel architectures and corresponding software development (for example GPU computing) have become ubiquitous in many fields. However, scalability and efficiency of parallelization (whether shared memory, e.g., threadbased, or MPI) remain challenging to achieve without sacrificing generality, e.g., many GPU adaptations of scientific code offer restricted functionality and limited control compared to their CPU counterparts.REMC
This logical keyword  when set to 1  instructs CAMPARI to perform a replica exchange (RE) calculation. If and only if the code was compiled with MPI (and the right executable is used), this keyword activates the RE method (see FMCSC_REFILE to learn how to setup a RE run) employing REPLICAS separate conditions (processes). Irrespective of whether the base sampler is pure Monte Carlo (see DYNAMICS), a dynamicsbased method, or any hybrid method, restrictions apply in that the sampled ensemble must be the canonical (NVT) one (see ENSEMBLE). This can either be achieved by running constant particle number MC, Newtonian dynamics with a proper thermostat (see TSTAT), or stochastic (Langevin) dynamics (which inherently tempers the ensemble). Performing RE swaps is optional (simply set REFREQ to something larger than the simulation length to disable them). While counterintuitive at first, this allows to set up parallel runs in which the system is evaluated for different Hamiltonians (socalled "foreign" energies: see REOLCALC), thereby allowing simple free energy calculations which use a wellestablished ("safe") sampling method as their engine without worrying about the intricacies of the RE method itself (reference).In CAMPARI, each replica and its output will correspond to instantaneous and averaged information from the associated condition, i.e., the underlying trajectory is no longer continuous. The typical assumption is that, depending on the settings for REFREQ, RESWAPS, RENBMODE, and RE_VELMODE, and given a suitable arrangement of replicas in the RE input file, it can be achieved that the resultant ensemble averages and distributions are, for finite samples, indistinguishable within error from a correct reference simulation for the same condition that does not utilize exchange moves. This issue is not at all trivial, however, and the more general and precise approach to the analysis of replica exchange data is to reweight all samples to a given target condition that should either have been part of the original replica space or that can be obtained by interpolation (rather than extrapolation). This reweighting is technically possible in CAMPARI (→ FRAMESFILE) for almost all analysis features in trajectory analysis mode, but the weights have to be determined externally (e.g.,, by the weighted histogram analysis method, WHAM).
As an entirely separate issue, it may sometimes be desirable to perform trajectory analysis in parallel. One motivation can be to simply speed up analysis of large data sets and/or obtain data suitable for error estimates via block averaging. Parallel trajectory analyses are possible and require the RE setup, specifically, keywords REFILE, REPLICAS, and REDIM. All other simulationrelated keywords are ignored. Conversely, analysis keywords REOLCALC, REOLINST, and REOLALL are respected. This can be useful in postprocessing simulation data for free energy growth or related calculations requiring "foreign" energies. There is another complication with RE data and that is the question how to evaluate a possible sampling benefit. Users should always keep in mind that a RE trajectory with swaps inherently averages over data from several coupled trajectories. A simple consequence of this is that data tend to look smoother and better converged if the number of replicas is increased. An assessment of the actual purpose of the method, i.e., increased barrier crossing rates by excursions into conditions amenable to barrier crossing, is more feasibly obtained by unscrambling trajectories, i.e., by looking at trajectories continuous in conformation (and not in condition). This is why CAMPARI allows the user to supply an input file with the swap history of a set of trajectories with the goal of transcribing the set of trajectories to a new set that are all continuous in conformation. The input file needs to be similar in format to the analogous output file created by CAMPARI during RE simulations. If this option is enabled, auxiliary keywords RE_TRAJSKIP and RE_TRAJOUT may become relevant.
Technically, parallel trajectory analysis requires that the REPLICAS individual trajectories are systematically named and numbered in a fashion similar to how CAMPARI writes trajectories in RE simulations. This means that every file is prefixed with "N_XXX_", where XXX gives the replica number (started from "000"). Since there is only a single keyfile, the input trajectory name specified should not include this prefix (it will be added automatically). An example is given elsewhere. Framespecific analyses (and thereby frame weights) are not yet supported in parallel trajectory analysis runs.
Note that a replicaexchange run that contains Monte Carlo moves, and that uses the WangLandau acceptance criterion with WL_MODE being set to 1 may result in identical copies of WangLandau runs if the exchanged parameters do not alter the Hamiltonian (since environmental conditions are irrelevant to the WangLandau sampler in such a case). In any case, the WangLandau iterations will proceed independently for each replica. This implies that it may yield results that are difficult to interpret if replicaexchange swap moves are allowed (because those  currently  always follow a Boltzmann criterion).
REFREQ
One of the free parameters of the replicaexchange method is the interval with which structure swaps are attempted between the different conditions (replicas). If replica exchange is in use (→ REMC), this keyword allows the user to specify said interval. Unlike frequencies supplied to define Monte Carlo move sets described above, this parameter is a deterministic interval, i.e., a setting of 10^{4} will imply that possible exchanges are attempted exactly every 10^{4} elementary steps. This is because, in general, the communication requirement will mandate that all replicas remain synchronized regardless. A swap cycle counts as a single (Monte Carlo) step in the trajectoryViewed as a Monte Carlo move, such a swap attempt is defined in the context of a multicanonical ensemble. This means that any analysis should consider the entire set of simulation data and employ appropriate reweighting protocols to obtain canonical averages corresponding to the individual or even interpolated conditions. It is not immediately clear how justified it is to assume that the individual replicas in a replica exchange run can be analyzed as if they satisfied the canonical distribution for each condition individually. For a large fraction of published replica exchange simulations, swap attempts are restricted to the immediate neighbors along a onedimensional temperature coordinate, and the data coming from replicas are treated independently. Keyword RENBMODE allows the user to choose between neighboronly and global swap protocols. We emphasize again that CAMPARI does support the computation of reweighted averages and distributions by adding floatingpoint weights to a frames file.
It is difficult to provide guidelines for useful settings for this keyword. In replica exchange, very small values for this exchange attempt interval can lead to relaxation problems. With dynamics samplers, the treatment of velocities becomes an important consideration (see RE_VELMODE).
RESWAPS
If the replica exchange method is in use, this keyword specifies the number of swaps within a swap cycle. Each time a step is encountered that is a multiple of REFREQ CAMPARI will collect the data from all replicas, construct the required energy matrix, and randomly pick pairs of eligible replicas (see RENBMODE) for which the swap move Boltzmann acceptance criterion is evaluated. This process is repeated RESWAPS times and the map matrix (structure to condition) is upated after every successful swap. This means that it is possible for no pairs of replicas to effectively swap structures despite the presence of accepted moves. This stochastic implementation differs from that seen in other software and requires a careful choice for this keyword. For exchanges between all replicas (see RENBMODE), this should probably be at least N_{rep}·(N_{rep}1)/2, where N_{rep} is the number of replicas in the simulation. For neighbor swaps only, it should be N_{rep}1. The reason for choosing a number proportional to or larger than the unique possible exchanges is that the computational cost of computing necessary crossenergies (in Hamiltonian replica exchange) and of communicating the information required for the aforementioned matrix is, in our implementation, independent of the final number of accepted swaps. This means that the cost of a swap cycle would be largely wasted by exchanging just a single pair chosen from a much larger number of replicas. For neighbor swaps, the set of possible swaps is limited because the required energy matrix is only a tridiagonal band matrix. This means that "secondary" swaps may be rejected due to lack of information rather than the Boltzmann criterion, which can introduce biases.Note that the acceptance rates become very small once there is hardly any overlap between different replicas (in turn, the acceptance is always strictly unity if the conditions are the same  regardless of the two structures). A large number of attempted swaps in conjunction with allagainstall exchange corresponds to an equilibration of current structures across conditions. In the limit of tiny acceptance rates, the impact of the replica exchange method is no longer felt, and it reduces to a set of independent canonical simulations at different conditions (the same limit is achieved explicitly by setting REFREQ to be very large). Because of this, a reasonable swap acceptance rate is often taken as the primary diagnostic toolfor the choice of conditions (see output file for swap probabilities).
REFILE
This keyword defines location and name of the file containing the specifications for the RE method (see elsewhere for details).RENBMODE
As alluded to above, the replica exchange method represents a rigorous sampling technique if one considers the multicanonical ensemble it defines. This can cause problems in the interpretation of data obtained for an individual condition. Moreover, the energetic overlap between distant conditions is often small leading to negligible swap likelihood for all but the replicas most similar in condition. This is the typical scenario for temperature replica exchange calculations in explicit solvent. Here, it is very common to restrict swap attempts to the (at most) two neighboring replicas for a series of conditions. In Hamiltonian replica exchange, the same idea might actually be more useful as it also restricts the computation of the energy matrix to neighboring conditions. Recomputing energy values for many different conditions can be costly. Therefore, the available options are: Swaps are attempted with all available replicas
 Only the (at most two) neighboring replicas are eligible for swap moves, and neighbor relationships are determined by the sequence of conditions as they appear in the input file (this is the default).
Note that almost all exchangerelated problems naturally disappear in the limit of few attempted swaps (→ REFREQ) or in the limit of poor overlap and consequently few accepted swaps. This limit is very easily reached for large, condensedphase system with typical interaction potentials (fluctuations decrease with increasing size).
REPLICAS
This keyword sets the number of subprocesses intended to be created by a multicopy simulation. For replica exchange calculations, this has to rigorously correspond to the number of processes granted by the system. A large enough number of different conditions in the corresponding input file (→ FMCSC_REFILE) has to be present. For MPI averaging calculations, this will be altered to match the actual processor number granted by the system.REDIM
If the replica exchange method is in use (→ REMC), this keyword sets the number of dimensions specifying the conditions to be expected in the dedicated input file (→ FMCSC_REFILE). Note that replica exchange calculations may rely on neighbor relations (see RENBMODE), and that those may be difficult to define if multiple dimensions are used to specify each condition.REMC_DOXYZ
If the replica exchange method is in use (→ REMC), this simple logical keyword lets the user choose to use Cartesian rather than torsional/rigidbody coordinates to be exchanged in replicaexchange Monte Carlo calculations. It is ignored for REMD. This can be necessary if internal degrees of freedom not sampled by MC diverge in any nodespecific input files (for example through certain rare restart scenarios).RE_VELMODE
This keyword selects how to deal with velocities in replicaexchange molecular dynamics (REMD) runs in the NVT ensemble (see ENSEMBLE and DYNAMICS) given that the replica exchange method is in use (→ REMC). One of the complications of REMD calculations arises in the necessity to pass on or reassign velocities upon any successful structure swap. The options for handling this difficulty are as follows: All velocities are always randomly reassigned upon receiving a new structure. This is equivalent to an instantaneous, global action of an Andersentype thermostat (see TSTAT). It might be the safest option to use for pure Hamiltonian replicaexchange, especially if the Andersen thermostat is used in conjunction with Newtonian dynamics.
 Velocities are rescaled by a factor equivalent to (T_{i}/T_{j})^{1/2} where T_{i} is the temperature of the current node, and T_{j} the temperature of the node the swapped in structure originated from. Note that this does not scale the instantaneous temperature to a specific value, but rather by a specific factor. Unlike the first option, it preserves directions and relative magnitudes of all velocities. This mode relaxes to the third option if temperature is not one of the replica exchange dimensions.
 Velocities are taken directly from the node the incoming structure originated from, i.e., always remain associated with "their" structure. This will almost certainly lead to small artifacts when temperature is part of the replica exchange dimensions.
RETRACE
This keyword is only relevant in MPIReplica Exchange calculations with swaps performed. It requests that an instantaneous integer trace is written indicating which condition is (after each swap cycle) associated with which initial starting conformation (see N_000_REXTRACE.dat elsewhere). These data can be used to reconstruct a trajectory continuous in geometric variables rather than continuous in RE condition (the latter being the CAMPARI default). This is useful to be able to estimate the sampling enhancement provided by replica exchange in terms of conformational decorrelation or similar metrics.MPIAVG
This logical keyword  when set to 1  instructs CAMPARI to perform a multireplica calculation. If and only if the code was compiled with MPI (and the right executable is used), this keyword activates the MPI averaging method. This means that the chosen system is simply replicated REPLICAS times onto as many processing units (typically processor cores).For most simulations, the individually copies are strictly independent (no communication requirement) until the very end when onthefly analysis data are automatically collected and processed by the server node (see OUTPUTFILES for details). Some analysis functions or simulation algorithms may not be supported. This is primarily a mode to save time for the user since it essentially is a multiple noncommunicating replica technique (MRMC/D).
If, however, the simulation is a pure Monte Carlo simulation , and if the WangLandau acceptance criterion is used, the behavior changes. WangLandau runs are essentially iterative, and in such a case the MPI averaging functionality will create a parallel version of the WangLandau scheme. At an interval set by WL_FLATCHECK, the histograms are recombined over the individual nodes. The combined histogram is then what determines the move acceptance, and what is used to evaluate whether to update the convergence parameter or not. The value of the convergence parameter, and all other relevant settings remain synchronized throughout. In between update steps, the individual replicas evolve according to the last global histogram that was since incremented locally. This means that the value chosen for WL_FLATCHECK is a delicate quantity since both too small and too large values may impede convergence. While the former may remove the bias for an individual replica to traverse phase space faster than a canonical simulation, the latter may result in several replicas exploring the same area of phase space, thereby amplifying a lack of global convergence. Note that the communication routines used in the parallel WangLandau implementation can be finetuned using keywords MPICOLLS and MPIGRANULESZ.
MPIAVG_XYZ
If the MPI averaging technique is in use (→ MPIAVG), this simple logical keyword lets the user choose to obtain trajectory data for each of the independent, identical replicas separately (which is also the default). If this keyword is explicitly set to zero (logical false), only a single trajectory file will be written with entries cycling not only through the time or equivalent axis but also through replica space (see elsewhere for details). The choice here is mostly a matter of convenience for postprocessing but note that with individual trajectories REPLICAS as much structural data are written as with a single file. Lastly note that very frequent write operations by different processes to a shared output file may occasionally cause race conditions and/or be inefficient due to long waiting times.MPICOLLS
This keyword acts as a simple logical (turned off by default) that allows the user to enable the usage of collective communication routines defined by the MPI standard for selected communication operations in CAMPARI (routines such as MPI_ALLREDUCE, MPI_BCAST, etc). These routines should at all times be functionally equivalent to what CAMPARI would use otherwise, i.e., collective primitives constructed exclusively from blocking send and receive operations (MPI_SEND and MPI_RECV).The reason for having such a keyword is twofold. First, buggy code in conjunction with these MPIdefined collective communication routines can be difficult to diagnose and debug, because the MPI standard requires an outcome, but not a specific implementation. Essentially, developers and users cannot make any assumptions about the underlying communication flow. In general, this is of course desired (especially from a performance point of view), since it leaves the optimization of said communication to the MPI library rather than forcing the calling program to address these issues. Second, there are enough reports on the web of potentially faulty implementations of these routines in common MPI libraries. In conjunction with additional concerns regarding thread safety, etc, it could prove advantageous to developers to have modifiable implementations in place.
MPIGRANULESZ
If custom CAMPARI routines for collective communications are in use (→ MPICOLLS), and if a calculation is performed that relies on such collective communication operations, this keyword lets the user alter the communication flow structure CAMPARI sets up to handle these cases. The keyword specifies a number of processes, amongst which communication is presumed fast (most often the number of CPU cores on a single board). The communication flow is then set up in a way that minimizes the required communication between such blocks of processes (they are generally assumed to be in sequence and to all be of identical size). This keyword is therefore unlikely to be useful for heterogeneous allocations (different numbers of cores granted on different machines or processes distributed nonsequentially). Between blocks, communication attempts to minimize latency (tree topology), whereas within blocks communication is (currently) strictly hierarchical and sequential with a single head process for each block. This means that (currently) setting MPIGRANULESZ to the number of processes granted by MPI will generate a global hierarchical flow with a single master, whereas setting it to 1 will generate a global treelike flow.NRTHREADS
CAMPARI can be linked against a common highlevel thread library, OpenMP. This is currently a stalled development and not documented in INSTALL. After successful linking, this keyword allows the maximum number of concurrent threads to be utilized. At present, only MC energy calculations can make use of a threaded environment somewhat effectively for certain systems. Users should not pursue this option for anything but testing and development for the time being.Output and Analysis:
(back to top)
Preamble (this is not a keyword)
Unlike most other simulation software, CAMPARI offers to analyze certain quantities while the simulation is being performed ("onthefly"). This has the advantage that the frequency of dumping raw trajectory data to the disk does not have to control the frequency of analyses. This can save time and money by circumventing expensive write operations to disk. Of course, in a typical simulation setting, the user will still want to obtain trajectory data: for visualization, for nonyetdefined analyses, and so on. However, the builtin analyses can still prove beneficial by utilizing as much data as possible. This is generally controlled by several interval settings: analysis X should be performed or instantaneous data Y should be reported every N steps. Such keywords (see for example ANGCALC) are interpreted the same way unless otherwise noted. For example, if ANGCALC is 250 and NRSTEPS is 1000, the analysis would be performed at steps #250, 500, 750, and 1000. There is only one other keyword affecting this: the number of equilibration steps. If in the above example EQUIL is 400, the analysis would only be performed at steps, 500, 750, and 1000 (i.e., the count is always relative to the 0^{th} step).Note that some analyses can be costly. Their scaling with system size will usually be stated. At the very end, the logoutput will typically report the fraction of CPU time spent performing analysis routines. This may help assess whether some of the frequency settings should be reduced.
Lastly, CAMPARI often groups statistics together. For example, for a melt of identical polymers, CAMPARI would by default compute only a single histogram of endtoend distances. This grouping is at times undesired and is overcome by the concept of analysis groups.
RSTOUT
This keyword sets the interval specifying how often to write out a restart file. Such a file will allow continuing both crashed and normally terminated runs without losing significant accuracy due to truncation of significant digits (such as in pdbfiles). Note that they are not bitwise perfect, however. The concept is described elsewhere. Restart files are written to two files continuously replacing themselves on an alternating schedule such that even if a crash occurs during a writeoperation at least one sane restart file should exist. These files are generally named {basename}_1(2).rst. Settings for EQUIL are (of course) irrelevant for this output.ANGRPFILE
This keyword sets path and name to the input file for determining analysis groups by custom request rather than by molecule type (→ ANGRPFILE). By default, CAMPARI will often combine collected analysis data for molecules of identical type. This is not always the desired behavior. For example, CAMPARI fails to recognize differences introduced to molecules of the same type by virtue of moleculespecific constraints or biasing potentials. Analysis groups alleviate this and similar problems by allowing to group molecules of identical type into arbitrary analysis groups. Note that it is never possible to combine data for molecules of chemically different type or to split a single molecule into multiple groups (although the latter may be implemented in the future). Systems employing chemical crosslinks (please refer to sequence input for details) pose a special case: here, intermolecular crosslinks do not conjoin two molecules in terms of data structures and analysis, i.e., it will for example (currently) not be possible to obtain the net radius of gyration of two crosslinked polypeptide chains. Instead, both chains will be analyzed and treated as if they were separate molecules.ENOUT
This keyword defines the interval how often current potential energy data are written to a file called ENERGY.dat. Note that the total energy is decomposed into the individual terms controllable by keywords of the type SC_XYZ (for example SC_IPP). It is presently not possible to obtain energy decompositions based on subcomponents of the system. Settings for EQUIL are ignored for this output.ENSOUT
By this keyword, the user sets the interval how often to write current ensemble data to a file called ENSEMBLE.dat. This is only relevant if DYNAMICS is not set to 1 or 6 (pure Monte Carlo sampling or minimization). The reported quantities are informative ensemble variables (limited output presently) including  most prominently  potential and kinetic energies. Settings for EQUIL are ignored for this output.ACCOUT
If pure Monte Carlo or hybrid sampling is used (→ DYNAMICS), this keyword sets the interval how often to report cumulative acceptance data to a file called ACCEPTANCE.dat. Note that these data are mildly informative in that they do not directly allow to compute acceptance rates. They are mostly useful in analyzing a running simulation and assessing the performance of the move set. CAMPARI will report acceptance statistics as well as residue and moleculeresolved acceptance counts at the very end of the simulation to logoutput. The data in ACCEPTANCE.dat are only resolved by move type. Settings for EQUIL are ignored for this output.TOROUT
This keyword lets the user decide how often to write sets of internal coordinate space degrees of freedom to a file FYC.dat in a one structureperline format. These files can easily become large due to the number of degrees of freedom in general scaling linearly with system size. There are two options selected by using a positive (mode 1) or negative integer (mode 2) for TOROUT. Native CAMPARI degrees of freedom are written with a header providing residuelevel information. These generally correspond to the unconstrained degrees of freedom in Monte Carlo or torsional dynamics calculations (see sequence input for details). All but rigidbody coordinates are written to FYC.dat and much more information is provided there. Because rigidbody coordinates as missing, the information in the file is never enough to completely reconstruct the system even when assuming the default covalent geometries
 Sampled, dihedral angle degrees of freedom are written with a header that provides atomic indices corresponding to the various Zmatrix lines describing these dihedral angles. This mode excludes degrees of freedom that are actually frozen, and can include degrees of freedom that are not native to CAMPARI. All values are again written to FYC.dat, and more details are provided there. This mode never includes bond angles and/or dihedral angles that have no explicit Zmatrix entry.
XYZOUT
This very important keyword sets the frequency with which snapshots containing (at least) the Cartesian coordinates of the system (or selected subsystem) are written to a new file or appended to an existing trajectory file (→ OUTPUTFILES and XYZPDB). Part of the filename(s) will be determined by keyword BASENAME. This is the fundamental saving frequency for obtaining trajectory data and should be chosen carefully whenever the proposed simulation is resourceintensive.XYZPDB
If structural output is requested (→ XYZOUT), this keyword chooses the output file format (see OUTPUTFILES). It is an integer [13(4,5)] interpreted as: Tinkerstyle .arcfiles (ASCII)
 ASCII .pdbfiles (default option) in various output conventions (see PDB_W_CONV)
 CHARMMstyle binary .dcdfiles (these include the box information for each snapshot and have a CHARMMstyle header  note that the header is written only once by CAMPARI and contains the number of snapshots in the file. This may not always be correct if simulations are prematurely terminated or trajectory files appended)
 Compressed binary .xtcfiles as used in GROMACS: note that this option is only available if the program is linked against a proper version of XDR (see INSTALL)
 Compressed binary .ncfiles as define by the NetCDF format in AMBER convention: note that this option is only available if the program is linked against a proper NetCDF library (see INSTALL).
XYZMODE
If structural output is requested (→ XYZOUT), this integer [12] keyword determines whether to write to a series of numbered files (1) or a single file (2, the default). This, however, currently works for pdb only (specifically: .arc are always multiple files, and the binary formats always write to (append) a single file).XTCPREC
If structural output is requested (→ XYZOUT) and the chosen output format is the binary .xtcformat (option 4 for XYZPDB), this keyword can be used to specify the multiplicative factor determining the accuracy of compressed xtctrajectories (the minimum is 100.0). It is also required for proper reading of xtctrajectories in xtcanalysis mode (see PDBANALYZE and XTCFILE).PDB_NUCMODE
CAMPARI's internal representation of polynucleotides has one peculiarity. It assigns the entire PO_{4}^{} functional group to the same nucleotide residue whereas most other programs seem to assign the 3'oxygen atom to the residue carrying the sugar. This causes a nontrivial inconsistency when trying to use CAMPARIgenerated pdbfiles as input for other software. Therefore, this keyword defines how to assign the O3*atom of nucleic acids in pdboutput only. There are two options: The O3*atom is assigned to the residue carrying the 5'phosphate it is part of, i.e., it is the very first atom in that residue. This is the CAMPARIinherent convention and reflects the authentic structure of arrays in CAMPARI (which is relevant for any analysis requiring atom numbers, see for example PCCODEFILE in INPUTFILES).
 The O3*atom is assigned to the residue carrying the sugar it is part of; this is the PDBtypical convention. Note that this inherently disrupts the 1:1correspondence between numbering in the pdbfile and how nucleic acids are represented internally. It is recommended if CAMPARIoutput is sought to be compatible with other software working in this latter convention.
PDB_W_CONV
CAMPARI can in general process different atom and residue naming conventions for the formatting of PDB files. This keyword selects the format for written files. Choices are: CAMPARI format
 GROMACS format (atom naming, nucleotide and cap residue names, ...)
 CHARMM format (atom naming, cap residue names and numbering
(patches), ...): Note that there are two exceptions pertaining to
Cterminal cap residues (NME and NH2) which arise due to nonunique
naming in CHARMM: 1), NH2 atoms are called NT2 (instead of NT) and
HT21, HT22 (instead of HT1, HT2), and 2), NME methyl hydrogens are
called HAT1, HAT2, HAT3 (instead of HT1, HT2, HT3).
 AMBER format (atom naming, nucleotide residue names, ...)
XYZ_SOLVENT
If structural output is requested (→ XYZOUT), this logical keyword allows the user to suppress trajectory output for molecules labeled as solvent. This can be useful to downconvert trajectory files from explicit solvent runs or  more general  to isolate certain parts of the system from existing trajectory data (employing PDBANALYZE and ANGRPFILE). It may also be used to save space during actual simulations but it should be kept in mind that information about the solvent may be lost irrevocably and that the resultant trajectories may no longer be straightforward to analyze.TRAJIDXFILE
Usage of keywords XYZ_SOLVENT in conjunction with the concept of analysis groups allows the user some amount of fine control over what is written to the trajectory file. In some scenarios this may not be enough (for example, if external scripts or software or even CAMPARI itself are meant to analyze nontrivial subsets of the system). Then, the user has the option to supply a simple index file providing peratom control over what coordinate information is written to the trajectory file. Note that this will be useful for subsequent trajectory analysis runs only if the selected subset preserves the integrity of all molecules to remain in the output, or if the output format is pdb such that missing atoms can be rebuilt. For example, consider a block copolymer consisting of two blocks. The full trajectory could be reanalyzed using an index file to yield a reduced trajectory in pdbformat (keywords XYZOUT, XYZPDB, and XYZMODE) that contains only one of the two blocks. With a properly adjusted sequence input file, it may then be possible to perform intrinsic CAMPARI analyses over the isolated block which really was part of a larger molecule. In this process, almost certainly some terminal atoms would have to be rebuilt at the break point (but those may not influence the analyses). For a description of the input file format, see here. Note that all other output selection settings are ignored if an index set is used via this keyword.XYZ_FORCEBOX
If a system is simulated or analyzed that utilizes periodic boundary conditions, this keyword can be used to alter the standard CAMPARI way of placing atoms with respect to the unit cell. By default, CAMPARI will never break up molecules in trajectory output, which implies that the absolute coordinates in the trajectory file(s) can extend significantly beyond the formal boundary of the unit cell. Sometimes (for example, for visualization or for certain analyses), it may be desired to instead have all atoms be inside the unit cell, and this is what this keyword accomplishes. It currently works a simple logical, and setting it to 1 will make sure that in all trajectory and other structural output all atoms selected for output are indeed inside the formal unit cell. Naturally, this will generally split molecules into two or more parts, which may interfere with molecular representations relying on bonds, etc.Note that trajectory files created in such manner are currently not understood by CAMPARI when trying to read them back in. It is therefore recommended to utilize this feature only to transform preexisting trajectories (via trajectory analysis mode) rather than using it during the actual simulations.
ALIGNCALC
In trajectory analysis runs CAMPARI offers the option to structurally superpose the current Cartesian coordinates to a suitable reference. Note that this functionality is conveniently available through almost all molecular visualization software packages. CAMPARI provides automatically generated visualization scripts designed to work with VMD. If these options are unavailable or inconvenient, this keyword lets the user set the interval how often CAMPARI should perform structural alignment. For example, to create  from an original trajectory  a superposed trajectory of every 10th frame, XYZOUT would have to be 10 and ALIGNCALC would have to be 10 or a factor of 10 (5,2,1).Alignment happens before any of the analysis routines are called and works by first defining a reference set of atom indices (→ ALIGNFILE). Using a quaternionbased algorithm, an optimal translation and rotation is determined that minimizes  when applied to the current coordinates  the deviation between the transformed current coordinates and the reference coordinates (i.e., a set of coordinates for all atoms in the reference set). Note that this procedure will always preserve the internal state of molecules and  except for certain cases in periodic boundary conditions  the relative arrangement of molecules. It will not, however, preserve the relative position of the system boundary. This may lead to artifacts in energetic analyses of aligned trajectories or any analyses that rely upon relative, intermolecular coordinates.
There are two ways of defining the reference coordinate set. The first is via an external file. Here, CAMPARI reuses the pdbtemplate functionality. If keyword PDB_TEMPLATE is specified and successfully read, then the reference coordinate set is extracted from this file using the reference set defined via ALIGNFILE. Note that the template may serve a double purpose in this scenario as it may still provide the atom numbering map needed to read binary trajectory formats with nonCAMPARI atom order. If no template is specified, the reference set will be given by the previously aligned structure. This successive alignment therefore uses a different reference coordinate set each time and will consequently lead to drift.
ALIGNFILE
If system alignment is possible and requested (→ ALIGNCALC), this keyword allows the user to supply the path and name of a mandatory input file containing an atomic index list defining the reference set to be used for the alignment algorithm. For example, in the simulation of a macromolecule with cosolutes it will not be meaningful to use the entire set of atoms in the system as the alignment set since the randomly dispersed cosolutes will dominate the alignment. Instead, one will typically want to only supply nonsymmetric protein atoms here.This keyword serves a second purpose, viz., if structural clustering is requested, and if an RMSD distance criterion with differing alignment and distance atom index sets is desired, this keyword lets the user specify the input file with the alignment set. Simultaneous use of both functionalities is permitted. Lastly, note that any set used for alignment must consist of at least three atoms.
POLOUT
This keyword sets the interval how often to compute and write current systemwide polymeric variables (→ POLYMER.dat). This instantaneous output can be useful to easily monitor structural changes (such as dimerization events) in dilute systems with heterogeneous density. It is completely uninformative for systems with homogeneous density. For simulations of a single polymer chain, distributions of polymeric order parameters as well as correlation functions can be computed from the output in POLYMER.dat.POLCALC
This keyword lets the user specify the frequency with which values for polymeric properties incurring low computational cost are computed. These data are collected and reported resolved by analysis group and include characteristic values for shape and size, histograms of endtodistances, etc. Setting this keyword such that polymeric analyses are performed, several output files are generated: (→ POLYAVG.dat, RGHIST.dat, RETEHIST.dat, and RDHIST.dat). Furthermore POLCALC controls the interval for data collection to obtain averages of the suitably defined angular correlation function along the polymer backbone, which may be related to the intrinsic stiffness or persistence length of the polymer (→ PERSISTENCE.dat and TURNS_RES.dat). Lastly, this keyword controls the frequency for the computation and averaging of molecular, radial density profiles, i.e., the mass distribution function along the radial coordinate originating from the each molecule's center of mass considering only atoms belonging to that molecule (→ DENSPROF.dat). This quantity is used in Lifshitztype polymer theories.RHCALC
Since the computation of comprehensive polymerinternal distances is more expensive, this dedicated keyword controls the data collection interval for analyses relying on such data. A comprehensive set of internal distances in CAMPARI is used to compute three quantities: An alternative estimate of the polymer's spatial size which is sometimes related to the hydrodynamic radius (→ corresponding entry in POLYAVG.dat; note that should RHCALC be set such that no analysis is performed but POLCALC be chosen such that the other quantities in POLYAVG.dat are compute and provided, the corresponding column must be ignored).
 A scaling profile of the internal distances with distance of separation in primary sequence (→ INTSCAL.dat).
 The scattering (Kratky) profile of the polymer (→ KRATKY.dat; this relies on the additional frequency setting SCATTERCALC).
SCATTERCALC
As alluded to above, this keyword sets an auxiliary frequency for the calculation of scattering properties resolved by analysis group (→ KRATKY.dat). This requires computing Fourier transforms of internal distances for a series of wave vectors and is consequently a very expensive calculation. Due to the coupling to the computation of internal distances (see RHCALC), this keyword is not interpreted like the other interval keywords (???CALC). Instead, SCATTERCALC sets the calculation interval amongst only those steps chosen already via RHCALC. For example, if RHCALC is 10 and SCATTERCALC is 20, then scattering data will be accumulated every 200 steps. The data in KRATKY.dat can be used to compare simulation data directly to experiment. In a doublelogarithmic plot, it may also be possible to identify linear regimes ("power law regime" in contrast to the "Guinier regime" for smaller wave vectors) which can be fit to yield the scaling exponent for fractal objects. Conversely, for globular polymers, Porod's law may hold.SCATTERRES
Since the required number of points and range of wave vectors for the prediction of scattering profiles may be systemdependent, this keyword allows the user to adjust the spacing of wave vectors assuming scattering data are being calculated at all (→ RHCALC and SCATTERCALC). The first wave vector's absolute magnitude q=q will always be 0.5·SCATTERRES with units of Å^{1}. In general, the larger the chain, the smaller the absolute magnitudes of wave vectors needed.SCATTERVECS
Since the required number of points and range of wave vectors for the prediction of scattering profiles may be systemdependent, this keyword allows the user to adjust the total number of employed wave vectors assuming scattering data are being calculated at all (→ RHCALC and SCATTERCALC). Together with SCATTERRES, this determines the range of the wave vectors. Note that generally a coarse resolution (and hence a small number of vectors) is sufficient as scattering profiles tend to be very smooth functions.HOLESCALC
For polymers it may be interesting to analyze the distribution of "internal" void spaces. In CAMPARI, a rudimentary analysis routine exists which attempts to place spheres of varying size at different distances from the molecule's centerofmass and to record whether any overlap with part of the polymer is encountered. This analysis is recorded in instantaneous output (HOLES.dat), and the latter needs to be postprocessed. Note that this analysis is restricted to simulations of monomeric polymers.RGBINSIZE
If standard polymeric analyses are performed (→ POLCALC), this keyword sets the size of the bins in Å for the three output files RGHIST.dat, RETEHIST.dat, and DENSPROF.dat. It therefore determines the resolution along the radius of gyration or related axes.POLRGBINS
If standard polymeric analyses are performed (→ POLCALC), this keyword can be used to set the number of bins of size RGBINSIZE for the three output files RGHIST.dat, RETEHIST.dat, and DENSPROF.dat. Since quantities like the radius of gyration or endtoend distances are strongly systemdependent, it is up to the user to ensure the appropriate number of bins. Note that  just like all other histograms in CAMPARI  terminal bins will be overstocked should range exceptions occur.PHOUT
This keyword controls the frequency how often to output ionization states of certain ionizable residues. Currently, this analysis relies on pseudoMonte Carlo moves (see PHFREQ) to work and is therefore only available in straight MC runs. Further limitations are listed in the descriptions of sampler and output file.ANGCALC
This keyword lets the user define the interval how often to extract polypeptide backbone torsion angle statistics, i.e., how often to go through all nonterminal polypeptide residues and bin values for the φ/ψangles into a twodimensional histogram. This keyword also controls the data collection frequency for estimation of vicinal NMR Jcoupling constants (H_{N} to H_{α} → JCOUPLING.dat). The Ramachandran analysis itself is reported globally in a file called RAMACHANDRAN.dat. Due to the systemwide averaging (including over molecules of different type), this is probably most meaningful for simulations of single homopolymers. For more detailed control, further output files may be obtained: residuespecific as well as analysis groupspecific maps should requests have been provided via keywords RAMARES and RAMAMOL, respectively.ANGRES
This keyword matters only if ANGCALC is chosen such that polypeptide backbone φ/ψstatistics are accumulated. If so, it sets the resolution in degrees for such angular distribution functions. The smallest permissible value at the moment is 1.0°.RAMARES
This keyword matters only if polypeptide φ/ψanalysis is requested (→ ANGCALC). If so, it allows the user to monitor the distributions specifically for selected polypeptide residues in the system. The first entry, which defaults to zero, specifies the number of such specific requests. The user then has to provide the appropriate number of integer values (residue numbers as defined per sequence input) on that same line in the keyfile. The maximum number for individually monitored residues is limited to 1000. Successful requests (those pointing to nonpolypeptide, nonexisting, or terminal residues will be ignored) will create output files like "RESRAMA_00024.dat".RAMAMOL
This keyword is exactly analogous to RAMARES only that it operates not on residue but on analysis groups (all residues of all molecules in that analysis group are pooled, numbering as reported initially in the logoutput). It will create files like "MOLRAMA_00002.dat".INTCALC
This keyword sets the interval how often to compute comprehensive statistics for typical internal coordinates of the system, i.e., all bond lengths, angles, torsional angles, as well as improper torsional angles (trigonalplanar centers  consult PARAMETERS for further details). Note that molecular topology defines which atom pairs  for example  share a bond. With this analysis, it is therefore not possible to analyze arbitrarily defined distances, angles, and torsion angles in the system. If turned on, up to five different output files are provided, namely INTERNAL_COORDS.idx, INTHISTS_BL.dat, INTHISTS_BA.dat, INTHISTS_DI.dat, and INTHISTS_IM.dat.WHICHINT
This is one of the few keywords expecting multiple inputs and matters only if internal coordinate analysis is requested (→ INTCALC). Four integers should be provided and each one is interpreted as a logical to turn on individual groups of internal coordinate analyses. The first turns on the calculation of bond length histograms, the second that of bond angle histograms, the third that of improper dihedral angle histograms, and the fourth that of proper torsional angle histograms. Note that the number of possible internal coordinates quickly exceeds the number of atoms for any complex molecule. These analyses can therefore easily become fairly timeconsuming as well as datarich (in terms of the sizes of the output files). This is one of the reasons for introducing this selection mechanism. The other lies simply in the fact that in any simulation using CAMPARItypical torsional space constraints (see CARTINT) analyses of bond length, angle, and improper dihedral distribution is meaningless.SEGCALC
This keyword lets the user specify the interval how often to scan the polypeptide backbone for stretches of similar secondary structure (as defined in the specified file through FMCSC_BBSEGFILE). The annotation  in contrast to DSSP  is obtained purely on torsional criteria and relies on defining consensus regions within φ/ψspace. These consensus definitions are found in a supplied data file (→ BBSEGFILE). At the end of the simulation results are written to files named BB_SEGMENTS_NORM.dat, BB_SEGMENTS_NORM_RES.dat, BB_SEGMENTS.dat, and BB_SEGMENTS_RES.dat This analysis is resolved by analysis group and useful to identify coarse secondary structure propensities in polypeptides. As an example, the data in BB_SEGMENTS_NORM_RES.dat can be used to compute parameters of the helixcoil transition according to the LifsonRoig formalism (see for example Tutorial 3 or this reference). SEGCALC also controls the computation of global (at a molecular level) secondary structure order parameters f_{α} and f_{β} (which are also used for the corresponding bias potentials → SC_ZSEC used in Tutorial 9 or this reference). Various distribution histograms are written to files ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat. Analysis of these order parameters is similarly performed in analysis groupresolved fashion.DSSPCALC
This keyword specifies how frequently to perform DSSP analysis. DSSP is a secondary structure assignment procedure for proteins (reference). All eligible (i.e., full peptide) residues are scanned for backbonebackbone hydrogen bond patterns and various statistics and running output is provided if so desired (see DSSP_NORM_RES.dat, DSSP_NORM.dat, DSSP.dat, DSSP_RES.dat, DSSP_HIST.dat, DSSP_EH_HIST.dat, and DSSP_RUNNING.dat). The DSSP results typically complement the results from backbone segment statistics (see for example BB_SEGMENTS_NORM_RES.dat) well as the former are based exclusively on hydrogen bond patterns while the latter are based exclusively on dihedral angles.INSTDSSP
If DSSP analysis is requested (→ DSSPCALC), this keyword is interpreted as a simple logical whether to write out running traces of the full DSSP assignment for the current snapshot (see DSSP_RUNNING.dat). This can be useful when analyzing pdbtrajectories or even individual pdbstructures with CAMPARI. Instantaneous DSSP output is currently not supported for MPIaveraging calculations (see MPIAVG).DSSP_MODE
Based on DSSP analysis (→ DSSPCALC), the code computes two order parameters to measure canonical secondary structure content. The Escore corresponds to the βcontent and the Hscore to the αcontent. they are systemwide quantities and are computed as follows:Escore = Efraction · ( HbondScore_E )^{1/n}
Hscore = Hfraction · ( HbondScore_H )^{1/n}
Here, Efraction and Hfraction are simply the fractions of residues which are assigned E or H according to DSSP. n is an arbitrary scaling exponent (see DSSP_EXP). HbondScore_E is a continuous variable which measures the mean quality of the hydrogen bonds forming the βsheets in the system and HbondScore_H is the analog for αhelices. In principle, all the hydrogen bond energies are collected and divided by the value for the same number of good hydrogen bonds (see DSSP_GOODHB). The quantity can be capped, however, based on the choice for DSSP_MODE:
 Every hydrogen bond can maximally contribute the value of DSSP_GOODHB. Therefore, HbondScore_X is always less than unity and only approaches unity if each and every relevant Hbond is at least as favorable as the cutoff given by DSSP_GOODHB. This is the most stringent score. The resultant Xscores will always be less or equal to the corresponding Xfractions.
 Every hydrogen bond can maximally contribute DSSP_MINHB which is always more negative than DSSP_GOODHB. The value of HbondScore_X, however, is capped to be at most unity. In this score, very strong Hbonds can compensate the effects of a few weak ones but the value for Xscore still is capped by the corresponding Xfraction.
 Every hydrogen bond can maximally contribute DSSP_MINHB. The value of HbondScore_X is not capped and can adopt values larger than unity. The Xscore is capped, however, to never exceed unity. This is the most lenient score and the only one in which Xscore can exceed the value of Xfraction.
DSSP_EXP
For the DSSP analysis in CAMPARI (→ DSSPCALC), this keyword choose the integer scaling exponent for the Hbond term in computing E and Hscores (see DSSP_MODE).DSSP_GOODHB
For the DSSP analysis in CAMPARI (→ DSSPCALC), this keyword defines the standard energy for a "good" hydrogen bond. This is used to evaluate the smoothed E and Hscores (see DSSP_MODE) and not part of the original DSSP standard. Permissible values lie between 1.0 and 4.0 kcal/mol.DSSP_MINHB
For DSSP analysis (→ DSSPCALC), this keyword specifies the minimal (= lowest possible = most favorable) energy for any hydrogen bond. Since the DSSPformula is based on inverse distances it is useful to introduce this lower cap such that conformations with steric overlap do not overly bias the analysis (for example in pdbanalyses → PDBANALYZE). Permissible values lie between 10.0 and 4.0 kcal/mol.DSSP_MAXHB
For DSSP analysis (→ DSSPCALC), this keyword allows the user to define the maximal (= highest possible = least favorable) energy fo any hydrogen bond. This is the fundamental cutoff for DSSP to consider Hbonds and therefore a very important quantity for the analysis to be meaningful. The recommended value is 0.5 kcal/mol but values between 1.0 and 0.0 kcal/mol are allowed.DSSP_CUT
For DSSP analysis (→ DSSPCALC), this keyword defines the distance cutoff applied to the C_{α}atoms of two peptide residues to consider them for hydrogen bonds. This can be relatively short (defaults to 10 Å) but the accuracy hinges on the choice for DSSP_MAXHB. Consistency has to be ensured by the user. Using a C_{α} cutoff for prescreening of residue pairs significantly reduces the computation time needed by the DSSP analysis.CONTACTCALC
This keyword specifies the interval how often to perform contact analysis, i.e., how often to get information about which and how many solute residues are close to each other. See CONTACTMAP.dat and CONTACT_HISTS.dat for more details. Note that this analysis is restricted to residues of molecules tagged as solutes (→ "FMCSC_ANGRPFILE) in order to facilitate frequent contact analysis even if solute molecules are explicitly represented (which may be prohibitively expensive otherwise).CLUSTERCALC
This keyword (along with CONTACTCALC) controls the computation frequency for solute cluster statistics (i.e., cluster sizes, cluster contact orders, and moleculeresolved cluster statistics) where a cluster is defined through the minimum atomatom distance contact definition (between any pair of residues). Note that this is the interval at which to perform cluster analysis from within the calculation of contacts (i.e., CLUSTERCALC is relative to CONTACTCALC, as SCATTERCALC is to RHCALC). The reason is that the cluster detection algorithm relies on the determination of contacts but that it may not always be a meaningful analysis to perform (see CLUSTERS.dat, MOLCLUSTERS.dat, and COOCLUSTERS.dat for further details on the output).CONTACTOFF
If contact analysis is requested (→ CONTACTCALC), this keyword defines a sequencespace offset to exclude neighboring residues from the analysis. For topologically connected systems (i.e., polymer chains) data for nearneighbor contacts such as i↔i+1 may be uninformative as they will always be in contact on account of the underlying topology. Note that the omission only applies to intramolecular contacts. Setting this to zero includes everything (even i↔i), and any larger integer lets the analysis start from the distance. The default here is zero and there is rarely a reason to change it.CONTACTMIN
For contact and cluster analysis (→ CONTACTCALC), this keyword provides the threshold value for of a residueresidue contact in Å. Here, the threshold is applied to the minimum distance between any arbitrary pair of atoms formed by the two residues in question. This defaults to 5.0 Å. Note that this computationally more expensive definition has the advantage of rendering the contact probabilities more or less sizeindependent for polyatomic residues. In the presence of excluded volume interactions, monoatomic residues (ions) of different size will still yield contact statistics which include physically meaningless biases, however.CONTACTCOM
For contact and cluster analysis (→ CONTACTCALC), this keyword gives the alternative threshold value for a residueresidue contact in Å. Here, the threshold applies to the distance between the centers of mass of the two residue it question. It also defaults to 5.0 Å. Note that (in the presence of excluded volume interactions) contact probabilities obtained this way are by design dependent on the size of the interacting residues and results may be misleading if contact statistics between pairs of residues with highly variable size are compared.PCCALC
This keyword allows the user to specify how often to perform pair correlation analysis, i.e., get distance counts for a variety of intra and intermolecular distances and  in the case of intermolecular distances  proper normalization by the current volume element. It controls the computation frequency for three different classes of distance distributions: Generic intramolecular amideamide distributions (various acceptordonor pairs, as well as centroidcentroid) (→ AMIDES_PC.dat in OUTPUTFILES), only relevant for polypeptide systems.
 Generic intermolecular pair correlation functions for solutes (→ RBC_PC.dat in OUTPUTFILES), only relevant for systems with more than one solute. Note that this option can consume inordinate amounts of memory should a lot of different solute types be present. Workarounds consist of disabling this analysis or of using the analysis group feature to redeclare most of those as solvent molecule types and to use specific atomatom distributions instead.
 Specific atomatom distributions and/or pair correlation functions as defined through an index file (see FMCSC_PCCODEFILE) (→ GENERAL_PC.dat in OUTPUTFILES).
DO_AMIDEPC
If pair correlation analysis is requested (→ PCCALC), this keyword enables the user to disable the computation of intramolecular amideamide distance distribution functions (→ AMIDES_PC.dat) by setting it to zero.PCBINSIZE
This keyword specifies the distance bin size in Å for pair correlation analysis (→ PCCALC).PCCODEFILE
This keyword specifies the path and filename to the input file for requesting specific pair correlation or distance distribution analyses (see FMCSC_PCCODEFILE). In general, the input is rather flexible and it is possible to pool many analogous or even unrelated atomatom distances under a certain code or to use unique codes for very specific requests. Upon successful parsing of the input and given that pair correlation analysis is globally requested (→ PCCALC), the output file GENERAL_PC.dat is created.GPCREPORT
This logical keyword instructs CAMPARI whether or not to write out a summary of the terms requested through FMCSC_PCCODEFILE (→ GENERAL_PC.idx). It is only available if distance distribution / pair correlation analysis is in use (→ PCCALC).SAVCALC
This keyword specifies how often to compute solventaccessible volume (SAV) fractions and solvation states for the system. If the ABSINTH implicit solvent model is in use (→ SC_IMPSOLV), this analysis can rely on the current values for those quantities (no additional, computational cost); otherwise computing atomic SAV fractions incurs a moderate computational cost. The solventaccessible volume will globally depend on the choice for the thickness of the assumed solvation shell (→ SAVPROBE). The mapped solvation states as reported for individual atoms (please refer to the ABSINTH publication for details) will depend on further ABSINTH parameters. Some of these can be adjusted through patches, e.g., usersupplied values for overlap reduction factors.SAV analysis creates at most three output files; an instantaneous one (SAV.dat) that depends on auxiliary keyword INSTSAV, an atomresolved output file that reports simulation averages (→ SAV_BY_ATOM.dat), and finally a file containing distribution functions (histograms) for selected atoms for those quantities (→ SAV_HISTS.dat). The latter file is dependent on another auxiliary keyword, i.e., SAVATOMFILE. The instantaneous output is primarily useful as a diagnostic tool for the system while the simulation is running, and to be able to compute correlation functions, multidimensional histograms, etc. for quantities related to the solvation of specific sites on macromolecules. Please refer to the descriptions of the output files for further details.
INSTSAV
If analysis of solventaccessible volume fractions is requested (→ SAVCALC), this keyword allows the user to have a quantity related to the total SAV along with a running average being printed to a dedicated output file (→ SAV.dat). In addition, the values for SAV fractions for selected atoms (via SAVATOMFILE) are written out. The latter allows the construction of correlation functions, multidimensional histograms, etc. The keyword (positive integer) is interpreted as a printout frequency relative to the frequency with which SAV analysis is performed per se. This means that the effective printout frequency will be SAVCALC·INSTSAV.SAVATOMFILE
If analysis of solventaccessible volume fractions is requested (→ SAVCALC), this keyword specifies the location and name of a simple input file (list of atomic indices, format is described elsewhere) that allows the user to select a subset of the system's atoms for creating histograms of both SAV fraction and resultant solvation state (see above). These histograms are written to a dedicated output file (→ SAV_HISTS.dat). In addition, if instantaneous output of SAVrelated quantities is requested (→ INSTSAV), the values for the SAV fractions for the selected atoms are written to the corresponding output file (SAV.dat). Note that instantaneous values for the SAV fractions allow manual computing (during postprocessing) of solvation states (using parameters set in the keyfile and/or reported in SAV_BY_ATOM.dat, and using the reference publication to retrieve the necessary expressions). It should be kept in mind that with normal settings for SAVPROBE, SAV fractions of nearby atoms are tightly coupled. This means for example that requesting information for atoms that are covalently bound will rarely yield additional information. Lastly, the binning for the histograms is fixed and uses 100 bins across the interval from zero to unity (both quantities are restricted to this interval).NUMCALC
This keyword is relevant only when the chosen thermodynamic ensemble allows for particle number fluctuations (simulations is performed in the (semi)grand canonical ensemble). It then specifies the number of Monte Carlo steps between successive accumulations of numberpresent histograms for each fluctuating particle type. For a description of the corresponding output file please refer to PARTICLENUMHIST.dat.COVCALC
This simple keyword instructs CAMPARI to collect raw data (signal trains) for select degrees of freedom in the system (currently this is restricted to all flexible dihedral angles → TRCV_xxx.tmp) every COVCALC steps. This is a nearobsolete functionality that has large overlaps with the output written to FYC.dat via TOROUT. It was meant to provide intrinsic support for variance/covariance analyses, e.g., with the ultimate goal of performing dimensionality reduction. Given that merely raw data are provided and that dihedral angle data are generally circular (periodic) variables requiring the use of circular statistics (not as trivial as it may sound), usage of this facility is generally not recommended. This option is available in different modes (see COVMODE) and may eventually be revived or extended later. Note that CAMPARI can perform intrinsic principal component analysis (PCA) as part of the structural clustering facility (→ CCOLLECT and PCAMODE).COVMODE
This keywords chooses between (currently) two types of raw data to be provided by CAMPARI in output files TRCV_xxx.tmp. It can be set to: Internal degrees of freedom (i.e., torsions) directly in torsional space (radian)
 Internal degrees of freedom (i.e., torsions) expressed as their cosine and sine components
DIPCALC
This keyword specifies how often to compute molecular and residuewise dipole moments for netneutral molecules (or residues). Because the analysis relies on atomic partial charges, dipole analysis requires SC_POLAR to be set to a value larger than zero as charges are otherwise not assigned. The (somewhat preliminary) analysis produces output files MOLDIPOLES.dat and RESDIPOLES.dat.EMCALC
This keyword specifies how often to compute spatial density distributions for the simulated system. If the density restraint potential is in use, this analysis is automatically performed at every step given that it is computed regardless. The result is an averaged density on a threedimensional grid of dimensions controlled generally by keywords EMDELTAS and SIZE. For nonperiodic boundaries, the evaluation grid is or can not be mapped to the system dimensions exactly, and keyword EMBUFFER becomes relevant. When using the density restraint potential the grid serves both the purpose of analysis as described here, and the purpose of evaluating the potential itself, which implies that it is an option to adopt the grid dimensions from the input density map. This is the default behavior for a cuboid system with 3D periodic boundary conditions when EMDELTAS is not provided.The resultant spatial density is that of a given atomic property selected by keyword EMPROPERTY. It is written to an output file in NetCDF format, an external library required to use this feature. The details of the file format CAMPARI use are described elsewhere. The spatial density is computed as follows:
ρ_{ijk} = ρ_{sol} + V_{ijk}^{1} Σ_{n}^{N} [ X_{n}  γ_{n}V_{n}ρ_{sol} ] Π_{d}^{3} B_{A} ( r_{n}^{d}  P_{ijk}^{d} )
Here, V_{ijk} is the volume of the grid cell with indices i, j, and k, N is the number of atoms in the system, X_{n} is the target property of the atom with index "n", V_{n} is that atom's volume, and r_{n}^{d} are the three components of its position vector. The parameter γ_{n} is a pairwise, volume overlap reduction factor that corrects atomic volume for overlap with covalently bound atoms. It is explained in some detail elsewhere. The parameter ρ_{sol} sets a physical background density for the property in question, and this is relevant when not all matter contributing to the property density in the system is represented explicitly. In such a case, an assumed vacuum would lead to severe errors. Note that atomic volumes and volume reduction factors are no longer relevant if ρ_{sol} is zero in the above equation. Finally, the product in the above equations utilizes cardinal Bspline functions of order "A", B_{A}, which are assumed centered at the center of each grid cell (vector P_{ijk} with components P_{ijk}^{d} for each dimension). This technique of distributing a property on a lattice is shared with the particlemesh Ewald method.
EMDELTAS
If the density restraint potential is not in use, but spatial density analysis is requested, this keyword is mandatory and sets the lattice cell size of the analysis grid by providing three floating point numbers corresponding to the lattice cell sizes in Å for the x, y, and z dimensions, respectively.Conversely, if the density restraint potential is in use, this keyword is optional and allows the user to set a lattice cell size different from the one used by the input density map. The keyword again requires the specification of three floating point numbers that set the lattice cell sizes in Å for the x, y, and z dimensions of the analysis and evaluation grid, respectively. Note that acceptable choices require that it be possible to superpose the cells of the input density map exactly with the analysis grid after reducing its resolution to that of the input map. Minor adjustments may be made automatically to system size and/or the origin of the input map. If, for example, in the xdimension the input map has 10 cells of width 2Å, and the evaluation grid has 26 cells of width 1Å, then the system origin has to be chosen such that the left boundary of the first cell of the input density aligns with the left boundary of the first, third, or fifth cell of the evaluation grid (but not any others). In the same example, CAMPARI would reject a system size of 25Å, because the resultant number of cells in the xdimension would not be divisible by the integer factor corresponding to the differences in resolution (here 2). It would also reject an origin aligning the first input cell to the seventh evaluation grid cell, because this would mean that the input map extends beyond the system boundaries. Finally, implied boundary conditions of the input map are not made to correspond to system boundary conditions automatically. For periodic boundary conditions of the system, the evaluation grid is and must be fit exactly to the system dimensions.
EMPROPERTY
If spatial density analysis is requested, or if the density restraint potential is in use, this keyword lets the user pick an atomic property to be distributed on a lattice. If this is supposed to work as a density restraint, there are only two options available at the moment: Use atomic mass (resultant units are g/cm^{3})
 Use atomic number, i.e., proton mass (resultant units are also g/cm^{3} for convenience)
 Use atomic charge (resultant units are e/Å^{3})
Note that additional options may be made available in the future.
EMBGDENSITY
If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets an assumed background level for the atomic property in question. In general, the value should be zero if all relevant matter in the system is represented explicitly, i.e., if empty space is indeed meant to correspond to a vacuum. If not, the value should be given in appropriate units depending on the property the density is derived from. These are g/cm^{3} for mass and proton densities (atomic number), and e/Å^{3} for charge.EMBUFFER
If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets a ratio for how much to extend the evaluation grid for spatial densities beyond any nonperiodic boundaries of the system. In the direction of a nonperiodic boundary, CAMPARI takes the maximum dimension (e.g., the diameter of a sphere) and multiplies it with this factor to obtain the (approximate) size of the rectangular cuboid grid. Alignment with a potential input grid is achieved by shifting the origin of the evaluation grid slightly. Note that the behavior will generally be undefined for cases where solute material samples positions off the evaluation grid. It is up to the user to ensure that the buffer spacing is big enough for the stiffness of the boundaries to prevent this from happening.EMBSPLINE
If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets the order of Bsplines used to distribute the atomic property of interest on the lattice. This setting corresponds to parameter "A" in the equation above. Bsplines of order 3 or higher lead to functions with smooth derivatives, and are appropriate for gradientbased methods. Bsplines have finite support, and the cost per atom will increase with A^{3} for a threedimensional lattice. The limiting case of A being unity corresponds to a simple binning function, whereas for large A, a Gaussian function is recovered. The effective width does not grow linearly with A, but it is rather the tails of the functions that grow. This implies that very large values for A are probably not a useful investment of CPU time. Note that the effective width of the Bspline can be thought of as setting an inherent resolution or averaging scale for a given atom in question, since it replaces a point function with a distribution. The choice for this keyword should therefore be made in concert with the choice of formal grid resolution.DIFFRCALC
This keyword specifies how often to compute approximate fiber diffraction patterns for the whole system (excluding ghost particles in GC simulations → ENSEMBLE). The system is aligned according to an assumed fiber axis in the system (see DIFFRAXIS), and amorphous diffraction patterns using cylindrical coordinates (through FourierBessel transform) are computed. The code currently assumes atomic scattering cross sections which are proportional to atomic mass with the additional modification that all hydrogen atoms are excluded from the diffraction calculation. Specifically, the atomic scattering function for heavy atom i is proportional to m_{i}/m_{C} with a proportionality constant yielding units of the square root of scattering intensity. It is zero for hydrogen atoms. See DIFFRACTION.dat for more details. As a cautionary comment it should be noted that these calculations are somewhat untested and that output should be carefully examined.DIFFRRMAX
For diffraction calculations (→ DIFFRCALC), this specifies the maximum number of bins in the reciprocal radial dimension (r in cylindrical coordinates). The resultant bins will be centered around zero.DIFFRZMAX
For diffraction calculations (→ DIFFRCALC), this specifies the maximum number of bins in the reciprocal axial dimension (z in cylindrical coordinates). The resultant bins will be centered around zero.DIFFRRRES
For diffraction calculations (→ DIFFRCALC), this gives the resolution in the reciprocal radial dimension (r in cylindrical coordinates) in Å^{1}.DIFFRZRES
For diffraction calculations (→ DIFFRCALC), this gives the resolution in the reciprocal axial dimension (z in cylindrical coordinates) in Å^{1}.DIFFRJMAX
This defines the maximum order of Bessel functions to use in the FourierBessel (Hankel) transform to generate the (fiber) diffraction pattern (→ DIFFRCALC). Note that the transform takes the product of actual and reciprocal radial coordinate as its argument. Hence, the maximum order will determine how meaningful the generated information for large values of inverse radial dimensions is. This soft cutoff will scale reciprocally with the size of the system in the radial dimension. These features arise due to the fact that Bessel functions of order n only contribute nonzero values beyond a (unitless) argument value of ca. n. Also note that the input file for the Bessel functions (see FMCSC_BESSELFILE) needs to provide the tabulated functions up to the necessary order.DIFFRAXIS
For diffraction calculations (→ DIFFRCALC), it is possible (and usually meaningful and necessary) to use a fixed system axis as the assumed fiber axis. This is (naturally) particularly appropriate for singlepoint calculations on specific structures. The axis' x, y, and z components have to be provided as three floating point numbers. The length of the vector is not important. The axis will pass through the point defined (see DIFFRAXON). If this keyword is not specified, the program will identify the longest possible atomatom distance in the system, and use the resultant axis. Note that this axis will not be constant with respect to the absolute (lab) coordinates, but that it is supposed to cover cases where changes in configuration are allowed (especially if rigidbody movement is permitted).DIFFRAXON
This keyword specifies the point the (constant) axis (see DIFFRAXIS) for diffraction analysis (→ DIFFRCALC) will pass through. This will define the zeropoint in the zcoordinate, and hence the origin of the cylindrical coordinate system. If this keyword is not provided, CAMPARI will assume the {0.0 0.0 0.0}point for this (independent of specifications for the system origin).REOLCALC
This keyword is only relevant in MPIReplica Exchange calculations (or parallel trajectory analysis runs using the same setup). It instructs CAMPARI to compute various overlap measures between the different Hamiltonians employed in the REMC/D run (see N_XXX_OVERLAP.dat). Note that this relies on the evaluation of the system energy at different conditions, i.e., Hamiltonians. Currently, the software makes the assumption that the energy has to be fully reevaluated for each condition, which means that there is a significant cost associated with the overlap calculation. Moreover, in the presence of cutoffs/longrange corrections, the foreign energy evaluation always uses those cutoffs. Also note that the user controls whether to calculate foreign energies across all replicas (see REOLALL). If only neighboring conditions are requested, output in N_XXX_OVERLAP.dat may be truncated or uninformative. Lastly, it is important to mention that the MC branch of the energy functions is used only in plain REMC calculations, and that in all other cases (including hybrid methods → DYNAMICS) the dynamics branch is used. This is important since cutoff and longrange treatments can easily be inconsistent between the two (see LREL_MC and LREL_MD).REOLINST
This keyword is only relevant in MPIReplica Exchange calculations (or parallel trajectory analysis runs using the same setup). This keyword requests instantaneous "foreign" energies to be written (see N_XXX_EVEC.dat). "Foreign" or "cross"energies are simply the energies of the current structure evaluated at Hamiltonians different from the one generating the ensemble. Note that the user controls whether to calculate foreign energies across all replicas (see REOLALL). If only neighboring conditions are requested, a truncated vector (length 2 or 3) is provided in N_XXX_EVEC.dat. To facilitate frequent overlap analysis with sparser instantaneous output, this keyword is interpreted as a subordinated frequency for REOLCALC (as SCATTERCALC is relative to RHCALC).REOLALL
This keyword is only relevant in MPIReplica Exchange calculations (or parallel trajectory analysis runs using the same setup). It is interpreted as a simple logical which determines whether "foreign" energies are computed over all other or just the neighboring replicas (see N_XXX_EVEC.dat and N_XXX_OVERLAP.dat).TRACEFILE
If a parallel trajectory analysis run is performed (→ details elsewhere), this optional keyword allows the user to supply a file with a running map of replicas to starting conditions. Details of format and interpretation are given elsewhere. The default map assumed by CAMPARI is the identity mapping 1..REPLICAS. If a trace file is provided, sets of step number and an updated map for that specific step are read. This is primarily meant to make replica exchange trajectories that are continuous in condition (i.e., have conformational jumps in them) continuous in conformation (i.e., afterwards they have jumps in condition in them). In such a case, the trace file is the history of replica exchange moves such as output by CAMPARI itself. CAMPARI will then recombine information from the input trajectories according to the trace. This means that all analyses performed are on the unscrambled trajectory that can of course also be written (→ XYZOUT). Naturally, this keyword can also be used to specify any other map for other applications, e.g., to create trajectories for obtaining bootstraptype error estimates. The relation of step numbers in the trace file to frames in the trajectories is handled by keywords RE_TRAJOUT and RE_TRAJSKIP.RE_TRAJOUT
If a parallel trajectory analysis run is performed (→ details elsewhere), and if a file with the replica exchange history (trace) has been provided, this keyword lets the user set the trajectory output frequency CAMPARI is supposed to assume for the supplied input trajectories. This is important because the exchange trace is meant to use simulation step numbers that are not preserved in trajectory analysis mode (no step number or time information from input trajectories are read and used). A successful unscrambling of the trajectories requires that the exchange trace is exhaustive at the level of the output frequency of this keyword. This means that it is sufficient to provide the current map of condition to starting structure for every snapshot in the input trajectories (more information can be supplied without harm, less information will lead to errors).RE_TRAJSKIP
If a parallel trajectory analysis run is performed (→ details elsewhere), and if a file with the replica exchange history (trace) has been provided, this keyword lets the user set the equilibration period for trajectory output that CAMPARI is supposed to assume for the supplied input trajectories. This is important because the exchange trace is meant to use simulation step numbers that are not preserved in trajectory analysis mode (no step number or time information from input trajectories is read and used). Both RE_TRAJOUT and this keyword are required for CAMPARI to correctly relate the frames in the trajectories to the step numbers in the exchange trace. Of course, it is also possible to edit the file with the exchange trace and set RE_TRAJOUT and RE_TRAJSKIP to 1 and 0, respectively.CCOLLECT
This keyword controls the frequency with which a selected subset (see CDISTANCE and CFILE) of the trajectory data (typically in a trajectory analysis run → PDBANALYZE) is stored in a large array in memory for postprocessing. Such analysis currently consists of different algorithms (→ CMODE) to identify structural clusters in the data and is performed after the last step of the run has completed. If set to something larger than the number of simulation steps (NRSTEPS), the clustering analysis is disabled (also the default).Various output will be produced aside from information written directly to standard out or the logfile. The first and foremost is a list of cluster annotations per analyzed snapshot (→ STRUCT_CLUSTERING.clu) along with a helper script for the visualization software VMD (→ STRUCT_CLUSTERING.vmd). Furthermore, CAMPARI will produce a file representing the clustering as a graph in an xmlbased (socalled "graphml") format (→ STRUCT_CLUSTERING.graphml). Taken together these files allow further analyses of the clustering, primarily those that take advantage of the fact that the clustering yields a complex network/graph. All clustering algorithms (→ CMODE) will write a summary of the determined clusters (usually involving at least the number of contained snapshots and a measure of size) to logoutput. Also, they will  normally at the very end  give an empirical assessment of clustering quality. Currently, these numbers are meant primarily for developer consumption and the reader is referred to the source code to understand how they are computed. The exact progress index method is an exception as it does not explicitly record a clustering (the aforementioned output files are missing).
Note that structural clustering breaks the typical CAMPARI paradigm of "onthefly" analysis since the bulk of the CPU time for analysis will be invested only at the very end. Therefore, structural clustering will most often be used in trajectory analysis runs as it will be highly undesirable to risk an unclean termination of an actual simulation (certain algorithms for structural clustering require large amount of memory and/or CPU time). Note as well that structural clustering should not be confused with the analysis of molecular clusters (see CLUSTERCALC and its corresponding output files).
A special remark is required for simulation runs using the MPI averaging technique. Similar to any use of the clustering functionality "onthefly", trajectory output should be generated in accordance with the setting for CCOLLECT (most easily by using MPIAVG_XYZ and a matching value of XYZOUT). This is so the clustering results can be annotated and understood at all. In an MPI averaging run, CAMPARI will then at each collection step gather data from all replicas and store them in an array allocated exclusively by the master process. The data arrangement is such that trajectories will be continuous and ordered by increasing replica number. The concatenation introduces spurious transitions that may affect subsequent computations. Data collection causes a synchronization and communication requirement absent in other types of MPI averaging calculations. At the end of the simulation, the resultant concatenated trajectory is analyzed exclusively by the master process, which  depending on settings and algorithms in use  may lead to severe imbalances in terms of both memory consumption and CPU time requirements. This should be kept in mind when using this approach across machines not sharing any memory. To enforce the complementary behavior of every identical replica analyzing its own trajectory, it may be possible to use a fake replica exchange run by using a single dummy (or nearly invariant) parameter for exchange.
Because the chosen subset of degrees of freedom often represents a limited space of particular interest to the user, CAMPARI offers to compute all principal components for the collected data if, i) the chosen proximity metric is not circular (this excludes options 12 for CDISTANCE); ii) the code was linked to a linear algebra library (LAPACKcompliant, see installation instructions for general information on linking libraries); and iii) there are more samples than variables (degrees of freedom). Principal component analysis (PCA) is essentially a dimensionality reduction technique that works by identifying linear transforms of the meansubtracted data that collect maximal sample variance in as few components as possible. The principal components can be computed via singlevalue decomposition (SVD). They are normalized and orthogonal, i.e., have unit length and zero (linear) covariance. The latter should not be equated with a lack of correlation. Many nonlinear correlations between variables yield zero covariance. The reason that circular (periodic) data are currently not supported is that variance and in particular covariance become somewhat empirical and laborious to compute. If PCA is performed (→ PCAMODE), CAMPARI produces up to two output files, one containing the eigenvectors themselves (PRINCIPAL_COMPONENTS.evs) and another optional one containing the data matrix in PCspace (PRINCIPAL_COMPONENTS.dat). The latter can be used to derive probability or free energy surfaces in reduceddimensional spaces.
PCAMODE
If data for structural clustering are collected (→ CCOLLECT), this keyword allows the user to instruct CAMPARI to perform PCA on the collected samples (see above for details). Three options are currently available: No PCA is peformed
 PCA is performed via SVD, and the (left) eigenvectors are written to a dedicated output file.
 PCA is performed via SVD, and the (left) eigenvectors are written to a dedicated output file. In addition, the original sample data are transformed into PCA space and the resultant values are written to another output file.
CDISTANCE
If data for structural clustering are collected (→ CCOLLECT), this keyword defines what type of data to collect and how to define structural proximity. There are currently 8 supported options:
This option is tailored toward the intrinsic degrees of freedom of a typical CAMPARI simulation
that are also the essential internal degrees of freedom of most molecular systems, i.e. the molecules' dihedral angles.
The values {φ_{k}} for a set of K dihedral angles are collected throughout the run.
A list can be provided by using a dedicated input file (→ CFILE), otherwise all
most CAMPARI internal degrees of freedom are used (excluding those pertaining to the conformation of fivemembered
rings). The distance between two states is then given as:
d_{l↔m} = [ (1.0/K) · Σ_{k}^{K} ( (φ_{k}^{l}  φ_{k}^{m}) mod 2π )^{2}]^{1/2}
Because dihedral angles are periodic (circular) quantities, a meaningful metric of proximity must account for boundary conditions, hence the "mod 2π" term. Dihedral anglesbased clustering poses  aside from periodicity  the challenge that all considered degrees of freedom are bounded and that the strongest contribution to the signal will come from those torsions with large variance, which unfortunately are often the ones of least interest (for example sidechain torsions). Therefore, a careful selection of the subset to use is critical for an informative clustering. Like any other method, dihedral anglebased clustering is vulnerable to Euclidean distances in highdimensional spaces becoming uninformative. Note that all dihedral anglebased proximity criteria are useful primarily for single molecules since relative intermolecular orientations are not representable whatsoever. 
This is identical to the previous option only that each dihedral angle is weighted by the combined effective masses (the
associated diagonal element in the mass matrix, i.e., massmetric tensor)
of that very dihedral angle in the respective states l and m, i.e. {IM_{k}^{l}+IM_{k}^{m}}.
The distance between two states will then be given as:
d_{l↔m} = [ (Σ_{k}^{K} (IM_{k}^{l}+IM_{k}^{m}) ) ^{1} · Σ_{k}^{K} (IM_{k}^{l}+IM_{k}^{m}) · ( (φ_{k}^{l}  φ_{k}^{m}) mod 2π )^{2}]^{1/2}
This option attempts to remedy the problem with the previous one regarding the impact of "uninteresting" degrees of freedom. The weighting with the effective masses ensures that slow degrees of freedom (e.g. central backbone torsions) will contribute much more to the overall signal than sidechain torsions. An additional complication incurred by this is that the weights are now variables themselves when considering sets of conformational distances. 
This option is largely identical to option 1. It carries all the same caveats with the exception of the periodicity
of dihedral angles. Here, we expand each dihedral angle into its sine and cosine terms to construct a distance metric as follows:
d_{l↔m} = [ (0.5/K) · Σ_{k}^{K} (sin(φ_{k}^{l})  sin(φ_{k}^{m}))^{2} + (cos(φ_{k}^{l})  cos(φ_{k}^{m}))^{2}]^{1/2} 
This is the analogous modification of the previous option by introducing weights composed from the effective masses:
d_{l↔m} = [ 0.5 (Σ_{k}^{K} (IM_{k}^{l}+IM_{k}^{m}) ) ^{1} · Σ_{k}^{K} (IM_{k}^{l}+IM_{k}^{m}) · ( (sin(φ_{k}^{l})  sin(φ_{k}^{m}) )^{2} + (cos(φ_{k}^{l})  cos(φ_{k}^{m}) )^{2} ) ]^{1/2}
The same caveats apply. 
This option is probably the most commonly used variant, the positional RMSD. The
Cartesian position vectors {r_{k}} for a set of K atoms
are collected throughout the run. A list can be provided by using a dedicated input file
(→ CFILE), otherwise all atoms in the system are used.
The distance between two states is then given as:
d_{l↔m} = [ (1.0/K) · Σ_{k}^{K} ( r_{k}^{l}  RoTr(r_{k}^{m}) )^{2}]^{1/2}
Here, RoTr is meant to indicate rotation and translation operators that superpose the {r_{k}}^{m} optimally with the frame provided by the {r_{k}}^{l}. This alignment uses the same quaternionbased algorithm mentioned elsewhere. Superposition (alignment) implies that the atomic RMSD is not necessarily a bona fide metric of distance as it is not guaranteed to satisfy d_{l↔m} ≤ d_{l↔p} + d_{p↔m}, i.e., the triangle inequality. This is because the operator RoTr is different for computing d_{p↔m} than it is for computing the other two distances. In reality, for similar structures, this is never really a problem in the context of clustering. RMSDbased clustering is  like any other method  vulnerable to Euclidean distances in highdimensional spaces becoming uninformative and  in particular  to obscuring of the signal by uneven variances (a reason why very commonly terminal parts of polymer are excluded from such analyses). The alignment step for both this and the next option can be disabled with the help of keyword CALIGN (RoTR is then simply the identity operator). Without alignment external degrees of freedom become part of the distance criterion. 
This is similar to the previous option, and is only relevant if alignment is performed.
Then, this option allows the user to split the atomic index sets used for alignment and distance computation, i.e., the
alignment operator, RoTr, minimizes pairwise distances computed over an independent set of atoms that can either be a superset,
subset or completely different set of atoms than the one specified via CFILE.
Then, if we term the distance set {D} and the alignment set {A}, with {A} to be provided via
ALIGNFILE, the distance between two states will be given as:
d_{l↔m} = [ (1.0/D) · Σ_{d}^{D} ( r_{d}^{l}  RoTr_{{A}}(r_{d}^{m}) )^{2}]^{1/2}
Note that choosing disparate sets can easily destroy the fundamental meaning of alignment, i.e., the removal of differences caused purely by external (rigidbody) degrees of freedom. This in turn would almost certainly lead to violations of the assumption that members of different clusters are dissimilar, and can also eliminate the notion of similarity amongst members of the same cluster. Conversely, it can be useful in improving the signaltonoise ratio for cases where one is interested in states populated by a specific part of a much larger system that moves as a single entity (specifically, states characterized by relative arrangements of parts of a system may emerge more clearly if alignment is performed on the whole entity, but distances are computed only over a small portion of interest). Note that errors in calculations relying on mean cluster properties computed for example in the treebased algorithm or hierarchical clustering (→ CMODE) using mean linkage can easily become large if the two atom sets have little overlap. Specifically, a cluster of similar snapshots as determined by the distance set, which is constituted by elements with large differences in the alignment set, will produce deteriorating accuracy of, for example, computing a snapshot's mean distance to it. This is because the heterogeneity of the alignment operator is masked by the simplified algebra used to compute these properties in constant time. 
Let us define a set of K interatomic distances, {r_{ij}} over unique atom pairs i and j. These distances
are collected throughout the run. A list can be provided by using a dedicated input file
(→ CFILE), otherwise all possible, unique interatomic distances
are used (this latter scenario is rarely desirable due to redundancy, efficiency, and memory
requirements). The distance between two states will then be given as:
d_{l↔m} = [ (1.0/K) · Σ_{k}^{K} ( r_{ij(k)}^{l}  r_{ij(k)}^{m} )^{2}]^{1/2}
I.e., the chosen distance metric is simply the root mean square deviation across the set of interatomic distances. Distancebased clustering inherently removes external degrees of freedom from the proximity measure, and it is therefore suitable to most applications. As with any other measure, Euclidean distances in highdimensional spaces may become uninformative and results may be obscured by uneven variances. 
This is identical to the previous option only that each distance is weighted by the combined mass of
the constituting atoms. The distance between two states will then be given as:
d_{l↔m} = [ (Σ_{k}^{K} (m_{i(k)}+m_{j(k)}) ) ^{1} · Σ_{k}^{K} (m_{i(k)}+m_{j(k)}) · ( r_{ij(k)}^{l}  r_{ij(k)}^{m} )^{2}]^{1/2}
Here, m_{i} denotes the mass of atom i.
CREDUCEDIM
If data for structural clustering are to be collected (→ CCOLLECT), and principal components are to be computed (→ PCAMODE), this keyword allows the user to elect to run the clustering algorithm (→ CMODE) on a dataset of reduced dimensionality that corresponds to the first N_{V} data vectors in the transformed space, where N_{V} is set by the choice for this keyword. It is assumed that the principal components are sorted from largest to smallest variances such that the maximum amount of variance (which hopefully corresponds to important information) is included.Note that the data obtained from PCA are always interpreted as simple, aperiodic signals, i.e., none of the peculiarities for different choices of CDISTANCE are considered any longer. For instance, when using positional RMSD with alignment, alignment is performed only during PCA. Note that PCs are the result of a linear transform meaning that clustering results become invariant if this keyword is to set to the actual dimensionality of the data collected originally (and no other approximations are introduced).
CMODE
If data for structural clustering are to be collected (→ CCOLLECT), this keyword allows the user the specify the algorithm by which the accumulated data are to be clustered. Before going into detailed options, a few general words are in order: CAMPARI strives to allow the geometric and other net quantities of a collection of snapshots to be computable irrespective of which metric of proximity is chosen (→ CDISTANCE). For options 3 and 7 this is trivial. For option 1, periodicity has to be accounted for. This is solved approximately by i) making sure the proper image of an added snapshot is considered, ii) adding appropriate periodic shift to the geometric center increments each time a boundary violation is found after updating. Other transforms are corrected accordingly. Options 2 and 4 incur the use of several additional cluster sums (means) due to the changing weights (for details, the reader is referred to the source code). For option 5 (atomic RMSD), the first member of a cluster defines a reference frame. This frame is used for alignment of all subsequently added frames (therefore, the definition and all derived quantities are approximate although the error is usually negligible for small clusters). Options 6 and 8 are the corresponding massweighted equivalents of 5 and 7 and work by storing the massweighted geometric center, i.e. for option 6, M_{t}^{1} m • x_{k} is aligned and accumulated. Here, "•" denotes the Hadamard product (elementbyelement multiplication), m is the mass vector (three entries per atom with identical mass), and M_{t} is the sum of all masses of atoms in the coordinate set.
 With the geometric center being defined, certain properties of a cluster are computable at constant cost with respect
to cluster size. For example, the average distance from the center ("radius") is given as:
R^{2} = N^{2} · [ N · Σ_{k}^{N} · x_{k}^{2} 
(Σ_{k}^{N} · x_{k})^{2}]
x_{k} denotes the coordinate vector belonging to the k^{th} member of the cluster. Other properties such as the mean snapshottosnapshot distance ("diameter") are similarly available. All that is required is that (in the simplest case) each cluster accumulates its linear sum vector and squared sum during construction.
 The data are clustered according to the leader algorithm. This is a very simple algorithm that sequentially scans the data. Each new snapshot is compared to the center snapshots of preexisting clusters and added to the first one for which a provided distance threshold is satisfied (→ CRADIUS). If no such cluster is found, a new cluster is spawned. Results will be input order dependent and clusters will have illdefined "centers" since the central snapshot is set at the time the cluster is spawned and remains unchanged. Processing direction(s) can be chosen with the auxiliary keyword CLEADER.
 The data are clustered according to a modified leader algorithm. This works very similarly to the standard leader algorithm with two important modifications. First, each new snapshot is compared to the current geometric center of preexisting clusters to evaluate the threshold criterion. Second, the result is (optionally → CREFINE) postprocessed and snapshots belonging to smaller clusters that would also satisfy the threshold criterion for a larger cluster are transferred to that larger cluster. There are exactly two passes over the data of this refinement step (iteration is difficult and timeconsuming due to continuously changing cluster centers). Processing direction(s) can be chosen with the auxiliary keyword CLEADER and the threshold criterion is set via CRADIUS. Modified leaderbased clustering tends to generate fewer clusters compared to the standard leader algorithm due to better cluster centers. Due to centers changing position, the maximum snapshottosnapshot distance is no longer guaranteed to be below twice the value for CRADIUS (although in typical scenarios violations are very rare).

The data are clustered according to a hierarchical algorithm. In theory, a hierarchical algorithm works by first creating
a sorted list of all N(N1)/2 unique snapshottosnapshot distances. Starting with the shortest distance, the two constituting
snapshots do one of the following:
 They spawn a new cluster (if they are both unassigned and the threshold criterion is fulfilled).
 They merge the two clusters they belong to (if they are both assigned and the threshold criterion is fulfilled).
 The cluster the previously assigned snapshot is part of is appended with the unassigned snapshot (if one of them is unassigned and the threshold criterion is fulfilled).
 They terminate the algorithm (if the threshold criterion is not fulfilled).
Because the problem as stated is intractable for large datasets, CAMPARI uses a dedicated scheme to help keep the computation as feasible as possible. In the first step, a snapshot neighbor list is generated that uses a truncation cutoff set by CCUTOFF. The neighbor list generation uses a preprocessing trick that aims to minimize the number of required distance calculations. This preprocessing step relies on a truncated leader algorithm whose target (threshold) cluster size is set by the (borrowed) keyword CMAXRAD. The resultant clusters are then used to screen groups of snapshot pairs and to exclude them from distance computations. Unfortunately, the problem of dimensionality often renders this procedure worthless. In highdimensional spaces → CFILE, volume grows with distance so quickly that the distance spectrum becomes increasingly δ functionlike, and in turn becomes unsuitable for exploiting additive relationships. This stems from conformational distances having a rigorous upper bound for systems in finite volume and with fixed topology. The situation is obfuscated further if many of the dimensions are tightly correlated (such that the effective number of dimensions is indeed lower). Alternatively, this neighbor list can be read in from a previously obtained file (→ NBLFILE). The neighbor list is then further truncated to exactly match the size threshold specified via CRADIUS. For the algorithm to work properly, CCUTOFF has to be at least twice the value of CRADIUS. From this truncated list, a global list is created and sorted according to size. This can be quite memorydemanding. The global list is then fed into the algorithm as described. The results of hierarchical clustering depend very strongly on the linkage criterion (→ CLINKAGE). 
The data are arranged according to a method described in detail elsewhere (→ reference). Briefly,
by using a specified criterion of distance, either the exact or an approximate minimum spanning
tree (MST) of the graph constituted by all trajectory snapshots (vertices) and the N·(N1)/2 unique, pairwise distances is constructed.
Provided a certain starting snapshot, the MST is used to generate a sequence of snapshots (progress index) in which a snapshot added has the minimum distance
to any other snapshot that has already been added, i.e., it is the object nearest to the current set of objects in the sense of minimum
linkage (as discussed in a different context elsewhere). The complete progress index is then simply a sequence of
snapshots that is likely to group similar objects together. This assumes that the phase space density sufficiently is inhomogeneous, i.e.,
there are enclosed regions (basins) that are sampled preferentially and that consequently have higher point density than the regions
connecting them. It is important to keep in mind that the chosen distance may transform the full, underlying phase space.
The idea of this method is to provide an annotation function for the progress index that contains kinetic (or effectively kinetic) information. This assumes that the evolution of the system is incremental and happens on a continuous manifold. Therefore, apparent jumps in phase space such as those introduced by the replicaexchange methodology may diminish the interpretability of the results obtained with this algorithm (unlike purely structural clustering algorithms). There are a few possible annotation functions, and they are discussed further in the documentation of the corresponding output file. For practical concerns, there is a methodological choice to pick either the exact or the approximate scheme (→ CPROGINDMODE) in addition to providing a starting snapshot (→ CPROGINDSTART). There are further keywords associated exclusively with this methodology (see CPROGINDRMAX, CPROGINDWIDTH, CBASINMIN, and CBASINMAX) as well as auxiliary keywords overlapping with other approaches (see CPROGINDMODE for details). 
The data are clustered according to a treebased algorithm (→ reference) that
shares architectural similarities with the BIRCH clustering algorithm.
The BIRCH scheme is focused on achieving i) minimal IO cost and processing of datasets that exceed the available
memory in size; ii) time that increases quasilinearly with the size of the dataset; iii) stable clusterings
(invariant to input size or repeated application) that require only local information.
To do so, a hierarchical tree is constructed consisting of several levels down to the outermost leaf nodes which represent
the final clustering. The tree is built incrementally while the data are scanned. In the first pass, each new data point will propagate up
the tree (from the root) and be associated with the closest member of the set considered or (at the leaf level) spawn a new cluster in
case the (single) threshold criterion cannot be satisfied.
Upon page size violations, a split is induced that may propagate all the way down to the root.
The criteria for the splits are governed by considerations of consumed memory and disk space ("page size").
The initial result is an approximate clustering that is no longer representing individual data points, but represents sets of points as
socalled clustering feature (CF) vectors (size, linear sum, squared sum, see above). The clustering is made stable
by postprocessing the initial tree in two steps: first, leaf nodes are reclustered with a hierarchical scheme;
second, all data points are sequentially redistributed into the tree resulting from the previous step (this makes the clustering stable and eliminates
a specific type of error of an identical data point ending up in two different clusters).
The tree algorithm implemented in CAMPARI differs from the BIRCH scheme  amongst other changes  by dropping the requirements i) and iii) mentioned above. The entire dataset is kept in memory. The tree is assumed to be of a set height (number of hierarchical levels → BIRCHHEIGHT) that span a provided range of threshold criteria (upper bound set by CMAXRAD). After that, the algorithm will proceed in the first pass by choosing  at each level up to the penultimate one  the closest cluster and subsequently scanning only the child clusters of the chosen one. If the threshold at the given level is violated, a new cluster is created at that level. Pathsearching will still continue using the children of the nearest cluster even if the threshold is not satisfied. This defines a unique path through the tree. Due to children of a parent being able to probe a larger phase space volume than the parent itself, it can occur that failed assignments recover further up the tree. In such a case the recovery child is linked with the newly created clusters, and now possesses an extra parent. The first pass provides a fixed tree up the penultimate level. In the second pass, the procedure is repeated with two modifications: i) the leaf level is now included (analogously); ii) nonleaf clusters are no longer appended. The resultant clustering is obtained as the collection of leaf nodes at the end of the second pass.
The employed definition of proximity is that of the distance of the snapshot to the geometric center of the cluster. This leads to centroid drift while clusters are created at all levels, and means that assignments can still fail even in the second pass (handled identically to the first pass in such a case). We do keep a list of indices into the dataset associated with each cluster to be able to later quickly access that information. As for refinement, the challenge is to find protocols that do not exceed the time/space complexity of the algorithm itself. Currently, there is only one type of optional refinement step that will locally merge leaf clusters that have different, but proximal parent clusters, if the diameter of the joint cluster decreases upon merging (relative to the individual values).
The treebased algorithm is extremely fast (especially if refinement is skipped), and will generate more clusters than the leader algorithm with the same setting for CRADIUS. However, the cluster distribution is altered nonuniformly (the largest clusters in the treebased algorithm will often be larger, but the number of very small clusters (15 snapshots) will increase substantially, especially for large height). Overall, the clusters tend to be substantially tighter. In essence, the multiple hierarchical levels act as a layered array of filters that creates a resultant net pore size that is smaller than any one of the filters by themselves.
CFILE
If data for structural clustering are to be collected (→ CCOLLECT), this keyword provides the path and location to an input file selecting a subset of the possible coordinates. For options 14 of the proximity measure, this file is a single column list of indices specifying specific system torsions (see elsewhere). For options 56, it is a single column list of atomic indices (see elsewhere). Lastly, for options 78, it is a list of pairs of atomic indices (two columns, see elsewhere)).CALIGN
If structural clustering is performed (→ CCOLLECT), and an atomic RMSD variant is chosen as the proximity measure (→ CDISTANCE), this keyword can be used to specifically disable the alignment step that occurs before the actual RMSD of the two coordinate sets is computed. To achieve this, provide any value other than 1 (the default) for this on/offtype keyword.CCUTOFF
If data for structural clustering are to be collected (→ CCOLLECT), and an algorithm is used that requires a rigorous snapshot neighbor list (currently either hierarchical clustering or the exact variant of the progressindex based scheme → CMODE), this keyword defines the cutoff distance for said neighbor list. It is very critical to choose an appropriate (as small as possible) value for this parameter as otherwise CAMPARI will both run out of (virtual) memory and create humongous files that are written to disk. Note that even with a minimal setting, the problem of computing and storing the neighbor list can very easily become intractable. Often simulation data in highdimensional spaces will be clustered very unevenly in space meaning that multiple "length scales" in distance space matter. This is detrimental to a neighbor list relying on defining a single, specific length scale through CCUTOFF.NBLFILE
If data for structural clustering are to be collected (→ CCOLLECT), and an algorithm is used that requires a rigorous snapshot neighbor list (currently either hierarchical clustering or the exact variant of the progressindex based scheme → CMODE), this keyword can be used to provide name and location of an input file in the appropriate format. CAMPARI uses the versatile binary NetCDF format for this purpose, and consequently the code needs to be linked to the NetCDF library for this option to be available (see installation instructions). Most commonly, this type of file will have been created by CAMPARI itself (it is automatically written if the code is linked against NetCDF and if an algorithm is used that requires a neighbor list → corresponding documentation). This keyword is primarily meant to circumvent the costly neighbor list generation in subsequent applications of the algorithm (for instance, with different settings for CRADIUS).CRADIUS
If structural clustering is performed (→ CCOLLECT), and an algorithm is used that uses a distance (span) threshold criterion (→ CMODE), this keyword sets the value for said threshold criterion. For leaderbased clustering this is either the distance from the center snapshot (standard leader) or from the current geometric center (modified leader) and therefore constitutes a maximum cluster radius. For hierarchical clustering, twice this value is the maximum distance of any two snapshots to be part of the same cluster, so again CRADIUS will control the maximum cluster radius. For treebased clustering, this keyword again sets the maximum distance from the current geometric center. Values are to be provided in Å for proximity measures 58, unitless for 34, and in degrees for 12 (→ CDISTANCE).CREFINE
If structural clustering is performed (→ CCOLLECT), this simple logical keyword lets the user control whether to apply any possible refinement strategies to the initial clustering results. Currently, there are two such procedures: for the modified leader algorithms, a refinement procedure is available which redistributes polyvalent snapshots to larger clusters. For the treebased algorithm (for descriptions of these methods see elsewhere), a possible refinement consists of a (noniterative) merging of clusters with sufficient overlap.CLEADER
If structural clustering is performed (→ CCOLLECT), and a leaderbased algorithm is used (→ CMODE), this keyword allows the user to alter the processing directions of the leader algorithm by the following codes: The collected trajectory data are processed forward. Clusters are searched backward (starting with the most recently spawned one).
 The collected trajectory data are processed forward. Clusters are searched forward (starting with the one spawned first).
 The collected trajectory data are processed backward. Clusters are searched backward (starting with the most recently spawned one).
 The collected trajectory data are processed backward. Clusters are searched forward (starting with the one spawned first).
CLINKAGE
If structural clustering is performed (→ CCOLLECT), and the hierarchical algorithm is used (→ CMODE), this keyword allows the user to choose between different linkage criteria: Maximum linkage: Appending a cluster with a snapshot implies that the new snapshot is less than twice the value for CRADIUS away from all snapshots currently part of the cluster. For merging two clusters, maximum linkage implies that all possible intercluster distances satisfy the threshold condition. This creates clusters with an exact upper bound for their diameter (maximum intracluster distance) and therefore resembles leader clustering.
 Minimum linkage: Appending a cluster with a snapshot implies that the new snapshot is within a distance of twice the value for CRADIUS of at least one snapshot already contained in the cluster. Merging two clusters implies that at least one intercluster distance satisfies the threshold condition. With a minimum linkage criterion clusters no longer have a welldefined radius and tend to get very large unless tiny values are used for CRADIUS. This is rarely a useful option for molecular simulation data.
 Mean linkage: Appending a cluster with a snapshot implies that the snapshot is within a distance of CRADIUS of the current geometric center of the cluster. Merging two clusters implies that their respective geometric centers are within a distance of CRADIUS of one another. This will create clusters that no longer have a rigorous upper bound for the intracluster distance and therefore resembles the modified leader algorithm.
CMAXRAD
If structural clustering is performed (→ CCOLLECT), and the treebased algorithm or the approximate progress indexbased scheme is used (→ CMODE), this keyword sets the upper distance threshold value for the hierarchical tree, i.e., it corresponds to the coarsest threshold used outside of the (virtual) root (see BIRCHHEIGHT for additional details).BIRCHHEIGHT
If structural clustering is performed (→ CCOLLECT), and the treebased algorithm or the approximate progress indexbased scheme is used (→ CMODE), this keyword sets the number of hierarchy levels in the algorithm. Briefly, the treebased algorithm works by defining a series of threshold criteria (set by interpolating between CRADIUS and CMAXRAD) that define hierarchical levels. Each snapshot follow a specific path through the tree structure that is defined by identifying the closest existing cluster at each hierarchy with only those searched that belong to the parent cluster at the next higher level. The base of the tree is never counted as it always encloses all snapshots, so by specifying 1 for BIRCHHEIGHT one can recover an algorithm that is  in its basic outline  very similar to the modified leader scheme (see CMODE).Larger numbers of level generally lead to the formation of more clusters. This is because of a specific type of error that is linked to the children of a cluster (i.e., a set of clusters at the next finer level) overlapping with children from a nearby cluster. If a snapshot then proceeds through such a hierarchy on a path exploring only the children of a single cluster, inevitably chances increase that an actual, appropriate target cluster at the finest level is missed. Then, a new cluster at the finest level is likely to be spawned. In terms of the snapshots contained, this new cluster could theoretically be combined with other clusters without the maximum intracluster distance ever exceeding the distance threshold.
To combat these errors, CAMPARI refines the results obtained via treebased clustering by applying a merging scheme to all pairs of clusters that belong to parent clusters at the next higher level that have sufficiently close geometric centers themselves. However, the merging requirement is extremely stringent: the average intracluster distance has to decrease upon merging. While it would be possible to apply alternative merging criteria, those would either be too expensive to compute (remember that the average intracluster distance is available in constant time with respect to the cluster sizes) or would run the risk of diluting the threshold criterion and creating clusters that contain severe outliers.
CPROGINDMODE
If structural clustering is performed (→ CCOLLECT), and the progress indexbased algorithm is used (→ CMODE), this keyword allows the user to choose between the exact (1) and the approximate scheme (2 = default). The two cases differ as follows: In the exact scheme, CAMPARI attempts to construct the exact minimum spanning tree (MST) for the trajectory of interest. This is achieved by following the same setup procedure used in hierarchical clustering (described under option 3 to CMODE), i.e., a heuristicsbased scheme is used to construct a neighbor list in snapshot space up to a certain hard cutoff. Alternatively, the neighbor list can be read from a dedicated input file. From this list, a globally sorted list of near distances is constructed. This setup work provides the foundation to construct the exact MST without additional parameters via Kruskal's algorithm . The high cost (both in terms of time and memory) makes the exact scheme impractical for large data sets. Note that the neighbor list must be sufficient for the algorithm to run. This means that all the edges for the exact MST have to occur in the neighbor list, which is unfortunately not guaranteed even if each snapshot has multiple neighbors listed. Potential failures are therefore difficult to predict.
 In the approximate scheme, CAMPARI utilizes a twostage approach. The goal is to improve upon the large computational cost associated with the exact scheme without sacrificing too much information encoded in the progress index. First, the trajectory is structurally clustered using the highly efficient, treebased algorithm (described under option 5 to CMODE). This hierarchical tree of groups of snapshots (clusters) is not to be confused with the approximate MST we wish to generate. Because the treebased clustering is used, keywords CRADIUS, CMAXRAD, and BIRCHHEIGHT are all relevant. The hierarchical tree is then used to first grow a set of individual components of the approximate MST by considering only snapshots within a cluster at the finest level. Upon growing, the components are eventually merged, and levels further toward the root of the hierarchical tree will be utilized. This procedure emulates Borůvka's algorithm with a search space limited by the hierarchical tree. Because the spanning tree thus constructed is not strictly minimal, it is important to update component memberships after each merging operation. The algorithm is dependent on a parameter regulating the number of search attempts used for finding the nextnearest neighbor to a growing MST component (→ CPROGINDRMAX). Depending on the settings, the algorithm is expected to run in approximately NlogN time with the constant prefactor determined by the clustering and the choice for CPROGINDRMAX. Similarly, the quality of the generated MST depends nontrivially both on the aforementioned search parameter as well as on the properties of the treebased clustering. Because of algorithmic limitations, it is extremely unlikely that the approximate MST be in fact the exact MST for trajectories of appreciable length. This implies that  speculatively  the only asymptotic limit for recovering the exact MST is that of having all snapshots be in a single cluster and choosing CPROGINDRMAX in excess of trajectory size. This limit is clearly impractical as it requires at least O(N^{2}) time.
CPROGINDSTART
If structural clustering is performed (→ CCOLLECT), and the progress indexbased algorithm is used (→ CMODE), this keyword allows the user to pick a specific snapshot to serve as starting point for the generation of the progress index (the default is the first snapshot). Note that the snapshot indexing refers to the sequence of analyzed snapshots, and not to general simulation settings or the input trajectory itself in trajectory analysis mode, i.e., it depends on the choice for CCOLLECT.As a special option, specifying zero instructs CAMPARI to find a set of suitable starting snapshots. These are generally found by generating a sample profile (discussed elsewhere) that is then scanned for extrema using an automated detection system that can be tuned with two additional keywords, CBASINMAX and CBASINMIN. The idea behind this is to generate profiles starting from a complete set of putative basins. If this automatic detection is unsuccessful, CAMPARI will revert to using the first snapshot as a starting point.
As a further option that is only available in the approximate scheme (→ CPROGINDMODE), a specified value of "1" instructs CAMPARI to use as starting snapshot the central snapshot of the largest cluster found during the preparatory treebased clustering.
This keyword can serve an alternative function for specifying the target cluster (ordered by size) for the generation of a cutbased pseudo free energy profile (→ CMSMCFEP). For this second function, requests corresponding to the special choices above (1 or 0) also work in different ways. If CPROGINDSTART is 1, CAMPARI will set the reference cluster as the one that contains the first snapshot in the accumulated data for clustering. The algorithm will proceed as if CPROGINDSTART had been given as the correct positive number. If CEQUILIBRATE is 1, and if CPROGINDSTART is 0, CAMPARI will find all strongly connected components of the underlying graph and use the largest cluster within each component (subgraph) as reference for multiple, distinct cut profiles (separate output files). Lastly, if CEQUILIBRATE is 0, and if CPROGINDSTART is also 0 (implying that the entire graph is analyzed irrespective of connectedness) CAMPARI defaults to using the largest cluster as the (only) target cluster.
There are two compatibility options for the case where the approximate progress index method is used, and the user is also interested in obtaining a similar cut profile from the auxiliary clustering itself. Setting 2 homogenizes the reference as the largest cluster (clustering) or its representative snapshot (progress index), and setting 3 homogenizes the reference as the first snapshot (progress index) or the cluster containing it (clustering).
CBASINMAX
If structural clustering is performed (→ CCOLLECT), the progress indexbased algorithm is used (→ CMODE), and an automatic determination of multiple starting snapshots for profiles is requested (→ CPROGINDSTART), this keyword controls how a test profile using the standard annotation function described elsewhere is parsed to automatically identify minima in this function. Specifically, around each eligible point in the profile, environments of varying sizes are considered, and the following criteria are used: The sum of values to the left over a stretch of n_{e} points must be greater than the sum of values over a stretch of n_{e} points centered at the point currently considered.
 The sum of values to the right over a stretch of n_{e} points must be greater than the sum of values over a stretch of n_{e} points centered at the point currently considered.
 The sum of values to the left and right over a stretch of n_{e} points each must be greater than a reference sum that is given as twice the sum of values over a stretch of n_{e} points centered at the point currently considered plus 4n_{e}.
 The left (far) half of the sum of values to the left over a stretch of n_{e} points must be greater than the right (near) one.
 The right (far) half of the sum of values to the right over a stretch of n_{e} points must be greater than the left (near) one.
 No point toward the left over a stretch of n_{e} points must be greater than or equal to the point currently considered.
 No point toward the right over a stretch of n_{e} points must be greater than the point currently considered.
CBASINMIN
If structural clustering is performed (→ CCOLLECT), the progress indexbased algorithm is used (→ CMODE), and an automatic determination of multiple starting snapshots for profiles is requested (→ CPROGINDSTART), this keyword controls the minimum value considered for n_{e} as explained in the documentation of keyword CBASINMAX.CPROGINDRMAX
If structural clustering is performed (→ CCOLLECT), the progress indexbased algorithm is used (→ CMODE), and the approximate version is chosen (→ CPROGINDMODE), this keyword controls the maximum number of attempts for a random search (with replacement) of the next correct MST neighbor of a growing MST component. Since such a search first exhausts the possibilities within a given cluster of the hierarchical tree underlying the approximate algorithm, the parameter is strictly a maximum value. If the remaining eligible number of eligible snapshots is less than this parameter, random searching is replaced with a systematic loop over all eligible snapshots. In both cases, the eligible snapshot with the minimum distance to the MST component under consideration is used to create the next link of the approximate MST.CPROGINDWIDTH
If structural clustering is performed (→ CCOLLECT), and the progress indexbased algorithm is used (→ CMODE), this keyword controls the auxiliary annotation function defined elsewhere. Specifically, it corresponds to the parameter l_{p} in the documentation found by following the link.CMSMCFEP
If structural clustering is performed (→ CCOLLECT), which includes the case of the approximate progress index method, this keyword allows the user to select a type of cutbased pseudo free energy profile to be computed (reference). The target node for this profile can be chosen with keyword CPROGINDSTART. Currently, there is only one fully supported option (some hidden options exist, which will not be disabled): The transition matrix is inferred from the simulation trajectory and the associated coarsegraining (clustering). The meanfirst passage times to the target node (largest cluster by default) in the Markov state model approximation are computed iteratively. After sorting all clusters according to these mean first passage times, partitions can be defined as a function of a threshold time. The cutbased pseudo free energy profile associates each threshold time with the total weight of edges (number of transitions) crossing this threshold along the trajectory, and plots the normalized weight in logarithmic fashion (see elsewhere for details).
TRAJBREAKSFILE
If any type of structural clustering is performed (→ CCOLLECT), or if the exact progress indexbased algorithm is used (→ CMODE), the resultant trajectory is used to infer the properties of a network. This is relevant for output file STRUCT_CLUSTERING.graphml (the mesostate (cluster) network itself), for cutbased free energy profiles (kinetic information derived from network properties), and the output of the progress index method. Essentially, the sequence of events in the trajectory defines a transition matrix. However, not all transitions in a trajectory may be equally valid, as they may be caused by trajectory concatenation (e.g., when using structural clustering with the MPI averaging technique, by replica exchange swaps, by nonlocal Monte Carlo moves and so on). It may therefore be appropriate to remove such spurious transitions from the analysis in order to keep inferences regarding the underlying dynamics accurate. This is what this file accomplishes, and the input and its interpretation are described in detail elsewhere. There are two additional notes. First, CAMPARI will not remove any transitions by default, and it may sometimes be difficult to obtain or preserve the required information (e.g., the replica exchange trace file must be used to extract the exact history of accepted swaps). Second, there is no guarantee that the graph remains intact (it may fracture into multiple, disconnected subgraphs), and this may impact the interpretability of the data in the aforementioned output files.CEQUILIBRATE
If structural clustering is performed (→ CCOLLECT), which includes the case of the approximate progress index method, the resultant coarsegrained trajectory serves to define the graph printed out in STRUCT_CLUSTERING.graphml. The complete graph may not be strongly connected (or even fractured → TRAJBREAKSFILE). This keyword allows (setting of 1) the user to request all the individual components to be equilibrated separately (preserving their relative weights set directly by the trajectory). In addition, the generation of cutbased free energy profiles will be reduced to the strongly connected component that the reference cluster resides in. Special rules for CPROGINDSTART can be used to request the computation of such profiles for all relevant components, always using the largest cluster within each component as reference. By default (setting of 0), CAMPARI will include the entire graph with weights set directly from the trajectory for the data in the aforementioned output files.(back to top)