CAMPARI Keywords
Full Keywords Index:
 Parameter File:
 Random Number Generator:
 Simulation Setup:
 Box Settings:
 Integrator Controls (MD/BD/LD/Minimization):
 Move Set Controls (MC):
 Files and Directories:
 Structure Input and Manipulation:
 Energy Terms:
 Cutoff Settings:
 Parallel Settings (MPI and OpenMP):
 Output and Analysis:
 NetCDF Data Mining:
Preamble
The overall setup of simulations becomes more and more involved and complicated with increasing numbers of options offered by simulation software, and CAMPARI is no exception here. Not all settings are relevant in all circumstances (in fact, often very few are), and a complete understanding of all keywords is clearly not required to use subsets of CAMPARI's functionality. Users should keep the following points in mind:
 Most keywords have default choices. In case of doubt, check parsekey.f90 to locate the variable associated with the selection, and then initial.f90, allocate.f90, and sometimes other files for default assignments.
 Not all keywords can be connected and arranged such that they group nicely. The documentation here groups keywords into a small number of sections, some of which end up being very large. This has both advantages and disadvantages.
 For navigation, it is highly recommended to a) search for terms within the page with the help of the browser (all keywords are described within a single htmlpage), b) follow the links that are provided everywhere.
 If an option is unclear, but easily testable, it is probably fastest to just try it out. If it is difficult to test, post a question on the SF forums.
 The understanding of many implemented, standard methodologies requires the corresponding literature. This is why a bibliography is provided.
 The fastest way to learn how to run basic simulations or perform trajectory analyses is to consult the various tutorials. Tutorials offer the chance to group information in a more natural workflow compared to the documentation here. They cannot explain all options in detail, though, and it is crucial to follow the links within the tutorial pages that point back to this and the other documentation pages.
Notes on Nomenclature and File Parsing:
All keywords used by CAMPARI are named FMCSC_* where the different possible strings for "*" are explained below. This means that in your keyfile the correct keyword to use to specify the simulation temperature is FMCSC_TEMP and not just "TEMP". There are only two exceptions to this, viz. keywords PARAMETERS and RANDOMSEED. This has purely historical reasons (as does the ad libitum acronym "FMCSC").The beginning of log output will print some information regarding the parsing of information in the keyfile. Superfluous lines should be masked as comments using the hash character ("#"). Lines that are neither empty nor comments will be pointed out unless they correspond to the two exceptional keywords just mentioned or unless they begin with the canonical prefix "FMCSC_". The keyword parser operates hierarchically meaning that some legitimate keywords will not be processed because the required base functionality has not been enabled (e.g., thermostat settings are not processed unless a gradientbased method is in use). This is done mostly to avoid needless warnings from popping up. All apparent keywords that have not been processed will be reported by the parser. However, the hierarchical dependence is not enforced stringently, which means that a keyword not reported in this list but appearing in the keyfile does not automatically control a setting relevant to the attempted calculation. It is important to realize that the list of unprocessed keywords can also include misspelled ones. To make the detection of typos easier, it is recommended to comment or remove unused keywords from the keyfile.
Finally, most read operations of simulation settings are prone to data type mismatch errors. Supplying a character value to a numerical setting will trigger a Fortran I/O error. The error message is usually informative yet the relevant position in the keyfile is not reported. I/O in general (also for input files) may be made less errorsensitive in the future, but for now we apologize for this limitation.
Parameter File Keywords:
PARAMETERS
This keyword allows the user to provide the location and name of the parameter file to be used for the simulation. The different files offered by default (shipped with CAMPARI) are listed below:Custom Parameter Sets:
The parameter sets fmsmc*.prm are outdated and should be used with utmost caution. They contain no bonded parameters except dummy declarations and are therefore only suitable for torsional space calculations.
In general, the LennardJones parameters for ions in these files require a cautionary note as they simply are those from Aqvist's work. They have not been specifically parameterized to work together with the ABSINTH continuum solvation model in case a full Hamiltonian is used (they merely have been shown to reside on the "safe" side). This is a matter of ongoing development. It may be be more appropriate to use parameters for ions that feature harder cores and better congruence between σ_{ii} parameters and actual contact distances.
fmsmc.prm:
This are basic parameters fit for
simulations in the excluded volume ensemble. As LennardJones
parameters, they employ Hopfinger radii with generic (and generally
small) interaction parameters. They contain a reduced charge
set derived from the OPLS brand of force fields but are thoroughly
unsuitable for simulations with "complete" Hamiltonians if just
for the fact that they lack support in many places.
fmsmc_exp.prm:
This file is identical to fmsmc.prm only
that pairwise LJterms (σ_{ij}) for pairs involving
a polar atom and a polar hydrogen are specifically reduced. It also
lacks support for phosphorus.
fmsmc_exp3.prm:
This file is identical to fmsmc_exp.prm only
that LJ interaction parameters (ε_{ii}) are raised
for polar heavy atoms (nitrogen and oxygen).
fmsmc_exp2.prm:
This file is identical to fmsmc_exp3.prm only
that LJ size parameters (σ_{ii}) for common atoms
atoms are bloated to approximately 107% which makes the parameter set
more OPLSAAlike in terms of LJ parameters.
abs3.2_opls.prm:
This file combines ABSINTH LJ parameters
with the full OPLSAA/L charges including
the Kaminski et al. revision. OPLSAA/L's bonded parameters are
only retained inasmuch as they are
required to maintain quasirigid geometries (i.e., bond length
and angle potentials, improper dihedral potentials,
and torsional potentials around bonds with hindered rotation). Comparison to the
reference parameter set may be useful. In
addition, the free energies of solvation are
reduced by ~30 kcal/mol for ionic groups on biomolecules. This is the
file used for most published work employing
the ABSINTH implicit solvation model thus far.
abs3.1_opls.prm:
This file is identical to abs3.2_opls.prm
only that the free energies of solvation are
not artificially lowered by ~30 kcal/mol for ionic groups on
biomolecules.
abs3.2_charmm.prm:
This file combines ABSINTH LJ parameters
with the full CHARMM charges from version 22 (polypeptides)
and 27 (polynucleotides), respectively. CHARMM's bonded parameters are
only retained inasmuch as they are
required to maintain quasirigid geometries (i.e., bond length
and angle potentials, improper dihedral potentials,
and torsional potentials around bonds with hindered rotation). Comparison to the
reference parameter set may be useful. In
addition, the free energies of solvation are
reduced by ~30 kcal/mol for ionic groups on biomolecules. In
conjunction with the ABSINTH implicit solvent model, CHARMM parameters
probably offer the best combination of simplicity (small enough dipole
groups) and completeness (support for both
nucleotides and peptides as well as most terminal groups and some small
molecules).
abs3.1_charmm.prm:
This file is identical to abs3.2_charmm.prm
only that the free energies of solvation are
not artificially lowered by ~30 kcal/mol for ionic groups on
biomolecules.
abs3.2_charmm36.prm:
This file is identical to abs3.2_charmm.prm
only that the parent force field is CHARMM36 and not CHARMM22/27
(see the documentation for CHARMM36 below).
abs3.1_charmm36.prm:
This file is identical to abs3.1_charmm.prm
only that the parent force field is CHARMM36 and not CHARMM22/27
(see the documentation for CHARMM36 below).
abs3.3_charmm36.prm:
This file is identical to abs3.1_charmm36.prm
only that some LennardJones and free energy of solvation parameters have been adjusted. These changes
have thus far (2017) not been used in published works employing the ABSINTH paradigm.
As mentioned in the disclaimer above, the ion parameters in the "3.1" and "3.2" files
were somewhat of a weakness, and this is at least partially addressed here, both at the
level of reference free energies of solvation and at the level of LennardJones parameters.
It may be helpful to visualize the differences between abs3.3_charmm36.prm and abs3.1_charmm36.prm
using a text editor. Nevertheless, similar to all "3.1" parameter files, buried ionic
clusters will form readily when polymers carry charged moieties.
abs3.4_charmm36.prm:
This file is identical to abs3.3_charmm36.prm
only that the free energies of solvation are artificially lowered by ~15 kcal/mol for ionic groups on
biomolecules. This offset is only half as large as that in abs3.2_charmm.prm,
which means that transient ionic interactions are more likely to occur. This lower offset
is possible in part because of the changes to LennardJones parameters relative
to abs3.2_charmm36.prm.
abs3.2_a94.prm:
This file combines ABSINTH LJ parameters
with the full AMBER charge set from the '94revision
(Cornell et al.). AMBER charges are generally not wellsuited
to be used in conjunction with the ABSINTH paradigm since the latter is
most meaningful for small dipole groups with local neutrality. AMBER
charges are determined by a more or less
unconstrained QMfit and spread polarization across the (arbitrary)
unit of each residue (see FMCSC_ELECMODEL).
AMBER's
bonded parameters are only retained inasmuch as they are
required to maintain quasirigid geometries (i.e., bond length
and angle potentials, improper dihedral potentials,
and torsional potentials around bonds with hindered rotation). Comparison to the
reference parameter set may be useful. In
addition, the free energies of solvation are
reduced by ~30 kcal/mol for ionic groups on biomolecules. Please refer
to the details provided for AMBER reference force fields below in
order to obtain answers concerning AMBERspecific implementation details
of force field parameters.
abs3.1_a94.prm:
This file is identical to abs3.2_a94.prm
except that the free energies of solvation are
not artificially lowered by ~30 kcal/mol for ionic groups on
biomolecules.
abs3.2_a99.prm, abs3.1_a99.prm, abs3.2_a03.prm, abs3.1_a03.prm:
These files are analogous to abs3.2_a94.prm
and abs3.1_a94.prm only that they incorporate AMBER
parameters of revisions '99 (Wang et al., abs3.2_a99.prm,
abs3.1_a99.prm) and '03 (Duan et al., abs3.2_a03.prm,
abs3.1_a03.prm), respectively.
abs3.2_GR53a6.prm:
This file combines ABSINTH LJ parameters
with full GROMOS53a6 charges.
Note that GROMOS53 is a united atom model and that aliphatic hydrogens
(which do exist here) therefore carry no charge.
This appears inconsistent  at least compared to other force fields in
which aliphatic hydrogens almost
universally carry a small positive charge of less than 0.1e 
but speeds up simulations with
screened electrostatics interactions considerably. Bonded parameters
are only retained inasmuch as they are
required to maintain quasirigid geometries (i.e., bond length
and angle potentials, improper dihedral potentials,
and torsional potentials around bonds with hindered rotation). Comparison to the
reference parameter set may be useful. In
addition, the free energies of solvation are
reduced by ~30 kcal/mol for ionic groups on biomolecules.
abs3.1_GR53a6.prm:
This file is identical to abs3.2_GR53a6.prm
except that the free energies of solvation are
not artificially lowered by ~30 kcal/mol for ionic groups on
biomolecules.
abs3.2_GR53a5.prm and abs3.1_GR53a5.prm:
These files are analogous to abs3.2_GR53a6.prm and
abs3.1_GR53a6.prm only for
the a5revision of the GROMOS53 charge set.
Some recommended settings for using any of these custom parameter files are listed below. Note that these are also the settings required to achieve an exact match with the ABSINTH reference.
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 1
FMCSC_ELECMODEL 2
FMCSC_MODE_14 1
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_BONDED_B 0.0
FMCSC_SC_BONDED_A 0.0
FMCSC_SC_BONDED_T 0.0
FMCSC_SC_BONDED_I 0.0
FMCSC_SC_EXTRA 1.0
We do, however, recommend replacing FMCSC_SC_EXTRA being unity with FMCSC_SC_BONDED_T set to unity since the above files will typically contain (unless otherwise noted) the required and "native" bonded potentials for each parent force field. This ensures better parameter coherence (the ones used SC_EXTRA are taken from OPLSAA/L) and  more importantly  control over all torsional potentials (and bonded potentials in general) through the parameter file. If the system to be sampled contains proline residues, other flexible rings, or chemical crosslinks, it will also be necessary to set FMCSC_SC_BONDED_A, FMCSC_SC_BONDED_B, and FMCSC_SC_BONDED_I to 1.0 to avoid obtaining nonsensical results.
Reference Parameter Sets:
The parameter sets below attempt to be as complete as possible for the biopolymer types supported by CAMPARI. In general, support for small molecules (which often use derived parameters) will often be limited (but can easily be added by the user). In addition, rare and generally poorly parameterized biopolymer constructs (such as zwitterionic amino acids or free nucleosides) may have incomplete parameter portings in particular of bonded parameters. If a perfect match of a certain parameter set paradigm cannot be achieved (against the reference implementation), this is stated explicitly.
oplsaal.prm (reference implementation: GROMACS 4.5.2)
This file provide full OPLSAA/L
parameters, i.e., it includes the Kaminski et al. revision
of peptide torsions and sulphur parameters. Note that GROMACS 4.5.2 was used as the reference
implementation (and not BOSS or MCPRO).
Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.5
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2
GROM53a6.prm, GROM53a5.prm (reference implementation: GROMACS 4.0.5)
Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.5
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2
This file provide full
GROMOS53 parameters. Torsional
potentials for which
the same biotype is attached multiple times to an axis atom are only
approximately
supported by replacing the potential acting on just an arbitrary and
single one of those atoms in the GROMACS reference implementation with
proportionally reduced potentials acting on all of those atoms.
This should be chemically more correct but prevents exact matches of
torsional terms. The choice within GROMOS is motivated by computational
efficiency, but evaluation of torsional terms is not a timecritical
execution component in almost all presentday simulations (and trivially parallelizable).
Moreover, cap and terminal residues may have been adjusted to use more
consistent parameters (terminal and cap residues are generally not specifically
parameterized in GROMOS from what we can tell, in particular for polynucleotides).
GROMOS uses a rather specific interaction model and represents
aliphatic CH_{n} moieties
in unitedatom representation. Note that revisions a5 and a6 only
differ in a few partial
charge parameters.
Required settings for emulating reference standard:
FMCSC_UAMODEL 1
FMCSC_INTERMODEL 3
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
amber94.prm, amber99.prm, amber03.prm (reference implementation: AMBER port in GROMACS 4.5.2)
Required settings for emulating reference standard:
FMCSC_UAMODEL 1
FMCSC_INTERMODEL 3
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 2
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
These files provide full AMBER parameters in
three different revisions which differ
mostly in their parameterization of torsional potentials for
polypeptides. Note that support for terminal
amino acid residues through the parameter file is marginal since AMBER's charge set is so
detailed that each atom in each
terminal residue would have to be an independent biotype. Normal
polypeptide caps are fully supported, however. To allow a more accurate emulation
of the AMBER standard for terminal polypeptide residues, the
charge patch functionality within CAMPARI can be used. We have tested this for
a few examples, and recovered 100% accurate matches to the AMBER standard that way. Keep in mind as well
that the parameterization of terminal polymer residues is often the "sloppiest" component in a
biomolecular force field since their impact on overall conformational equilibria is deemed small. Note
that we did not use the actual AMBER software in the porting.
Required settings for emulating reference standard (skipping eventual charge patches):
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.833
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2
charmm.prm (reference implementations: CHARMM35b2 and CHARMM38b1)
Required settings for emulating reference standard (skipping eventual charge patches):
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 0.833
FMCSC_FUDGE_ST_14 0.5
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_EXTRA 0.0
FMCSC_IMPROPER_CONV 2
This file provides access to simulation employing the full CHARMM parameters
as provided in parameter set 27 for polypeptides and polynucleotides.
CMAP corrections for
polypeptides are supported and included. Note that <ABSINTH_HOME>
should be the exact same
directory specified in the localization of the Makefile (see installation instructions). To simulate polynucleotides
with 5'phosphate groups using 100% authentic CHARMM parameters for the
terminal phosphate, the
charge patch functionality within CAMPARI has to be used. The same applies to
the polarization on the hydrogen atoms on the NH_{2} groups in
guanine and cytosine (this is a much smaller effect, though; also compare FMCSC_AMIDEPOL).
Similarly, the use of the amidated (NH2) Cterminus in polypeptides requires use of the
biotype patch and other patch functionalities.
CAMPARI's port of CHARMM parameters generally offers the most complete support for the systems supported natively
by CAMPARI, e.g., for phosphorylated amino acid sidechains.
Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_AMIDEPOL 0.01 # or 0.01
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_M 1.0
FMCSC_CMAPDIR <ABSINTH_HOME>/data
FMCSC_SC_EXTRA 0.0
charmm36.prm (reference implementations: CHARMM38b1 and CHARMM39b1)
Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_AMIDEPOL 0.01 # or 0.01
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_M 1.0
FMCSC_CMAPDIR <ABSINTH_HOME>/data
FMCSC_SC_EXTRA 0.0
This file incorporates the various revisions of the CHARMM force field contained in parameter set 36.
All other comments made for parameter set 27 apply here as well.
Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_AMIDEPOL 0.01 # or 0.01
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_M 1.0
FMCSC_CMAPDIR <ABSINTH_HOME>/data
FMCSC_SC_EXTRA 0.0
In order to create a new parameter file, it is advisable to start
with "template.prm". For details on the paradigms underlying the
construction of a parameter file consult the detailed documentation on this topic.Required settings for emulating reference standard:
FMCSC_UAMODEL 0
FMCSC_INTERMODEL 2
FMCSC_ELECMODEL 1
FMCSC_MODE_14 2
FMCSC_FUDGE_EL_14 1.0
FMCSC_FUDGE_ST_14 1.0
FMCSC_SC_IPP 1.0
FMCSC_SC_ATTLJ 1.0
FMCSC_EPSRULE 2
FMCSC_AMIDEPOL 0.01 # or 0.01
FMCSC_SIGRULE 1
FMCSC_SC_POLAR 1.0
FMCSC_SC_BONDED_B 1.0
FMCSC_SC_BONDED_A 1.0
FMCSC_SC_BONDED_T 1.0
FMCSC_SC_BONDED_I 1.0
FMCSC_SC_BONDED_M 1.0
FMCSC_CMAPDIR <ABSINTH_HOME>/data
FMCSC_SC_EXTRA 0.0
Random Number Generator Keywords:
(back to top)
RANDOMSEED
This keyword allows the user to provide a specific seed for the PRNG. This is usually relevant in two contexts: Reproducibility:
Eliminate mismatches between different versions of the program (for example) by doing the stringent test that the results must be exactly the same if the PRNG is seeded with the same seed. Such tests may occasionally be hampered by a lack of precision in any input files and in particular by different compiler/architecture optimization levels.

Timing:
Eliminate identical calculations if jobs are submitted simultaneously. Normally the PRNG uses a seed derived from from system time, which can be identical if jobs are submitted exactly in parallel. Avoiding this behavior by specifying different values for RANDOMSEED is only adequate if the jobs are indeed submitted as individual, serial jobs. Conversely, in intrinsically parallel applications (MPI), CAMPARI uses the node number to vary the seed across different nodes unless RANDOMSEED is specified. This means that a provided value for RANDOMSEED will homogenize the PRNG across all replicas which is almost always undesirable.
Simulation Setup:
(back to top)
UAMODEL
This keyword is a simple but very important switch. It allows the user to control whether nonpolar hydrogens are going to be part of the system's topology or not. In particular in earlier simulation work, it was a common and convenient trick to improve simulation efficiency by uniting all atoms of a methyl or methylene group into a single, coarsegrained "united atom". Different force fields used or use different varieties of this trick. In the GROMOS line of force fields, for instance, all aliphatic hydrogen atoms are merged into the carbon atoms they are bonded to. Conversely, the CHARMM19 protein force field in addition eliminates nonpolar hydrogens bound to sp^{2}hybridized carbon atoms in aromatic rings.Unlike other simulation software, CAMPARI maintains a complete internal "knowledge" of biomolecular topology of those systems it allows the user to build from scratch. Therefore, choosing between all or unitedatom models is not simply a matter of parameter files (although it is possible to create inefficient unitedatom variants of force fields by disabling all interaction parameters pertaining to the required hydrogens). Instead, the software itself requires knowledge of this choice.
Choices are:
 Use an allatom model for those molecules represented explicitly.
 Use a unitedatom model according to GROMOS convention, i.e., all aliphatic hydrogen atoms are merged into the carbon atoms they are linked to (this does include terminal aldehyde hydrogen atoms).
 Use a unitedatom model according to CHARMM19 convention, i.e., all aliphatic and all aromatic hydrogens bound to carbon atoms are merged into the latter.
Outside of simulations using the GROMOS force field, this keyword is most useful when using CAMPARI to analyze trajectory data generated by other software using such a unitedatom force field. Such a run would not tolerate atom number mismatches between the internal representation of the system and what is found in the binary trajectory files (mismatches are acceptable only if the input format is pdb → see below). Note that this keyword has no impact on systems involving residues not supported natively by CAMPARI (→ sequence input and PDB input).
PDBANALYZE
This keyword is a simple but very important logical. It specifies whether the proposed simulation is a trajectory analysis run: in these, a pdb (or xtc, dcd, NetCDF)trajectory is read from file and analyzed with CAMPARI's internal analysis routines. The desired format is chosen with keyword PDB_FORMAT. All outputs and parameters are completely analogous to normal calculations. Essentially, the snapshot readin replaces the sampling step. This means that low analysis frequencies will be desirable, since usually the number of snapshots will be relatively small compared to the number of simulation steps in a typical simulation. Note that  in particular for large systems (> 10^{4} atoms)  the analysis run may be slowed down by: Certain timeconsuming analyses scale poorly with the number of atoms (solution structure analyses, see for example PCCALC or CLUSTERCALC).
 At each step, the global system energy is calculated using  depending on the setting for DYNAMICS  either CAMPARI's energy (MC) or force (MD/LD) routines and making little to no simplifying assumptions. To ensure decent speed, this may require setting the system Hamiltonian to zero (see below) and/or using an efficient cutoff / neighborlist routine (see CUTOFFMODE).
 Very large files in particular in pdbformat may cause memory shortages which slow down the machine entirely. In general binary trajectory files in conjunction with an optional template file are the preferred and much faster way of performing analysis runs.
When using an MPI executable of CAMPARI in parallel, it is also possible to perform trajectory analysis across many processors. This uses the replica exchange setup and is described in detail elsewhere. The four primary applications are simultaneous analyses of several trajectories, the unscrambling of replica exchange trajectories that are normally output continuously for a given condition, the post facto computation of energetic overlap distributions, and the evaluation of the PIGS heuristic for analysis purposes. Specific analysis routines (such as DSSP analysis may be restricted to specific types of residues, and this may limit the utility of these routines for entities that are not natively supported by CAMPARI (see sequence input). In general, analysis runs on systems featuring unsupported residues should be relatively straightforward. This is true at least as long no energetic analyses are required (which naturally entails the complex issue of parameterization).
Analysis runs can also utilize the shared memory (OpenMP) parallelization of CAMPARI. As is described elsewhere, this decomposes the workload for many timeintensive tasks that CAMPARI can perform. For analysis functionalities, the load per step for an individual analysis is often so low that is not effective to let multiple threads operate on it. This is why most simple analysis functions are not parallelized per se but simply performed simultaneously. This is obviously ineffective if only a single such analysis is needed. Currently, the only exceptions are certain analyses related to structural clustering and the calculation of spatial density maps. Important analyses that can be timeconsuming but are not parallelized per se are those controlled by CONTACTCALC, DSSPCALC, RHCALC, SAVCALC, and DIFFRCALC. As a general comment, tt should be noted that CAMPARI will always spend some of its execution time dealing with coordinate operations. Depending on the chosen settings, there may also be a large contribution from evaluating energies at every step. While the former is never parallel in analysis runs and constitutes a hidden performance bottleneck (just as file I/O does), the latter takes full advantage of the parallelization offered in regular simulations. These considerations should be kept in mind when deciding whether to use the OpenMP code in an analysis setting. Generally, it will of course be more efficient to parallelize in snapshot space by letting the MPI version operate on separate pieces of a longer trajectory. As is generally the case, the MPI and OpenMP frameworks can also be used simultaneously in analysis runs with the standard hierarchy (each MPI process maintains a separate copy of the system, and the processing of each copy can be sped up by using more than one OpenMP thread per MPI process).
NRSTEPS
This keyword sets the total number of simulation steps including equilibration.EQUIL
This keyword specifies the total number of equilibration steps. This implies that no analysis is performed as long as the current step number does not exceed this value. Note that this also means that no structural output (trajectory) is produced. Conversely, certain necessary diagnostics are provided irrespective of equilibration (see for example ENOUT or ACCOUT).TEMP
This keyword sets the absolute (target) temperature in K.PRESSURE
This keyword allows the user to specify the absolute (target) pressure in bar (not yet in use).ENSEMBLE
This crucial keyword determines which ensemble to simulate the system in. The options available are limited in that they depend strongly on the type of sampler (i.e., there is no NVE (microcanonical) ensemble if sampling is done via Monte Carlo → DYNAMICS).The options are as follows:
1) NVT (Constant Particle Number, Constant Volume, Constant Temperature):
Always available, this is the
canonical ensemble and currently the only option available
for pure Monte Carlo runs.
2) NVE (Constant Particle Number, Constant Volume,
Constant
Energy):
The microcanonical ensemble
(adiabatic conditions) is only supported (and possible) for
nondissipative, i.e., Newtonian dynamics (see option 2 in DYNAMICS).
5) μ_{i}VT (Constant Chemical Potential(s),
Constant Volume, Constant
Temperature):
This requests the grand canonical ensemble where the number of
particles in the system is allowed to fluctuate. Subscript i
indicates that not all particle types may be subject to number
fluctuation (typical for example in the simulation of macromolecules
and a (co)solvent atmosphere for
which only the small molecule would be treated in "grand" fashion. This
implies that technically incorrect hybrid ensembles are populated
(sometimes referred to as "partially grand" ensembles). The rigorous
grand canonical ensemble would require all particle types to be
permitted to fluctuate in number. Such partially grand ensembles are
not to be confused with the "semigrand" ensemble (see below).
Technically, the GC ensemble is realized in CAMPARI by allowing
molecules to transfer between a real and a shadow existence, the latter
also serving as the reference state. The discreteness of transitions
between shadow and real existence implies that currently the grand ensemble
is only available in pure Monte Carlo simulations.
Note that currently the reference
state is modeled in the infinite dilution limit (there are no
intermolecular interactions). This is consistent with the default
implementation choice (→ GRANDMODE), in
which the bath communicates with the system via an expected bulk concentration
and an excess chemical potential correcting for the interactions arising
from that finite bulk concentration.
6) Δμ_{i}N_{t}VT (Constant Chemical
Potential Difference(s), Constant Total Particle Number, Constant
Volume, Constant
Temperature):
This requests the semigrand ensemble as originally formulated by Kofke
and Glandt (1988), in which particle types are allowed to fluctuate in
number under the constraint that the total particle number (N_{t})
remains constant. Just like for the μ_{i}VTensemble, CAMPARI
allows the definition of partial semigrand ensembles in which  for
example  a bath of water and methanol solvating a macromolecule is
subjected to moves attempting to transmute methanol into water or vice
versa. Note that the amount of realworld applications for such an
ensemble to be appropriate is very small. Technically, the constraints
to keep N_{t} fixed may improve acceptance rates in dense fluid
mixtures. For both options (5 and 6), please refer to the documentation
for the particle fluctuation file, specified using PARTICLEFLUCFILE, for
details. Note that the sanity of results obtained with any partial
grand or semigrand ensemble must be investigated with utmost care.
To be added in the future:3) NPT (Constant Particle Number, Constant Pressure, Constant Temperature):
May eventually be made available for MC and Newtonian MD runs
4) NPE (Constant Particle Number, Constant Pressure,
Constant
Enthalpy):
May eventually be made available for Newtonian MD runs
Note to developers: there is rudimentary support for NPT and NPE ensembles in CAMPARI right now but those branches are completely disabled.
GRANDREPORT
If an ensemble is chosen that allows particle number fluctuations, this keyword acts as a simple logical whether or not to write out a summary of the grandcanonical setup, i.e., which particle types are allowed to fluctuate in numbers, what the initial numbers (bulk concentrations) are, and what (excess) chemical potentials are associated with those.GRANDMODE
If an ensemble is chosen that allows particle number fluctuations, this keyword acts to choose between two different implementation modes. In the first (choice 1), file input is used to provide CAMPARI with the initial numbers and absolute chemical potentials of fluctuating particle types. This is generally inconvenient for cases with realistic interaction potentials and/or multiple fluctuating particle types that require coupled chemical potentials (such as individual ionic species). The bulk concentrations are set implicitly by the chemical potentials. This formulation involves the "thermal volume" of particles meaning that a monoatomic ideal gas will require a massdependent chemical potential. In the second option (choice 2, which is the default), the same file input is used to set the bulk concentration explicitly (based on the initial particle number provided), and the chemical potentials listed are merely the excess terms. This formulation involves no massdependent terms, is numerically more stable (accuracy of exponentials), and provides an easy reference limit for dilute solutions (zero excess chemical potential).To illustrate the difference in implementation, consider the additional contribution to the acceptance probability (term c_{b} in description of keyword MC_ACCEPT) of a particle insertion attempt:
Mode 1:
c_{b} = e^{βμideal} · e^{βμexcess} · V· (N+1)^{1}· ζ^{1}
Here, V is the system volume, N is the current number of particles of the type to be inserted, μ_{ideal} and μ_{excess} are the components of the chemical potential, and ζ is the aforementioned thermal volume.
Mode 2:
c_{b} = e^{βμexcess} · <N> · (N+1)^{1}
This equation contains the expected bulk concentration as <N>.
While numerically the two cases can be made equivalent, the latter contains a selfconsistency check by being able to compare the measured <N> to the assumed <N> given the chosen μ_{excess}. In the former, the assumed <N> is unknown, because the partitioning between μ_{ideal} and μ_{excess} is not explicit. For a singlecomponent system (or a system with multiple independent components), the measured <N> can be used to derive the μ_{excess} that the simulation essentially corresponded to. With dependent components, however, this becomes very difficult to adjust. For general calibration strategies of excess chemical potentials and background, see references.
DYNAMICS
This is one of the core keywords and specifies how to propagate (sample) the system, i.e., how to obtain a new conformation of the system given the current one. The system configuration usually involves both momenta and coordinates unless the sampler is momentumfree (e.g., Monte Carlo). Most propagation schemes are able to take advantage of the shared memory parallelization, but it is important to benchmark this routinely as the scalabilities differ. For example, the work load available for parallelization in an incremental energy evaluation in Monte Carlo is usually much smaller than that in a full force evaluation in dynamics. Options are as follows:1) Pure Monte Carlo sampling (see keyword MC_ACCEPT and section on Monte Carlo move sets).
2) Molecular Dynamics:
Integration of Newton's equations
of motion either in internal or Cartesian coordinate space (see CARTINT). This is fully
supported. The internal coordinate space formulation is based upon a published algorithm. More details are found in
the documentation to keywords TMD_INTEGRATOR and
TMD_UNKMODE.
A simplified summary of the internal coordinate space variant is as follows:  Dynamics are performed on internal degrees of freedom which are assumed to be independent (rigid body translation, rotation around the Cardinal x, y, and z axes of the laboratory frame (static) centered at the center of mass of each molecule, torsional degrees of freedom).
 Dynamics for polymers vary along the chain (faster at the termini) as they should, but this does not happen in any fashion proven to comply rigorously with a specific dynamics. By altering the chain alignment mode, more exotic dynamics can be produced. This is because the building directions of any polymer chains represent an arbitrary choice in the method.
 By assuming a diagonal mass (inertia) matrix (viz., a block of the mass metric tensor), applicability of simple integrators is a given. In the absence of interactionbased forces, the goal is to preserve rotational kinetic energy (but not angular momentum) by considering the effective masses associated with various rotational degrees of freedom as timedependent variables in a discrete integration scheme. This treatment is intrinsically consistent, and agreement with data obtained from Monte Carlo simulations has been shown (for select cases). CAMPARI provides a simple diagnostic of the impact of assuming a diagonal mass matrix by printing kinetic energies in both internal and Cartesian coordinates to logoutput.
 Because the algorithm does not produce dynamics that obey Gauss' principle of least constraint or conserve angular momentum, integrator stability can be inferior to that for a case of identical constraints realized as holonomic constraints in Cartesian molecular dynamics. This effect cannot always be quantified since the holonomic constraints implied by the internal coordinate space treatment often become too highly coupled for linear solvers to converge (→ SHAKEMETHOD). Select cases with quickly varying masses highlight the effect, and the most significant example are probably rigidbody simulations of water (water has tiny rotational inertia and is a prototypical test case for rigidbody integrators). Quantification of relative integrator stabilities for such a case can be performed.
 Subtle equipartition artifacts (i.e., some individual or collective degrees of freedom heating up at the expense of others because they are either more susceptible to integration error or weakly coupled to the rest of the system) can always occur. Effects differ between internal coordinate and Cartesian treatments. This is because dihedral angles will generally have a rather different level of energetic coupling and integration stability than the positional coordinates of an atom embedded in a polyatomic molecule.
Conversely, the integration of Newton's equations of motion for the Cartesian coordinates of all atoms
represents the more canonical approach to molecular dynamics. These
algorithms are conceptually much simpler, and users are referred to standard literature on the topic. This is primarily because the mass
matrix is diagonal leading to independent equations.
The simplicity holds primarily for unconstrained simulations in the microcanonical ensemble.
In practice, additional procedures are needed in almost all cases, for example the enforcement of holonomic constraints through appropriate algorithms such
as SHAKE or LINCS. Most threepoint water models are explicitly calibrated as rigid models, and it is therefore
necessary to maintain water geometry as a set of holonomic constraints throughout
a Cartesian dynamics simulation. Similarly, the desired switch to the canonical ensemble requires the action of
a thermostat. CAMPARI always uses the simple leapfrog integrator in Cartesian
molecular dynamics, which has excellent energy conservation properties due to error cancellation. This does not mean
that it is free of discretization errors, which increase with increasing time step.
The latter statement is of course true for any numerical integration of equations of motion.
3) Langevin Dynamics: Integrations of Langevin equation
of
motion. This is supported via the impulse integrator due to Izaguirre
and Skeel (reference). With respect
to the torsional dynamics implementation, the same caveats apply as for
Newtonian dynamics. There is an additional limitation in that the only implementation supported
currently is an approximate scheme (corresponding to keywords
TMD_INTEGRATOR being 2 and
TMD_INT2UP being 0). This is because the
structure of the impulse integrator is more complex, thus allowing a straightforward
extension to our torsional dynamics only for the simplest case (research in progress). It
also means that the shared memory parallelization will not (yet) work
with this choice.
Note that all LD simulations work in the fluctuationdissipation limit, which means that all degrees of freedom are automatically coupled to a heat bath, and which assumes an underlying continuum providing frequent collisions as the source of the stochastic term as well as the frictional damping. In addition, note that hydrodynamic interactions are neglected and that currently there is only a single, uniform frictional parameter for all degrees of freedom (see FRICTION). The latter is a major and nonobvious assumption in internal coordinate spaces featuring polymers with flexible dihedral angles. This is because it is not clear what the frictional drag incurred by rotations around molecular bonds is and what the results of ignoring communication between these drag effects are.
5) Mixed Monte Carlo and Newtonian (Molecular) Dynamics: Note that all LD simulations work in the fluctuationdissipation limit, which means that all degrees of freedom are automatically coupled to a heat bath, and which assumes an underlying continuum providing frequent collisions as the source of the stochastic term as well as the frictional damping. In addition, note that hydrodynamic interactions are neglected and that currently there is only a single, uniform frictional parameter for all degrees of freedom (see FRICTION). The latter is a major and nonobvious assumption in internal coordinate spaces featuring polymers with flexible dihedral angles. This is because it is not clear what the frictional drag incurred by rotations around molecular bonds is and what the results of ignoring communication between these drag effects are.
This hybrid method mixes MC with MD sampling
and assumes consistency of ensembles at all times. Since MC sampling only
supports the canonical ensemble at the moment, this means that Newtonian MD has
to be performed with a thermostat preserving the correct ensemble, e.g.,
the Andersen or Bussi et al. schemes.
Then, the entire trajectory should be treatable as a Markov chain and
analysis is performed as if the sampling engine were one of the two.
A potential caveat lies in velocity autocorrelation. The method is implemented such that segments of MC sampling alternate with MD segments. Upon switching from MC to MD, new velocities are assigned from the proper Boltzmann distribution. This may introduce some amount of noise. Aside from this particular concern, all independent concerns about both Monte Carlo and dynamicsbased methods apply. It is up to the user to ensure that either sampler yields the required ensemble rigorously.
A particular concern lies with the selection of degrees of freedom. In general, it will be highly desirable for the set of sampled degrees of freedom to be exactly identical between the two samplers. This is not always possible, however, e.g., when sampling sugar pucker angles in MC, but not in dynamics. In these scenarios it will be desirable to use short segments lengths in order to improve the chances of convergence (in the given example, convergence is unlikely if long dynamics segments only "see" few frozen conformations of the sugar pucker states in the system). This issue is particularly difficult in mixed Cartesian/internal coordinate space simulations attainable by selecting a hybrid scheme here and 2 for CARTINT. Some improvement can be made by including geometric constraints in Cartesian space, but a rigorous match will generally be out of reach.
Technically, the simulation simply alternates between MCbased and dynamicsbased segments whose minimum and maximum lengths are controllable by the user (→ keywords CYCLE_MC_FIRST, CYCLE_MC_MIN, CYCLE_MC_MAX,CYCLE_DYN_MIN, and CYCLE_DYN_MAX).
6) Minimization: A potential caveat lies in velocity autocorrelation. The method is implemented such that segments of MC sampling alternate with MD segments. Upon switching from MC to MD, new velocities are assigned from the proper Boltzmann distribution. This may introduce some amount of noise. Aside from this particular concern, all independent concerns about both Monte Carlo and dynamicsbased methods apply. It is up to the user to ensure that either sampler yields the required ensemble rigorously.
A particular concern lies with the selection of degrees of freedom. In general, it will be highly desirable for the set of sampled degrees of freedom to be exactly identical between the two samplers. This is not always possible, however, e.g., when sampling sugar pucker angles in MC, but not in dynamics. In these scenarios it will be desirable to use short segments lengths in order to improve the chances of convergence (in the given example, convergence is unlikely if long dynamics segments only "see" few frozen conformations of the sugar pucker states in the system). This issue is particularly difficult in mixed Cartesian/internal coordinate space simulations attainable by selecting a hybrid scheme here and 2 for CARTINT. Some improvement can be made by including geometric constraints in Cartesian space, but a rigorous match will generally be out of reach.
Technically, the simulation simply alternates between MCbased and dynamicsbased segments whose minimum and maximum lengths are controllable by the user (→ keywords CYCLE_MC_FIRST, CYCLE_MC_MIN, CYCLE_MC_MAX,CYCLE_DYN_MIN, and CYCLE_DYN_MAX).
This uses the potential energy gradient to
steer the system to a near minimum through a
variety of techniques (see MINI_MODE).
Minimization is not a technique to sample phase
space in terms of a welldefined ensemble, and the closest approximation of its results is
probably that of a locally sampled constantvolume (NVT) condition at extremely low temperature.
In general, minimizers are apt at finding local, but not global minima.
Note that these algorithms are still numerically discrete schemes, i.e.,
they employ finite step sizes. This means that
irrespective of any theoretical guarantees or expectations an algorithm offers,
results may not always be as straightforward. In addition, minimizers
are poor tools if the basic step sizes should be heterogeneous for different
degrees of freedom, e.g., for a dilute phase of LennardJones atoms or clusters.
7) Mixed Monte Carlo and Langevin Dynamics: This is analogous to 5) only that Newtonian
dynamics are replaced with Langevin dynamics (see 3). (example reference)
To be added in the future are:
4) Brownian Dynamics
Note that in all of the above methods relying on forces (options 27), it is very likely that optimized loops will be used (depending on settings for the Hamiltonian). These currently have the property of using few stackallocated array variables that may become large if cutoff settings are very generous or if no cutoffs are in use. This may lead to unannotated segmentation faults (depending on compiler, architecture, and local settings). There are several workarounds (on Unixsystems, the shell command "ulimit" can for example be used to increase stack size for the local environment) some of which will be compilerspecific (for example to force the compiler to always allocate local arrays from the heap). Stack access is faster and therefore generally desirable in the speedcritical portions of the code.
MC_ACCEPT
If the simulation uses (at least partially) Monte Carlo sampling, this very important keyword allows the user to choose between (currently) three different types of acceptance rules for MC moves that are as follows: The Metropolis criterion is used. A random number sampled uniformly over the interval is compared to the term c_{b}·e^{β ΔU}. Here, ΔU is the difference in (effective) energy of the new vs. the original conformation (U_{new}  U_{old}), β is the inverse temperature, and c_{b} is a bias correction factor that is specific to the move type. If the random number is less than the term above, the move is accepted. Note that c_{b} can encompass different types of bias. It is also important to keep in mind that some advanced move types may imply incorporating biasing terms during the picking of a new conformation (see TORCRMODE), and no longer show up in c_{b}. The Metropolis criterion has the advantage that it is rejectionfree in the limit of no energetic or other biases. With a nonzero energy function in place, the distribution sampled from is the Boltzmann distribution.
 A Fermi criterion is used. A random number sampled uniformly over the interval is compared to the term (1 + c_{b}^{1}·e^{β ΔU})^{1}. If the random number is less than the term above, the move is accepted. The Fermi criterion's only advantage over the Metropolis criterion is that it defines an actual probability on the interval [0,1]. The downside is that the limiting acceptance rate is only 50%. However, the impact is much weaker if ΔU is relatively large on average (in absolute magnitude). The sampled distribution is again the Boltzmann distribution.
 A WangLandau / Metropolis criterion is used. A random number sampled uniformly over the interval is compared to the term c_{b}·e^{β Δln T} or to the term c_{b}·e^{β ΔU  Δln T} (see keyword WL_MODE). Here, Δln T is the difference in the logarithms of the current and proposed estimates of the target distribution (e.g., the density of states), i.e., Δln T = ln T_{new}  ln T_{old}. The WangLandau algorithm is explained in detail elsewhere, but it should be pointed out that the sampled distribution is no longer the Boltzmann distribution (instead it is illdefined, and the simulation results require snapshotbased reweighting), the simulation does not satisfy detailed balance (the estimate of the density of states changes continuously), and convergence/errors are much more difficult to assess (since the method is essentially an iteration and not an equilibrium sampling scheme). It is crucial to keep in mind that the standard Metropolis criterion is used while the simulation has not exceeded the number of equilibration steps. This is mostly to avoid range problems when starting from random initial configurations.
FRICTION
This keyword allows the user to specify the uniform damping coefficient acting on all degrees of freedom. The value is interpreted to be in ps^{1}. Currently, this is only relevant if DYNAMICS is set to either 3 or 7. In Langevin dynamics, the velocity damping through friction is given by e^{γ·δt}. Here, γ is the damping coefficient, and δt is the integration time step (see TIMESTEP). Note that in Cartesian dynamics (see CARTINT) each degree of freedom is an orthogonal direction of the Cartesian movement of each atom. Typically, Langevin dynamics integrators may make the friction on those degrees of freedom dependent on atom mass but CAMPARI does not support this at the moment since the hydrodynamic properties of individual atoms are poorly described in any case. Conversely, in torsional dynamics, the rigidbody and torsional degrees of freedom of each molecule are integrated and the friction is applied uniformly to all of those. This means that hydrodynamic properties are  again  illrepresented. Bias torques on account of variable effective masses for most dihedral angle degrees of freedom will continue to be in effect (see elsewhere).When applying Stokes' law (which should be inapplicable when the diffusion object is strongly aspherical and/or of similar size compared to the molecules comprising the surrounding fluid) to the selfdiffusion of water, the measured diffusion constant of around 2.3·10^{9} m^{2}s^{1} is roughly consistent through the EinsteinStokes equation with the measured viscosity of about 8.9·10^{4} kgm^{1}s^{1} (both at 25°C). By dividing by the mass, a damping constant of about 90ps^{1} can be obtained from the Stokes approximation. When performing stochastic dynamics simulations of large, spherical rigid bodies, such a value may be appropriate. For molecular simulations, however, it is not. First, in conjunction with typical time steps, the value is so large that the impulse integrator in use (→ DYNAMICS) can no longer sample the correct ensemble (it becomes overdamped implying temperature artifacts). Second, in a Cartesian treatment, unless one samples a monoatomic fluid of inert particles, the correlations between particles are so high that a treatment as independently diffusing spheres is not just inaccurate, but nonsensical in the absence of hydrodynamic interactions. Third, in internal coordinate spaces, the individual degrees of freedom hardly ever fit the Stokes approximation. Torsional and rigidbody rotational degrees of freedom would require a completely different model of friction. Furthermore, unlike in a Cartesian treatment, the degrees of freedom are not all similar to one another. The above means that the damping constant should be understood as an empirical parameter. Better control over values for individual degrees of freedom will be implemented in the future. It defaults to a value of 1.0 ps^{1} on par with the coupling times of thermostats in molecular dynamics (→ TSTAT_TAU).
CYCLE_MC_FIRST
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the length of the first segment (in number of steps) which is always a MC segment. This is to ensure that hybrid runs can safely be started from poorly equilibrated (random) structures where forces are large and integrators quickly become unstable.CYCLE_MC_MIN
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the minimum length of MC segments (in number of steps) with the exception of the first segment.CYCLE_MC_MAX
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the maximum length of MC segments (in number of steps) with the exception of the first segment.CYCLE_DYN_MIN
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the minimum length of dynamicsbased segments (in number of steps). This should probably be significantly larger than the velocity autocorrelation time of the system.CYCLE_DYN_MAX
If a hybrid MC/M(B,L)D method is used (see DYNAMICS), this keyword controls the maximum length of dynamicsbased segments (in number of steps).PH
This keyword sets the assumed simulation pH which currently possesses significance for titration moves only → PHFREQ. This keyword may later be extended to represent the assumed (bath) pH in constantpH simulations.IONICSTR
This keyword sets the assumed simulation ionic strength for simplified pK_{a} computations. The units are molar (M). Ionic strength is used in a grossly simplified DebyeHückel approach to estimate crossinfluences between multiple ionizable sidechains on a polypeptide ( see PHFREQ). Note that this keyword can not be used to set an assumed ionic strength for the generalized reactionfield method (see RFMODE).RESTART
This keyword is a simple logical indicating whether to restart a previously discontinued run.It tells the program to attempt to restart a simulation which was accidentally or intentionally terminated. The program writes out ASCIIfiles containing relevant information in comparatively high precision (see RSTOUT). This file (one for each node in MPI calculations) is called {basename}.rst (see elsewhere). If it is successfully read, the simulation is extended from the simulation step the file was last written for. Nonsynchronous MPI runs are synchronized to the step number of the slowest node. Note that instantaneous output of the crashed run should be saved separately (i.e., moved to another directory) since with the exception of running trajectory pdb/xtc/dcdoutput new files will replace the old ones. All noninstantaneous analysis of the crashed run is unfortunately lost. The simulation will then proceed starting effectively at that step, so the same keyfile (with the exception of the RESTARTkeyword itself of course) can be used. If it is past the equilibration step, onthefly analysis will begin immediately. Final output will reflect only the restarted portion of the run. The program will acknowledge in the logfile that it's restarting, and will post a warning message if the energies of the structures reported in the restartfile and recomputed by the program are inconsistent. Note that it is  rigorously speaking  only safe to restart the exact same calculation, since the information contained in the restart file will depend on the type of calculation performed. It will often be possible to start MC runs (see DYNAMICS) from a nonMC restart file, however. For the opposite and all other cases, consider using the auxiliary keyword RST_MC2MD.
It should be noted that these restarts are not fully deterministic meaning that they deviate from the original run if it had continued for more steps (this is typically unknown of course). The reasons for this are several. First, no information about the state of the random number generator is preserved. This affects Monte Carlo and Langevin dynamics sampling , stochastic thermostats, and so on. Second, the information in the restart files is not printed to full double precision (this has historical reasons). This means that even a conceptually deterministic simulation will start to deviate after some number of steps (depending on the system). Third, if the shared memory parallelization is used, the balancing of load is initialized and reoptimized as it would be at the beginning of the simulations. This leads to a different sequence and blocking of computations, which subtly affects sums, for example. Fourth, as a related point, the order and grouping of compute tasks are architecture and compilerdependent. This means that code using different optimization levels or simply a different compiler is not the same at the machine level, and consequently the results are not the same either. Much more dramatic deviations are obtained by enabling aggressive optimization settings, for example those that reduce the precision of builtin mathematical functions. While the first two points could be avoided easily, the latter two are essentially insurmountable with presentday computers. One way to state this deficiency is to redefine numerical reproducibility as the matching of a reference result with finite accuracy, i.e., for suitably rounded results to be the same. The required level of rounding depends on the "depth" of the calculations, i.e., on how often inaccurate results are reused. This technique was used extensively in debugging the OpenMP parallel code across different architectures and compilers.
RST_MC2MD
This is a rather specialized keyword meant for the specific case of (re)starting a dynamics run from a restartfile generated by an MC run. In this case, the restart file is shorter and only contains atomic positions, the Zmatrix, and whatever else is necessary. When set to 1, this keyword instructs the restartfile reader to assume the MC format even though the run is set to be a dynamics run (see DYNAMICS). Initial velocities are then generated from a Boltzmann distribution using the bath temperature (see TEMP). Ff this keyword is not set, an attempt to read mismatched restart files will crash the program (most likely in a segmentation fault). This is due to the assumed rigid formatting. The inverse procedure (reading a restart file generated by a dynamics run as the starting point for an MC run) is currently not supported. Note that the typical application for this is to use MC for equilibration of a system and to continue the run using a dynamics sampler. In singleCPU calculations, this simplifies the overall procedure and avoids using the generally lowprecision pdb format as an intermediate step (although this can be adjusted with keyword PDB_OUTPUTSTRING). For some replicaexchange runs (see REMC), restart files are actually the only option which allows starting the individual nodes from individual, nonrandom conformations stored in an input file. The primary application for this keyword therefore probably lies in replicaexchange molecular dynamics runs which use ReplicaExchange Monte Carlo runs for equilibration purposes.DYNREPORT
This minor keyword is a simple logical which ensures that in calculations with different temperaturecoupling groups a summary is provided of the partitioning in that regard.CHECKGRAD
This keyword is a simple logical which instructs CAMPARI to test the gradients for the current calculation given the Hamiltonian, system, and starting structure. It tests Cartesian gradients first, followed by the transformed gradients acting on the internal degrees of freedom (if settings allow that: see CARTINT).It is mostly for developer's usage and creates at most two undocumented output files: NUM_GRAD_TEST_XYZ.dat and NUM_GRAD_TEST_INT.dat). The procedure works by numerically computing gradients using pure energy routines (finite differencing) and juxtaposing the analytical solution. It is slow and can sometimes be misleading or uninformative for the following reasons:
 For just a single molecule, rigidbody gradients are always net zero (outside of boundary contributions).
 The dynamics Hamiltonian must be identical to the MC Hamiltonian (in particular see LREL_MC and LREL_MD).
 For Cartesian gradients to be accurate, no strictly torsional space Hamiltonian terms should be used (see for example SC_ZSEC and SC_TOR). For those, Cartesian gradients are circumvented unless CARTINT is 2.
UNSAFE
This keyword is a simple logical (default off) which allows selected fatal errors to be transformed into warnings (for example the simulation of systems which are not netneutral). It should be used with caution (obviously) and the logoutput should always be studied meticulously. In addition, enabling unsafe execution may skip some costly sanity checks, e.g., when reading in trajectories in pdb format.CRLK_MODE
CAMPARI currently provides limited support in dealing with chemical crosslinks which either create one (or multiple) intramolecular loops, or link multiple molecules together. For forcebased sampling in Cartesian space only (see CARTINT and DYNAMICS), this functionality matters exclusively for the following reasons: A chemical crosslink can be thought of as a branch in the mainchain. Such nonlinear polymers violate CAMPARI's model of identifying topologically connected sequence neighbors purely based upon primary sequence. Therefore, nonbonded interactions have to be corrected if the two residues in question are crosslinked to each other (to comply with the settings provided via INTERMODEL and ELECMODEL). This is supported by CAMPARI independent of crosslink type (even though there currently are only disulfide linkages supported → sequence input).
 A single intermolecular crosslink essentially merges two molecules into a single one. However, CAMPARI continues to treat both chains as if they were independent molecules. This has a variety of reasons most of which pertain to the consistency of internal data representation and to the support of internal analysis routines. One area where this is tricky is for simulations in periodic boundary conditions (→ BOUNDARY), as shift vectors are generally applied only to intermolecular contacts. For two crosslinked molecules, this continues to be the case thereby allowing  given a poor simulation system setup  the theoretical possibility of one of the two crosslinked molecules to interact with parts of different images of the other molecule. Trajectory output may also appear confusing for the same reason.
 New bonded interactions are created which have to be correctly accounted for. In accordance with the previous point this implies that distance vectors have to be imagecorrected in periodic boundary conditions even for those. For the crosslink to be actually established it is necessary that the parameter file offer support for the required bond length, angle, and dihedral terms. This is of course true for any topological interaction in a Cartesian treatment. Request a report to obtain more information at the beginning of the simulation.
 For random initial structures it will be necessary for the crosslink to be satisfied to allow stable integration of the equations of motion. This is elaborated upon elsewhere.
 If the ABSINTH implicit solvation model is used (→ SC_IMPSOLV), the crosslink usually modifies two solvation groups (one on each "side") to yield a single new unit. CAMPARI will typically split this group such that the solvation groups may remain associated with their "host residue".
 The crosslink is treated as restraints and the sampler is unaware of its explicit existence.
 The crosslink is treated as a set of (hard) constraints and the sampler is adjusted to preserve these constraints. This mode is currently under development and not yet supported.
The latter is the primary reason for supporting mode 2 in the future. Here, the move set will be explicitly adjusted to only allow moves which automatically satisfy the crosslink exactly. For torsional dynamics this option will be less useful as CAMPARI does not possess the capability to enforce highlevel loop closure constraints in torsional space and consequently all residues within the loop region would have to be completely constrained for the crosslink to remain intact exactly.
BIOTYPEPATCHFILE
This simple keyword lets the user provide the location and name of an optional input file that can be used to (re)set the assigned biotypes for specific atoms or groups of related atoms in the system. The corresponding biotype number has to be available (listed) within the parameter file in use. Biotypes are the most fundamental assignment for atoms within in CAMPARI and can indirectly set many other properties such as charge, mass, etc. This is explained in detail elsewhere. However, there are parameters not affected by biotype assignment, specifically the default geometries and parameters derived from them. This means that it is generally impossible to, for example, mutate a molecule into a different molecule using such patches. Applications of this type may be more feasible for simulations in Cartesian space.The main domains of application for biotype patches are twofold. First, they allow the fastest and most convenient route to include parameter support for atoms in residues not supported natively by CAMPARI (→ sequence input). Second, they allow to diversify a parameter file regarding natively supported residues, .e.g., by maintaining multiple parameterizations for a small molecule or by including extra distinctions for atoms in terminal polymer residues. Biotype patches are applied first and may be largely overridden by successive application of other patches, e.g., atom type patches, charge patches, etc.
MPATCHFILE
This simple keyword offers the user to provide the location and name of an optional input file that can be used to alter the masses of specific atoms in the system (in g/mol). Normally, masses are chosen for atoms based on the assigned atom types in the parameter file, and this behavior can be overridden by this keyword specifically for atomic mass. Note that this different from changing the atom type of the atom itself, for which a dedicated patch facility is in place. Some more details are given elsewhere.RPATCHFILE
Similar to keyword MPATCHFILE, this simple keyword offers the user to provide the location and name of an optional input file that can be used to alter specifically the radii of individual atoms in the system (in Å). By default, these radii are inferred either from the assigned atom types, i.e., computed from the LennardJones size parameters, or they are overridden at the level of the parameter file by the "radius" specifications. Because the latter still operate at the resolution of assigned atom types, this keyword offers an atomspecific override facility. Note that there is a distinct hierarchy to this. Specifically, changing the radius via a patch does not change the atom type for that atom. It does, however, alter the default values of parameters that depend on radius, such as maximum SAV fractions or atomic volume reduction factors, which are then again patchable themselves. Furthermore, a radius patch overrides a radius inferred by applying a patch to the LennardJones parameters of a specific atom. Details on the input are given elsewhere.WL_MODE
By specifying the WangLandau acceptance criterion for a (partial) Monte Carlo run, the WL method is enabled. This keyword defines the reaction coordinate of choice and the coupled pair to be iterated (see below). Suppose we have an augmented Hamiltonian as follows:H = K + λE + X(Y)/β
Here, K and E are kinetic and potential energies, β is the inverse temperature, and X(Y) is an unknown function of a selected reaction coordinate. The factor λ can be either 0 or 1. Assuming that the Hamiltonian is separable, expected sampling weights from the Boltzmann distribution for the augmented Hamiltonian are:
w(Y_{1})/w(Y_{2}) = (p_{λ}(Y_{1})/ p_{λ}(Y_{2})) exp[X(Y_{2})−X(Y_{1})]
Here, p_{λ}(Y) is the expected probability (usually treated numerically as the integral over a finite interval, i.e., by binning). If λ is 1, it corresponds to the equilibrium (Boltzmann) probability for the original Hamiltonian. Conversely, if it is 0, p_{λ}(Y) corresponds to the density of states (distribution as T→∞). If Y=E, p(E) can be written simply as p(E) = g(E) exp(λE/β), with g(E) being the density of (energy) states. This simple form is not available for other reaction coordinates. The WangLandau method's key ingredient is choosing X(Y) such that w(Y_{i})/w(Y_{j}) = 1 ∀ i,j over an interval of interest. This statement is equivalent with the definition of a flat walk in the space of Y. A flat walk eliminates all barriers in the projected space of Y and should therefore be efficient at exploring phase space (see associated keywords for details on this). The main use of the flatness is as a diagnostic, however, and the WangLandau algorithm uses X(Y) and the apparent distribution in Y as a coupled pair to iteratively build up X(Y). If the apparent distribution becomes flat, confidence rises that X(Y) corresponds to the target distribution of interest. The target distribution is set by this keyword:
 The target distribution is ln g(E) (arbitrary offset). This is achieved by letting λ be zero and Y=E. This is also the implementation chosen in the original publication. Interest in the density of states comes from the fact that it (theoretically) enables reweighting of the flatwalk ensemble to any condition of interest. This is the default.
 The target distribution is ln p(Z) or ln p(Z_{a},Z_{b}) (arbitrary offset), where the Z are geometric reaction coordinates (→ WL_RC) restricted to specific molecules (→ WL_MOL). By letting λ be unity, the target distribution is actually the potential of mean force (PMF) for that (pair of) reaction coordinate(s). Unlike for umbrella sampling (see, e.g., Tutorial 9), it is obtained without further postprocessing. This variant was introduced here. As stated, it is possible to estimate a twodimensional target distribution.
 The target distribution is ln p(E) or ln p(E,Z) (arbitrary offset). This is achieved by letting λ be unity and Y=E. In comparison to the first option, this will oversample low likelihood states rather than low degeneracy states. It can be combined with a geometric reaction coordinate (Z) in a twodimensional approach.
A few technical comments are necessary. First, the WangLandau acceptance criterion can be combined with a hybrid sampling technique. In such a case, the dynamics segments will propagate the system as usual, but will contribute in no way to the WangLandau histograms. They merely serve to evolve the system to find new states that may be hard to access given the Monte Carlo sampler. The MC segments will utilize the WangLandau criterion and increment the histograms. As a result, it may be possible that a dynamics segment starts in a high energy state. This may make the integrator unstable initially, and cause unforeseen crashes. Second, WangLandau sampling is also supported in parallel runs. For pure Monte Carlo simulations, the MPI averaging technique implies a parallel WangLandau implementation, i.e., an implementation in which the histograms are updated globally. WangLandau sampling is also supported in conjunction with the replicaexchange method, but here each replica is confined to its own iterative WangLandau procedure (since the Hamiltonians are most likely different).
WL_MOL
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, and if a molecular reaction coordinate was chosen as the histogram to consider (→ WL_MODE), this keyword allows the user to select the molecule that the reaction coordinate is computed on. The numbering of molecules follows the userselected sequence in sequence input. Note that it is up to the user to ensure that the chosen reaction coordinate is defined and has a meaningful range for the chosen molecule (see WL_MAX, WL_EXTEND, and WL_BINSZ). If a twodimensional variant with two geometric reaction coordinates is chosen, it is theoretically possible to supply two different molecules here. Note that the effective coupling is likely to be low in this scenario, which may lead to poor convergence properties in the 2D space. In conjunction with WL_MODE being 3, specification of a legal entry for WL_MOL will extend the WL estimation of ln p(E) to a twodimensional case with an additional, geometric reaction coordinate (ln p(E,Z)). Note that this keyword is the only way to control the dimensionality for WL_MODE being either 2 or 3.WL_RC
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, and if a molecular reaction coordinate was chosen as the histogram (or as one or both axes of the 2D histogram) to consider (→ WL_MODE), this keyword allows the user to select amongst few geometric reaction coordinates as follows: The molecule's radius of gyration is used (default). The range of this quantity is difficult to predict and depends on the constraints in the system. For example, in Cartesian space, it will be advisable to restrict the range of the histograms (→ WL_MAX and WL_EXTEND) to those values that do not coincide with steric overlap (low end) or stretching of bonds (high end).
 The molecule's mean αcontent is used as defined for the global seconday structure biasing potential. The quantity always has finite range, but for small systems and typical settings, it exhibits sharp spikes connected by low likelihood regions that may challenge the discretization of the WL scheme.
 The molecule's mean βcontent is used. See previous option for details and caveats.
WL_HUFREQ
This is one of the keywords that controls the convergence properties of a WangLandau run. The target distribution in question is accumulated as a histogram (always logarithmic), and this keyword sets the frequency (step interval) for updating it with the current value of the f parameter, i.e., the current increment size (equivalent to multiplication by f in the linear space). The accumulation of the target distribution begins only after the equilibration phase has passed. Naturally, a small setting here will quickly increment the histogram, which may accelerate convergence (in case the effective "mobility" of the system defined by system properties and sampling engine is good enough). However, a small setting may also interfere with convergence because it emphasizes the noise in initial estimates of the target distribution (in absolute magnitude), and this may make it harder to refine the guess upon reductions of the f parameter (see WL_HVMODE and WL_FREEZE). The default choice is 10 elementary steps. Note that if the parallel WangLandau implementation is used, the step number provided refers to the sampling amount for each individual node.WL_HVMODE
This is one of the keywords that controls the convergence properties of a WangLandau run. It has been argued that the flatness of the accumulated histogram for the target distribution in question (usually tested via some maximum relative deviation criterion) is not generally useful as a criterion for considering a switch to the next stage of refinement (by lowering the f parameter), and can be replaced with a recurrence (minimum visitation) criterion (discussed for example in Zhou and Bhatt). This keyword selects two different options for such a recurrence criterion. Option 2 requires each (relevant) bin to be visited exactly once in every stage, whereas option 1 mandates that each bin be visited the nearest integer of 1/sqrt(f) times (at least once, though). In the parallel parallel WangLandau implementation, the condition will always be checked against the combined data. If the condition is fulfilled, and if the number of postequilibration WangLandau steps exceeds the buffer setting, ln f will be reduced (initial value set by keyword WL_F0) by a factor 2. Note that the f parameter is implied to operate on a logarithmic scale (same as target distribution) of counts to avoid numerical issues with large numbers. The rule used here is equivalent to the square root update rule suggested in the original publication. Belardinelli and Pereyra suggest that the exponential update becomes inappropriate for small f and CAMPARI implements their suggestion to switch over to f ∝ 1/N_{steps}, where N_{steps} is the current number of WL steps having being executed. In the parallel parallel WangLandau implementation, this implies the combined total of WL steps from all replicas. This modified update rule is implemented irrespective of the fulfillment of the criterion defined by WL_HVMODE.It is useful to keep in mind that option 1 will initially lead to fewer reductions of the f parameter, which may be beneficial for establishing correctness, and at the same time may be harmful for the rate of convergence. An issue often affecting convergence adversely are verylowlikelihood bins. In this context, it should be emphasized that the relevance of a bin toward defining flatness is partially controlled by keyword WL_FREEZE, which consequently serves two purposes, and partially controlled by the general range settings (WL_MAX, WL_EXTEND, and WL_BINSZ).
WL_FLATCHECK
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword can be used to control the step interval at which the evaluation of the visitation criterion for the temporary histogram is performed. If the parallel WangLandau implementation is used, this coincides with the requirement to (at least temporarily) combine the data from all replicas and therefore imposes a communication requirement. Should a check return a positive result, the temporary histogram is added to the overall estimate, the temporary histogram is reset to zero, and the f parameter is altered as described elsewhere. In the parallel version, additional operations are performed to broadcast the new total (combined) histogram identically to all replicas. In case the criterion is not fulfilled, the temporary histogram(s) is (are) left unchanged.The technical use of this keyword is twofold: First, to reduce communication requirements for the parallel implementation; second, to artificially delay the progression of the iteration. The latter can sometimes be useful for complex systems with strong degeneracy in the chosen reaction coordinate (also see WL_RC). Note that for the parallel code the step number provided refers to the sampling amount for each individual node.
WL_F0
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword defines the starting value for the f parameter (logarithmic). The f parameter is meant to decay from some positive number to 0, which corresponds to multiplicative factors larger than 1 reducing to 1 in the linear space. The default is 1.0. The number of reductions of the f parameter by the exponential rule (see elsewhere) is printed to log output. Depending on the properties of the system and the resultant convergence rate, the rule may change as described for WL_HVMODE.WL_MAX
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword sets the (initial) upper bound (given as the bin center of the last bin) of the energy or reaction coordinate histogram (→ WL_MODE and WL_RC). At the beginning, 100 bins of equivalent size are created. Depending on the choice for WL_EXTEND, the histogram and its upper limit may be extended throughout the simulation. It is safe to extend the histogram to values that are impossible to realize for the system in question, since bins that are strictly empty do not meaningfully contribute to the algorithm (see WL_FREEZE). CAMPARI accepts two separate entries for any 2D histogram. Note that the choice for this keyword may be overwritten if a dedicated input file is used to set an initial guess for the target histogram (→ WL_GINITFILE). The maximum value that will not trigger a range exception or an automatic histogram extension is of course the value given here plus half the relevant bin size.WL_BINSZ
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword sets the fixed bin size for the energy or reaction coordinate histogram (→ WL_MODE and WL_RC). At the beginning, 100 bins are created. Depending on the choice for WL_EXTEND, the histogram and its lower and upper limits may be extended throughout the simulation. However, the bin size will remain fixed. CAMPARI accepts two separate entries for any 2D histogram. Note that the histogram bin size and the initial number of bins may be overwritten if a dedicated input file is used to set an initial guess for the target histogram (→ WL_GINITFILE).WL_EXTEND
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword controls whether the energy or geometric reaction coordinate histogram (→ WL_MODE) is allowed to grow in range during the simulation. Choices are as follows: The histogram is fixed. Note that any WangLandau simulation performed over a restricted interval bares the danger of generating incorrect results even after reweighting. For common interaction potentials and standard energybased WangLandau sampling, this is particularly true for truncation of the energy histogram on the lower end.
 The histogram is allowed to grow only towards lower (more negative) values. This can be useful for energy histograms, where the initial energy range is not known.
 The histogram is allowed to grow in both directions. It is strongly recommended not to use this feature for energy histograms with a realistic interaction potential (since the energy is unbound on the positive side, and memory exceptions / segmentation faults are likely). This option is meant primarily for histograms defined purely on geometric reaction coordinates (→ WL_MODE).
WL_GINITFILE
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword allows the user to replace the default initial guess for the (logarithmic) target distribution with a usersupplied one. The default guess is flat. Supplying a nonflat guess can be useful in several scenarios: i) ongoing refinement of a WL run; ii) cases where a more useful "zero order guess" is available, e.g. an exponentially growing function for a condensed phase system with inverse power potentials; iii) convergence tests. The details regarding the format of this input file are provided elsewhere.WL_FREEZE
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this keyword controls whether the range of bins in the energy or reaction coordinate histogram (→ WL_MODE) that is considered for proceeding to the next iteration stage (updating the value of the fparameter) is fixed after the first such update or not. The update procedure is described for keywords WL_HUFREQ, WL_HVMODE, and WL_FLATCHECK.Any positive integer specified here will prescribe a minimum number of preliminary simulation steps beyond equilibration that must be exceeded before an update of the fparameter is considered. After such an update, the range of bins considered for the histograms is the continuous one (and it must be continuous on account of the update rule) currently populated. If during further simulation steps additional bins were to be visited, those moves are instead considered as range exceptions and are rejected (the summary statistics provided in logoutput for range exceptions can therefore contain results from two different contributions → WL_EXTEND). Any negative number provided will specify by its absolute value the aforementioned minimum number of preliminary steps in identical fashion. However, in this case, CAMPARI is instructed to allow further bins to be added for consideration during later stages of the algorithm. Note that this violates the refinement idea behind the WangLandau scheme, and can lead to severe convergence problems due to the numerical mismatch created by the extra bin "missing out" on fincrements during early stages of the algorithm. It is therefore strongly recommended to choose a relatively large and positive number for this keyword (to ensure that appropriate coverage of the accessible range has been reached).
Note that if the parallel WangLandau implementation is used, the step number provided refers to sampling amount for each individual node.
WL_DEBUG
If a WangLandau acceptance criterion is used for a (partial) Monte Carlo run, this simple logical allows the user to request debugging information regarding the WangLandau iterative algorithm. If turned on, CAMPARI will report in logoutput the progression through the various updating stages and may  depending on settings  also write temporary output files for the relevant histograms.Box Settings:
(back to top)
BOUNDARY
Every simulation has to occur within an explicitly or implicitly defined, finite volume. CAMPARI presently supports different ways of achieving such a finite volume listed below. For constant volume ensembles (→ ENSEMBLE), the (formal) volume remains exactly constant throughout the simulation. This does not imply that volume remains a meaningful paramerter under all circumstances, e.g., if phase separation occurs. For the type of boundary condition, there are currently three supported options and one quasiobsolete mode: Periodic boundary conditions (PBC):
This is the most commonly used boundary condition in molecular simulations. Here, the generally polyhedral simulation cell is assumed to be replicated as a  theoretically infinite  periodic system around the central one (which constitutes the actual, physical simulation container). Partial periodicity is also possible with other walls implemented as restraints. This is theoretically applicable to many different containers including polyhedra but only supported for periodic cylinders at the moment (SHAPE is 3 and BOUNDARY is 1). The implementation is such that all distance calculations along periodic dimensions are amended by determining the smallest distance amongst those between a particle and any of the replicated images of another particle. This socalled minimum image convention implies that for normal pairwise interaction potentials (for example SC_IPP) a particle only interacts with at most one "version" of another particle, never two or more. The idea of PBC is borrowed from crystals in which the assumption of periodicity is justified given that the simulation volume can be chosen such that it coincides with the crystal's unit cell (or exact multiples thereof).
Conversely, in liquids there is no persistent longrange order (homogeneous density, no pair correlations), and the approximation of a system of thermodynamic size by infinite replication of a nanoscopic system is at least questionable. Given typical cutoff schemes, however, the contribution of longerrange interaction is often exactly zero unless explicit techniques are used enumerating the periodic sum (→ Ewald summation, which is the only feature for which CAMPARI currently calculates interactions beyond the minimum image convention). This means that the actual impact of PBC is often just to mimic a continuous environment for particles close to the edge of the physical simulation volume. Note that no realspace interaction cutoff should exceed half the shortest linear dimension realizable in the simulation volume since otherwise it becomes possible for multiple images of the same particle to be within interaction distance. In conjunction with the minimum image convention cited above, this invariably leads to artefactual results (reference). Note that in CAMPARI the convention of using the nearest image operates at the molecule level, i.e., the general rule is that intramolecular distances always refer to atoms in the same image of a molecule. CAMPARI will occasionally warn users about cases where an image interaction would be within the cutoff distance, but these warnings are not part of all routines (for efficiency reasons). Enabling boxconsistent trajectory output may help in diagnosing such issues independently.
 Hardwall boundary condition (HWBC):
This option is obsolete and cannot be selected. It may be reactivated in the future to enable simulations in containers with hard, particle momentumconserving (i.e., reflective) walls.  Residuebased softwall boundary condition (RSWBC):
In simulations employing a continuum description of solvent, the resultant density is almost always low, in particular in the limit of simulating just a single macromolecule. In those cases, it may neither be meaningful nor beneficial to introduce additional replicas of the simulation cell. CAMPARI offers to define a systemvolume via a softwall for such a scenario. Here, the simulated particles are prevented from leaving a simulation container (most often a spherical droplet) by an applied boundary potential modeled as follows.
Spherical case:
E_{BND}^{Sphere} = Σ_{i} k_{BND}·H(r_{i}r_{D})·(r_{i}r_{D})^{2}
Here, r_{i} is the distance from a suitable reference point on residue i to the simulation sphere's origin, r_{D} is the sphere's radius, k_{BND} is the force constant and H(x) is the Heaviside step function.
Rectangular box case:
E_{BND}^{Box} = Σ_{i} Σ_{j=1..3} k_{BND}·H(d_{i,j}L_{j}/2)·(d_{i,j}L_{j}/2)^{2}
Here, d_{i,j} is the j^{th} element of the distance vector of the reference point on residue i to the center point of the box (note that by convention the lower left corner serves as origin of the box), and the L_{j} are the side lengths.
Nonperiodic cylinder case:
E_{BND}^{Cylinder} = Σ_{i} k_{BND}·[H(d_{i,z}h)·(d_{i,z}h)^{2} + H(r_{i,xy}r_{C})·(r_{i,xy}r_{C})^{2}]
Here, d_{i,z} is the zelement of the distance vector of the reference point on residue i to the middle of the cylinder (cylinder axis always aligns with zaxis), r_{i,xy} is the distance of the same point from the cylinder axis in the xyplane, and h and r_{C} are height and radius of the cylinder, respectively.
Partially periodic cylinder case:
E_{BND}^{Periodic Cylinder} = Σ_{i} k_{BND}·[H(r_{i,xy}r_{C})·(r_{i,xy}r_{C})^{2}]
The nomenclature is the same as for the nonperiodic cylinder. The partially periodic cylinder has a periodic boundary in the zdirection, and the corresponding term is thus missing from the boundary potential.
In general, hardwall boundaries may be approximated by letting k_{BND} → ∞. This will deteriorate integrator stability in gradientbased simulations, however. Choosing a RSWBC means that the boundary penalty is imposed on the reference atom of each residue (for peptide residues this is always Cα). This can lead to potential boundary artifacts with parts of large residues sticking out of the sphere and hence being deprived of interactions with smaller residues. Additionally, it must be pointed out that softwall boundary conditions lead to somewhat illdefined system volumes since the code assumes the fixed volume inside the boundary to be the system volume whereas realistically it should be slightly extended depending on temperature and stiffness. The latter is not easily computed, however, since 1) the purely kinetic (entropic) pressure may be altered by the presence of nonrigid molecules, and 2) the virial pressure is generally unaccounted for. Hence, an exact volume is only recovered in the limit of an infinitely stiff boundary (HWBC).  Atombased softwall boundary condition (ASWBC):
This option is analogous to the previous (RSWBC) option only that the boundary term is computed for each atom in the system rather than for the reference point on each residue in the system (formulas are not repeated). This will minimize artifacts of the aforementioned type, but it is also the most expensive droplet BC to compute. Because multiple atoms will contribute to the boundary penalty for each residue, it is generally recommended to use smaller force constants than for the RSWBC.
SHAPE
This keyword lets the user specify the shape of the simulation container the system is enclosed in. The available choices for SHAPE depend on the boundary condition selected (→ BOUNDARY). At the moment, choices are as follows: Rectangular cuboid (= rectangular parallelepiped): This container is supported with both periodic boundary conditions (PBC) and soft walls.
 Sphere: This container is only available with soft walls.
 Cylinder: This container is available with soft walls or with partially PBC along the cylinder axis, which always aligns with the zdimension, and atombased soft walls elsewhere (BOUNDARY is 1).
ORIGIN
This keyword lets the user set the origin of the simulation system as a vector of three elements (x, y, and z). The reference point depends on the container's shape and is its origin for a sphere, its lower left corner for a cuboid, and the center of its central circular cross section for a cylinder. Note that for simulations started from "scratch" (no structural input), this keyword is mostly irrelevant. There are two things to consider, though: Structural output may be compromised if values are used that are far away from zero. This is because binary trajectory files and in particular the strictly formatted PDBfiles have finite representation widths and fixed units (Å or nm) such that output may be severely compromised (for PDB files, format adjustments to nonstandard formats are available, see PDB_OUTPUTSTRING and PDB_INPUTSTRING). It is therefore recommended to adjust this keyword such that the minimum and maximum values for Cartesian coordinates (largest dimension) are symmetric around the origin of the coordinate system.
 If structural input it used, it is strongly recommended to match the settings for ORIGIN to that implied in whatever structural input is provided. In droplet BC, it may otherwise occur that parts of the system overlap with the illplaced droplet boundary and that their internal arrangement is destroyed or that the simulation explodes during the first few steps of simulation.
SIZE
This keyword allows the user to define the size of the simulation container. Depending on its shape, SIZE takes on alternative meanings. If the system volume is a rectangular cuboid (SHAPE is 1), a vector of three floatingpoint numbers is read in that specifies the three side lengths of the cuboid in the x, y, and zdirections, respectively. If the system volume is spherical (SHAPE is 2), just one real number is needed that specifies the sphere's radius. Finally, if the system volume is cylindrical (SHAPE is 3), two floating point values are read and assumed to be the radius and height of the cylinder, respectively. Note that highly asymmetric boxes and very short, partially periodic cylinders can place very stringent settings on cutoffs since it is generally the shortest dimension that matters.SOFTWALL
This keyword sets the harmonic force constant both for the residuebased and the atombased SWBCs (see BOUNDARY). It is to be provided in units of kcal·mol^{1}Å^{2} and corresponds to parameter k_{BND} in the equations above. It is currently not possible to disable the evaluation of this potential by setting SOFTWALL to zero. Both very small and very big values can be detrimental to a simulation by producing an illdefined volume (small values) and creating large forces (big values), respectively.Integrator Controls (MD/BD/LD/Minimization):
(back to top)
TIMESTEP
If any dynamicsbased (including hybrid methods of course) method is used, this keyword lets the user set the integration time step for the integrator in units of ps.CARTINT
This keyword determines  at a very fundamental level  the choice of degrees of freedom that CAMPARI shall sample. The "native" CAMPARI degrees of freedom are the rigidbody coordinates of all molecules and a subset of internal coordinates (almost exclusively freely rotatable dihedral angles). This option is the default and specified by choosing 1 for this keyword. Alternatively, the Cartesian positions of all atoms in the system may serve as the underlying degrees of freedom as is commonly the case in molecular dynamics calculations (option 2). There are several very important limitations and considerations that are mentioned throughout the documentation and reiterated here. CAMPARI does not support the direct sampling of Cartesian degrees of freedom in Monte Carlo simulations. This applies to the MC portion of hybrid simulations as well. While it is trivial to design and implement simple move sets doing precisely that, their efficiency is negligible due to the large amount of motional correlation present between an atom and its immediate molecular environment.
 Internal space simulations do not require the full amount of bonded interaction parameters that are typically part of molecular mechanics force fields, specifically no bond length terms, and typically no or very few improper dihedral and bond angle terms (→ PARAMETERS).
 For freely rotatable dihedral angles, there is a distinction between those deemed important vs. those deemed unimportant. Details are listed in the documentation for providing sequence input. These choices generally pertain to methyl groups and/or to bonds describing electronically hindered rotations with identical groups. The resultant sets of degrees of freedom are not always entirely consistent (.e.g., between polypeptide sidechains and their respective small molecule model compounds). Related keywords are OTHERFREQ (MC) and TMD_UNKMODE (dynamics).
 While unsupported residues pose no problems in the setup of Cartesian coordinates, internal coordinate space simulations need to infer which dihedral angles are rotatable from the input topology. This happens automatically and is described elsewhere. For eligible dihedral angles not identified with standard polypeptide or polynucleotide backbone angles, relevant keywords are again OTHERFREQ (MC) and TMD_UNKMODE (dynamics).
 The choice of degrees of freedom in internal coordinate space simulations can be customized rather flexibly by introducing additional constraints (see corresponding input file). For MC simulations, the preferential sampling utility offers an additional level of control.
 Conversely, algorithms to enforce holonomic constraints in Cartesian space simulations are often limited to weakly coupled constraints (see SHAKEMETHOD for details). This means that it is not (yet) possible to mimic torsional space constraints in a Cartesian space run but that it is possible to follow a typical MD protocol by simulating a flexible macromolecule with some bond length constraints in a bath of rigid water molecules.
 The existence of virtual sites (effectively atoms with no mass) poses stringent requirements to Cartesian dynamics, in that those sites have to be constrained exactly relative to real atoms. At each integration time step, the forces acting on these sites are transferred to the surrounding atoms, and their positions are rebuilt post facto (see elsewhere for more details). Virtual sites in internal coordinate space simulations can only cause issues if a degree of freedom's effective mass depends solely on such sites. Then, CAMPARI will automatically freeze the corresponding degree of freedom.
TSTAT
This keyword lets the user choose the thermostat to be used to generate an NVT (or NVTlike) ensemble in dynamics simulations using a Newtonian formalism (option 2 or 5 in DYNAMICS). Currently, three options are fully supported: Berendsen weakcoupling scheme (reference):
This is a deterministic and global velocity rescaling scheme which creates an exponential relaxation toward the target temperature. The velocity rescaling factor is computed for each coupling group (see TSTAT_FILE) according to:
f_{v,i}^{2} = 1.0 + (δ_{t}/τ_{T})·[ (T_{target}/T_{i})  1.0 ]
As is apparent, whenever the instantaneous group temperature (T_{i}) matches the ensemble target (T_{target}), velocities are not rescaled (f_{v,i} is unity). Any deviations from T_{target} will lead to a systematic rescaling for all velocities that are part of the coupling group toward the target with a relative decay rate of τ_{T} (→ TSTAT_TAU). If τ_{T} approaches the discrete time step (δ_{t}), the relaxation becomes instantaneous. Note that the coupling of subparts of the system to essentially different thermostats is a largely obsolete method used in early days of simulations to prevent obscure freezing events sometimes encountered when the system is effectively partitioned into subsystems with very different levels of integrator stability, noise, and inherent relaxation. Then such an approach may circumvent the most dramatic pitfalls resulting from the inherent incorrectness of the weakcoupling scheme (and masking said incorrectness in the process). It is crucially important to realize that the Berendsen thermostat does not generate a welldefined ensemble and that the method only relaxes "safely" to the microcanonical one for τ_{T} approaching infinity. The quenched fluctuations observed in the Berendsen scheme may severely distort results on fluctuationsensitive computations such as free energy growth calculations (see GHOST). Since the Berendsen scheme is a global coupling scheme, it is compatible with holonomic constraints but prone to equipartition artifacts (see option 4 below) like the freezing of subparts of the system mentioned above. Global thermostats are generally good at preserving velocity crosscorrelations, which is an important property for tightly coupled systems, e.g. dense, polar liquids. They are also good at absorbing integrator (discretization) error.  Andersen scheme (reference):
The Andersen thermostat is a stochastic thermostat which introduces "collisions" rerandomizing the velocity associated with a given degree of freedom to one coming from the ensemble at that given temperature. This method is shown to sample the canonical ensemble and one of the recommended options for any calculation sensitive to the details of ensemble fluctuations. Implementationwise, it works by reassigning the velocity for each degree of freedom at each time step with a probability equivalent to δ_{t}/τ_{T}. This effectively gives rise to a "bath"induced relaxation over a timescale τ_{T}. Note that implementations in other software packages may synchronize the application of these velocity resets. This is not the case in CAMPARI where each degree of freedom is treated independently. In Cartesian space, this means that each dimension for every atom is coupled individually, i.e., the velocity resets are uncoupled. As a consequence, the Andersen thermostat as implemented currently is incompatible with holonomic constraints (while the constraints are maintained, their imposition systematically bleeds kinetic energy from the resets leading to an artificial cooling). Without constraints and much like in Langevin dynamics, a remaining concern here can be the artificial loss of velocity correlations between multiple particles. Unlike in Langevin dynamics, however, the stochastic process is uncoupled and instantaneous, which has an additional downside. If there are noticeable discretization errors (there usually is), one cannot rely on the Andersen thermostat to absorb them in the same way that a globally coupled thermostat does. This is (most likely) because errors accumulating locally must also be dissipated locally where in a global scheme they are dissipated globally.  Extended ensemble methods:
Methods such as those by NoséHoover, MartynaTobiasKlein, or Stern are currently not supported, but may be in the future. They often show poor relaxation behavior due to coupled oscillations in particular in the NPT ensemble, which they are most useful for.  Bussi et al. scheme (reference):
This thermostat can be thought of as a hybrid approach of the NoséHoover and Berendsen thermostats. It preserves the exponential relaxation kinetics of the weakcoupling scheme if the ensemble target is far away but introduces fluctuations to the kinetic energy such that at equilibrium the global rescaling does not quench fluctuations. This thermostat is currently the one offering the most general support for different classes of system and different types of degrees of freedom and thus the recommended option in most applications. The implementation is that of evolving the kinetic energy via an auxiliary stochastic dynamics much like the Langevin piston for pressure coupling does. Here:
f_{v,i}^{2} = e^{δt/τT} + f_{T,i}(1  e^{δt/τT}) (R_{1}^{2} + R_{Γ,Nf,i1}) + 2e^{0.5δt/τT}·R_{1}[ f_{T,i}(1  e^{δt/τT}) ]^{0.5}
With:
f_{T,i} = N_{f,i}^{1}·(T_{target}/T_{i})
Here, N_{f} is the number of degrees of freedom in the respective coupling group, R_{1} is a normal random number with mean of zero and unity variance, and R_{Γ,Nf,i1} is a random number drawn from the gamma distribution with outside scale factor of 2.0 and shape of (N_{f,i}1)/2. Like any thermostat acting globally, there is a higher risk for equipartition artifacts than for thermostats/methods that couple degrees of freedom individually. Equipartition artifacts are generally more likely to occur in weakly coupled and inhomogeneous systems, e.g., a peptide in a bath of ions in an implicit solvent model.
TSTAT_TAU
If the simulations is performed in the NVT ensemble and if Newtonian dynamics are used, this keyword allows the user to set the key parameter of the employed thermostat, i.e. its coupling (decay) time, τ_{T}, in units of ps (the default is 1.0ps). Note that it is really the ratio of the time step δ_{t} (see TIMESTEP) and this number that matters, hence TSTAT_TAU cannot be less than the integration time step.TSTAT_FILE
If the simulations is performed in the NVT ensemble and if Newtonian dynamics are used, this keyword sets name and location of an optional input file for defining thermostat coupling groups. These are meaningful only if the Berendsen weakcoupling or the Bussi et al. scheme is used (options 1 or 4 for TSTAT). For details, the user is referred to the description of the input file itself.SYSFRZ
This keyword controls the removal of net drift artifacts in dynamics runs (which are primarily relevant for fully ballistic MD). Predominantly in periodic boundary conditions (see BOUNDARY), it can happen that all kinetic energy is transferred into global translations or rotations of the system. This collective "degree of freedom" is typically frictionfree and therefore represents a stable trap for the system's kinetic energy to accumulate in. Such behavior will give rise to grossly misleading results (the effective ensemble sampled has a much lower temperature). This can be avoided by periodically removing such global motions. For translational displacements, this is easy, but for rotational motion problems arise if subensembles have access to modes that are quasi frictionfree themselves. This is often the case in mixed rigidbody/torsional dynamics and at the moment not dealt with properly.Choices are as follows:
 No removal of global motions is performed (the safest setting for most applications).
 CAMPARI will attempt to only remove translational motion of the system.
 CAMPARI will attempt to remove both global translation and global rotation (this option should be used with caution and is also automatically disabled if certain types of constraints are in use).
TMD_INTEGRATOR
If a simulation is performed in mixed torsional/rigidbody space that contains a Newtonian dynamics portion, then this keyword allows the user to choose between (currently) two basic integrator variants. All integrators are derived from the following discrete scheme that relies on the aforementioned assumptions, i.e., a diagonal mass matrix (equations of motion formally decoupled) and the accuracy/correctness of the total kinetic energy expressed in terms of this diagonal mass matrix. Then, we can define pseudosymplectic conditions as shown below for a (rotational) degree of freedom with index k:I_{k}(t_{2})ω_{k}(t_{2})^{2}  I_{k}(t_{1})ω_{k}(t_{1})^{2}  δt [ω_{k}(t_{1}) + ω_{k}(t_{2}) ] F_{k}(t_{1.5}) = 0
Here, δt is the integration time step, I_{k} denotes the diagonal element of the mass matrix for the kth degree of freedom (function of time), ω_{k} is the associated angular velocity, and F_{k} denotes the deterministic force projected onto this degree of freedom (torque). The projection yielding the torques and the mass matrix elements are computed with recursive schemes, i.e., they operate in linear time with the number of atoms in the molecule (more or less irrespective of how many rotatable bonds there are). More information on this recursive scheme can be obtained indirectly with the help of keyword TMDREPORT. The above scheme defines a quadratic equation that has a maximum of two solutions for ω_{k}(t_{2}) (formula omitted). The correct one must be picked (which may be difficult), and an alternative must be defined if no solutions are possible. For both purposes, we use a welldefined approximation to the full solution that yields:
ω_{k}(t_{2}) ≈ [I_{k}(t_{1})/I_{k}(t_{2})]^{1/2} ω_{k}(t_{1}) + δt F_{k}(t_{1.5})/I_{k}(t_{2})
This solution is always available and can be used to pick the correct solution amongst two alternatives for the full quadratic equation (simply as the closer one). The setting for TMD_INTEGRATOR determines whether the correct solution to the quadratic equation should be used whenever possible (option 1), or whether the approximation is used exclusively (option 2, which is the default for historical reasons). As written, the equations still contain the problem that they require knowledge of I_{k}(t_{2}), whereas only the halfstep mass matrix elements (which are structural quantities) are available in a typical leapfrog scheme. If the I_{k} are slowly varying functions in time, a simple approximation solving this problem is to allow a lag of half a time step:
ω_{k}(t_{2}) ≈ [I_{k}(t_{0.5})/I_{k}(t_{1.5})]^{1/2} ω_{k}(t_{1}) + δt F_{k}(t_{1.5})/I_{k}(t_{1.5})
This is again written for the approximative version (setting 2). The resultant leapfrog integrator is extremely simple and efficient, and it is obtained by setting the related keyword TMD_INT2UP to 0. However, at each integration time step, we can also take a halfstep guess using a similar approximation to obtain a value for all the I_{k}(t_{2}). This done by explicitly perturbing the coordinates and recomputing just the mass matrix elements (little additional cost for all but tiny or trivial systems). With the values obtained, we can integrate the second equation above as written (this is obtained by setting TMD_INT2UP to 1). While theoretically more accurate, this variant can be noisy due to the extrapolation of the masses. In practice, for systems with very small and quickly varying I_{k} (such as rigid water molecules), performance is similar for all four pairings (TMD_INTEGRATOR 1 or 2, TMD_INT2UP 0 or 1), and reveals that additional corrections are recommended if the rate of change of the I_{k} is high (see below). Conversely, if the rate of change is negligible, all possible settings obtainable by combinations of the two keywords mentioned here relax to the exact same integrator (standard leapfrog in rotational space). This covers the special case of linear (translational) degrees of freedom, which have constant mass.
Note that this keyword is currently irrelevant for stochastic dynamics (always uses a derivation analogous to the last equation above), but that it is relevant for the stochastic minimizer. Another crucial keyword relevant to TMD integrators in general is ALIGN. Specifically for the Newtonian case, coupling parameters become relevant as well (TSTAT and TSTAT_TAU in particular).
TMD_INT2UP
If a simulation is performed in mixed torsional/rigidbody space that contains a Newtonian dynamics portion, then this keywords allows the user to control the number of incremental velocity update steps used to improve integrator stability for cases with quickly varying elements of the mass matrix (see above). The cases of 0 and 1 have already been covered in the documentation on TMD_INTEGRATOR. The remaining options assume that values for the diagonal elements of the mass matrix at times t_{1}, t_{1.5}, and t_{2} are available explicitly (as in: computed directly from coordinates) when trying to compute the updated angular velocity for a degree of freedom at time t_{2}. Rather than solving the velocity update step in one step, the interval from t_{1} to t_{2} is instead divided into TMD_INT2UP subintervals, and the velocity is updated incrementally for each subinterval. If TMD_INT2UP is larger than 2, additional values are obtained by linearly interpolating between explicit values at the three times. This is why it is recommended to set this keyword to multiples of 2, and this is also why the added benefit becomes successively smaller. A recommended value is 4. Note that this only matters for velocity updates, and that the torque is assumed constant over the entire interval (F_{k}(t_{1.5}) above). As a result, this option does not notably alter speed for a system of appreciable size and is not at all equivalent to a change in integration time step.TMD_UNKMODE
If a simulation is performed in mixed torsional/rigidbody space with a gradientbased sampler (including minimization), then this keyword controls default constraints operating on certain rotatable dihedral angles. A second function of this keyword occurs in structural clustering using a distance function based on dihedral angles (see end of description). As described for sequence input, there is a selection of "native" CAMPARI torsional degrees of freedom that does not include every rotatable dihedral angle in natively supported residues, and for obvious reasons does not include any degrees of freedom within unsupported residues. This keyword therefore controls how to deal with these two categories of additional degrees of freedom. Options are as follows: Only native CAMPARI degrees of freedom are sampled. This will leave any unsupported residues and molecules completely rigid.
 In addition to native CAMPARI degrees of freedom, all identified degrees of freedom in unsupported residues and molecules will be sampled.
 In addition to native CAMPARI degrees of freedom, all torsional degrees of freedom in natively supported residues, which are frozen by default, are sampled. This will leave any unsupported residues and molecules completely rigid.
 All aforementioned classes of degrees of freedom are sampled.
In the context of structural clustering, this keyword cocontrols which dihedral angles are eligible as dimensions for a distance function. This applies to cases where a custom request is made through the appropriate input file and to cases where the full dimensionality is ment to be used. Further information is provided elsewhere.
TMDREPORT
This simple logical keyword enables the printing out of information regarding internal degrees of freedom (rigid body, torsional). This file is particularly useful for constructing input for a specific input mode for custom constraints. For every molecule it lists the index of the first atom in that molecule ("Ref."), the total number of atoms ("Atoms"), the total mass ("Mass") after applying all patches, and, if a gradientbased sampler in torsional space is in use, information on whether the (up to) 6 rigid body degrees of freedom are frozen or not ("Frozen"). The order is translation in x, y, and z followed by rotation around x, y, and z axes.The output on the dihedral angles in the molecule provides the following information. For each atom that, in the Z matrix, corresponds to the definition of a relevant (rotatable) dihedral angle, the structure of the rotation list setup is provided. Specifically, the number of rotating (swiveling) atoms ("Rotat.") is printed out along with their total mass ("Mass"). It is specified how many of the rotating atoms are unique ("Unique") with respect to that atom's rotation list a level above in the hierarchy that contains the rotation list for the present atom entirely ("Parent"). The hierarchy is understood by considering the polymer as a branched chain with a number of tips and a base of motion. This base of motion is defined by keyword ALIGN. Degrees of freedom at the tip have minimal rank (starting at 0) whereas those near the base have maximal rank in the hierarchy ("Rank"). The hierarchy necessitates a particular sequence of processing the individual degrees of freedom ("Order"). The report also provides information on the chemical elements ("Ele.") of the 4 constituent atoms for the dihedral angle (the atom defining the dihedral angle comes last) and on whether the degree of freedom is frozen in torsional space molecular dynamics ("Frozen"). The last bit of information is available only if a gradientbased sampler in torsional space is in use. The report is available irrespective of the type of calculation being performed. Note that keyword ALIGN, while conceptually controlling the same thing, is implemented differently in Monte Carlo moves. This means that most of the columns are representative for the Monte Carlo part only if ALIGN is set to 1. In hybrid samplers, the dynamics portion takes precedence.
FRZFILE
This keyword specifies name and location (full or relative path) of the input file for the selection of molecules or residues for which selected degrees of freedom are to be excluded from sampling by explicit removal from Monte Carlo sampling lists and/or by not integrating equations of motion for them. This means that only such degrees of freedom can be constrained that are in fact explicit degrees of freedom of the sampling scheme in use (see DYNAMICS and CARTINT). If this keyword is not present, no constraints are going to be used beyond the systemimposed ones, which may be samplerdependent. Note that restricting the Monte Carlo move set defines effective constraints not covered here. In Cartesian space, explicit constraints to the x, y, and z coordinates of selected atoms are possible. However, indirect geometric constraints are also supported (differently and independently via SHAKESET).The input for explicit constraints is described in detail elsewhere. Hard constraints may be necessary for specialized applications, for example when one attempts to just reequilibrate the sidechains in a folded protein while leaving the fold intact. In general, it will be possible to use restraints (see for example SC_TOR or SC_DREST) as alternatives. Those allow the selected degrees of freedom to respond and fluctuate around a stable equilibrium position.
Note that constraint requests are not entirely arbitrary, and that the level of control being offered depends on the sampling engine. It is is not possible  for instance  to constrain just one out of several χangles in a protein sidechain in Monte Carlo simulations. In general, custom constraints in combination with a hybrid sampling approach may prove challenging when trying to match the sampled sets of degrees of freedom between Monte Carlo and dynamics segments. Furthermore, introducing constraints may prohibit certain MC samplers from being applied not just to the residues carrying the constraints but surrounding ones as well (such as concerted rotation methods → CRFREQ) due to underlying and conflicting assumptions. Lastly, CAMPARI will exit with an error if userselected constraints deplete the sampling list for a given MC move type entirely. Here, it is requested of the user to explicitly adjust the move set, since otherwise these moves would have to be converted to another type that is not necessarily desirable (note that this still happens if moves are requested that the system simply does not support).
FRZREPORT
If explicit coordinate constraints are used (→ FRZFILE), this keyword acts as a simple logical whether or not to write out a summary of the constraints in the system to logoutput.SKIPFRZ
If constraints are used (→ FRZFILE) in torsional space simulations, this keyword gives the user some control over the calculation of effectively frozen interactions due to constraints. In Monte Carlo simulations (see DYNAMICS), incremental energies are computed by only considering the parts of the system that move relative to one another. This automatically addresses constraints. Conversely, in dynamics the total system energy and forces are calculated at each step. If this keyword is set, interactions between parts that have no chance of moving relative to one another (relative orientation completely constrained) will no longer be considered. Note that the potentials rigorously have to be (at most) pairwise decomposable for this option to be available (e.g., the polar term in the ABSINTH implicit solvation model is not strictly pairwise decomposable; → SC_IMPSOLV and SCRMODEL). Usage of this keyword can significantly accelerate dynamics runs or minimization runs in heavily constrained systems (such as ligand optimizations within a rigid protein binding site). Note that any reported energies do not contain the frozen contributions either if this option is chosen.SHAKESET
It is standard practice in molecular dynamics simulations in Cartesian space to employ holonomic constraints such that the system evolves according to Gauss's principle of least constraint. The reader is referred to the literature as to what exactly constitutes a timereversible, symplectic integrator if holonomic constraints are enforced. In general, it will possible to formulate an algorithm which at least is driftfree, has some target precision for the constraints, and is approximately symplectic when the microcanonical ensemble is in use.The idea behind holonomic constraints in molecular dynamics is to eliminate fast vibrational modes in the system to allow for a larger integration time step to be used. This keyword allows the user different choices for which holonomic constraints to employ as follows:
 No holonomic constraints are used.
 All "native" bonds to terminal atoms with a mass of less than 3.5 a.m.u. are constrained in length. A terminal atom is defined as any atom bound to exactly one other atom. "Native" means that only bonds consistent with the assumed molecular topology (codeinternal) are considered. This selection will usually constrain all bonds to hydrogen atoms.
 All "native" bonds of any type are constrained in length. This does include bonds formed by virtue of chemical crosslinks.
 All "native" bonds of any type are constrained in length as in mode 3. In addition, several bond angles are constrained explicitly. For a molecule free of rings of size 6 or less all bond angles are constrained (this also constrains improper dihedral angles at trigonal centers). For molecules with rings of size 6 or less, ringinternal bond angles are generally omitted. Note that more bond angles can be formulated at a tetrahedral site than constraints are needed, and that  systemdependent  redundant constraints may be created (which may be harmful). This option is only supported for the standard SHAKE constraint algorithm at the moment.
 This is nearly identical to option 4. However, bond angles are constrained by additional distance constraints rather than explicitly. This means this option is theoretically available for constraint algorithms other than SHAKE.
 An input file is read and used to derive the list of constraints. Note that it is possible to derive intra and intermolecular longdistance constraints that way (geometric information will be taken from starting structure), but that those will very easily cause CAMPARI to crash.
The cost, accuracy, and applicability of constraint algorithms all scale poorly with the level of coupling. Options 4 and 5 from the list above will therefore be usable only in special cases (→ SHAKEMETHOD) such as systems without any rings or planar, trigonal centers. For specific applications using angle constraints, we strongly recommend defining a minimum set of distancebased constraints via option 6 above. This has the best chance to succeed.
SHAKEFILE
If SHAKESET is set to 6, this keyword specifies the name and location of the file defining userselected holonomic constraints to be enforced during the simulation. Its format and requirements are documented elsewhere.SETTLEH2O
This keyword allows the user to append/modify the constraint set selected via SHAKESET to replace all preexisting constraints acting on three, four, or fivesite water molecules (SPC, TIP3P, TIP4P, TIP4PEw, or TIP5P) with constraints that completely rigidify each water molecule. It acts as a simple logical and is turned on by default since CAMPARI as of now does not support explicitly any inherently flexible water models. This means that a setting of 2 or 3 SHAKESET for a calculation in explicit water will still constrain waters to be rigid, and therefore correspond to a standard (and  for the supported water models  correct) simulation setup. Specifying this keyword and setting it to anything else but 1 will disable this override. Note that for water models possessing virtual sites (all four and fivesite models), it is assumed that the extra sites have no mass (see below). If this is not the case, the use of the analytical SETTLE algorithm for water is no longer possible, and the more complex set of constraints may no longer be solved efficiently (or may no longer be solved at all).SHAKEFROM
This keyword allows the user to control how the actual values for the set of holonomic constraints are determined at the beginning of simulations in Cartesian space. There are currently the following options: Irrespective of structural input, all distance constraints of atoms bound covalently to one another are taken directly from the (hardcoded) CAMPARI default geometries that use, wherever possible, databases of highresolution crystallographic structures of biomolecules (see for example the reference by Engh and Huber). For indirect angle constraints, i.e., constrained distances of atoms separated by two covalent bonds, the bond angle and two bond lengths in question are used to compute the effective length in similar fashion. For explicit angle constraints the reference value can be used directly. Lastly, simulations of unsupported residues require the structural input to be used directly (see comments on option 3 below for implied caveats). This option is currently the default.
 Irrespective of structural input, CAMPARI will try to reconstruct the required constraint lengths from the minimum positions of bonded potentials (see SC_BONDED_B and SC_BONDED_A) that are provided by the force field in use. As for option 1, this extends to indirect and explicit angle constraints. If terms are missing in the force field, covalent distances are taken from the default CAMPARI geometry as for option 1. This option is the recommended one for "standard" molecular dynamics calculations. Note that patches to bonded parameters are recognized and respected in this context.
 CAMPARI takes all reference values for constrained degrees of freedom directly form the structure the simulation is started in. This requires no adjustments but comes with caveats. Since input in pdb format is of limited precision, the various bond lengths and angles can only be extracted to the same precision. This means that constraints that are chemically identical will be set to slightly different values (e.g., the CH bonds in a methyl group), which can cause small artifacts. For bond lengths involving hydrogen, rebuilding is an alternative to circumvent this problem (see PDB_HMODE). A second problem arises due to the lack of reproducibility caused by the limited precision. Specifically, simulations started from two different and minimized conformations of the system will end up using different values for the constraints. A more extreme version of this problem is encountered when starting simulations from snapshots of other simulations, in which the constrained degrees of freedom had been left free to move (this exceeds the mere precision effect). Due to these caveats, it is not recommended to use this option. Note that CAMPARI will have to use values defined by structural input for cases where no other information is available.
When restarting simulations, this keyword should generally be left unchanged. In case of option 3 being in use, it is recommended to either never supply an input pdb file or to always use the same one supplied as a template. For a simulation started normally, options 1 and 2 above entail the possibility that constrained degrees of freedom are adjusted before the simulation begins. This adjustment is reflected in the reference structure files written at the beginning of each run ({basename}_START.pdb and {basename}_START.int).
SHAKEMETHOD
This keyword allows the user to choose which of the currently implemented algorithms CAMPARI should use to enforce the chosen set of holonomic constraints during a molecular dynamics simulation in Cartesian space. Options are as follows: The standard, iterative SHAKE procedure is used. Coupled constraints are solved iteratively by assuming independence and linearity (Newton's method). SHAKE may converge in very few steps to good accuracy if the coupling is weak (coupling matrix is sparse). This is the only method that currently supports explicit constraints on bond angles (see SHAKESET and SHAKEATOL). It is also the only method that allows parallelization of an individual constraint group across multiple threads if the shared memory (OpenMP) parallelization of CAMPARI is in use, and if the constraint group is otherwise expected to become a bottleneck. Due to the use of Newton's method, SHAKE is not guaranteed to converge if the underlying "landscape" is nonlinear due to the introduction of coupling between constraints. Then convergence is only guaranteed within a small enough environment around the actual solution. Therefore, SHAKE places an upper limit on the time step that can be used even though it is meant to allow increases of precisely that time step. Nonetheless, in canonical applications (bond length constraints only), SHAKE will be a reasonably efficient solution, i.e., the desired tolerance can usually be reached within few steps. The main weakness of SHAKE and related algorithms is their inherent inability to enforce planarity at a given site. This is because at a planar site all bond vectors which form the basis set for the application of iterative corrections are part of the same plane, i.e., it is impossible to correct an outofplane motion using those vectors. Depending on the exact set of constraints used, SHAKE will require many steps, not converge at all or converge with limited accuracy, and occasionally crash if bond length and angle constraints at a site deem it to be perfectly planar.
 A mix of the SHAKE and PSHAKE (see below) algorithms is used in which PSHAKE is applied only to those constraint groups which are internally entirely rigid. Because PSHAKE, at least as implemented, fails as a general purpose constraint algorithm, this option is practically obsolete. It can be enabled for testing purposes with the help of keyword UNSAFE.
 The socalled PSHAKE (preconditioned SHAKE) procedure is used. In PSHAKE, SHAKE is augmented by introducing a preconditioning step which changes the convergence rate from linear to quadratic. The preconditioning step is a matrix multiplication essentially forming linear combinations from the bond vectors in the constraint vectors. Corrections employed along those new directions minimize the linear error by decoupling the constraints (within the bounds of a linear theory → hence the quadratic and not instantaneous convergence). Unfortunately, this method currently is implemented either inefficiently or incorrectly and does not usually offer a discernible improvement. However, it is also fundamentally flawed as large constraint groups are handled inefficiently due to the requirement of a full matrix multiplication that is needed to increment the coordinates at each iteration step. This operation in PSHAKE has a cost of 3·n_{p}·n_{c} and of only 6·n_{c} in standard SHAKE. In addition, the matrix used to precondition the procedure, has to be recalculated frequently if a molecule undergoes significant conformational changes (currently hardcoded to 100 integration steps). PSHAKE is therefore suitable only for enforcing holonomic constraints in small rigid or quasirigid molecules that can be solved by SHAKE as well. Just like SHAKE, it fails badly for planar sites (see above). As a consequence of the above, PSHAKE can be enabled for testing purposes only with the help of keyword UNSAFE (otherwise CAMPARI terminates). When using PSHAKE, CAMPARI may crash without any indicative messages due to failures in the LAPACK routines used by the algorithm (see installation).
 The LINCS method is used. LINCS is a linear constraint solver that uses a projection approach. In the end, a matrix equation needs to be solved which requires the inversion of a matrix related to the coupling matrix of the constraints in the group. This is the critical step and grossly ineffective as a general procedure. For sparse matrices, however, the inversion can be performed approximately by a series expansion. It is the order of this expansion and its applicability that will determine the success and accuracy of LINCS. LINCS is generally inapplicable to anything involving bond angle constraints, in particular in allatom representation. It will work well for loosely coupled groups of constraints. Since the accuracy depends on the unknown convergence properties of an infinite sum, the accuracy of LINCS cannot be tuned directly to yield a specific tolerance for satisfying the constraints. Instead, a combination of theexpansion order and a number of corrective iterations controls the achieved discrepancy. To make LINCS more comparable to SHAKE in results, CAMPARI will dynamically adjust the former to achieve the target tolerance. Note that LINCS currently cannot split the workload for an individual constraint group across multiple threads if the shared memory (OpenMP) parallelization of CAMPARI is in use. This can be a performance limitation.
There is an additional issue that arises when virtual sites (technically atoms with no mass) are used, for example in rigid water models like TIP4P. Such sites have to be circumvented by the integration scheme (displacement is dependent on inverse mass), and therefore they have to be exactly constrained with respect to the positions of atoms with finite mass. These constraints cannot be solved within the standard framework (also dependent on inverse mass). Instead, the least constraint solution is obtained by simply rebuilding the positions of these sites with fixed internal geometry. For this to yield a correct integrator, however, the forces acting on the sites need to be remapped to the atoms they are connected to. This is done by decomposing the Cartesian force acting on the site into internal forces, for which compensating terms are added to all the atoms comprising the respective internal degree of freedom. This cancels exactly the net force on the site, and makes integration symplectic. Virtual sites cannot occur in constraint groups that are handled by a method other than standard SHAKE or SETTLE.
Note that if the shared memory (OpenMP) parallelization of CAMPARI is in use, there are most two levels of parallelization, across constraint groups and within a given constraint group. The second level is only available with standard SHAKE at the moment. The first level is balanced dynamically (see THREADS_DLB_FREQ and related keywords for general information on load balancing in CAMPARI).
SHAKETOL
If SHAKE or PSHAKE are in use (→ SHAKEMETHOD), this keyword allows the user to set the target tolerance for satisfying distance constraints. The tolerance is relative to the target value of the constraint. As soon as the maximum deviation is less than this value, the iteration stops unless it is terminated earlier for other reasons (→ SHAKEMAXITER).If LINCS is in use, this keyword still has meaning even though the tolerance cannot be set explicitly. Should CAMPARI find that LINCS with the given settings satisfies the constraints significantly worse than defined by this keyword, it will adjust one of the open parameters of the method (→ LINCSORDER) in an attempt to remedy this situation. Similarly, should the opposite occur (LINCS satisfies constraints significantly more accurately than the desired tolerance), the parameter will be adjusted in the opposite direction. All this happens within sane bounds (216).
SHAKEATOL
If SHAKE (→ SHAKEMETHOD) is in use with explicit bond angle constraints (→ SHAKESET), this keyword allows the user to set the target tolerance for satisfying angular constraints. The tolerance is absolute and applies to the unitless cosine of the respective angle. As soon as both maximum deviations drop below the threshold tolerances (see also SHAKETOL) the iteration stops unless it is terminated earlier for other reasons (→ SHAKEMAXITER).SHAKEMAXITER
If SHAKE or PSHAKE are in use (→ SHAKEMETHOD), this keyword allows the user to alter the maximum number of iterations permissible to the algorithm. Since poor convergence properties are generally indicative of a more fundamental problem, increasing the value for SHAKEMAXITER will rarely be useful. After exceeding this many steps, the algorithm will simply continue with its current solution meaning that  for a good case  constraints will be violated slightly more than specified by SHAKETOL and eventually SHAKEATOL. Note that CAMPARI will then adjust the constraint targets in an attempt to rescue a simulation otherwise doomed. This may not always work and also lead to unwanted drift. Appropriate warnings are provided.LINCSORDER
If LINCS is in use (→ SHAKEMETHOD), this keyword allows the user to define the initial expansion order for the approximate matrix inversion technique. As mentioned above, the convergence properties of this approximation are not really known and prevent LINCS from satisfying an exact tolerance explicitly. In particular, for a fixed number of corrective iterations, convergence does not improve strongly for comparatively large changes in expansion order. Thus, CAMPARI adjusts the expansion order dynamically if it find that constraints are satisfied significantly better or worse than the desired tolerance provided through SHAKETOL. The allowed range is from 2 to 16, and relative tolerances below 10^{4} will generally require a setting of 2 or larger for keyword LINCSITER. Very small tolerances are feasible and meaningful in CAMPARI since the representation is entirely in 64bit floating point precision. Warnings are produced if the tolerance is missed, and the average expansion order across all LINCS constraint groups is reported at the end. If this value is large (in particular, if it is close to 16), it is strongly recommended to increase LINCSITER.LINCSITER
If LINCS is in use (→ SHAKEMETHOD), this keyword allows the user to define the number of iterations for correcting rotational lengthening. Typically, LINCS assumes only one such correction, but the matrix expansion cannot become arbitrarily precise with a single iteration. Thus, this keyword, which defaults to 2 in CAMPARI, can be used to alter the minimum tolerance achievable by LINCS. Since CAMPARI operates entirely in 64bit floating point precision, meaningful tolerances can be chosen that are comfortably solvable by SHAKE in few iterations yet for which LINCSITER being 1 is insufficient.CAMPARI will not vary the number of iterations throughout the run, but it will vary the expansion order per constraint group to achieve the desired tolerance. It thus may be computationally less efficient to set LINCSORDER to 1 (depending on tolerance, system, integration time step, etc.) as it requires a larger expansion order.
MINI_MODE
If a minimization run is performed, this keyword lets the user select the method of choice. CAMPARI currently supports three canonical and one nonstandard minimizer. All minimizers can operate either in mixed rigidbody/torsional space, i.e., the "native" CAMPARI degrees of freedom or in Cartesian space; → CARTINT. However, there are algorithmic restrictions that the canonical minimizers (options 13 below) only support trivial constraints (see FMCSC_FRZFILE), which is an issue in Cartesian space (rigid water models, etc).Let us define γ as a vector of base increment sizes suitable for each of the degrees of freedom (partitioned into three classes: rigidbody translation, rigidbody rotation, and dihedral angles; keywords MINI_XYZ_STEPSIZE, MINI_ROT_STEPSIZE, and MINI_INT_STEPSIZE are used to specify each element γ_{i}). Also, let f_{m} be an outside scaling factor in units of mol/kcal set by keyword MINI_STEPSIZE. Lastly, we introduce a unitless dynamic step length factor λ. If we now denote the heterogeneous vector of phase space coordinates as x, and the Hamiltonian is written as U(x), then we can write how the system is evolved through either one of four different protocols as follows:
 Steepestdescent:
x_{i+1} = x_{i}  λ·f_{m}γ•∇U(x_{i})
Here, "•" denotes the Hadamard (Schur) product, i.e., simply the elementbyelement multiplication. Should the new conformation have overstepped in the direction of steepest descent, λ is iteratively reduced by a constant factor until a valid step is found (lower energy). In case of successful steps, λ is iteratively increased to improve the efficiency of the procedure if the underlying landscape is relatively smooth and flat. Successful steps are used as well to construct an appropriate guess for the initial step size should a complete reset be necessary. This mimics a line search.  Conjugategradient:
x_{i+1} = x_{i}  λ·f_{m} [ γ•∇U(x_{i}) + f_{CG,i}d_{i1} ]
f_{CG,i} = [ ∇U(x_{i})·∇U(x_{i}) ] / [ ∇U(x_{i1})·∇U(x_{i1}) ]
d_{i1} = γ•∇U(x_{i1}) + f_{CG,i1}d_{i2}
This conjugategradient method follows the PolakRibiere scheme and augments the steepestdescent prediction by an additional term that is estimated according to the suggestion by Fletcher and Reeves. Much like in steepestdescent, should the new conformation have overstepped, λ is iteratively reduced by a constant factor until a valid step is found (lower energy). In case of successful steps, λ is iteratively increased analogously to what is described above.  Memoryefficient BroydenFletcherGoldfarbShanno method (LBFGS)
according to Nocedal (reference):
x_{i+1} = x_{i}  λ· [ H^{1}·(γ•∇U(x_{i})) ]
This quasiNewton approach technically employs the inverse of the Hessian which is typically unknown. However, the LBFGS method constructs a numerical estimate directly for the matrix product H^{1}·(γ•∇U(x_{i})) from the recent history of the minimization process. This widely used recursive twoloop scheme has the advantage of i) only requiring very few floating point operations, and ii) not requiring a running guess for the complete Hessian (inverse or not) due to the recursive formulation. Note that the inverse Hessian in our implementation is constructed from γ•∇U(x_{i}), i.e. has units of mol/kcal throughout, irrespective of which degree of freedom is considered. This means that the factor f_{m} does not show up in the LBFGS equation except for the first step (initially or after a reset) when the steepestdescent approximation is used (see mode 1). The usage of (estimated) second derivative information should generally help inform the minimizer of more useful directions to pursue but step size limitations and inadequate guesses of the Hessian may render this potential benefit ineffectual. The reader is referred to the literature for further details.
 Thermal noise
quasistochastic (akin to simulated, thermal annealing):
This minimizer couples the system to a variable temperature bath. By changing the coupling parameters, the degrees of freedom are successively brought to a state consistent with a very low temperature ensemble. A similar quench in conditions is used in simulated annealing, a general solution strategy for optimization problems.
Initially, the system uses a heat bath as defined by the settings for TSTAT and TEMP. The system is then evolved using NVT molecular dynamics in either mixed rigidbody/torsional space or Cartesian space. Depending on initial conditions, this may heat up the system to a variable extent, and the maximum temperature is recorded. After a prescribed fraction of the total simulation steps, the target temperature is successively lowered to the value specified by keyword MINI_SC_TBATH. This interpolation uses a Gaussian function on the normalized time axis such that all interpolation curves can be rescaled in temperature to exactly coincide. Simultaneously, the algorithm measures the rate in change of temperature from the recorded maximum toward MINI_SC_TBATH. If the actual rate appears too slow or too fast, the time constant, τ_{T}, of the thermostat in use (→ TSTAT_TAU) is successively altered so as to achieve a cooling down of the system to a negligible temperature within the remaining number of available iterations. These alterations happen within bounds of 10 times the integration time step on the low end and the original setting for TSTAT_TAU on the high end.
This minimization approach employs two convergence criteria as soon as the number of steps specified via MINI_SC_HEAT has passed. During the cooling schedule, the procedure will stop either because the RMSgradient fell below the threshold (→ MINI_GRMS) or because the target temperature (MINI_SC_TBATH) was reached which  per se  does not provide information on the local gradient. Of course, it may be possible to minimize such a structure further using a canonical approach. Both temperature and RMSgradient are written to logoutput to allow for easy inspection whether the parameters are set reasonably well. As an additional note it must be pointed out that  much like in standard molecular dynamics  runs starting from very unfavorable structures will cause large accelerations which may lead to a catastrophic blowup of the system. This behavior can be avoided by performing a number of steepest descent minimization moves upfront. This number is set by keyword MINI_SC_SDSTEPS.
In general a minimization run will terminate after either the maximum number of iterations has passed (see NRSTEPS) or after convergence is achieved (see MINI_GRMS). Note that bad combinations of the various step sizes and the convergence criterion can easily lead to nonterminating runs even if convergence is achieved de facto.
In general, minimizations are unlikely to be interesting for onthefly analysis. This is because the conformations encountered do not correspond to a meaningful ensemble: neither in terms of coverage nor in terms of relative weights. Nevertheless, all analysis routines are supported and will work assuming that a single step corresponds to a single successful perturbation doing minimization (due to overstepping, the number of energy/gradient evaluations in minimization is usually larger than the actual number of steps: keyword NRSTEPS sets the maximum for the former).
MINI_STEPSIZE
If a canonical minimization run is performed, this keyword acts as a scale factor applied to all conformational increments applied during minimization. It therefore sets the global step size and corresponds to factor f_{m} in the equations above. It  for technical reasons  has units of mol/kcal to eliminate the energy units of the normalized gradients. There are no canonical rules one can formulate but values significantly less than unity will typically be most appropriate to avoid that the algorithm frequently oversteps in a subset of the degrees of freedom and then has to iteratively reduce the step size. However, step size management is dynamic (consult factor λ introduced in the equations for minimization modes 12(3) above). This means that the impact this keyword has may be less than what one would generally expect.MINI_GRMS
If a minimization run is performed, this keyword allows the user to set the convergence criterion in units of kcal/mol. Since minimization runs can occur in torsional and rigidbody space, the "raw" gradient over all degrees of freedom is unsuitable. CAMPARI utilizes a simple workaround by normalizing all gradients by a basic step size for the respective types of degrees of freedom (see keywords MINI_XYZ_STEPSIZE, MINI_ROT_STEPSIZE, and MINI_INT_STEPSIZE). The resultant, normalized gradient is used to obtain its root mean square (→ GRMS) which is compared to the convergence criterion provided here. Since the normalized gradients assume a default step size, this parameter becomes dependent on them. For unit values for all three base step sizes, values around 10^{2} are recommended. Conversely, in Cartesian space, only MINI_XYZ_STEPSIZE is relevant for the gradient criterion.MINI_XYZ_STEPSIZE
If a minimization run is performed, this keyword determines a basic step size to be considered for all rigidbody translations of molecules and for all Cartesian displacements of atoms. This value is to be provided in units of Å. Note that this keyword determines the effective initial translation step size in conjunction with MINI_STEPSIZE and that it is mostly needed to be able to handle the different units occurring when minimizing in mixed rigidbody and torsional space. All translational gradients are normalized by this number such that numerical estimates of the Hessian (→ BFGS) or even a meaningful root mean square can be written (→ MINI_GRMS). Note that for simulations in (effective) Cartesian space, it would be possible to combine this parameter with MINI_STEPSIZE to a single step size parameter.MINI_ROT_STEPSIZE
If a minimization run in mixed rigidbody and torsional space is performed, this keyword determines a basic step size to be considered for all rigidbody rotations. This value is to be provided in units of degrees (compare MINI_XYZ_STEPSIZE).MINI_INT_STEPSIZE
If a minimization run in mixed rigidbody and torsional space is performed, this keyword determines a basic step size to be considered for all dihedral angles. This value is to be provided in units of degrees (compare MINI_XYZ_STEPSIZE).MINI_UPTOL
If a minimization run is performed, and if the BFGS method is used, this keyword lets the user choose a tolerance criterion in kcal/mol for accepting uphill steps. At most ten or MINI_MEMORY (whichever one is smaller) such steps will be tolerated until a reset of the estimate of the Hessian occurs. This reset will reorient the (multidimensional) direction back onto a steepest descent path and the procedure can start anew. This feature is included since the curvaturebased estimate of the direction in the BFGS method does not always guarantee a downhill direction (i.e., the energy resultant upon a perturbation in such a direction is larger than the current one for all steps within a finite interval (including arbitrarily small ones → this is a different problem from "overstepping" for which step size reductions are employed).MINI_MEMORY
If a minimization run is performed, and if the BFGS method is used, this keyword lets the user choose the memory length for the running estimate of the Hessian. Since the system will evolve throughout the minimization, the estimate of the Hessian is of course a moving target and it will only be useful to include points from the immediate vicinity in its numerical, gradientbased estimate. This keyword simply gives the (integer) number of immediately preceding steps to consider. Note that very long values will typically be irrelevant since the BFGS procedure will  in rough landscapes  frequently propose an illfated (uphill) direction (see MINI_UPTOL for comparison). Such moves will eventually lead to a reset of the estimate of the Hessian which includes "forgetting" all the memory. Hence, the effective usable memory length will be limited by the system as well. Note that the resets are necessary for the BFGS method to find any minima.MINI_SC_SDSTEPS
If a stochastic minimization run is performed, this keyword allows the user to request the program to first run the specified number of steps as canonical steepestdescent (SD) minimization. These SD moves will follow the same parameter settings as described above and are completely independent of the stochastic steps. Note that these steps are always skipped if the settings request the use of holonomic constraints when minimizing in Cartesian space.MINI_SC_HEAT
If a stochastic minimization run is performed, this keyword specifies the fraction of the total number of steps (NRSTEPS) that are going to be used to perform NVT dynamics at the usersupplied initial temperature and thermostat settings. Generally, for an efficient annealing protocol, it is probably advisable to combine a large value for this keyword with a high enough temperature and/or a comparatively large value for the thermostat's time constant, τ_{T}, such that NVE dynamics are mimicked over short periods of time (this will lead to heating in itself). Conversely, for straight minimization, it will be more appropriate to supply small values in conjunction with tight thermostat settings and low initial temperature.MINI_SC_TBATH
If a stochastic minimization run is performed, this keyword lets the user specify the target temperature of the bath the system will be coupled to at the very end of the run. From the simulation step defined by MINI_SC_HEAT onward, the target temperature is interpolated between TEMP and MINI_SC_HEAT using a Gaussian function operating on a normalized time axis. For the protocol to work as intended, it will not be useful to specify anything but values close to (but not exactly) zero here.Move Set Controls (MC):
(back to top)
Preamble (this is not a keyword)
A Monte Carlo simulation is a series of biased or unbiased random perturbation attempts to the system, in which some moves will be accepted (the Markov chain transitions to a new microstate) and the others rejected (the Markov chain remains in place) dependent on some criterion. This acceptance criterion is designed to sample a specific distribution, and the most common example is the Metropolis criterion designed to produce Boltzmanndistributed ensembles.The type of random perturbation attempts possible constitute the move set, and the resultant microstate transitions are usually very different from those observed in molecular dynamics (MD). In dynamics, all unconstrained degrees of freedom evolve simultaneously (high correlation), but in small increments (low effective step size). In Monte Carlo, one or few degrees of freedom evolve at a given time, but in step sizes of varying amplitudes. It is not required that individual degrees of freedom are all sampled with equal weight (nor would it be clear how to establish this). The effective sampling weight is determined by three components:
 The overall picking frequencies for move types (e.g., OTHERFREQ) are implemented by CAMPARI through a binary decision tree invoked at each step of the MC simulation. This means that the decisions taken at the root will influence the actual number of attempted moves of types chosen further up the tree, and that it may be complicated to calculate the expected numbers of attempts for those moves. This is why formulas are provided. Some totals (attempted and accepted moves) are reported in the log output at the end.
 The organizational unit for a move is often a residue, but not all residue may possess equal numbers of degrees of freedom. For instance, sidechain moves have a variable number of degrees of freedom they sample (→ NRCHI), but the actual numbers per degrees of freedom will not be uniformly distributed since different residues may have different numbers of χangles.
 Sampling weights can be adjusted explicitly with the help of the preferential sampling utility.
Because elementary Monte Carlo only change a few degrees of freedom at a time, the algorithms should be (and usually are) smart enough to only consider an incremental energy change associated with a move. The energy complexity of moves differs by type (see reference for details). The technical complexity with regards to applying the random perturbation also differs by type and sometimes behaves antagonistically to energy complexity. Taken together, these characteristics mean that it is challenging to parallelize a Monte Carlo sampler efficiently. CAMPARI generally tries to parallelize both coordinate operations and incremental energy calculations and strives to achieve load balance by explicitly estimating and then splitting the overall load for each significant task. This involves specialized routines for special move types. Because of the heterogeneity of MC move sets (and the controls offered over them), it is recommended to always perform a quick scaling check when using CAMPARI's shared memory (OpenMP) parallelization.
PARTICLEFLUCFREQ
This keyword is relevant only when ENSEMBLE is set to either 5 or 6, i.e., those ensembles which allow numbers of particles to fluctuate. In this case, the keyword defines the fraction of all moves that attempt to sample the particle number dimension of the thermodynamic state of the system. For the semigrand ensemble, this corresponds to attempting to transmute one particle type into another while preserving the position of the target particle. For the grand ensemble, it will with 50% probability try to insert a particle of permissible type in a random location in the simulation container and with 50% probability attempt to delete a permissible particle. These moves are applied at the molecule level and most closely related to rigidbody moves in terms of complexity (→ RIGIDFREQ).Technically, the GC ensemble is supported in CAMPARI by maintaining a set of ghost particles for each fluctuating type which work as "standins". This framework entails certain limitations which are detailed elsewhere.
Expected numbers of such moves overall are calculated trivially as:
NRSTEPS · PARTICLEFLUCFREQ
Note that the default picking probabilities are such that every molecule type allowed to fluctuate in numbers receives equal weight. In case of particle permutation moves, which are implemented as joint insertion/deletion, there is no way to adjust these. This is because the implementation mandates the molecule types to be different, which would require additional corrections in the acceptance probability, which would cancel out the preferential sampling weights. For independent insertion and deletion available in the grand ensemble, the preferential sampling utility allows the user to at least adjust the picking probabilities on a pertype basis. This can be relevant for example in electrolyte mixtures with disparate target concentrations (and correspondingly disparate bath particle numbers), for which it would make sense to preferentially insert and delete those particle types with overall larger numbers. Such an adjustment would also bring the sampling weights in line with the default picking probabilities for rigidbody moves, which are flat on a permolecule basis.
RIGIDFREQ
This keyword specifies what fraction of all remaining moves (i.e., 1.0  PARTICLEFLUCFREQ) is to perturb rigidbody degrees of freedom. This encompasses translations and rotations of individual molecules as well as of groups of molecules (the latter are only available in case rotation and translation are coupled → COUPLERIGID). The default picking probabilities are even for all molecules regardless of type, size, or other properties. They can be adjusted via the preferential sampling utility, and this may be relevant in dense or semidilute systems with different molecule types of vastly different size (e.g., proteins and inorganic ions). In such a case, the acceptance rates for the macromolecules will be noticeably smaller, and this could be compensated for by sampling them preferentially.COUPLERIGID
This keyword is a simple logical deciding whether or not to couple translational and rotational rigidbody moves for single molecules. Like any type of move coupling, this means that up to six independent perturbations of individual degrees of freedom are employed (translation in x,y,z, rotation around three axes) before energies and the acceptance criterion are evaluated. Note that molecules with no rotational degrees of freedom will have their moves counted as pure translation moves in the logoutput.ROTFREQ
This keyword can be used to set the subfrequency for purely rotational moves if uncoupled moves are used (→ COUPLERIGID is false). It will then determine the fraction of those rotational moves. Total number:NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · ROTFREQ.
And the total number of purely translational moves will be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0ROTFREQ)
Note that the above formulas do not account for the choice between randomizing and stepwise perturbations (→ RIGIDRDFREQ), which would introduce an additional factor into the above product.
RIGIDRDFREQ
This keyword sets a terminal choice in the selection tree that is common to many of the moves in CAMPARI (see similar keywords PIVOTRDFREQ, NUCRDFREQ, and so on). Amongst the available rigidbody moves (it applies to three separate branches: coupled singlemolecule moves, coupled multiplemolecule moves, and decoupled singlemolecule moves), the keyword chooses the fraction to completely randomize the underlying degrees of freedom. For example, the complete randomization of translational degrees of freedom would displace the molecule's reference center to an arbitrary point in the simulation container. The remaining fraction will correspond to stepwise perturbations in which a usually small random increment is added to the degrees of freedom in question. For example, such a move would displace a molecules reference center by a random vector small in absolute magnitude.As an example consider singlemolecule translation moves. The total number of expected randomizing translation moves would be (assuming COUPLERIGID is false):
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0ROTFREQ) · RIGIDRDFREQ
And the number of stepwise translation moves would be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0ROTFREQ) · (1.0RIGIDRDFREQ)
The same modifications apply to any other branch of rigidbody move as explained above. As an additional complication, the decision about randomization vs. stepwise perturbations is decoupled itself in coupled rigidbody moves. Also note that the log output does not distinguish between the stepwise and randomizing varieties for any move type.
Randomizing rigidbody translations have one peculiarity. Unless a spatial dimension is periodic (see BOUNDARY), the absolute coordinates in this dimension have no strict bounds, which means that a fully random prior would have to extend to infinity. If the restraining force of the present interacting boundaries is not infinitely strong, it therefore gives rise to a consistent bias if particles are regularly placed randomly in a volume confined exactly to the formal definition of system size (the regions extending beyond this formal size become undersampled). This bias decreases with increasing restraining force. It is also less noticeable whenever the boundary potential does not act on just a single atom in the molecule. If acts on just a single atom for monoatomic "molecules" with an atombased boundary condition and for singleresidue molecules with a residuebased boundary condition (see BOUNDARY). In other cases, the bias is less apparent because the translation is applied to the molecule's geometric center, which in general is more likely to reside in the formal volume than the outermost atoms/residues. The same reason causes cluster rigidbody moves to be affected less as well, To avoid this type of bias, keyword RIGIDRDBUF can be used.
ROTSTEPSZ
For any stepwise perturbation of rotational rigidbody degrees of freedom, this keywords sets the maximum step size in degrees. It is implemented such that the actual step size is drawn with uniform probability from an interval from 0.0° to ROTSTEPSZ°.TRANSSTEPSZ
For any stepwise perturbation of translational rigidbody degrees of freedom, this keywords sets the maximum step size in Å. Analogous to ROTSTEPSZ, it is implemented such that the actual step size is drawn with uniform probability from an interval from 0.0 to TRANSSTEPSZ Å.RIGIDRDBUF
For any full randomization attempt in a rigid translation move in the presence of an explicit boundary potential acting on at least one of the coordinates, this keyword sets a scale ratio of sampling dimension and the size of the simulation container. Specifically, this means that the point to be placed, which is the geometric center of a molecule or a group of molecules, samples from a uniform distribution inside a container with dimensions that are RIGIDRDBUF times larger than the formal size specified. For a rectangular box these are the side lengths, for a sphere it is a radius, and for a cylinder it is the height and the radius. In all cases, the scale factor applies only to those dimensions, which are not periodic, i.e., which are "closed" by an explicit boundary potential of controllable strength. The factor is applied uniformly to all of these eligible dimensions. The default value is 1.0, and only values greater than or equal to 1.0 are allowed. The computational efficiency of these move types decreases with increasing RIGIDRDBUF because an increasing number of moves will be rejected on account of the boundary potential. However, a value of 1.0 introduces systematic biases. Essentially, the full randomization move is nonergodic and will lead to a systematic underestimation of occupancy probabilities for positions with finite values of the boundary potential. This effect is strongest for the displacement of singleatom molecules by singlemolecule rigid translation moves. Appropriate values for RIGIDRDBUF are those that effectively mask the bias. These appropriate values depend on boundary force, temperature, formal size, and system composition (see above for details regarding the last point).CLURBFREQ
This keywords sets the fraction of all available coupled rigidbody moves to simultaneously perturb the rigidbody degrees of freedom of more than one molecule in concerted fashion. In other words, these moves allow the concerted translation (by the same vector) and rotation (around the "cluster" centerofmass) of several molecules in one shot.The expected number of multimolecule moves would be (assuming COUPLERIGID is true):
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · CLURBFREQ
And that of coupled singlemolecule moves would be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · RIGIDFREQ · (1.0CLURBFREQ)
Currently, the picking of the molecules in a "cluster" is completely random. Note that cluster moves in can easily become tricky: In periodic boundary conditions, the nearest image and hence the internal structure of the cluster may actually change upon rotation of a cluster, whereas in droplet boundary conditions rotations and translations of clusters formed by distal molecules may incur significant boundary penalties and hence be inefficient overall. Like all other rigidbody moves, cluster moves can be stepwise or completely randomizing (still in concerted fashion). This is all regulated by the previously introduced keywords RIGIDRDFREQ, ROTSTEPSZ and TRANSSTEPSZ. The picking frequencies are regulated at the molecule level. With the preferential sampling utility, it is possible to alter the picking weights on a permolecule basis. Note that this should yield either zero or reasonably large weights for all molecules, because the weights combine in a product sense during the picking process. This also means that it is tedious to compute the expected sampling probabilities for all possible "clusters" of molecules of sizes 2 to the maximum value. These moves are likely to be replaced by a more efficient variant in the future. In part because of this, they are (currently) unavailable when using CAMPARI's shared memory (OpenMP) parallelization.
CLURBMAX
This keyword sets the maximum "cluster" size for concerted multimolecule rigidbody moves (see CLURBFREQ). The assignment is completely random at any given step such that detailed balance is maintained. Note that the number of possible "clusters" grows as binomial coefficients with increasing size of the cluster until CLURBMAX reaches half the number of molecules in the system. It is important to point out that picking values close to the number of molecules can cause search problems that CAMPARI actively avoids. Specifically, if the total sampling weight of available molecules remaining is less than 10%, a new molecule has not been found to add to the "cluster" in 100 tries, and the current size is at least 2, then the value picked initially for CLURBMAX is decreased to the current size. This is to avoid the code spending an excessive amount of time in an inefficient search procedure. The control on total sampling weight is particularly relevant for cases where the picking weights have been altered on account of the preferential sampling utility.ALIGN
This keyword is an integer indicating how to handle the fact that lever arm effects can be asymmetric in multimolecular simulations. A brief explanation is in order. Consider a macromolecule with multiple dihedral angles along the backbone. Then, a perturbation of an individual of those dihedral angles may be implemented in two basic implementations corresponding to two building directions of the (unbranched) main chain. Either one of the ends will swivel around (leverarm) while the other remains fixed in place. In a simulation with just a single molecule, the new conformations for either type will be identical except for an implied rotation of the reference frame. In a simulation with multiple molecules, however, the two conformations will be explicitly different since the other molecules define the now static reference frame. In general, moves with longer leverarms will have lower acceptance rates and are slower to evaluate and should generally be avoided. For MC, this affects polypeptide pivot moves (coupled and uncoupled (see COUPLE)), ωmoves (see OMEGAFREQ), Favrin et al. inexact CR moves (see CRMODE), pivottype nucleic acid moves (see NRNUC), sugar pucker moves (see SUGARFREQ), and polypeptide cyclic residue pucker moves (see PKRFREQ). It affects single torsion pivot moves (see OTHERFREQ) in a slightly different manner, and this is described there. It is also relevant for torsional dynamics for which it in similar vein determines the assumed building direction for the chains. Options are as follows: Always leave Nterminus unperturbed (Cterminus swings around).
 Always leave Cterminus unperturbed (Nterminus swings around). This is only recommended in special applications since the Cterminal alignment requires the whole molecule to be rotated around, which makes this mode more expensive but analogously asymmetric when compared to mode 1.
 Always leave the longer end unperturbed (shorter leverarm is chosen). This is the default (and a good) choice as it should be the most efficient one for simulations with multiple chains of significant length. It is also the recommended setting for torsional dynamics in which the kinetics at one of the termini will otherwise be artificially slowed (note that the criterion determining lever arm length uses number of atoms rotated rather than number of residues in dynamics).
 A stochastic modification of mode 3 only available in MC: The
probability with
which the longer end swivels around is equal to:
p_{lt} = (L_{st} + 1) / (L_{st} + L_{lt} + 2)
And conversely:
p_{st} = (L_{lt} + 1) / (L_{st} + L_{lt} + 2)
Here, L_{st} is the smaller number of residues beyond the pivot point towards the nearer terminus and L_{lt} is the larger number of residues beyond the pivot point towards the more distant terminus such that L_{st}+L_{lt}+1 yields the total number of residues in the molecule. For example, a molecule with six residues would yield probabilities for doing Cterminal alignment (the Nterminus swings around) of 6/7 for residue 1, 5/7 for residue for residue 2, and so on down to 1/7 for residue 6.
This choice represents the most flexible move set and should normally be preferred in MC when sampling problems are encountered.
COUPLE
If this keyword is set to 1 (logical true), all polypeptide pivot moves are coupled to sidechain moves on the same residue (→ PIVOTMODE). This means that new conformations for the φ and ψangles as well as for some of the sidechain χangles (if any) are proposed before the energy and acceptance criterion are evaluated. Like any other unbiased move perturbing multiple degrees of freedom, this procedure drastically increases the chance of generating an unacceptable conformation (assuming a typical excludedvolume interaction potential is used). Consequently, acceptance rates will be very low and it is generally not recommended to use this option. Note that it is still possible to use independent sidechain moves but that it is impossible to do independent pivot moves for residues with sidechains. In other words, all frequency settings are used as normal but all standard polypeptide pivot moves (the default move type of the decision tree) are coupled to a mandatory sidechain move (of a (sub)set of sidechain angles in that residue). Keywords PIVOTRDFREQ, PIVOTSTEPSZ, CHIRDFREQ, CHISTEPSZ, and NRCHI and all observed in the respective parts of the underlying base move types.The expected number of those coupled moves would be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · (1.0OTHERFREQ)
Note that the same formula applies to uncoupled polypeptide pivot moves.
PIVOTMODE
Polypeptide pivot moves are historically the oldest move type in CAMPARI. Therefore, they are placed at the outermost branch of the move selection tree and possess no frequency selection keyword. In general, pivot moves simultaneously sample the φ and ψangles of a single polypeptide residue unless the residue is ringconstrained (such as proline or hydroxyproline) in which case only the unconstrained degree of freedom (ψ for proline) is sampled. See PKRFREQ for "pivot" moves which sample the φangles of proline and analogous residues. The default picking probabilities for polypeptide moves are even for all residues with peptide φ/ψangles. They can be adjusted with the help of the preferential sampling utility. An example where this can be useful is in reducing the picking weight of proline and similar residues, for which the number of degrees of freedom is smaller.Mostly for historical reasons, this keyword allows the selection of different modes for pivot moves as follows:
 Blind backbone sampling, i.e., all angles have equal likelihood (unbiased and the default)
 Using grids (requires GRIDDIR), i.e., angle pairs are sampled come from within an approximate envelope derived from the space available to the corresponding dipeptide if one assumes typical excluded volume interactions (biased).
PIVOTRDFREQ
Much like for other move types, CAMPARI allows the user to mix two types of polypeptide pivot moves: the first randomizing the φ and ψangles of the residue in question (for proline only the ψangle, for coupled moves also the sidechain χangles → COUPLE), the second perturbing them by a small increment whose size is set by the auxiliary keyword PIVOTSTEPZ. Note that randomizing moves may be extremely ineffective for the sampling of dense phases (collapsed states of macromolecules) and that the only accepted moves will be those realizing small displacements by chance.To calculate the expected number of randomizing and stepwise polypeptide pivot moves, the user may employ the formula listed under COUPLE and multiply it with PIVOTRDFREQ and 1.0PIVOTRDFREQ, respectively.
PIVOTSTEPSZ
This keyword sets the step size in degrees for local perturbation attempts to the φ and ψangles of polypeptide residues (see PIVOTRDFREQ). Note that this step size encompasses the entire symmetric interval around the original position, i.e. a value of 10° will attempt uniformly distributed random displacements within the interval of 5° to 5°.GRDWINDOW
This keyword sets a parameter determined by external input files which are used to assist conformation space sampling in biased fashion when PIVOTMODE is set to 2. Then, GRDWINDOW needs to specify half the bin size for the steric grids (see GRIDDIR). The files are supplied in the datadirectory and the default value to be used here would be 5.0°. Note that gridassisted sampling is not a fully supported option in CAMPARI and may be removed entirely in the future.OMEGAFREQ
In polypeptides, the dihedral angle along the actual peptide bond (ω) is different from the φ and ψbonds since the carbon and nitrogen atoms have partial sp^{2}character. This inhibits free rotation around the bond due to electronic effects and means that only a very narrow range of conformations is typically available to the ωangle. The two dominant states are the planar cis and transconformations with the latter being almost exclusively seen for nonproline residues and both contributing for proline. In molecular mechanics force fields, these effects are typically represented via strong torsional potentials (see SC_BONDED_T and SC_EXTRA). From a sampling point of view, this means that it would be unwise to couple the sampling of such a stiff degree of freedom to any other degree of freedom. ωmoves therefore perturb nothing but the ωangle of an individual polypeptide residue. They technically are equivalent to pivot moves in that the "free" end will swivel around lowering the acceptance rates additionally if the perturbations are large (→ ALIGN).To calculate the number of expected ωmoves use:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · OMEGAFREQ
Note that the moves are additionally split up into those attempting to completely randomize the ωangle and those that attempt stepwise perturbations (→ OMEGARDFREQ). It should be emphasized that the randomizing move will typically be the only way of converting between cis and transconformations due to the height of the barrier separating the two. The default picking probabilities are identical for all residues with ωtype bonds. They can be adjusted with the help of the preferential sampling utility, and such adjustment could be useful in mixed systems with small molecule amides and polypeptides, where it may be beneficial to preferentially sample the polypeptide ωbonds.
OMEGARDFREQ
This keyword is completely analogous to PIVOTRDFREQ but applies to ωmoves instead of φ/ψmoves.OMEGASTEPSZ
This keyword is completely analogous to PIVOTSTEPSZ but applies to ωmoves instead of φ/ψmoves.PKRFREQ
Of the fraction of all pivottype polypeptide backbone moves, what is the fraction of backbone moves to selectively alter the dihedral angles around the NC_{α} bond in proline or similar residues? These rotations are hindered by the presence of the ring and hence they cannot be sampled independently. Moves of this type therefore alter the pucker state of the amino acid sidechain belonging to the chosen residue and the backbone conformation of the polypeptide (pivottype move) simultaneously. These moves are analogous to sugar pucker moves for polynucleotides (see SUGARFREQ).The expected number of polypeptide pucker moves would be:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · PKRFREQ
Note that these moves are split up into two variants  a nonergodic one which inverts the pucker state, and one which introduces new degrees of freedom, bond angles, but allows sampling of most of the relevant phase space (bond length changes remain quenched). This is determined by PKRRDFREQ. When analyzing highresolution structural databases, it can be seen that proline residues occupy two dominant pucker states separated by a barrier. The nonergodic move can jump across this barrier but is unable to explore the basin around its current position. The latter requires bond angle changes as otherwise the problem is overconstrained. This introduction of new degrees of freedom is generally undesirable (see discussion under ANGCRFREQ) but in this particular case of small impact since none of the bond angles along the main chain are allowed to change. This keeps the effects of bond angle changes local while allowing exploration of the continuous manifold of conformations of the fivemembered ring.
The exact set of degrees of freedom used to sample the ergodic move type is explained in detail elsewhere, and an implementation reference is given in the literature. The default picking probabilities for this move type are flat for all polypeptide residues possessing ring pucker degrees of freedom. The probabilities can be adjusted by the preferential sampling utility, and this could be used to finetune sampling weights in polymers. For example, puckering equilibria for central residues in polyproline are expected to be both more relevant and more difficult to sample than those for terminal residues and may benefit from being sampled preferentially. Finally, like sugar pucker moves, these moves are using no parallelization to the closure problem for the ring when CAMPARI's shared memory (OpenMP) parallelization is in use, which is a limitation.
PKRRDFREQ
As pointed out above, finding arbitrary conformations of a fivemembered ring while keeping all bond lengths and angles constant is an overconstrained problem (→ PKRFREQ). Therefore, CAMPARI releases the constraint on bond angle rigidity for those systems which include proline and similar polypeptide residues. This necessitates the use of bond angle potentials (see SC_BONDED_A) to keep local geometries reasonable. To sample different ring conformers effectively, CAMPARI uses a strategy of combining a nonergodic reflection of the pucker step (nonlocal) with stepwise but unbiased excursions away from the current state. This keywords regulates the fraction of pucker moves to be of the former type (reflection). The formulas listed under PKRFREQ multiplied with PKRRDFREQ and (1.0PKRRDFREQ), respectively, would give the expected numbers for either type. Note that it typically is not a good idea to set this to either zero or unity. A value of unity would create an effective twostate model (with fixed bond angles), while a value of zero would make it very difficult for the gross pucker state to switch due to the barrier separating the two (this last statement assumes typical interaction potentials).PUCKERSTEP_DI
This keyword applies to the second type of pucker sampling (see PKRRDFREQ) and controls the maximum step size for dihedral angles in degrees for the random stepwise excursions from the current state. It simultaneously applies to the problem of sugar pucker sampling (→ SUGARFREQ). In both cases, four of the seven freely sampled degrees of freedom are dihedral angles.PUCKERSTEP_AN
This keyword applies to the second (stepwise) type of pucker sampling (see PKRRDFREQ) and controls the maximum step size for bond angles in degrees for the random stepwise excursions from the current state. Much like PUCKERSTEP_DI, this keyword simultaneously applies to the problem of sugar pucker sampling (→ SUGARFREQ). In both cases, two of the seven freely sampled degrees of freedom are bond angles and one bond angle is derived to correctly close the loop.NUCFREQ
This keyword controls the frequency of all types of polynucleotide moves excepting those sampling just sidechain degrees of freedom. This set includes algorithms to sample stretches of polynucleotides with endconstraints (concerted rotation → NUCCRFREQ), dedicated algorithms to sample the constrained dihedral angles around the sugar bond (→ SUGARFREQ), and simple polynucleotide backbone pivot moves. The description below applies only to the latter type which does not possess a dedicated keyword but is the default fallthrough choice for this branch of the decision tree.Nonterminal polynucleotides have six backbone degrees of freedom one of which is not sampled by this type of move. Much like for proline, the rotation around the sugar bond is hindered and a dedicated algorithm is needed to sample this dihedral angle (→ SUGARFREQ). An overview of the backbone degrees of freedom for terminal and nonterminal nucleotides can be gleaned from the description of sequence input. Nucleotide pivot moves are physically analogous to polypeptide φ/ψmoves in that they sample the backbone of a single nucleotide residue. The new conformation will imply the rotation of a lever arm which will render largescale perturbations very unlikely to be accepted (→ ALIGN). Technically, these moves are implemented slightly differently in that the number of sampled degrees of freedom may vary (→ NRNUC). This is to make it possible to finetune sampling efficiency. As with any move coupling the sampling of independent degrees of freedom blindly, efficiency will typically be unacceptably low for more than two backbone dihedral angles given a realistic interaction potential and the complicated topology of polynucleotides. In the future, these moves are sought to cover any type of nonpolypeptide polymer and the flexible setup was implemented partially with that in mind.
Expected numbers for all polynucleotide pivot moves may be calculated as follows:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · NUCFREQ · (1.0NUCCRFREQ) · (1.0SUGARFREQ)
Remember that NUCFREQ does not control the fraction of polynucleotide pivot moves directly but only sets the expected number for all polynucleotide moves. Note that the moves are additionally split up into those attempting to completely randomize the nucleotide backbone angles and those that attempt stepwise perturbations (→ NUCRDFREQ). The default picking probabilities for these pivot moves are flat on a perresidue basis. They can be adjusted by the preferential sampling utility, and this could become routinely relevant in future applications, for which other polymer types are subjected to pivot moves through this facility. In such a case, it would almost certainly be desirable to make the picking frequencies (at the very least) proportional to the number of backbone degrees of freedom in each residue, which may not necessarily be homogeneous.
NRNUC
This keyword allows the user to set the maximum number of nucleic acid backbone angles to be sampled within a pivot polynucleotide move. The dihedral angles will always come from the same residue. The implementation has the following features: Whenever NRNUC is equal to or larger than the number of backbone angles on a certain residue, all backbone angles on that residue will be sampled simultaneously.
 Whenever NRNUC is smaller than the number of backbone angles on a certain residue, on average NRNUC of the available angles should be sampled simultaneously. However, the actual average will be larger since always at least one angle has to be sampled (in other words, there is a stochasticity to the number of angles chosen, and the asymmetry is introduced by the constraint to always have at least one angle in the set).
NUCRDFREQ
This keyword is completely analogous to PIVOTRDFREQ but applies to polynucleotide backbone pivot moves instead of φ/ψmoves.NUCSTEPSZ
This keyword is completely analogous to PIVOTSTEPSZ but applies to polynucleotide backbone pivot moves instead of φ/ψmoves.NUCCRFREQ
This keyword sets the fraction of exact nucleic acid concerted rotation (CR) moves amongst all nucleotide moves. Concerted rotation algorithms are provided both for polypeptides and polynucleotide and function generally analogously although there are important implementation differences. Important general information for this type of move is provided elsewhere, along with parameters that apply to all variants of exact CR moves (such as UJCRBIAS, UJCRSTEPSZ, and UJCRWIDTH). In particular, the reader is referred to both the literature and the documentation on CR moves for polypeptides (→ CRFREQ and TORCRFREQ) in particular with regards to the interpretation of auxiliary keywords (NUCCRMIN and NUCCRMAX) and the handling of picking probabilities and their alteration by userlevel constraints and preferential sampling weights.The general idea of a concerted rotation move is to sample a stretch of polymer without changing the absolute positions and relative orientation of the termini. Six degrees of freedom are required to solve this constrained problem. Note that for nucleic acid CR moves the rotation around the sugar bond (C4*C3*) is always excluded from the algorithm (treated as a rigid segment). The order of angles is as follows:
 Any number of consecutive and permissible backbone dihedral angles immediately preceding nuc_bb_4 on residue i
 O5PC5*C4*C3* (nuc_bb_4 on residue i)
 C4*C3*O3PP (nuc_bb_5 on residue i)
 C3*O3PPO5P (nuc_bb_1 on residue i+1)
 O3PPO5PC5* (nuc_bb_2 on residue i+1)
 PO5PC5*C4* (nuc_bb_3 on residue i+1)
 O5PC5*C4*C3* (nuc_bb_4 residue i+1)
The expected number of nucleic acid concerted rotation moves is obtained as follows:
Expected numbers for polynucleotide CR moves may be calculated as follows:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · NUCFREQ · NUCCRFREQ
The user is reminded again that some of the parameters required for this move type apply universally to all exact CR methods while some apply specifically to the nucleic acid variant. Finally, these and all other exact torsional concerted rotation moves are (currently) unavailable when using CAMPARI's shared memory (OpenMP) parallelization.
SUGARFREQ
This keyword sets the fraction of polynucleotide backbone moves to selectively alter the dihedral angles around the sugar bond (C4*C3*) amongst all polynucleotide moves not of the CR variety. Exactly analogous to the case for proline and similar cyclic residues in polypeptides (→ PKRFREQ), these rotations are hindered by the presence of the ring and cannot be sampled blindly. Moves of this type will therefore alter the pucker state of the sugar belonging to the chosen nucleotide and the backbone conformation of the polynucleotide (including lever arm) simultaneously.The expected number may be calculated as follows:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · NUCFREQ · (1.0NUCCRFREQ) · SUGARFREQ
The approach chosen to sample sugars is identical to the one for proline. There are two basic move types, one which inverts the pucker state by flipping the sign of two dihedralangles, and a second one which perturbs the bond angles and dihedral angles defining the 5remembered ring by small random increments while maintaining bond lengths exactly (→ SUGARRDFREQ). The default picking probabilities for this move type are even for all eligible, sugarcontaining residues. They can be adjusted by the preferential sampling utility. An example application could be to preferentially sample sugars close to the binding interface of a welldefined proteinDNA complex rather than those in the rigid portion of the DNA. Finally, like polypeptide pucker moves, these moves are using no parallelization to the closure problem for the ring when CAMPARI's shared memory (OpenMP) parallelization is in use, which is a limitation.
SUGARRDFREQ
This keyword is exactly analogous to PKRRDFREQ but applies to sugar pucker moves in polynucleotides instead of to polypeptide pucker moves.CHIFREQ
Most biologically relevant polymers possess at least minor branches off the main chain. These sidechains are typically short and usually encode the alphabet underlying for instance polypeptides and polynucleotides. From a technical point of view, such short branches are much easier to sample than the backbone of a polymer since the impact of a change in conformation of the branch only affects the branch (lever arm effects are minimal and the assumed direction is always from the main chain outward towards the end of the branch). Since the perturbation is local, energy evaluations are much less costly and acceptance rates generally higher. There is no need for advanced algorithms and simple pivotstyle moves resetting or perturbing the dihedral angles angles in such a sidechain branch are sufficient to explore phase space. This keyword sets the fraction of all sidechain moves including a specialized move type used for analysis only (→ PHFREQ).Expected numbers for actual sampling moves (denoted as χmoves) are:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · CHIFREQ · (1.0PHFREQ)
And for moves trying to determine the pKvalues of ionizable polypeptide sidechains:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · CHIFREQ · PHFREQ
Note that the former are decomposed further into those randomizing the contributing degrees of freedom and those applying stepwise perturbations (→ CHIRDFREQ). The default picking probabilities for this move type give equal weight to all residues with at least one χangle independent of the number of χangles. This can be adjusted by the preferential sampling utility, which as an example would allow making all residue picking probabilities directly proportional to the number of χangles for each residue.
CHIRDFREQ
This keyword is completely analogous to PIVOTRDFREQ but applies to χmoves instead of φ/ψmoves.CHISTEPSZ
This keyword is completely analogous to PIVOTSTEPSZ but applies to χmoves instead of φ/ψmoves.NRCHI
Many sidechains have different numbers of χangles and the complexity of a move would depend on the number of such angles sampled concurrently. Therefore, this keyword allows the user to set the maximum number of χangles to be sampled within a sidechain move. The dihedral angles will always come from the same sidechain on the same residue. Analogously to NRNUC, the implementation has the following features: Whenever NRCHI is equal to or larger than the number of χangles on a certain residue, all χangles on that residue will be sampled simultaneously.
 Whenever NRCHI is smaller than the number of sidechain angles on a certain residue, on average NRCHI of the available angles should be sampled simultaneously. However, the actual average will be larger since always at least one angle has to be sampled (in other words, there is a stochasticity to the number of angles chosen, and the asymmetry is introduced by the constraint to always have at least one angle in the set).
OTHERFREQ
MC move sets are highly specialized tools that have to reflect the choice of the system's degrees of freedom, its density, etc. Some of the choices enforced by the "standard" CAMPARI move sets and mandated by the default parameterization of the ABSINTH implicit solvent model are somewhat arbitrary. This is primarily an issue for degrees of freedom describing rotations around electronically hindered bonds and for rotations around terminal bonds between heavyatoms (methyl and ammonium spins). For example, the amide bond in secondary amides is allowed to vary with dedicated moves, but these are not available for primary amides (the reasoning behind it is connected to the vanishing relevance of cis/trans isomerization in the latter case). However, these choices may not always be desirable. Second, when attempting to simulate entities that CAMPARI does not support natively, the majority of "standard" move types may not be available (exceptions apply if the entities are recognized as conforming to a supported biopolymer type). This would limit simulations containing such entities to pure rigidbody sampling.To address both issues, CAMPARI offers a separate class of dihedral angle pivot moves that can be applied to any freely rotatable torsion angle in any of the system's components. There is a requirement that the Zmatrix be constructed such that only a single Zmatrix angle needs to be edited to describe the perturbation, and this is true for all candidate dihedral angles in residues supported natively by CAMPARI that are frozen by default (e.g., CN bond in the lysine sidechain, all CN bonds in primary amides, CACB bond in alanine, and so on). For unsupported residues, the Zmatrix is inferred from the input structure, and it may require some reordering of atoms to achieve the desired results (see a tutorial relevant in this context). In addition, these moves can also sample torsional degrees of freedom supported by other move sets as long as they fulfill the Zmatrix criterion (this currently excludes the polypeptide φ/ψangles, which are supported by the widest range of specialized move sets).
In terms of parameters, some care has to be taken that torsional potentials describing electronic effects (e.g., in primary amides) are included. Technically, moves of this type are unique in that they always sample only a single degree of freedom. Chain alignment works slightly differently for these moves. Specifically, for options 3 and 4, the number of atoms (rather than the number of residues) moving is critical in determining alignment. Also, all degrees of freedom are eligible for an inverted alignment including sidechain degrees of freedom. Even for option 3, this may consequently lead to the absence of a "base of motion" that would stay rigorously in place in the absence of rigid body moves. For option 2, CAMPARI attempts to preserve a welldefined base of motion at the Cterminus, but this may not work as expected, in particular for polynucleotides and/or very short chains.
To calculate the number of all expected moves of type of OTHER, use:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · OTHERFREQ
Note that these moves are additionally split up into three basic types (see OTHERUNKFREQ and OTHERNATFREQ for choosing different subsets of degrees of freedom), each of which is again split into two variants, i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). For each subcategory of degrees of freedom, sampling weights can be adjusted individually with the preferential sampling utility. Details and examples are given for the individual subcategories.
OTHERUNKFREQ
If single dihedral angle pivot (OTHER) moves are in use, and if the simulation utilizes entities (residues, molecules) that are not natively supported by CAMPARI, this keyword allows the user to choose the bulk sampling weight for degrees of freedom in those unsupported residues. The use of unsupported residues in simulations is explained in a dedicated tutorial.To calculate the number of expected moves acting on single dihedral angles in unsupported residues, use:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · OTHERFREQ · OTHERUNKFREQ
As mentioned above, these moves are additionally split up into two subtypes i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). They can be adjusted at the level of individual degrees of freedom by the preferential sampling utility. As an example, this can be useful when sampling an unsupported polymer (e.g., a polyester) and greater sampling emphasis should be placed on backbone degrees of freedom.
OTHERNATFREQ
If single dihedral angle pivot (OTHER) moves are in use, and if not all OTHER moves are consumed on unsupported residues (→ OTHERUNKFREQ), this keyword allows the user to choose the bulk sampling weight amongst remaining OTHER moves for degrees of freedom that are supported natively by CAMPARI.To calculate the number of expected moves acting on single dihedral angles natively supported, use:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · OTHERFREQ · (1.0  OTHERUNKFREQ) · OTHERNATFREQ
This keyword also controls the fraction of moves acting on dihedral angles frozen by default, but located in residues supported natively by CAMPARI. Compute expected number as:
NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · (1.0CRFREQ) · (1.0OMEGAFREQ) · (1.0NUCFREQ) · (1.0PKRFREQ) · OTHERFREQ · (1.0  OTHERUNKFREQ) · (1.0  OTHERNATFREQ)
Both subclasses are additionally split up into two subtypes i.e., those completely randomizing the dihedral angle and those that attempt stepwise perturbations (→ OTHERRDFREQ). The default picking probabilities for OTHER moves are different from other move types in CAMPARI, since they are identical for all eligible degrees of freedom (and not identical for all residues containing at least one eligible degree of freedom). They can be adjusted at the level of individual degrees of freedom by the preferential sampling utility. For the natively supported degrees of freedom, this could be useful in order to aid sampling of backbone degrees of freedom, whereas for the natively frozen degrees of freedom it could be used to selectively enable a few of those degrees of freedom (e.g., enable flexibility of arginine sidechains, but keep suppressing the methyl spins in hydrophobic residues).
OTHERRDFREQ
This keyword is completely analogous to PIVOTRDFREQ but applies to all moves of type OTHER instead of polypeptide backbone pivot moves.OTHERSTEPSZ
This keyword is completely analogous to PIVOTSTEPSZ but applies to all moves of type OTHER instead of polypeptide backbone pivot moves.CRFREQ
This keyword is a global frequency setting which controls and entire branch of Monte Carlo moves all sharing the feature that they are of the concerted rotation (CR) type and apply to polypeptides. The general idea of a CR move is to sample a stretch of polymer without changing the absolute positions and relative orientation of the termini. Six degrees of freedom are required to solve this constrained problem exactly but simpler methods exist to use more degrees of freedom to solve it approximately (→ CRMODE). The reader is referred to NUCCRFREQ for CR moves on polynucleotides.There are four different types of CR moves for polypeptides provided in CAMPARI:
 Exact CR moves utilizing both bond angles and dihedral angles
along the polypeptide backbone to solve the closure problem exactly
given fixed end constraints: these
moves are based on the work of Ulmschneider and Jorgensen (→ ANGCRFREQ). (reference)
 Exact CR moves utilizing φ, ψ, and ωangles along the
polypeptide backbone to solve the closure problem exactly given fixed
end constraints: these
moves are primarily based on the work of Dinner (→ TORCRFREQ and TORCROFREQ). (reference)
 Exact CR moves utilizing just φ and ψangles along the polypeptide backbone to solve the closure problem exactly given fixed end constraints: these moves are also based on the work of Dinner (→ TORCRFREQ and TORCROFREQ).
 Inexact CR moves utilizing just φ and ψangles along the
polypeptide backbone to approximate a solution to the closure problem
by linear response: these
moves are based on the Favrin, Irbäck, and Sjunnesson (default
fallthrough for this branch). (references)
The general appeal of exact CR methods partially lies in the reduced complexity of energy evaluations since the move only perturbs conformation locally and large parts of the polymer (assuming sufficient length) will remain static with respect to each other. This is never true for pivottype moves applied to residues at the center of the chain. The other aspect which makes CR moves appealing is that they introduce correlation into the MC move set (the reader is referred to Vitalis and Pappu for further reading).
To compute expected numbers, use (same numbering as above):
 NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · CRFREQ · ANGCRFREQ
 NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · CRFREQ · (1.0ANGCRFREQ) · TORCRFREQ · TORCROFREQ
 NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · CRFREQ · (1.0ANGCRFREQ) · TORCRFREQ · (1.0TORCROFREQ)
 NRSTEPS · (1.0PARTICLEFLUCFREQ) · (1.0RIGIDFREQ) · (1.0CHIFREQ) · CRFREQ · (1.0ANGCRFREQ) · (1.0TORCRFREQ)
ANGCRFREQ
This keyword selects the (sub)fraction of UlmschneiderJorgensen (UJ) CR moves (see J. Chem. Phys. 118 (9), pp42614271 (2003)) according to the formulas shown above. Like any other exact CR move implemented in CAMPARI, UJCR moves combine two strategies for efficient conformational sampling: the approach of Favrin et al. (→ CRMODE) is used to obtain a variable length prerotation which biases the end of the prerotation segment to a position with high chance of having at least one real solution when attempting to close it. The closure problem is solved exactly using a numerical root search for an algebraically transformed equation for the following six degrees of freedom: Dihedral angle C_{i2}, N_{i1}, C_{α,i1}, C_{i1} (φ_{i1})
 Bond angle N_{i1}, C_{α,i1}, C_{i1}
 Dihedral angle N_{i1}, C_{α,i1}, C_{i1}, N_{i} (ψ_{i1})
 Bond angle C_{α,i1}, C_{i1}, N_{i}
 Bond angle C_{i1}, N_{i}, C_{α,i}
 Dihedral angle C_{i1}, N_{i}, C_{α,i}, C_{i} (φ_{i})
 The chain closure algorithm relies on a search process to locate roots for a complicated equation, which makes repeated matrix operations necessary which generate a considerable computational overhead for a single UJCR move. This is true for all exact CR methods, even much more so for exact torsional variants than for UJCR moves (→ TORCRFREQ).
 The inclusion of bond angles into the prerotation stretch is not
a particularly useful extension but
required for reasons of ergodicity. Additional parameters are needed to
manage this aspect properly
(→ UJCRSCANG). The inclusion of
bond angles in the closure segments simplifies the root search
procedure by eliminating branches for solution space and generally
reducing the number of possible solutions. This makes the algorithm
faster than comparable methods using dihedral
angles only. However, varying bond angles cause two crucial issues:
 Allowing bond angles to change violates CAMPARI's typical paradigm of fixed geometry in MC calculations and therefore might invalidate some of the force field calibration done under this assumption. In general, it is very important to match the degrees of freedom chosen for the calibration phase of a force field with that for the application phase. The commonly held belief that the introduction of constraints does not alter the positions and relative weights of basins but merely influences barriers in the free energy landscape is not correct.
 CAMPARI currently has no way of independently sampling bond angles in Monte Carlo simulations. This means that effectively a subset of all bond angles are introduced as new degrees of freedom, for which is there is no a priori justification whatsoever (in other words: selectively sampling a few bond angles makes unjustified assumptions about the remaining bond angles). It is therefore recommended to use this feature with the utmost caution until a more sound implementation surrounding it is added. Presently, it may be most suitable as part of the MC move set in hybrid runs (see DYNAMICS) employing Cartesian sampling in the dynamics portions (see CARTINT) although this approach has its own caveats.
TORCRFREQ
Aside from the UJCR moves which employ bond angles (see ANGCRFREQ), analogous methods have been formulated to instead employ exclusively dihedral angles in both the closure and prerotation stretches. This keyword sets the frequency with which both subtypes of those moves occur during the simulation according to the formulas listed above. The preceding discussion has outlines the appeal of exact CR methods and it is not repeated here. Much like Ulmschneider and Jorgensen, CAMPARI employs a hybrid scheme of biased prerotations according to Favrin et al. (see CRMODE) and of exact closures according to Dinner. The latter half of the algorithm is the costintensive one. The algebraically transformed equation requires a numerical root search, for which we use a modified Newton scheme outlined below. Typically, multiple solutions need to be found, and a careful weighting and biasremoval strategy has to be employed to choose solutions with the proper probabilities (→ TORCRMODE). Those comments apply equally to exact polynucleotide CR moves (see NUCCRRFREQ). For polypeptides, there are two variants available which differ in which peptide torsions are used to close the chain (described below).Note that proline (or any other cyclic residue with constrained flexibility around any of the backbone dihedral angles) causes additional problems. In theory, one could formulate algebraic solutions which skip the proline φtorsion. Since the number and positions of proline residues in the closure stretch are not known a priori, this appears impractical. We therefore provide a coupling to (weakly biased and simplified) pucker moves (see PKRFREQ) which will simultaneously determine and propose a new pucker state while solving the chain closure problem. This means that:
 Sampling of the φangle becomes coupled to the proline sidechain conformation (as it should be).
 The acceptance rate for CR moves will be significantly lower due to the extra degrees of freedom included.
 The sampling of the sidechain conformation will be weakly biased towards proper pucker states. In detail, some of the proposed closures will yield φangle values incompatible with sidechain closure and those will be discarded. For those which yield a sane φangle, a corresponding χ_{1}value is proposed with bias toward closable states. One of two free bond angles is perturbed slightly in random fashion and the last one is given by the closure as usual.
 Due to the above, it will be advantageous to not rely overly on CRsampling for prolinerich systems  both for reasons of efficiency and accuracy. Conversely, it should be difficult to find a statistically significant impact of the sampler on global chain properties for polypeptides with low proline content.
TORCROFREQ
This keyword lets the user set the fraction amongst exact, torsional polypeptide CR moves to include ωangles in the formulation of the closure problem? Conversely, the remaining moves will use only φ/ψangles to close the chain. Expected numbers for either type are listed above. In detail the ωvariant uses the following six degrees of freedom: Dihedral angle C_{α,i2}, C_{i2}, N_{i1}, C_{α,i1} (ω_{i1})
 Dihedral angle C_{i2}, N_{i1}, C_{α,i1}, C_{i1} (φ_{i1})
 Dihedral angle N_{i1}, C_{α,i1}, C_{i1}, N_{i} (ψ_{i1})
 Dihedral angle C_{α,i1}, C_{i1}, N_{i}, C_{α,i} (ω_{i})
 Dihedral angle C_{i1}, N_{i}, C_{α,i}, C_{i} (φ_{i})
 Dihedral angle N_{i}, C_{α,i}, C_{i}, N_{i+1} (ψ_{i})
Conversely, for the nonωvariant we have:
 Dihedral angle C_{i3}, N_{i2}, C_{α,i2}, C_{i2} (φ_{i2})
 Dihedral angle N_{i2}, C_{α,i2}, C_{i2}, N_{i1} (ψ_{i2})
 Dihedral angle C_{i2}, N_{i1}, C_{α,i1}, C_{i1} (φ_{i1})
 Dihedral angle N_{i1}, C_{α,i1}, C_{i1}, N_{i} (ψ_{i1})
 Dihedral angle C_{i1}, N_{i}, C_{α,i}, C_{i} (φ_{i})
 Dihedral angle N_{i}, C_{α,i}, C_{i}, N_{i+1} (ψ_{i})
The need for different implementations is that the problems differ algebraically (for once) and that the stiffness of the ωbond may make those moves using the ωbonds in the closure particularly ineffective. This is not the only reason, however, to favor the nonωvariant which is also betterbehaved in terms of finding solutions to the closure reliably. Note that several diagnostics of the performance of exact CR methods are reported during the simulation and after its completion in the logfile.
CRMODE
This defines the mode to use for concerted rotation moves roughly according to the Favrin et al. reference: J. Chem. Phys. 114 (18), 81548158 (2001). In general, this type of move attempts to introduce correlation into a MC move by coupling several consecutive backbone angles (only φ/ψ are considered) together to minimize a cost function which in this case is the difference of the position of the last atom in the stretch compared to its original position. Larger biases lead to smaller moves and higher acceptance. More often than not, this algorithm suffers from its computational inefficiency. Because the loop is only approximately closed, energy evaluations of high complexity (even more expensive than a pivot move) are necessary. It is not recommended to use moves of this type extensively.There are two modes available:
 A matrix relating changes in the degrees of freedom to changes in the cost function (dr/dφ) is computed by considering effective lever arms. In this implementation six effective restraints are imposed through the three reference atoms (N, C_{α}, C) on the residue following the last one of those whose torsions are sampled (note, though, that algorithmically all nine Cartesian positions are used). Note that this mode therefore requires an additional buffer residue at the Cterminus. Specifically, sampling is possible only within an interval from the third residue (in addition to the ineligible terminal residues, there is a symmetrycreating Nterminal buffer residue as well) to the third last residue in each polypeptide chain. In that sense, these moves are trivially nonergodic since they fail to sample a subset of the chosen degrees of freedom (i.e., those within terminal residues).
 The dr/dφ matrix is computed by nested rotation matrices (propagating changes via matrix multiplication). This directly accounts for peptide geometry within the reference atoms and yields six actual restraints. Here, the reference atoms are C_{α}, C, and O on the last residue of which torsions are to be sampled. The implementation with nested rotation matrices is costlier and this mode is only marginally supported, i.e., offers very limited adaptability through the keywords below.
CRDOF
If inexact concerted rotation moves for polypeptides are in use (→ CRMODE), this keyword allows the user to provide the exact number of torsions to use each time such a move is performed. The default value is eight but a different number may be chosen as long as the chain is long enough to accommodate these moves. A minimum of seven degrees of freedom applies since the linear equations are otherwise overdetermined and only trivial solutions are (asymptotically) found. Note that this keyword is only supported if CRMODE is set to 1. Extensions of this to support mode 2 or to allow random, variable lengths during the simulations are currently not anticipated. This is due to the overall inefficiency of the Favrin et al. approach (see discussion here).CRWIDTH
This keyword gives the standard deviation in radian of the random normal distribution underlying inexact concerted rotation moves for polypeptides (→ CRMODE), from which the (unbiased) displacement vectors are implicitly drawn. This corresponds to parameter "a" in the reference but is specified here as its inverse (a = 1/CRWIDTH). Note that the actual resultant distribution width is only set by this keyword if the bias toward minimizing the cost function is zero. If the latter is nonzero the resultant distribution width will be cocontrolled by the setting for CRBIAS. Note that only values up to π/2 may be specified to avoid wraparound artifacts which may upset the procedure of removing the bias from these moves.CRBIAS
This keyword specifies the strength of the bias for inexact concerted rotation moves for polypeptides (→ CRMODE) and corresponds to parameter "b" in the reference. It essentially controls how close the end of the rotated segment will end up to its original position (satisfying the restraints). Unfortunately, this also coregulates the step size, hence there is a need for parameter optimization (i.e., the variance of the resultant biased distribution cannot be controlled easily). Intuitively, the reason is that  in a linear responsetype theory  tiny step sizes always represent one way of satisfying the restraint. Note that with a choice of zero for this keyword, these inexact CR moves relax to random pivot moves of multiple residues in a row (→ CRDOF) with a sampling width controlled by CRWIDTH. Conversely, when choosing very large numbers for this keyword, it should be kept in mind that the evaluation of the acceptance criterion requires inclusion of an exponential factor, exp[ (Δφ^{T} A Δφ) + (Δφ'^{T} A' Δφ') ]. Here, the primed quantities are for the reverse move. Matrix A is diagonal if this keyword is set to zero which implies A = A', and the bias correction is unity. For large values of CRBIAS, the two elements within the exponential become disparate in magnitude very quickly and the exponential may exceed numerical limits even for double precision variables. This may cause some compilers to throw exceptions. Note that the complete bias correction formula includes the determinant of matrix A as well.UJCRBIAS
Despite its name, this keyword regulates the biasing strength for the prerotation steps in all exact CR methods, i.e., nucleic acid CR moves, UJCR moves and both types of exact polypeptide CR moves (→ ANGCRFREQ, TORCRFREQ, and NUCCRFREQ). The strength of the bias controls how close the end of the prerotation segment remains to its original position hence improving the chances for successful closure. This parameter is strongly codependent "with" the default distribution width in the absence of any bias (→ UJCRWIDTH). This keyword parameter is analogous to CRBIAS in the Favrin et al. scheme and is called "c2" in the UJ reference. It should be stressed that all caveats outlined above apply here as well.UJCRWIDTH
Despite its name, this keyword regulates the general (in the absence of bias) width of the distribution (in degrees) sampled in the prerotation segment for all exact CR methods (→ ANGCRFREQ, TORCRFREQ, and NUCCRFREQ). As in the Favrin et al. scheme (which is practically embedded in all exact CR methods implemented in CAMPARI), the resultant width is codependent on the bias factor (see UJCRBIAS and for comparison: CRBIAS and CRWIDTH). It corresponds to "1/c1" in the UJ reference and therefore larger values give wider distributions.UJCRSTEPSZ
The chain closure algorithm works in most exact CR implementations by reducing a multidimensional variable search to a 1D rootsearch, which is then solved by some form of stepthrough protocol and subsequent bisection. This keyword allows the user to choose the stepsize for that root search in degrees for all exact CR methods. Currently, the UJCR method (→ ANGCRFREQ) uses a simple, nonadaptive stepping protocol (see also UJCRINTERVAL). Larger stepsizes there increase the speed of the algorithm significantly, but also increase the fraction of attempts in which no solution is found at all (a quantity reported at the end of the logfile). The recommended value by the authors is 0.05°. Conversely, the exact torsional CR methods for both polypeptides and polynucleotides (→ TORCRFREQ and NUCCRFREQ) employ a modified Newton scheme to map out the complete solution space in three hierarchical steps. In those cases, this keywords merely defines the largest step size to ever be used (i.e., if target function and derivative indicate that no root is near, the step size is not adjusted to very large values but instead to the value given by this keyword). For these methods, a setting of around 1.0 appears much more appropriate. In the future, the implementation of the UJCR method may be adjusted to use the same protocol as the torsional methods. For clarity, it shall be repeated that this keyword applies to all exact CR methods (but is inapplicable to inexact CR moves: → CRMODE). It is very important to understand that the numerical root search will invariably be unreliable, i.e., that there are conformations for which the function may be approaching zero asymptotically while also approaching imaginary solution space. This implies that with such a technique, it will be nearly impossible to eliminate all biases rigorously although it will be possible to reduce their amplitude below that of statistical noise, even when the settings are such that satisfactory computational efficiency is provided (which of course is a crucial element to consider for expensive algorithms such as exact CR methods).UJCRMIN
Specifically for the bond anglebased UlmschneiderJorgensen algorithm (→ ANGCRFREQ), this specifies the minimum requested length (in terms of number of residues) for the prerotation segment in the implementation. Note that if no molecule in the system is at least UJCRMIN+4 residues long (two for closure, two terminal buffer residues that can be caps), CR moves will be disabled entirely. Due to the problems outlined above, this suboptimal implementation has not yet been improved. Note that UJCRMIN and UJCRMAX are analogous to keywords TORCRMIN_DO and TORCRMAX_DO, but use residue numbers instead of numbers of degrees of freedom. Another restriction is that  unlike for TORCRMIN_DO and analogous keywords  UJCRMIN is enforced strictly, i.e., candidate residues are only those that provide the correct padding on either side (for the exact, torsional variants, the specified minimum padding is generally adjusted to the absolute minimum for stretches that would otherwise be too short). Therefore, the implementation of the angular UJCR moves generally offers less flexibility.UJCRMAX
Specifically for the bond anglebased UlmschneiderJorgensen algorithm (→ ANGCRFREQ), this keyword specifies the maximum requested length (in numbers of residues) for the prerotation segment in those moves. Note that this parameter is automatically reduced if a move is attempted for a molecule which is too short to allow the full range of segment lengths (but long enough to satisfy UJCRMIN of course). This will make it difficult to predict the resultant distribution of prerotation segment lengths (compare TORCRMIN_DO).UJCRINTERVAL
Specifically for the bond anglebased UlmschneiderJorgensen algorithm (→ ANGCRFREQ), this keyword lets the user choose the size of the search interval for the onedimensional rootsearch (see UJCRSTEPSZ). The algebraically isolated degree of freedom is scanned over the interval [φd;φ+d] where φ is the original value and d is the (half)interval size specified by this keyword. The recommended value is 20.0°. Note that this implementation is unique to the bond angle UJCR method and offers much reduced overhead cost per CR move compared to the exhaustive search performed by exact torsional methods. The efficiency and justifiability of the method both rely on the crucial assumption that  given a typical prerotation  approximately one solution will be found in the scanned interval. If the number of solutions is often zero or larger than one, the algorithm violates detailed balance and the resultant distributions will be strongly biased. It is generally recommended to analyze the performance of the algorithm beforehand by checking for proper Boltzmann weights in the distributions of both torsional and angular degrees of freedom. This is most easily and meaningfully done employing only bond angle potentials (→ SC_BONDED_A) but no other terms in the Hamiltonian. Then, the distributions of the dihedral angles must be flat and those for the angular degrees of freedom must be such that to k_{b}T·ln(p(α)) equals the acting bond angle potential on α.UJCRSCANG
This keyword applies exclusively to the bond anglebased UlmschneiderJorgensen CR algorithm for polypeptides (→ ANGCRFREQ). It lets the user set a scaling factor to reduce the magnitude of prerotation perturbations of bond angle degrees of freedom (in the absence of prerotation bias, resultant width will be proportional to UJCRWIDTH·UJSCRANG → values less than unity are desirable). Large perturbations on those bond angles would reduce the efficacy of the method considerably due to the stiff potentials typically used to keep bond angles in the valid regimes. Note that the UJCR method never considers ωangles for conformational sampling and that they are consequently excluded from prerotation sampling in their entirety. This is a bit of an arbitrary choice  in particular when considering the problems introduced by the bond angle sampling in the first place (discussion here)  and remedied in exact but purely torsional CR methods (→ TORCRFREQ). The parameter specified here corresponds to "1/c3" in the UJ reference.TORCRMODE
Unlike standard MC moves (such as φ/ψpivot moves), exact CR methods do not constitute an ergodic move set beyond the subspace satisfying the constraint (which is of course invariant toward sampling on that manifold). This necessitates mixing exact CR moves with other types of moves to achieve sampling of the entire phase space. Moreover, they solve an analytical problem numerically with finite error rate, i.e., not all solutions are always found. If these errors are dependent on the "position" of the constraint, i.e., on polymer conformation, the resultant sampling is biased even though Jacobian corrections are applied. This small bias is nearly impossible to remove entirely. CAMPARI supports two implementations for exact, torsional CR methods: When set to 1, at each step, a superset of solutions is created containing the original solution, a set of alternative closures given the original prerotation state, and a set of new conformations with a given, altered prerotation state and a set of closures for that altered state. For each solution, the Jacobian determinants with respect to the closure constraint and the prerotation constraint are evaluated, multiplied, and a solution is picked using the net Jacobian as a weight factor. The chosen move is then evaluated via the acceptance criterion given the additional bias correction of evaluating the randomness of the prerotation move forward and backward as in the Favrin et al. scheme. In the absence of any prerotation bias, this algorithm is conceptually rejectionfree. It also (in theory) satisfies detailed balance on account of the construction of the solution superset.
 When set to 2, at each step, a finite number of trials (see UJMAXTRIES) of prerotations according to the Favrin et al. scheme is performed. Closure is attempted and in case solutions are found, the possible closures along with the sampled prerotation constitute the set of possible moves. A random one is chosen (uniform probability) and the new conformation is evaluated via Metropolis with the Jacobian corrections for the proposed vs. the current state (with respect to both types of constraints) and the randomness correction for the prerotation step. Because solutions only need to be found given the prerotation, this algorithm is usually twice as fast as the one above given sane prerotation settings. This implementation does not satisfy detailed balance even in theory but attempts to remain globally balanced.
TORCRMIN_DO
This specifies the minimum requested number of degrees of freedom for the prerotation segment for exact CR moves for polypeptides utilizing ωangles during closure (→ TORCRFREQ). Note that this minimum number is not rigorously enforced but will be ignored if closure residues too close to the Nterminus are used. This is done in the interest of generality and to prevent the code from disabling these types of moves frequently. It is therefore not as straightforward as one may think to compute the expected distribution of prerotation segment lengths (and which residues are part of them with what probability) for each polypeptide. Note that here numbers of degrees of freedom are specified whereas for the bond angle UJ method, numbers of residues are specified (→ UJCRMIN).TORCRMAX_DO
This specifies the maximum requested number of degrees of freedom for the prerotation segment for exact CR moves for polypeptides utilizing ωangles during closure (→ TORCRFREQ). Note that this maximum number is in fact a rigorous upper limit and never exceeded but that the length of some polypeptides in the system may be such that it is never realizable. In the latter case, there will be an additional complication in predicting the resultant distribution of prerotation segment lengths (see TORCRMIN_DO as well).TORCRMIN_DJ
This keyword is exactly analogous to TORCRMIN_DO but applies to exact CR moves for polypeptides without using ωangles in the closure.TORCRMAX_DJ
This keyword is exactly analogous to TORCRMAX_DO but applies to exact CR moves for polypeptides without using ωangles in the closure.TORCRSCOME
This parameter is analogous to UJCRSCANG and scales down the magnitude of the stepsize for ωbonds in the prerotation segment of exact torsional CR methods for polypeptides. Since stiff torsional potentials usually act on ωbonds (→ OMEGAFREQ), the likelihood of obtaining rejected moves mostly on account of excursions of the ωangle is high. This unwanted behavior may be alleviated by employing small values for TORCRSCOME. Remember, however, that the prerotation step size will often be relatively small in general.UJMAXTRIES
Despite its name, this keyword regulates the maximum number of prerotation sampling events to consider in exact, torsional CR methods with TORCRMODE set to 2. If no solution is found within UJMAXTRIES, the move is counted as rejected. Naturally, detailed balance is maintained only if there is always at least one solution found given the new prerotation (i.e., this keyword is rendered obsolete). As alluded to above, this is never the case for the entirety of a simulation. It is difficult to predict what setting in those cases would best preserve global balance. The main utility of this keyword, however, lies in different sampling applications, e.g., in the efficient and exhaustive sampling of different loop conformations given a fixed constraint.NUCCRMIN
This keyword is analogous to TORCRMIN_DO but applies to exact CR moves for polynucleotides. Note that the sugar bond (C3*C4*) is always excluded from prerotation sampling.NUCCRMAX
This keyword is analogous to TORCRMAX_DO but applies to exact CR moves for polynucleotides. Note that the sugar bond (C3*C4*) is always excluded from prerotation sampling.PHFREQ
This is the frequency out of all sidechain moves (see CHIFREQ) whether to perform a (de)ionization MC move. These moves will be turned off automatically in case there are no titratable residues in the system (currently only polypeptide residues D, E, R, K, and H (use neutral form) are supported). Note that these are pseudoMC moves, i.e., they do not interface intuitively with the rest of the MC code. This means that the guidance criterion for accepting / rejecting titration moves is based on a distinct and simplified energy evaluation which has no impact on the actual Markov chain. These moves are therefore analyzing (onthefly) an independently generated Markov chain (using whatever Hamiltonian was specified) but do not perturb the conformational ensemble generated by said Markov chain in any way. This essentially corresponds to the assumption that the generated ensemble is independent of titration states  an assumption which is always wrong but may  in certain circumstances such as extreme denaturing conditions  nonetheless be justified. These moves rely on environmental settings (PH and IONICSTR) and are required for obtaining output in PHTIT.dat. The default picking probabilities for ionizable residues are flat and cannot be altered.PSWFILE
This keyword specifies name and location (full or relative path) of an optional input file parsed to alter the default picking probabilities for all types of moves in CAMPARI at most down to the residue level (but not further). In general, the idea of preferential sampling rests on the realization that any ergodic and unbiased move set is theoretically capable of producing a Markov chain yielding the correct phase space distribution. This means that the sampling weights given to degrees of freedom of the system need not be equivalent, but rather can be chosen arbitrarily (as long as a choice of zero somewhere does not eliminate ergodicity). Of course, the convergence properties of a Monte Carlo simulation are an exceptionally complicated function of the move set, and therefore deviation from default choices should be properly justified. Examples have been listed above, e.g. in the discussion of sidechain sampling.CAMPARI generally allows the preferential sampling facility to overlap with userlevel constraints. Constraints are applied first, and then picking probabilities are altered. In the process, it is possible to effectively introduce additional constraints on account of setting selected sampling weights to zero. This is tolerated as long as it does not deplete the pool for a class of moves entirely. In such a case, the program terminates with an error. There is a notable difference in zero sampling weights and constraint requests for concerted rotation moves of polymers (described elsewhere). Note that it is not possible to control frequencies that would lead to incorrect sampling. In particular, it is impossible to control picking probabilities for particle permutation moves, and particle insertion and deletion moves can only be controlled down to the molecule type level. Rigidbody moves are generally limited to the scope of molecules, not residues. The format of the input file is described elsewhere.
PSWREPORT
If the default picking probabilities are altered (→ PSWFILE) in torsional space Monte Carlo simulations, this keyword acts as a simple logical whether or not to write out a summary of the resultant picking frequencies for every move type that is active and has been modified (to the logfile).Files and Directories:
(back to top)
Preamble (this is not a keyword)
In general, files and directories should be provided using absolute paths. This is often advantageous in deploymentbased computing where relative directory structures and/or shortcuts may change or not exist. However, CAMPARI may fail in reading strings longer than 200 characters leading to truncation and subsequent failure. This should be kept in mind. Also, this section is merely a list of the auxiliary files potentially required by CAMPARI. The functionalities itself (including the files) are usually explained elsewhere (links are provided).BASENAME
This keyword allows the user to pick a name for the simulation/system that is going to be used in the names of all structural output files. However, all other output files produced by CAMPARI use generic names and will be overwritten if simulations are continuously run in the same directory.SEQFILE
This is the most important input file as it instructs CAMPARI which system to simulate. Its format and possible entries are explained in detail elsewhere.SEQREPORT
This keyword is a simple logical (specifying 1 means "true", everything else means "false") that controls whether CAMPARI writes out a summary of some of the system's features initially. In detail, it will provide an overview of the identified molecule types, viz., the numbers of each molecule type present, the first instance, their formal concentration, their molecular mass, and their highlevel suitability for performing CAMPARIinternal analyses. The latter would for example report that urea molecules are not suitable for peptidecentric analysis such as secondary structure analyses. In addition, the parsing of these molecule types into analysis groups is written to logoutput.PDBFILE
Among other functions, this is the main input file for providing a starting conformation for a simulation. See below for details.DCDFILE
See below.XTCFILE
See below.NETCDFFILE
See below.FRZFILE
See above.PSWFILE
See above.SHAKEFILE
See above.PARTICLEFLUCFILE
This keyword is relevant only when ENSEMBLE is set to either 5 or 6 (ensembles with fluctuating particle numbers). It provides the location of the file that specifies the particle types that are allowed to fluctuate, the numbers of particles of those types to initially include in the system, and the chemical potentials of each fluctuating particle type (see here).REFILE
See below.FEGFILE
This keyword lets the user specify name and location of the input file from which CAMPARI extracts which residues and/or molecules to subject to scaled interaction potentials with the rest of the system in free energy growth (ghosting) calculations.DRESTFILE
See below.BIOTYPEPATCHFILE
See above.MPATCHFILE
See above.RPATCHFILE
See above.BPATCHFILE
See below.LJPATCHFILE
See below.CPATCHFILE
See below.FOSPATCHFILE
See below.SAVPATCHFILE
See below.ASRPATCHFILE
See below.NCPATCHFILE
See below.TORFILE
See below.POLYFILE
See below.TABCODEFILE
See below.TABPOTFILE
See below.TABTANGFILE
See below.EMMAPFILE
See below.WL_GINITFILE
See above.EWWISDOMFILE
See below.ANGRPFILE
See below.BBSEGFILE
This keyword lets the user choose an input file containing a map annotating φ/ψspace for polypeptides with canonical secondary structure regions. This mapping is used to perform segmentbased analyses of polypeptide secondary structure. CAMPARI provides two such files already (in the data/ subdirectory). These and the files' format are explained in detail elsewhere.GRIDDIR
This keyword sets the directory CAMPARI browses to find input files for gridassisted sampling (see above). CAMPARI provides by default sample input files in $CAMPARI_HOME/data/grids/. The code assumes filenames to follow a systematic naming convention "xyz_grid.dat", where xyz is the lowercase, threeletter code of the standard 20 amino acids.This functionality is de facto obsolete and should not be used. It may be removed entirely in the future.
BESSELFILE
This keyword sets the location and name of the file CAMPARI expects to read for the tabulation of FourierBessel (Hankel) transforms. This is required for diffraction analysis and is normally contained in the CAMPARI data directory. Details on the input are found elsewhere.PCCODEFILE
See below.SAVATOMFILE
See below.ALIGNFILE
See below.TRAJIDXFILE
See below.FRAMESFILE
See below.TRACEFILE
See below.CFILE
See below.TRAJBREAKSFILE
See below.TRAJLINKSFILE
See below.CLUFILE
See below.NBLFILE
See below.CLUUNFOLDFILE
See below.CLUFOLDFILE
See below.NCDM_NCFILE
See below.NCDM_ASFILE
See below.NCDM_CFILE
See below.NCDM_FRAMESFILE
See below.Structure Input and Manipulation:
(back to top)
RANDOMIZE
This keyword determines the randomization aspects of initial structure generation. CAMPARI can generate default structures, completely random structures or, alternatively, use some or all available information from a structural input file. The outcome is determined both by the choices below and by the choice for keyword PDB_READMODE. In general, the program employs a hierarchical procedure whereby stretches of the input sequence are randomized residuebyresidue or moleculebymolecule. If no excluded volume term has been enabled (SC_IPP or SC_WCA), the randomization will almost certainly produce structures with steric clashes (the majority of energy terms are ignored; for example, it is not possible to implement excluded volume only by means of tabulated potentials and to rely on randomization to produce a clashfree initial structure). The only other, possibly relevant terms during randomization are boundary potentials and bonded potentials. It is a general limitation that other stiff potentials (including bias potentials such as distance restraints) as well as suboptimal clash resolution can easily generate very large energies and forces for the initial structures produced by randomization. While not generally a problem in Monte Carlo runs, the large forces will make any gradientbased simulation immediately unstable. In these cases, the cleanest workaround is to set up a hybrid calculation (see DYNAMICS) that first runs a number of Monte Carlo steps large enough to resolve all large forces (see CYCLE_MC_FIRST) followed by a single, very long dynamics segment (keywords CYCLE_DYN_MIN and CYCLE_DYN_MIN should both be set to NRSTEPS1). Alternatively, two separate calculations can be run with the Monte Carlo one being restarted as a gradientbased one with the help of keywords RESTART and RST_MC2MD. Because of the large forces, CAMPARI will exit during a randomization encountering an unresolvable clash for a pure gradientbased calculation (including minimization) unless UNSAFE is set to 1. The definition of "clash" is provided primarily by keyword RANDOMTHRESH.Any possible randomizations proceed according to the following threestep hierarchy, but not all steps are performed depending on the choice for RANDOMIZE.
 If a structural input file was read and parts but not all coordinates of one or
more complex molecules were read, CAMPARI may generate random conformations for sidechains (i.e. short branches
off a main chain confined entirely to a single residue, which excludes crosslinks). This happens if one or more relevant
heavy atoms in the side chain are missing from structural input. The step is desirable because sidechains constructed
using default dihedral angles are likely to create local but significant clashes.
The interactions the sidechain in question is subjected to are evaluated with respect to all residues read at least in
part from structural input and include an excluded volume bias (if either
SC_IPP or SC_WCA is turned on),
all enabled bonded potentials (SC_BONDED_B, etc.), and any
possible boundary potentials.
Sidechain resampling treats every sidechain independently in the order they appear in the sequence by a Monte Carlo
minimization procedure. Importantly, some natively frozen degrees in supported residues (such as the outofplane torsion χ_{5}
in arginine), can be moved during this procedure. CAMPARI will print, in the summary of the calculation, some information
as to how many residues the procedure was applied to (if nonzero).
At the end of the first stage, if performed (RANDOMIZE is 1 or 2), structural input has been augmented by missing side chains, and the resultant conformations, aside from missing parts, should be free of major clashes that are not already present in the input pdb file. If this stage was skipped (RANDOMIZE is 0 or 3), missing side chains are in default conformations, and clashes are extremely likely.  If a structural input file was read and parts but not all coordinates of one or more complex molecules were read, CAMPARI may generate random conformations for any missing tails. This is fully supported only in conjunction with option 3 for PDB_READMODE because otherwise at most a single Cterminal tail in the last processed molecule is treated this way. Missing chaininternal residues are a separate problem and not dealt with. The tails are built in a systematic and hierarchical Monte Carlo minimization procedure starting from the residue closest to the part that was read in and proceeding towards the respective terminus. Different tails are processed in the order that they appear in in the sequence. During randomization, every tail interacts at most with those atoms read in from file and those having already been placed as part of tails occurring before in the sequence. The interactions are an excluded volume bias (if either SC_IPP or SC_WCA is turned on), all enabled bonded potentials (SC_BONDED_B, etc.), and any possible boundary potentials. If the tails contain residues participating in intramolecular crosslinks, these crosslinks will at best be satisfied approximately. During the randomization procedure, they are whenever possible implemented in the same way as during the final simulation, i.e., by means of bonded potentials (which are thus required for obtaining a meaningful result). This is the case if a crosslink exists entirely within the same tail or if it links a tail to part of the coordinates having been read in. The situation is more complicated if a crosslink links tails in different molecules or two different tails in the same molecule. In these cases, results may be entirely unsatisfactory because all the burden of achieving a "closable" conformation is deferred to the tail occurring later in the sequence whereas the tail occurring earlier ignores crosslink constraints entirely. At the end of the second stage, if performed (RANDOMIZE is 1 or 2), all molecules read in at least partially from structural input should be in a state that is defined by the input or built randomly in a way that is approximately free of intramolecular clashes. If RANDOMIZE is 1, and there is more than one such molecule, they should additionally be free of intermolecular clashes. If the second stage is not performed but eligible tails exist, they are simply constructed in default polymer conformations and disregarding any clashes or crosslinks. This is unlikely to be useful and will almost certainly cause an error unless the simulation starts with a sufficient number of or uses only Monte Carlo steps.
 The third stage loops over all molecules and deals, for each molecule in this order, with internal
followed by external degrees of freedom. The internal conformation of molecules for which no structural input
was provided is constructed randomly unless RANDOMIZE is 0 or 3 or unless the molecule has no relevant
internal degrees of freedom. This random conformation is constructed as follows. For each residue, CAMPARI reserves
RANDOMATTS total attempts per residue and applies a threshold
penalty of RANDOMTHRESH kcal/mol. This penalty corresponds to
the required mean interaction energy per relevant (i.e., included by the shortrange cutoff)
residue pair. The relevant energy terms are a possible excluded volume bias and any
torsional potentials but do not include boundary potentials
or other bonded potentials. Energies are evaluated for residue pairs involving the current
residue and all residues further toward the Nterminus of the stretch (already processed) and the single
residue immediately following in the stretch (not yet processed). If the sum of these energies plus the difference
in any torsional potential energies from the initial state is less than
the threshold, the algorithm proceeds to the next residue.
Thus, the excluded volume contribution is evaluated as its absolute value, which means that the threshold will have to depend
on the particular choice of LennardJones parameters (→ PARAMETERS).
If the threshold criterion is passed, the calculation proceeds to the next residue.
This randomization occurs in three hierarchical phases (1/3 each of the total attempts per residue). In the first, only freely rotatable backbone angles (excluding all pucker and ωangles) are considered, e.g., the φ/ψangles of polypeptides, or any backbonelike angles in unsupported residues. In the second stage, rotatable sidechain angles (excluding those in native CAMPARI residues that are frozen by default) of the current residue are added to the set as well. In the third stage, all aforementioned degrees of freedom for the residue immediately prior in the sequence are added. It is obvious that even with all 3 stages triggered, a stretch may be "stuck", e.g. fold back onto itself, thus requiring a completely new solution. Resolving such situations is not supported as this would lead to an uncontrollable runtime. Instead, the energetically most favorable conformation of the sampled ones is picked and a warning or error (depending on keyword DYNAMICS) is produced.
For molecules free of internal crosslinks, the stretch considered is the entire molecule. If there are internal crosslinks, the molecule is divided hierarchically into stretches. Stretches under no constraint are processed sequentially (from N to Cterminus). Crosslinkconstrained stretches are parsed, and CAMPARI tries to find a hierarchical order starting with the "innermost" stretches in the hope of arriving at a solution that is both clashfree and satisfies all intramolecular crosslinks exactly. Priority is given to crosslinks over clashes because it is very easy to arrive at structures that are more or less clashfree and have a dramatically perturbed crosslink geometry that cannot relax properly without a complete reorganization of the molecule. This procedure can be very slow because RANDOMATTS" attempts per residue are used to construct a potentially closable stretch conformation (using an empirical bias in addition to the aforementioned potentials) followed by an energetic evaluation of all identified loop closures. If the solutions have too many clashes or none are found, the entire cycle is repeated up to RANDOMATTS times. In the case of coupled crosslinks (for example, nested or staggered), it remains likely that the hierarchical procedure encounters a dead end and exits with a warning or error as before.
If successful, at the end of this step, the molecule processed should be in one of four possible states: In a newly generated, random, and clashfree conformation that satisfies any intramolecular crosslinks exactly (RANDOMIZE is 1 or 2 and no structural input was provided).
 In a conformation partially or fully supplied by structural input with any tails randomized (RANDOMIZE is 1 or 2 and structural input was provided, see above).
 In a conformation partially or fully supplied by structural input with any tails in default conformations (RANDOMIZE is 0 or 3 and structural input was provided, see above).
 In its default conformation, which is generally not clashfree, with all (if any) intramolecular crosslinks broken (RANDOMIZE is 0 or 3 and no structural input was provided).
If the molecule is not connected to another molecule occurring earlier in the sequence by an intermolecular crosslink, the procedure is simple. There is only a single phase with the same number of total attempts (now per molecule). Energies are evaluated in pairwise fashion for all molecules occurring prior in sequence input vs. the current molecule. As before, the computed energy is taken as the mean interaction energy per relevant (i.e., included by the shortrange cutoff) residue pair. The relevant energy terms are a possible excluded volume bias, all enabled bonded potentials (although they matter only for intermolecular crosslinks), and any boundary potentials, all of which are taken as absolute values. The step ends as soon as the computed mean interaction energy is below the specified threshold.
Conversely, a molecule bound by an intermolecular crosslink to a molecule earlier in the sequence may not be placed freely. CAMPARI analyzes intermolecular crosslinks to determine a hierarchy that prioritizes molecules with more crosslinks (these should ideally be placed as early as possible in the sequence). While the higherpriority molecule is placed freely, a lower priority molecules instead satisfies the crosslink exactly, and the molecule is placed with the only randomization coming from the central 3 dihedral angles of the actual crosslink. This may require deferring this step until the higher priority molecule has been placed. Depending on the conformations of the involved molecules determined previously, this can easily lead to an unresolvable clash, which, as before, will be reported as a warning or error. Notably, there is no mechanism in place to displace molecules that have already been placed in previous iterations of the loop. This means that it is advantageous to place molecules joined by an intermolecular crosslink directly adjacent in the sequence file. If a molecule is crosslinked to more than one molecule occurring earlier in the sequence, a warning is produced and at most one of these crosslinks is respected in the generated conformation. Intermolecular crosslinks will at best be satisfied approximately in such a case (the same is true if only the position of a lower priority molecule in a crosslinked pair is determined by structural input).
At the end of the second step of the third stage, the molecule should be placed randomly in the simulation container without clashing with any molecules occurring earlier in the sequence. It should also satisfy intermolecular crosslinks to at most one molecule occurring earlier in the sequence. Note that even if PDB_READMODE is 3, it is not possible for a molecule requiring random placement to precede, in sequence input, a molecule placed based on structural input. The only molecules not placed randomly at the end of this stage are those read from structural input if RANDOMIZE is 0 or 3. To continue, the second stage proceeds to the next molecule until there are no unprocessed molecules left. The placement of molecules is very unlikely to be clashfree if the density is high (for example, liquid water).
 Minimal randomization is performed. It is the same as option 1 below if a structural input file is given that provides coordinates for all parts of the system. It is the same as option 3 if no structural input file whatsoever is provided. With this option, any missing tails in molecules with partial input are built in default conformations. No intramolecular crosslinks missing from structural input are satisfied initially. All molecules missing from structural input are built in default conformations and placed randomly in the box. Intermolecular crosslinks are satisfied as possible (see above).
 Supplementary randomization is performed, which is the default. This option is the same as option 0 if a structural input file is given that provides coordinates for all parts of the system. It is the same as option 2 if no structural input file whatsoever is provided. With this option, any missing tails in molecules with partial input are built in random, clashfree conformations that satisfy any crosslinks at best approximately. All molecules missing entirely from structural input are built in random conformations that satisfy intramolecular crosslinks exactly, and are placed randomly in the box. Intermolecular crosslinks are satisfied as possible (see above).
 This is the same as option 1 above only that the rigidbody coordinates are randomized even for those molecules read fully or at least partially from the structural input file. This option is the same as option 3 if a structural input file is given that provides coordinates for all parts of the system. This option can be useful for generating random starting structures for studies of the assembly of a protein complex from rigid components.
 This is the same as option 0 above only that the rigidbody coordinates are randomized even for those molecules read fully or at least partially from the structural input file. This option is the same as option 2 if a structural input file is given that provides coordinates for all parts of the system.
In general, the importance of initial structure randomization lies in avoiding initial structure biases that may be difficult to detect. Alternative procedures found in the literature often use simple reference states (fully extended chains) or results from hightemperature runs of an experimentally determined structure. With these approaches, it is quite difficult or even impossible to rigorously assert that the final results are not subtly influenced by the choice of starting conformation(s). Conversely, the random structures generated by CAMPARI are usually so independent from each other that the convergence of results is a good indicator of statistical error at the level of the chosen analysis. This is not to say that they are not biased by the simplified Hamiltonian used to construct them, which they of course are. Intramolecular crosslinks in particular generate constraints that can restrict the available space down to just a few "clusters" of solutions, and CAMPARI's hierarchical procedure may well pick with a strong bias from this set. Disregarding crosslinks, the excluded volumecentric Hamiltonian used in randomization will differ from the one used in the actual production simulations. This in itself is a bias of course. In most cases, the production Hamiltonian differs dramatically, in particular since it will generally contain net attractive potentials. This means that the beginning of a simulation corresponds to a quench/relaxation scenario, similar to what is seen experimentally in temperaturejump experiments or computationally in methods like simulated annealing. This instantaneous quench, broadly speaking, leads to the sampler taking the system to configurations, which are (more) compliant with the production Hamiltonian and most easily accessible from the starting configuration. Here, "most easily accessible" depends critically on the chosen sampler. Thus, memory can be retained and errors can be masked if the starting structures are not completely independent. It should be kept in mind that it is, outside of trivial cases, never possible to know a priori the subspace of configurations relevant to the system under the chosen conditions as this is usually the question one tries to answer by means of simulations. Errors pertaining to a lack of exhaustiveness can therefore not be diagnosed or understood based on randomized starting structures. For this, a robustness across different samplers and simulation lengths
Note that in replica exchange or MPI averaging runs, all replicas will start from different conditions unless RANDOMSEED is given explicitly by the user.
RANDOMATTS
If any type of initial structure randomization is requested, this keyword sets the general number of maximum attempts in randomizing the permissible degrees of freedom for a single residue or molecule. Large numbers (> 10000) may produce unacceptably slow performance when trying to randomize a long, complex polymer and/or a dense fluid. Large numbers in conjunction with too small a threshold can also be counterproductive. This is because this scenario corresponds to a hierarchical minimization and thus may limit the search space for dependent elements of the hierarchy. This problem can be exacerbated for intramolecular constraints, in particular coupled ones.RANDOMTHRESH
If any type of initial structure randomization is requested, this keyword sets the universal energy threshold to be applied with respect to energetic penalties for excluded volume, boundary potentials (rigidbody only), and bonded terms (e.g., torsional potential energies). Roughly speaking, for every residue or molecule being processed, there will be a given number of interacting residues (depending on the shortrange cutoff). While the total energy is used to pick the best current solution, the threshold is evaluated against a mean interaction energy per residue pair, and this is the value specified by RANDOMTHRESH (in kcal/mol). Different terms contribute for different stages of randomization as described above. All these terms are pure penalty terms and cannot yield negative energies. Specifying small values for the threshold will generally yield lower starting energies because they make the procedure more minimizationlike. There is a caveat in that parts of the randomization procedure are hierarchical, i.e., the solvability of a subproblem may depend on the solution of previous problem. Since the algorithm has very limited capability to "go back," a wellminimized result for a particular task may actually prevent subsequent problems from being solved satisfactorily. It is thus recommended to keep the threshold large enough and the number of attempts small enough that solutions remain diverse.PDBFILE
This keyword provides the (base)name and location of a structural input file in pdb convention. This can either be a pdb trajectory (for analysis) or, more commonly, the intended (partial) starting conformation of the system. The two interpretation modes are switched based on keyword PDBANALYZE and have different requirements. General and specific formatting information for pdb files (which also apply to keyword PDB_TEMPLATE) are given in the corresponding input file documentation. The parsing of pdb files depends on a number of auxiliary keywords, specifically PDB_READMODE, PDB_HMODE, PDB_INPUTSTRING, PDB_TOLERANCE_A, PDB_TOLERANCE_B, PDB_R_CONV, and PDB_MPIMANY.If trajectory analysis mode is enabled, CAMPARI will interpret the input to this keyword either as a pdb trajectory file (using the MODEL/ENDMDL records) or as the first in a series of systematically numbered files (→ PDB_FORMAT). For the former, the MODEL / ENDMDL syntax is checked and has to be interpretable (the actual numbering on the MODEL line is ignored, however). For the latter, a systematic numbering scheme is inferred from the provided file name (based on plain numbers, or numbers with leading zeros). In this scenario, the first of such files should be provided; CAMPARI will then try to extract the numbering scheme and open NRSTEPS1 consecutive snapshots. Note that in this mode the filename must not contain any additional numeric characters (i.e., foo_001.pdb is permissible while ala7_001.pdb is not). To choose between singlefile and multiplefile formats, keyword PDB_FORMAT is used. If the set of numbered files or the trajectory do not provide enough snapshots to satisfy the selected value for NRSTEPS, CAMPARI will either terminate with an error (if any MPI parallel execution mode is used) or dynamically adjust the run length (if serial or OpenMPonly code is used). The latter can be confusing and may produce nonsensical output from builtin analysis routines (if the run is shortened enough to effectively disable an analysis that would have been enabled given NRSTEPS, it is not guaranteed that all output from this analysis is correctly suppressed). Since the point here is to analyze a given trajectory, CAMPARI expects the specified sequence input to match exactly what is present in the pdb input file. This does not necessarily require that all atoms in every residue are read successfully, but it does require that all residues are found. The use of pdb files with atoms that were not read successfully or missing is of course quite confusing depending on the types of analyses to be performed. This is because the positions of these atoms will be reconstructed, i.e., some of the coordinates entering analysis may be derived, arbitrary, or, in the worst case scenario, numerically illdefined. While possibly tolerated by CAMPARI, it can be extremely confusing to supply a pdb trajectory (or set of files) that do not all contain the exact same set of atoms with the exact same names. The total number of rebuilt atoms will be reported at the end of the run to log output. Mismatches between CAMPARI's representation of a residue and what is present in the pdb file may be circumvented with keyword UAMODEL and can always be masked by renaming them to be recognized as unsupported residues as demonstrated in Tutorial 10.
The more common use of this keyword is for CAMPARI to attempt to read an external file to construct an initial nonrandom conformation for the system. Depending on the setting for RANDOMIZE, only some of the information may be used. Naturally, the system (sequence) in the pdb file has to be at least partially consistent with the choices made via SEQFILE. Note that parallel runs can use multiple input structures (→ PDB_MPIMANY). In particular, CAMPARI will not reorder atoms or residue blocks in the pdb except for very specific exceptions. In a box with a protein, solvent, and ions, it is therefore necessary that the order of the components in sequence input is the same as in pdb input. If not, most of the system information will be discarded. The input file can be processed with varying degrees of leeway and two different paradigms (both depend on the choice for PDB_READMODE).
Note that it is not possible to directly start a simulation from a structure provided in a binary trajectory file format. In this case, however, CAMPARI can be used to extract a suitable pdb file from the trajectory with the help of keywords PDBANALYZE, XYZPDB, XYZOUT, XYZMODE, and  for example  DCDFILE.
Lastly it is important to mention that PDBFILE provides some functionality that is overlapping with that provided by PDB_TEMPLATE. Specifically, runs containing residues not natively supported by CAMPARI require the topology of those moieties to be inferred from file. If an analysis run operates on a single pdb file, a trajectory file in pdb format or a series of pdb files, or if a simulation run is supposed to start from a specific structure supplied via PDBFILE, then PDBFILE can (but need not) serve the function of topology inference as described for PDB_TEMPLATE. Conversely, PDBFILE never replaces the function of PDB_TEMPLATE in the other contexts it is relevant in, viz., to provide a map from binary trajectory input file to CAMPARI, and to serve as a reference structure for alignment.
XTCFILE
This is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (xtc format) to analyze. Like all other xtcrelated options, this is only available if the code was in fact compiled and linked with XDR support (→ installation instructions). See PDB_TEMPLATE for instructions how to convert binary trajectory files with nonCAMPARI atom order. If the analysis run is parallel (→ REMC), an example is given elsewhere. Because binary trajectory files are not annotated, many of the above formatting options apply, at most, to the template. Specifically, keywords PDB_READMODE, PDB_HMODE, PDB_TOLERANCE_A, PDB_TOLERANCE_B, PDB_R_CONV, and PDB_NUCMODE are all irrelevant for the processing of the actual information in the xtc file whereas XYZ_FORCEBOX is respected. Should the data in the trajectory file be corrupted or exhausted before NRSTEPS snapshots have been read successfully, CAMPARI will either terminate with an error (if any MPI parallel execution mode is used) or dynamically adjust the run length (if serial or OpenMPonly code is used). Binary xtc files have a header section for each snapshot that specifies box coordinates, the number of atoms, and additional information. All of these except the number of atoms are ignored during readin.DCDFILE
Analogous to XTCFILE, this keyword is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (dcd format) to analyze. See PDB_TEMPLATE for instructions how to convert binary trajectory files with nonCAMPARI atom order. Binary dcd files have a single header section at the beginning of the file that specifies several control parameters including the number of atoms. All of these except the number of atoms are ignored during readin (as are the box coordinates, if present).NETCDFFILE
Analogous to XTCFILE, this keyword is only relevant if PDBANALYZE is true: It then specifies name and location of the trajectory (NetCDF format) to analyze. Like all other NetCDFrelated options, this is only available if the code was in fact compiled and linked with NetCDFsupport (→ installation instructions). See PDB_TEMPLATE for instructions how to convert binary trajectory files with nonCAMPARI atom order. Unlike xtc or dcd files, NetCDF files do not need to be parsed sequentially and are in general fully annotated. CAMPARI thus determines immediately whether NRSTEPS snapshots are present in the file. If not, CAMPARI will adjust this number to the available one. If any MPI parallel execution mode is used, this is the minimum across all MPI processes (which read different files). Binary NetCDF files encode a welldefined standard. For trajectory data of particle systems, CAMPARI resorts to the standard developed for the AMBER program suite described elsewhere both in writing and in reading. Because NetCDF files do not need to be processed sequentially, they offer an additional benefit of analyzing snapshots in a specific order that is not the same as the original trajectory (→ FRAMESFILE for details).FRAMESFILE
If PDBANALYZE is true, it is possible for CAMPARI to analyze a specific set of frames from the trajectory file (see PDB_FORMAT) rather than the entire trajectory. It is also possible to give every analyzed snapshot a sampling weight, and both functionalities are implemented by this keyword. Example applications are the extraction of structural clusters from a trajectory or the reweighting of biased simulations.Most input trajectories currently need to be processed sequentially (this applies to xtc, dcd, and pdb trajectory files, i.e., PDB_FORMAT is 1, 3, or 4). For these, the list of requested frames is sorted first, and duplicates are removed. This means that any newly written trajectory files (→ XYZOUT) will have exactly the same order of snapshots as the input. Conversely, the snapshots encoded in individual pdb files and NetCDF trajectory files (PDB_FORMAT is 2 or 5) can be accessed in arbitrary order. For these two settings, the frames file is processed "as is" unless there are floating point weights per snapshot or unless this is a parallel trajectory analysis run. Frames files processed "as is" have the advantage that they can arbitrarily reorder and duplicate individual simulation snapshots, which is relevant, for example, in the construction of synthetic trajectories.
It is important to note that the settings for NRSTEPS and EQUIL and all related frequency settings for analysis routines (see corresponding section) lose their straightforward interpretations if not all snapshots in the original trajectory are processed exactly once and in sequence. For the case of a processed frames file (sorted and free of duplicates), the analysis frequencies will still refer to the original, full trajectory file. This means that CAMPARI will read all frames sequentially and increment step counters accordingly. However, all the frames that are not part of the list are simply skipped. This implies that it is possible for a selection of 20 frames from a larger trajectory to fail to produce any output for polymeric quantities if POLCALC is set to 10, 5, or even 2 (simply on account of chance). It will therefore generally be easier to set such frequency flags to 1 if processed frame lists are used (this is the only setting that guarantees that the number of analyzed snapshots will be exactly proportional to the size of the list). Conversely, for a frames file used "as is," the unused frames are never read and no step counters are incremented. This means that the effective step becomes the processing of the frames file itself. Returning to the above example, a selection of 20 (possibly duplicated) frames from a larger trajectory will in this case always produce output for polymeric quantities if POLCALC is set to any value of 20 or less.
As mentioned above, the frames file allows the user to alter the type of averaging that is normally assumed for CAMPARI analysis functions. By default, each data point (trajectory snapshot) contributes the same weight to computed averages or histograms (distribution functions). This implied that the input trajectory conforms (was sampled from) the distribution and ensemble of interest. If, however, the input trajectory does not correspond a welldefined ensemble (or to a different one), it is common and possible to apply snapshotreweighting techniques based on analyses of system energies or coupled parameters using weighted histogram methods. The result is a set of weights for each snapshot, which allows simulation averages and distribution functions to conform to that target distribution and ensemble. As an example, one may combine all data from a replicaexchange run (that no longer conform to a canonical ensemble at a given temperature), use a technique such as TWHAM to derive a set of snapshot weights for a target temperature that was not part of the replicaexchange set, and then use this input file containing the weights to compute proper simulation averages at the target temperature.
The input file for this functionality is very simple and explained elsewhere. There are three important points of caution. First, floatingpoint weights imply that floatingpoint precision errors may occur. The implied summation of weights of very different sizes may then become inaccurate. CAMPARI provides a warning if it expects such errors to be large (based purely on the weights themselves). Second, snapshot weights do not influence the values reported for instantaneous output such as POLYMER.dat or for analyses that do not imply averaging (such as structural clustering). Third, reweighting techniques have associated errors that are difficult to predict. Simultaneous assessment of statistical errors via block averaging or similar techniques is therefore strongly recommended.
PDB_FORMAT
This simple keyword lets the user select the file format for a trajectory analysis run: CAMPARI expects a single trajectory file in pdbformat using the MODEL /ENDMDL syntax to denote the individual snapshots.
 CAMPARI expects to find multiple pdb files with one snapshot each that are systematically numbered starting from the file provided via PDBFILE.
 CAMPARI expects to find a single trajectory in binary xtcformat (GROMACS style).
 CAMPARI expects to find a single trajectory in binary dcdformat (CHARMM/NAMD style).
 CAMPARI expects to find a single trajectory in binary NetCDFformat (AMBER style). (reference)
PDB_READMODE
This keyword (integer) controls how the information in a supplied pdb file is meant to be used. (see keyword PDBFILE and input file documentation). A maximum of three options is available with the first one offering restricted support depending on the type of calculation: CAMPARI attempts to read in the Cartesian coordinates of heavy atoms from the pdb file, proceeds to extract the values for CAMPARI's "native" degrees of freedom (i.e., the unconstrained dihedral angles and the rigid body degrees of freedom in Monte Carlo or torsional molecular dynamics runs → CARTINT), and lastly rebuilds the entire structure using the determined values as well as internal geometry parameters for the constrained internal degrees of freedom (extracted from high resolution crystallographic databases). This hybrid approach will often lead to a propagation of error along the backbone of longer polymers and is therefore unsuitable for reading larger proteins or particularly for macromolecular complexes. While it is never a useful choice for structural input that contains complex molecules but does not exactly encode the same covalent geometry as what CAMPARI uses by default, it is of limited usefulness even when these conditions are met. Specifically, it should be used in conjunction with high precision PDB input (see PDB_INPUTSTRING and PDB_OUTPUTSTRING) for the remaining cases (essentially, CAMPARI runs in rigidbody/dihedral angle space not relying on any structural input). This input mode does not support the processing of unsupported residues (see PDB_TEMPLATE) and, upon discovery of unsupported residues in sequence input, will be changed automatically to option 3 (the default) below. There are further limitations to this mode. For example, it requires strictly that the first 3 atoms (in CAMPARI convention) are present for each molecule (unless there are less atoms in the molecule or it is a water, ammonium, or methane molecule), it does not recover (read) values for degrees of freedom in supported residues that are considered nonnative (e.g., hydrogen positions in methyl groups irrespective of values for TMD_UNKMODE and OTHERFREQ), and the read in is stopped as soon as there is any mismatch in sequence and structure inputs at the residue level (the remaining degrees of freedom missing from the pdb file are treated according to keyword RANDOMIZE). Overall, this option should be considered largely obsolete.
 CAMPARI attempts to read in the Cartesian coordinates of all
atoms from the pdb file and uses those explicitly
(i.e., it implicitly adopts the encoded geometry even for
degrees of freedom that are normally constrained within CAMPARI).
This will produce warnings if very unusual bond lengths or angles are
encountered (see PDB_TOLERANCE_A
and PDB_TOLERANCE_B), which
are most often indicative of missing atoms in the pdbfile (in
particular termini and hydrogens). Some of these problems will be dealt
with automatically, but it is always recommended
to check the file {basename}_START.pdb
to make sure that no drastic deviations occur. Such drastic deviations are almost inevitable
if backbone atoms are missing from polymer chains, and in these cases preprocessing of the
pdb file may be necessary. Conversely, if the input geometries are merely distorted
(experimental structures do not have arbitrary resolution or correctness),
the automatic rebuilding CAMPARI may perform should probably be
circumvented by increasing the thresholds
for PDB_TOLERANCE_B and
PDB_TOLERANCE_A.
Note that simulations with constraints cannot preserve exact values for the constrained degrees of freedom upon restarts of simulations from standard pdb files. If the sampler is in Cartesian space and constraints are used, keyword SHAKEFROM is a potential remedy. Conversely, simulations in rigid body or torsional space have no way of relaxing input geometries to the builtin (or any userdesired) values for bond lengths, angles, and rigid dihedral angles. In these cases, it may be useful (like for mode 1 above) to modify the assumed pdb format to improve precision (see PDB_INPUTSTRING and PDB_OUTPUTSTRING) and/or to rely on restart files whenever possible. The limitations of mode 2 are that atom names must be understood (automatic translation routines are inplace but not exhaustive) and that the readin stops as soon as there is any mismatch in sequence and structure inputs at the residue level (the remaining degrees of freedom missing from the pdb file are treated according to keyword RANDOMIZE).  This mode is identical to mode 2 above with the exception of how mismatches between pdb and sequence input are addressed. Here, CAMPARI will assume that all of the structural input is potentially relevant but that some parts of polymer chains may be missing, which is a common issue with experimental structures. It will thus try to match maximally long sequence stretches from individual molecules (in the order of appearance in the sequence) with the sequence in the pdb file. Readin stops as soon as structural input is exhausted or whenever an unresolvable mismatch occurs. An unresolvable mismatch occurs when, for a molecule present in sequence input, no information whatsoever can be found in the pdb or when the pdb file contains any residue that cannot be mapped to the current or next molecule. This mode enables the generation of initial structures with multiple C and Nterminal tails to polymers being rebuilt. The rebuilding is under the control of keyword RANDOMIZE. Note that the proper construction of Nterminal tails requires the first 3 mainchain atoms of the subsequent residue to be read in correctly from the input file. This option is the default option.
PDB_INPUTSTRING
This keyword allows changing the assumed PDB formatting string (Fortran) for PDB files. This is required to make CAMPARI be able to read in altered PDB files produced by the analogous keyword PDB_OUTPUTSTRING or by other software or scripts. The default is "a6,a5,1x,a4,a1,a3,1x,a1,i4,a1,3x,3f8.3" (with the quotes). Because Fortran in general deals poorly with stringbased I/O, any improper use of this keyword can easily lead to abnormal program termination. In the format string, the letters (a, i, f) give the type of variable, which must not change. The numbers give the fields lengths, and these can be customized for variables of type integer ("i") or real ("f"). It is also possible to modify the field widths of string variables ("a") but is not possible for extra content to be read, i.e., the resultant behavior is undefined. The only exception to this is the second variable (atom number), which is of the "wrong" type here because these values are ignored on input. This particular field width can be increased without harm. It is of course intended and required that the corresponding output string format uses an integer field here, by default "i5" instead of "a5".Common problems with standard PDB files, which can be addressed at least in part by the format string, are that the integer number for atom index overflows, that the chain indicator becomes fused to neighboring columns (because of overlong residue names or large residue numbers), that the residue number column overflows, that the coordinate entries get fused or overflow (if absolute coordinates are not centered at small (in absolute magnitude) values), or that the coordinate precision is insufficient for recovering exact covalent geometries based on this information alone.
PDB_HMODE
If structural input from a pdb file is requested in modes 2 or 3 (see PDB_READMODE and PDBFILE) or if a trajectory analysis run) is being performed, this keyword offers two choices for dealing with hydrogen atoms (which will often be missing from the pdb input file): CAMPARI will attempt to read in the Cartesian coordinates of all hydrogen atoms directly and only rebuild those hydrogen (and other) atoms which cause a geometry violation defined by keywords PDB_TOLERANCE_B and PDB_TOLERANCE_A.
 CAMPARI will rebuild all hydrogen atoms according to its underlying default models for local geometry in chemical building blocks. This is most useful if hydrogen atoms are missing entirely from the input file.
PDB_NUCMODE
For processing structural input, keyword PDB_NUCMODE explained below is ignored. It is listed here nonetheless to explain what CAMPARI actually does when reading in a pdb file supplied via PDBFILE or via PDB_TEMPLATE:If the input file is in CAMPARI convention, i.e., the O3* oxygen atom is part of the same residue as the phosphate it belongs to, the read in is consistent with internal convention. If, however, the input file is in pdb convention (also used by almost all other simulation software), i.e., the O3* oxygen atom is always part of the same residue as the sugar it belongs to, a heuristic is used to avoid an incorrect assignment. This heuristic relies on the geometry of the input structure being sane as it checks the bond distance to the appropriate phosphorous atom. For the heuristic to be successful, it is essential that the 4letter atom name for the phosphorous atoms is always " P ". In terminal residues, it is possible that two oxygen atoms appear, and in this case it is important that they have different names (" O3*" and "2O3*" in standard CAMPARI convention).
As long as atom names can be parsed (see also below), the user should therefore not have to worry about the placement convention used in pdb input files. This implies that it is possible to supply a binary trajectory file (for example via DCDFILE) written in the nonCAMPARI convention of assigning the O3*atom to the residue carrying the sugar it is attached to by the use of an appropriate template.
PDB_R_CONV
CAMPARI can in general process different conventions for the formatting of PDB files. A large fraction of simple atom naming convention multiplicities is handled automatically without the use of any keywords. PDB_R_CONV allows the user to select the format a readin pdbfile is assumed to be in to be able to deal with more severe discrepancies. Possible choices currently consist of: CAMPARI format (of course suitable for reading back in any CAMPARIgenerated output even if PDB_NUCMODE was used → see above).
 GROMOS format (nucleotide naming). This option offers very little unique functionality since most of the supported conversions are handled automatically regardless of the setting for this keyword. It is primarily used to handle the GROMOS residue names for nucleotides (ADE, DADE, and so on).
 CHARMM format (in particular atom naming, cap and nucleotide residue names and numbering (patches), ...). Note that there are two exceptions pertaining to Cterminal cap residues (NME and NH2) which arise due to nonunique naming in CHARMM: 1), NH2 atoms need to be called NT2 (instead of NT) and HT21, HT22 (instead of HT1, HT2), and 2), NME methyl hydrogens need to be called HAT1, HAT2, HAT3 (instead of HT1, HT2, HT3). For nucleotides, there is an additional exception to 5'residues carrying a 5'terminal phosphate (the hydrogen in the terminal POH unit needs to be called "5HO*" instead of " H5T"). This is again due to nonunique naming conventions within CHARMM.
 AMBER format (atom and residue naming in particular for nucleotides). Note that this option is the least tested one. Please let the developers know of any additional issues you may encounter.
XYZ_FORCEBOX
This keyword is a combined input/output keyword and explained below. It can be used to process structural input with molecules that are broken up for periodic systems.PDB_TOLERANCE_A
This setting allows the user to override CAMPARI's builtin defaults for the tolerances it applies on a readin structure (usually xyz from pdb). Since it is not always easy to distinguish distorted structures from missed input, the code applies a tolerance when comparing readin bond angles to the internal reference value (which is derived from crystallographic databases). The default is an interval to either side of 20.0° and this setting can be expanded or contracted using this keyword. If a violation is found, the code usually overrides the faulty value with the default since it assumes that atomic positions were missing. This can sometimes lead to unwanted effects which can be avoided by setting this to a large number.PDB_TOLERANCE_B
This is analogous to PDB_TOLERANCE_A, but allows the user to change the interval for considering bond length exceptions. The difference here is that two numbers are required: a lower fractional (down to 0.0) and an upper fractional number (preferably larger than 1.0 of course). This is because bond lengths ranges are inherently not normalized and in addition nonlinear (exceptions with too long bond lengths are much more frequent). The default is an interval between 80% and 125% of the crystallographic reference value (settings 0.8 and 1.25).PDB_TEMPLATE
This keyword allows the user to provide name and location of a pdb file that serves in possibly several auxiliary functions. A template pdb file is relevant in the following circumstances: In a trajectory analysis run, it can serve as a map to correct a mismatch in atom ordering between a binary trajectory file (dcd, xtc, NetCDF) and CAMPARI's intrinsic convention. Typically, a pdb file provided by the program having generated the binary file will serve this purpose. In order for the map to work, it is crucial to ensure that every single atom to be read in has a proper match (by atom name) in the pdb file, i.e., it is not tolerable to provide a pdb template with missing atoms or with atom names that CAMPARI cannot parse. In general, CAMPARI's pdb parser is relatively flexible and allows additional control via PDB_R_CONV. It is typically not possible, however, to correct mismatches in the grouping of atoms into residues (with the exception of the processing coordinates for nucleotides, see PDB_NUCMODE). This is because CAMPARI treats a change in residue number on consecutive coordinate records as the signal for delineating entries by residue. Conversely, the absolute numbers are irrelevant.
 The template pdb file can simultaneously serve as a reference structure if alignment is requested in trajectory analysis runs (→ ALIGNCALC). This has the same requirements as the previous function meaning that it is not trivially possible to align trajectories using an incomplete or different reference structure. However, alignment to a reference structure is a functionality offered by almost every molecular visualization program.
 In all types of runs, the template pdb file can be used to infer the topology of
residues not natively supported by CAMPARI. This is crucial for handling these systems.
Importantly, using the template for this purpose decouples the topology determination
from structural input for simulation runs, which allows
initial randomization (possibly partial) of systems containing such unsupported residues.
The content of the template should contain each unique unsupported residue in the same order
that they appear in in the sequence file. For unsupported residues that are
part of a polymer chain, each occurrence in the sequence must have its own entry also containing the
immediate N and Cterminal sequence context in the polymer chain. This is unless consecutive unsupported
residues are part of the same polymer chain (sequence context is mutually shared). If the unsupported residue is a small molecule
(single residue), the template should contain a single instance.
Beyond this, there are precise requirements
for both input files. They are listed in the corresponding documentation for both the
sequence file and the pdb file.
Assuming both files to be properly formatted, CAMPARI then does the following:
 From the sequence file, the number of unknown residues and their intended linkages are extracted.
 The template is read and the atomic indices delimiting all unknown residues are extracted. Singleresidue molecules that are unsupported and occur repeatedly in the sequence can (but need not necessarily) reuse the same indices as the first instance of this type (matched by residue name in the pdb file). The procedure will try to match sequence input with the data in the template while allowing for gaps in both: i) supported residue present in sequence input and not required as context for unsupported polymer residues are skipped; ii) supported residues present in the template are ignored; iii) unsupported singleresidue molecules occurring a second or more times in sequence input are skipped if (and only if) a different unsupported residue is found next in the template. The last condition means that the second occurrence of an unsupported singleresidue molecule may or may not be used (depending on sequence input). After this, basic parameters such as the effective residue radius and the reference atom are inferred. It is therefore important that the conformation of the residue in the pdb file is somewhat representative.
 The remainder of the system topology is constructed. The internal order of atoms for unsupported residues always reflects the order in the input pdb file exactly. The pdb template file is parsed again to ensure that the required sequence context is present. This applies only to those unsupported residues that are part of a chain (polymer).
 From the pdb atom names, the chemical element is guessed (C, O, N, H, P, S, halogens, various metals, and metalloids) and the mass is set to that of an appropriate atom type in the parameter file (identification by attempts to match mass and valence). The assignment will be poor if the parameter file does not support the chemical element in question. Further details are found elsewhere. This can later be overridden by a biotype patch and/or a combination of other patches.
 A new biotype is created for every new atom type encountered. This biotype is initialized to be empty with the exception of keeping the atom name and the (already) assigned atom type. The numbering of these new biotypes continues from what the highest number in the parameter file is. It is therefore not possible to use the parameter file for these assigned biotypes directly. Instead, it is recommended to use a biotype patch or specialized patches. The assignment of an atom type is sufficient to provide basic support, so for certain applications no patches may be required. For residues duplicated in sequence relative to the template, this and all subsequent information is simply copied from the first (and usually only) instance. For valid unsupported residues with identical names, which occur multiple times in the sequence, only the type (but not the geometry) information is copied from the reference instance.
 The covalent bond information is used to infer the molecular topology (including a detection of rings). This defines the Zmatrix entries (internal coordinate representation) for unsupported residues. Similarly, the linkage to covalently bound residues that are either supported or also unsupported is inferred. In the process, rotatable dihedral angles are detected automatically. This procedure, which explicitly tests for bond angle or length variations upon rotation, is critical to most subsequent assignments.
 Given a set of pdb names, atom types, valences, and a topology, CAMPARI attempts to conclude by analogy whether the residue conforms to the backbone of one of the supported polymer types (currently, polypeptides and polynucleotides). If it does, as many internal pointers as possible are set to identify the residue accordingly (this does not work for singleresidue molecules).
 If a residue is recognized as being part of a supported polymer type, the topology itself is corrected (the goal is that it should make no difference to CAMPARI whether a residue is supported or whether it is masked (by changing the name) as an unsupported one and all the information has to be inferred from the input structure). Further corrections pertain to the setup of interactions, etc. Note that the match cannot always be perfect, e.g., fudge factors that are not zero or unity in conjunction with MODE_14 being 2 and INTERMODEL being 1 may lead to energetic inconsistencies. The interaction setup relies on determining local rigidity via its knowledge of which dihedral angles are rotatable. Due to codespecific reasons (scanning for shortrange exceptions, exclusions, etc), it is highly recommended to parse the chain into residues such that any pair of atoms in residues i and i+2 is separated by a least four rotatable bonds.
 All flexible dihedral angles may be made part of basic sampling routines if the simulation is in internal coordinate space. These are the torsional dynamics sampler (→ TMD_UNKMODE for details) and the basic Monte Carlo moves for degrees of freedom of this type (→ OTHERUNKFREQ). Furthermore, access will be granted to the specialized samplers if the residue is detected as eligible. This, however, may sometimes lead to an altered interpretation of the absolute values of certain dihedral angles or even alter details of the sampler slightly, e.g., the pucker sampling of prolinelike, unsupported residues may end up perturbing different sets of auxiliary bond angles.
 If analyses are requested, these routines will respond to the unsupported residue according to the values set in the previous steps. Basically, the better the match to natively supported entities is, the more analysis functionalities will be available. Straightforward cases depend only on Cartesian coordinates (e.g., RHCALC or CONTACTCALC), whereas polymer typespecific analyses (e.g., DSSPCALC) require an unsupported residue to be recognized as the corresponding polymer type. Care must be taken in mixed polymers or other exotic cases, and it may occasionally be necessary to disable certain analysis routines.
PDB_MPIMANY
For MPI parallel (multicopy) runs, this logical keyword (1 means "on") allows the user to provide different starting structures via pdb files for different replicas (copies). The keyword is irrelevant in parallel trajectory analysis mode where this is the required and automatic behavior.CAMPARI used to restrict the use of this keyword to certain classes of calculations, but this is no longer the case. There are some risks associated with PDB_MPIMANY as follows: In internal coordinate space, the default accuracy of pdb files is too low to ensure that the covalent geometries across multiple input structures are sufficiently similar even when they were exactly the same in the underlying full precision coordinates. The distortions in covalent geometries mean that the simulated systems are no longer exactly the same, which is undesirable in cases where this is implied because the copies are coupled, e.g., in replica exchange or PIGS calculations. The magnitude of this effect can be diagnosed, for instance, by analyzing these geometries (→ INTCALC directly or by comparing shortrange energy terms, in particular bonded ones (such as bond angle potentials). The effect can be circumvented with the help of keyword PDB_INPUTSTRING, which allows redefining the pdb format to be highprecision. This is the most obvious solution but only available if pdb files can be generated from higherprecision coordinates to begin with. In Cartesian space simulations, geometries usually relax to their force field minima unless holonomic constraints are in use (in which case keyword SHAKEFROM can be used to circumvent precision issues with PDB_MPIMANY).
If this option is active, CAMPARI expects to find systematically named pdb files with the base name given via keyword PDBFILE. The naming is analogous to the convention CAMPARI uses for outputs of parallel runs and also identical to what parallel trajectory analysis runs require. It is explained elsewhere. A list of keywords specific to running CAMPARI in parallel is found found below.
Energy Terms:
(back to top)
Preamble (this is not a keyword)
There are various classes of energy terms. They include the core nonbonded energy terms (SC_IPP, SC_ATTLJ, SC_POLAR, SC_IMPSOLV, SC_TABUL, SC_WCA, and ghosted variants thereof), which typically use a truncation scheme involving neighbor lists. When CAMPARI's shared memory (OpenMP) parallelization is in use, these interactions are calculated in parallel using a detailed scheme operating at the neighbor lists level to achieve load balance. The second class of energy terms are those that are not pairwise, require no cutoff scheme, and are truly "local". Some of these, such as bonded potentials, e.g., bonded terms are generally split by residue across threads without difficulty, i.e., there are sums of independent terms. The final class of energy terms are various restraint (bias) terms requiring a synthesis of information across many residues, e.g., spatial density restraints. For these, the documentation below lists explicitly the parallelization support in each case.HSSCALE
This keyword controls a generic scaling factor for size parameters (Lennard Jones σ_{ii} and σ_{ij}) that were read in from the parameter file. This fundamentally alters the excluded volume properties of the system. Motivation for using this keyword (which naturally defaults to 1.0) may arise during parameter development or in specialized calculations.SC_IPP
This keyword allows the user to specify the linear scaling factor controlling the strength of the inverse power potential (IPP) defined as:E_{IPP} = c_{IPP}·4.0ΣΣ_{i,j}ε_{ij}f_{14,ij}·(σ_{ij}/r_{ij})^{t}
Here, the σ_{ij} and ε_{ij} are the size and interaction parameters for atom pair i,j, f_{14,ij} are potential 14 fudge factors (see FUDGE_ST_14) that generally will be unity, r_{ij} is the interatomic distance, t is the exponent, and the (double) sum runs over all interacting pairs of atoms. Lastly, c_{IPP} is the linear scaling factor controlled by this keyword which  unlike most other scaling factors for energy terms  defaults to 1.0. In most applications, the inverse power potential will be the repulsive arm of the LennardJones potential (t = 12 → 12^{th} power, see IPPEXP). The interpretation and application of the provided parameters (see documentation and keyword PARAMETERS) can be controlled through keywords SIGRULE, EPSRULE, INTERMODEL, FUDGE_ST_14, and MODE_14. Note that the use of the WeeksChandlerAndersen (WCA) potential (→ SC_WCA) is mutually exclusive with inverse power potentials.
IPPEXP
This keyword allows the user to adjust the exponent (an even integer that defaults to 12) for the inverse power potential. An important restriction is that many of the optimized loops in dynamics calculations do not support any other choice except 12. Note that very large numbers will of course  possibly in compilerdependent fashion  slow down code execution due to the increasing complexity of expensive operations in innermost loops. By (formally) setting this to a value greater than 100, CAMPARI is instructed to replace the IPP potential with a hardsphere (HS) potential, which is only available in pure Monte Carlo runs. In this case the scaling factor is ignored, the "infinity"value (penalty for nuclear fusion) is determined by the setting for BARRIER, and the use of a size reduction factor (HSSCALE) is strongly recommended. In hardsphere potentials, any energy readout for the IPP term should now be in multiples of BARRIER, and all persisting nonzero values would indicate a frustrated (nonrelaxable) system. The actual value specified for IPPEXP is then irrelevant.SIGRULE
This keyword defines the combination rule for combining the size parameters of LennardJones (and WCA) potentials, i.e., how to construct σ_{ij} from σ_{ii} and σ_{jj} if σ_{ij} is not provided as a specific override in the parameter file (for details see PARAMETERS).The choices are:
1) σ_{ij} = 0.5·(σ_{ii} + σ_{jj}) (arithmetic mean)
2) σ_{ij} = (σ_{ii} · σ_{jj})^{0.5} (geometric mean)
3) σ_{ij} = 2.0·(σ_{ii}^{1} + σ_{jj}^{1})^{1} (harmonic mean)
EPSRULE
Analogous to SIGRULE, this keyword defines the combination rule for interaction parameters of LennardJones potentials. The same options are available and the same caveats apply with respect to overrides in the parameter file.SC_ATTLJ
This keyword allows the user to specify the linear scaling factor controlling the strength of the dispersive (van der Waals) interactions defined as:E_{ATTLJ} = c_{ATTLJ}·4.0ΣΣ_{i,j}ε_{ij}f_{14,ij}·(σ_{ij}/r_{ij})^{6}
Here, the σ_{ij} and ε_{ij} are the size and interaction parameters for atom pair i,j, f_{14,ij} are potential 14 fudge factors (see FUDGE_ST_14) that generally will be unity, r_{ij} is the interatomic distance, and the (double) sum runs over all interacting pairs of atoms. Together with an inverse power potential with scaling factor 1.0 and exponent 12 (see SC_IPP), the canonical LennardJones potential is constructed if the scaling factor, c_{ATTLJ}, is set to unity.
INTERMODEL
This very important keyword controls the exclusion rules for shortrange interactions of the excluded volume and dispersion types (see SC_IPP, SC_ATTLJ, and SC_WCA). For Monte Carlo or torsional dynamics simulations assuming rigid geometries, the computation of spurious (constant) LJ interactions is inefficient. Conversely, in Cartesian sampling, bonded interactions are almost always parametrized with all 14, and certainly with all 15interactions in place. The latter refer to intramolecular atom pairs separate by either exactly three (14) or four (15) bonds. The ABSINTH implicit solvation model, which is one of the core features of CAMPARI, was parametrized with a reduced interaction model. Hence, this keyword allows two choices: Consider only interactions which are not rigorously or effectively frozen when using internal coordinate space sampling. This setting for example excludes all interactions within aromatic rings. As for determining 14interactions, the rules outlined under MODE_14 apply.
 Consider all interactions separated by at least three bonds to be valid. This is the default setting for molecular mechanics force fields. Note, however, that many of these interactions are quasirigid and that their computation is somewhat nonsensical even in a full Cartesian description. Also note that due to the inherent assumption that every bond is rotatable the setting for MODE_14 does not matter if INTERMODEL is set to 2. All atoms separated by exactly three bonds will be considered 14. It is important to point out that the setting chosen for INTERMODEL affects the setting for ELECMODEL as well (see ELECMODEL).
 The GROMOS force field uses a very specific set of nonbonded exclusions which is supported by choosing this option for INTERMODEL. It is essentially a weakened version of the first (sane) option. Note that to reproduce the GROMOS force field exactly, ELECMODEL (which remains an independent setting) has to be set to 2 and INTERMODEL to 3.
LJPATCHFILE
This keyword can be used to provide the location and name of an input file that allows reassigning the size exclusion and dispersion parameters used in describing generic shortrange potentials of the LennardJones (see SC_ATTLJ and SC_IPP) or WCA types. The parameter file that CAMPARI parses will contain atom entries that specify general atom types. These types have associated with them entries of the contact and epsilon types specifying the LennardJones σ_{ij} and ε_{ij} parameters (see equations provided with scale factor keywords). Within the list of biotypes, each biotype is assigned an atom type, and the patch functionality described here allows the user to change this to a different atom type for a specific instance of a biotype. Note that the reassignment is restricted to the LennardJones parameters, but excludes other atomic parameters specified by atom types such as mass, proton number, description, or valence. Conversely, parameters derived from LennardJones parameters are altered. This is particularly important for the derived atomic radii and volumes used in the continuum solvation model and analysis. If those parameters are meant to be left unchanged or set to yet another set of values, either the radius facility of the parameter file must be employed (if it is not already in use for the original atom type in question), or a patch of atomic radii must be applied in addition. Because size exclusion and dispersion parameters rely on combination rules and/or many overrides for special cases, it can be tedious to patch them. This is because a patch will often require the user to define a new atom type, which, for example, for the GROMOS force fields can be a lot of work. Some more details are given elsewhere.SC_EXTRA
This (somewhat obsolete) keyword specifies a linear scaling factor for certain structural correction potentials. Assuming the typical set of torsional space constraints (see CARTINT), these are applied to rotatable bonds with electronic effects which cannot be captured by atomic pairwise contributions. These consist of: Secondary amides: The rotation around the CN bond is hindered due to the partial doublebond character present in amides. Corrections are therefore applied to residues which have an ωangle (all nonNterminal peptide residues excluding NH2 as well as the secondary amides NMF and NMA → sequence input). These keep the peptide bond roughly planar while allowing for cis/transisomerization and increased overall flexibility. The potentials are directly ported from OPLSAA.
 Phenolic polar hydrogens: The rotation around the CO bond in phenols is hindered due to its partial double bond character and inplane arrangements of the attached hydrogen are favored. Corrections are applied to tyrosine (TYR) and pcresol (PCR). These keep the polar hydrogen in their favored position. The potential is not overly stiff so that outofplane arrangements will be populated as well. The parameters are again ported directly from OPLSAA.
SC_BONDED_B
This keyword gives the linear scaling factor for all bond length potentials. Their usage is permissible in all simulations but not meaningful unless bond lengths are actually allowed to vary, i.e., typically unless sampling happens fully in Cartesian degrees of freedom (see CARTINT). It is important to remember, however, that even in rigidbody / torsional space simulations, specific move types and systems will require setting this to unity (so we recommend it throughout). For bond length potentials, the only such exceptions are crosslinked molecules (see CRLK_MODE). Note that the parameter file has to provide support to be able to use this energy term (see PARAMETERS for details), and that simulations relying on those terms will otherwise fail, crash, or produce nonsensical results. Use GUESS_BONDED to circumvent those issues for incomplete parameter files.SC_BONDED_A
Similar to SC_BONDED_B for all bond angle potentials. Bond angle potentials (see PARAMETERS for details) matter for sampling in Cartesian space (see CARTINT), for crosslinked molecules (see CRLK_MODE), and for the sampling of fivemembered, flexible rings (see PKRFREQ and SUGARFREQ). The coordinate derivatives for bond angles diverge at the extreme values of both 0° and 180°. This means that care must be taken in setting up the Zmatrix such that no terms are included, which would explicitly demand these values. In other software, this is sometimes overcome by the use of dummy atoms. In CAMPARI, this is unlikely to be problematic in Monte Carlo simulations. In dynamics, forces are buffered to avoid program crashes due to floating point errors, but the actual values are no longer meaningful. This issue is primarily relevant when modifying the code or when simulating unsupported residues, for which the Zmatrix is inferred from input (see elsewhere for details).SC_BONDED_I
Similar to SC_BONDED_A for all improper dihedral angle potentials.SC_BONDED_T
Similar to SC_BONDED_B for all torsional potentials. Note that these do in fact encompass degrees of freedom sampled in all types of simulations supported within CAMPARI and hence are always relevant. As alluded to above, torsional potentials can be easily set up to cover the same correction terms as the ones applied within SC_EXTRA. If that is the case, we therefore recommend not using SC_EXTRA (otherwise energy terms will in fact be applied twice, which is effectively scaling up those torsions; in such a case, CAMPARI produces an appropriate warning as well).SC_BONDED_M
Similar to SC_BONDED_B for all CMAP potentials. These gridbased correction potentials are part of the CHARMM force field and explained in PARAMETERS. This keyword specifies the "outside" scaling factor. Note that CMAP corrections can theoretically be relevant for all possible simulations of biopolymers within CAMPARI since they act on consecutive dihedral angles. The default CMAP corrections from CHARMM only apply to polypeptides, however, and are only contained within the reference CHARMM parameter file.IMPROPER_CONV
If improper dihedral potentials are in use (→ SC_BONDED_I), this very specialized keyword can be used to force a reinterpretation of the input sequence for the assignment of improper dihedral angle potentials to bonded types (see elsewhere). When set to 2, this keyword forces CAMPARI to switch the meaning of the first and third specified bonded type when it comes to energy and force evaluations. This allows a more or less exact match to the convention used in the AMBER set of force fields (and by extension: in OPLSAA). For any other value specified, CAMPARI will use the CAMPARInative convention (that is the same as in the CHARMM and GROMOS force fields).CMAPORDER
If CMAP corrections are used (→ SC_BONDED_M), this keyword sets the interpolation order for cardinal splines (assuming those are chosen through parameter input → PARAMETERS). A higher spline order will yield a smoother surface. Since the splines are noninterpolating, however, rapidly varying or coarsely tabulated functions may not be well approximated in such cases. The only interpolating cardinal B spline is the linear one which requires a choice of 2 for this keyword. This keyword is irrelevant should bicubic splines be chosen.CMAPDIR
If CMAP corrections are used (→ SC_BONDED_M), this keyword lets the user specify the absolute path of the directory in which the CMAP files are to be found (by default they are in the data/subdirectory of the main distribution tree).BPATCHFILE
This keyword can be used to provide the location and name of an input file that allows reassigning or adding bonded potential terms (see bond length potentials, bond angle potentials, improper dihedral angle potentials, torsional potentials, and CMAP potentials). At the level of the parameter file that CAMPARI parses to generate default assignments based on biotypes (see elsewhere), there are limitations to how finely the system can be parsed. For instance, it is technically not possible to have different bond length potentials acting on the N→C_{α} bonds of two nonterminal glycine residues (because biotypes are identical). Of course, even providing bonded parameter assignments exactly at biotype resolution would generally be inordinately complicated, which is the reason for grouping biotypes into socalled bonded types in the parameter file. In cases where specific alterations to a given a system are desired, the patch functionality provided by this input file will generally be the most convenient (and often the only) route to take. For stiff terms, CAMPARI can also guess values based on initial geometries. Applied patches to bonded interactions are always printed to logoutput In order to diagnose their correctness more easily, it is recommended to use the report functionality for bonded potential terms. Note that the most critical limitation is that extra or alternative bonded potentials can only be applied to such internal coordinates that are eligible for default assignments themselves, e.g., it is not possible to apply a bond angle potential to atoms abc if a is not covalently bound to b or if b is not covalently bound to c.GUESS_BONDED
This keyword lets the user CAMPARI instruct to construct a set of bonded parameters from the most basic information available, which are the default molecular geometries ( usually from highresolution crystal structures) for residues natively supported by CAMPARI or structural input for unsupported residues (see documentation on sequence input for details). Options are as follows: No potentials are guessed. This means that the only bonded potentials available are those defined in the parameter file and mapped by matching the entries on bonded types to the available potentials, e.g. bond length potentials, as well as those potentials defined in a corresponding patch file, which also rely on the parameter file. Missing bonded interactions can make it impossible or meaningless to run simulations in Cartesian space.
 CAMPARI will guess missing harmonic potentials (type 1) for all bond lengths and angles defined by molecular topology. The equilibrium positions are defined as mentioned above. The force constants are flat, which is obviously a crude approximation, and evaluate to 300kcal·mol^{1}Å^{2} for bond lengths and 80kcal·mol^{1}rad^{2} for bond angles, respectively. It is not possible to add potentials to doublets or triplets of atoms that are not topologically derived. This is important for unsupported residues where a suboptimal Z matrix may be constructed because of bad atom input order.
 This options implies the previous option. In addition, eligible improper dihedral angles (see documentation on bonded types) are given a harmonic potential (type 2), and the strength is set to 40kcal·mol^{1}rad^{2}. Note that the functional form here includes a factor of 1/2 not present for bond angles.
While these potentials are obviously too crude to study problems requiring very high resolution at the local geometry level, they can be very useful too quickly enable Cartesian space simulations of unsupported systems where often calibration data are missing (or unreliable) to begin with. The guessed potentials are written to log output, and parsing this with a script can help in creating templates for exhaustive patch files, which are tedious to create from scratch. Note that the source data do not come from structural input for supported residues, which means that initial structures deviating dramatically from the assumed local geometries can be subject to large forces.
BONDREPORT
This report flag allows the user to request a summary of the bonded potentials found and not found during processing of the parameter file. This is primarily useful as a sanity and debugging tool for creating parameter files. Note that missing but necessary parameters (necessary ones are all bond length and angle potentials if and only if CARTINT is 2) as well as guessed parameters (see GUESS_BONDED) are always reported upon.SC_WCA
Mutually exclusive to the use of the LennardJones potential, CAMPARI allows using the extended WeeksChandlerAndersen (WCA) potential which is defined as :E_{WCA} = 4.0·c_{WCA}ΣΣ_{i,j}ε_{ij}f_{14,ij}·[(σ_{ij}/r_{ij})^{12}  (σ_{ij}/r_{ij})^{6} + 0.25·(1.0  c_{2})] if r_{ij} < σ_{ij}·2^{1/6}
E_{WCA} = c_{2}·c_{WCA}ΣΣ_{i,j}ε_{ij}f_{14,ij}·[0.5·cos(c_{3}·(r_{ij}/σ_{ij})^{2} + c_{4})  0.5] if r_{ij} < σ_{ij}·c_{1}
E_{WCA} = 0.0 else
with:
c_{3} = π·(c_{1}^{2}  2^{1/3})^{1}
c_{4} = π  c_{3}·2^{1/3}
(reference)
Here, the size, interaction, and fudge parameters are used as defined before. c_{1} is the interaction cutoff (in units of σ_{ij}) while c_{2} is the depth of the attractive well to be spliced in (in units of ε_{ij}). c_{1} and c_{2} can be set by keywords ATT_WCA and CUT_WCA, respectively. The potential provides a continuous function mimicking a LJ potential in which the dispersive term can be spliced in without shifting the position of the minimum. c_{WCA} denotes the linear scaling factor specified by this keyword.
ATT_WCA
This allows the user to specify the well depth (positive number) for the attractive part of the WCA potential in units of ε_{ij} (parameter c_{2} under SC_WCA).CUT_WCA
This allows the user to specify the cutoff value for the extended WCA potential in units of σ_{ij} (parameter c_{1} under SC_WCA). Note that the minimum allowed choice here is 1.5.VDWREPORT
This keyword is a simple logical deciding whether or not to print out a summary of the LennardJones (size exclusion and dispersion) parameters, i.e., to report the base values (σ_{ii} and ε_{ii}), the combination rules, and in particular all "special" values which overwrite the default combination rulederived result.INTERREPORT
Mostly for debugging purposes, this simple logical allows the user to demand a summary of shortrange interactions. Naturally, this output can be very large and the keyword should only be used when absolutely needed, for example to understand the settings for MODE_14 and FUDGE_ST_14.SC_POLAR
CAMPARI only supports fixedcharge atombased electrostatic interactions which work by defining partial charges for each atom and then writing the potential as:E_{POLAR} = c_{POLAR}·ΣΣ_{i,j} [ (f_{14,C,ij}·q_{i}q_{j}) / (4.0πε_{0}·r_{ij})]
Here, the q_{i} are the atomic partial charges, f_{14,C,ij} are potential 14 fudge factors (see FUDGE_EL_14) that generally will be unity, ε_{0} is the vacuum permittivity, r_{ij} is the interatomic distance, and the (double) sum runs over all eligible atom pairs (see ELECMODEL). c_{POLAR} is the linear scaling factor for all polar interactions specified by this keyword. Since electrostatic interactions are characterized by the potential to yield longrange effects (distance scaling ranges from r^{1} for monopolemonopole terms to r^{6} for dipoledipole terms between molecules tumbling freely), the Coulombic term can employ a different cutoff in MC calculations (see below) than shortrange potential. The correct longrange treatment of electrostatic interactions is one of the most investigated areas in molecular simulations and the user is referred to current literature and keywords LREL_MC and LREL_MD for details. All required partial charges are read either through the parameter file or can be set by a dedicated patch.
Note that the functional form given above is only correct if no implicit solvation model is in use. In such a scenario, Coulomb interactions are usually modified by an extra term s_{ij} which can be complicated function of interatomic distance and/or the positions of all nearby atoms. The reader is referred to Vitalis and Pappu for the exact functional forms used in the ABSINTH implicit solvation model.
ELECMODEL
This important keyword is somewhat analogous to INTERMODEL and allows the user to set the interaction model for electrostatic interactions: Depending on the setting for INTERMODEL, interactions are either screened for connectivity and frozen interactions are excluded (INTERMODEL is 1), or are purely considered based on the number of bonds separating two atoms (INTERMODEL is 2). In any case, partial charges interact without considerations of net neutrality (see below), which is problematic for shortrange interactions. Consider for example the ωbond in polypeptides and assume that CO and NH both form neutral groups supposed to indicate dipole moments. If INTERMODEL is 2 and ELECMODEL is 1, the interaction between O and H is considered (14) but none of the others as they are topologically too close. This leads to spurious (and very strong) Coulomb interactions between what essentially are fractional, net charges. This is an inherent weakness of the point charge model which is typically addressed by extensive coparameterization of bonded parameters, 14fudge factors, etc. (see FUDGE_EL_14).
 The partial charge set in the parameter file is read and the assigned charges are screened for (generally) net neutral charge groups. These charge groups are determined largely automatically and are currently not patchable per se. The automatic charge group generation operates by trying to group partial charges into groups of minimum size and spanning a minimum number of covalent bonds satisfying a target net charge. The default target net charges are derived from knowledge of every CAMPARIsupported residue and assumptions about their titration states (if any). This means, for example, that a nonterminal lysine residue will be processed by first looking for a charge group with a net charge of +1 before trying to identify as many net neutral groups as possible. While CAMPARI does not allow grouping charges arbitrarily, there is a dedicated patch which allows defining a series of (arbitrary) target values for the net charges of charge groups in a given residue. This is required to deal with charge sets that do not group at all, or to deal with residues that contain multiple ionic moieties. For example, depending on the charge set in use, one may want to partition free, zwitterionic alanine either as multiple groups with +1, 1, and 0 charges, or simply as one or more net neutral groups. For multiple targets, failure of the grouping algorithm at a given stage will lead to this stage being skipped. Conversely, failure at the last stage will result in all remaining atoms in the residue being members of a single group. Groups that are not welldefined charge groups according to CAMPARI's standards will be reported on in log output. With the groups in place, only interactions between those groups, for which all possible atomatom pairs are separated by at least one significant degree of freedom, are computed. Interactions within a group are always excluded. What constitutes a significant degree of freedom is predetermined by the choice for INTERMODEL, and the reader is encouraged to read up on this if necessary. Essentially, INTERMODEL will define the maximum set of shortrange interaction pairs that can also be considered for polar interactions. As an example, for the 6 net neutral CH units in benzene, if INTERMODEL is 1, no intramolecular polar interactions can be considered (the maximum set is empty). Conversely, if INTERMODEL is 2, several groupgroup interactions are now permissible (C1HC4H, C2HC5H, C3H C6H). Depending on the charge set and on the choice for INTERMODEL, setting ELECMODEL to 2 can lead to a massive depletion of shortrange electrostatics. This paradigm clashes heavily with traditional force field development, but  in the authors' opinion  is the only sane treatment of dipoledipole interactions if the latter are represented by point charges.
 The charge groups are important for deciding how longrange electrostatic interactions between ionic groups are computed exactly (see options 1, 2, and 3 for LREL_MC and options 4 and 5 for LREL_MD).
 The charge groups are used as the basis for computing groupaveraged screening factors for certain screening models in the ABSINTH framework (see options 1, 3, 5, and 7 for SCRMODEL).
AMIDEPOL
One "flaw" in the biotype setup in CAMPARI (see PARAMETERS) is the fact that the two polar hydrogens on primary amides are treated as chemically equivalent which  on a typical simulation timescale  they are not. Instead of creating yet more biotypes, this keyword simply allows to add a small polarization term for partial charges on those hydrogens. The value specified will be added to the hydrogen cis to the oxygen (the electronegative atom nearby increase the partial positive charge) and subtracted from the transH to keep them both at the same total charge. For example, if both hydrogens have a charge of +0.36, a specification of 0.05 here will yield charges of 0.41 (cisO) and 0.31 (transO). It will be useful to track these changes using ELECREPORT. It is very important to note, however, that fundamentally a sampling algorithm may isomerize the amide bond and hence render the correction incorrect and  moreover  that reading in a structure may flip the two hydrogens to start out with (because of inconsistent numbering between two software packages). Hence, this keyword should be used only when absolutely necessary (and its sign may have to be flipped to achieve the desired effect).This correction to primary amides is a specific example for the occasional need to overwrite partial charge parameters for atoms due to "biotype splitting". The more general approach provided CAMPARI for this explicit purpose is to "patch" the partial charge set by a dedicated input file.
CPATCHFILE
If the polar potential is in use, this keyword can be used to provide the location and name of an input file that allows overriding some or all of the partial charge parameters CAMPARI obtains from the parameter file (see elsewhere). This can be required to match the exact standard given by a force field with a finer biotype parsing. Note that  by default  such corrections are errorprone and should only be used when absolutely needed. In any case, the user is recommended to use ELECREPORT for a detailed summary of final partial charges in the system.DIPREPORT
This simple logical will  when turned on by the user  produce two summary files (see DIPOLE_GROUPS.vmd and MONOPOLES.vmd), which allow to graphically assess the automatically determined charge groups. The former will visualize all charge groups in the system (not just the net neutral ones) by highlighting all atoms belonging to each group. The second will visualize the "center" atom of all groups carrying a net charge (the meaning of this is defined by the value for POLTOL). Note that  naturally  this option is not available if SC_POLAR is zero.NCPATCHFILE
If the polar potential is in use, CAMPARI automatically determines charge groups, i.e., groups of atoms within a residue that are topologically close and whose partial charges sum up to zero or to an integer net charge. If LREL_MD is 4 or 5 and/or LREL_MC is 1, 2, or 3, this information is used to flag residues as carrying ionic groups, which leads to the computation of additional interactions even if residues are not in each others' neighbor lists. A residue is flagged if it contains at least one charge group with a total, absolute charge greater than a tolerance that is zero by default (and there should be very good reasons for increasing this tolerance).This keyword allows the user to specify location and name of an optional input file that can perform two important tasks:
 It allows removal of the net charge flag for specific residues, thereby altering the overall interaction model (if the corresponding options for LREL_MD and/or LREL_MC are selected).
 It allows the manual specification of sequential target values for the total charges of charge groups to be identified. This is currently the only way to manually alter the charge group partitioning, and can be crucial when simulating unsupported residues and/or when dealing with charge sets that do not group naturally (such as those within the AMBER family of force fields).
POLTOL
If the polar potential is in use, CAMPARI automatically determines charge groups, i.e., groups of atoms within a residue that are topologically close and whose partial charges sum up to zero or to an integer net charge. As described above, these net charge values can be patched. This may, for example, be used to obtain a grouping into approximately neutral groups for partial charge sets that include complex polarization patterns. In order to avoid having the resultant groups cause CAMPARI to flag the corresponding residue as carrying a net charge (i.e., they are treated like ions), this keyword allows the user to defined an increased tolerance for what is considered "approximately neutral". This is relevant because treatment of residues as ions can have substantial implications for the interaction model in particular in terms of computational efficiency (see LREL_MC and LREL_MD). Note that this keyword operates at the charge group level, whereas patches via NCPATCHFILE can (also) disable the ionic flag status of residues. Therefore, both offer different levels of control. The numerical value specified here (in units of e) is compared to the total charge of a given charge group.As an example, consider a terminal nucleotide residue carrying a 5'phosphate with an integer negative charge. Suppose that the partial charges on the phosphate linker to the next residue are such that  in addition to the terminal phosphate  this leaves a charge group with a small, fractional charge. In this case, the residuelevel patch could only remove the net charge flag for the entire residue (probably undesirable), whereas the tolerance setting described here could specifically eliminate the group with the fractional charge from the list of ionic groups. The default tolerance is set to be zero within reasonable numerical precision. Note that this keyword has no impact on the charge group partitioning itself and is relevant only if LREL_MD is 4 or 5 and/or LREL_MC is 1, 2, or 3.
FUDGE_ST_14
This keyword provides a flat 14 scaling factor for interatomic, nonbonded interactions of specific types. 14 interactions are defined according to the choice for MODE_14 and depend on the setting for INTERMODEL as well. The value for FUDGE_ST_14 is applied to all steric and dispersion potentials, i.e., the potentials turned on by SC_IPP, SC_ATTLJ, and SC_WCA. The only other 14scaled interaction potential is the electrostatic one for which a separate 14scaling factor is in use (see FUDGE_EL_14). All other pairwise, nonbonded potentials are never subjected to 14corrections (see for example SC_TABUL or SC_DREST). Note that the value for FUDGE_ST_14 is applied in addition to corrections applied at the parameter level by providing 14specific σ and εparameters in the parameter file (see PARAMETERS).FUDGE_EL_14
Similar to FUDGE_ST_14, this keyword specifies a scale factor for 14interactions. Here, the provided value will be applied specifically to electrostatic interactions (see SC_POLAR) only. If ELECMODEL is set to 2, any charge group interaction will be scaled as a whole by this factor, as soon as any of the possible atom pairs fulfill the 14criterion (see MODE_14).MODE_14
This keyword's relevance is limited to the case in which INTERMODEL is 1. Then, this essentially defines what a 14interaction is, specifically whether anything separated by exactly three bonds or by exactly one relevant rotatable bond should be considered 14: Only two interacting atoms separated by exactly three bonds are treated as 14.
 Two interacting atoms separated by exactly one relevant, freely rotatable bond are always treated as 14.
Take a phenylalanine residue and consider the CACBCGCD1 stretch (from C_{α} to one of the C_{δ}). This is exactly three bonds and the bond CBCG is the only relevant rotatable one (CACB is also rotatable but irrelevant, since CA lies on the axis, while CGCD1 is not rotatable). CA and CD1 are treated as 14 in both modes. Now consider the CACBCGCD1CE1 stretch. These are four bonds and CA and CE1 are not considered 14 in mode 1. However, there is still only one relevant rotatable bond in between (CBCG, since CGCD1 is rigid), so CA and CE1 are in fact treated as 14 in mode 2.
Note that CAMPARI allows specific modifications of 14interactions, either through the use of fudge factors (see FUDGE_ST_14 and FUDGE_EL_14) or through specific parameters provided in the parameter file. If neither of those indicates a deviation from normal interaction rules, then this keyword becomes irrelevant as well.
ELECREPORT
If the polar potential is in use, this simple logical allows the user to request a summary for the closerange electrostatic interactions in the system. Similarly to INTERREPORT, this keyword mostly serves debugging purposes and should only be needed/required to understand the details of the shortrange interaction setup.SC_IMPSOLV
This keyword serves two functions. First, as a logical it enables the ABSINTH implicit solvent model, i.e., it will compute the direct meanfield interaction (DMFI) of each solute with the continuum and enable screening of polar interactions (if turned on → SC_POLAR). For the former (the DMFI) it simultaneously serves as the linear scaling factor. Note that the amount of screening of polar interactions is not dependent on this keyword and solely determined by other parameters (in particular IMPDIEL). The DMFI is defined as:E_{DMFI} = c_{DMFI}·Σ_{k} ΔG_{FES,k} [Σ_{i} ζ_{i}^{k}·υ_{i}^{k}]
Here, υ_{i}^{k} is the solvation state of the i^{th} atom in the k^{th} solvation group and ζ_{i}^{k} is its weight factor. The solvation states are computed by CAMPARI and vary throughout the simulation whereas the weight factors are constant. The reference free energies of solvation for each solvation group (ΔG_{FES,k}) are provided through the parameter file and are constant as well (for the latter see PARAMETERS). Note that the computation of the DMFI given the υ_{i}^{k} is a computation of negligible cost and that CAMPARI obtains the υ_{i}^{k} while computing shortrange nonbonded interactions at a moderate additional cost. This implies that the ABSINTH implicit solvation model is speedlimited almost exclusively by the complications incurred by the screening of polar interactions. The user is referred to Vitalis and Pappu for further details (reference).
To employ the ABSINTH implicit solvent model as published use:
FMCSC_FOSFUNC 1
FMCSC_FOSMID 0.1
FMCSC_FOSTAU 0.25
FMCSC_SCRFUNC 1
FMCSC_SCRMID 0.9
FMCSC_SCRTAU 0.5
FMCSC_SAVMODE 1
FMCSC_SAVPROBE 2.5
FMCSC_IMPDIEL 78.2
FMCSC_SCRMODEL 2 # (or 1)
Note that as of 2013 the more rigorous screening model (option 1) appears in published literature only for the work on argininerich peptides (Mao et al.). Many other screening models are fully implemented but without any published data available (as of 04/2016). Similarly, it is possible to switch the functional forms for mapping from solventaccessible volume fractions to solvation states using keywords FOSFUNC and SCRFUNC and to change the way overlap volumes are computed (→ SAVMODE). Finally, note that the DMFI can be made temperaturedependent by additions to the parameter file and use of keyword FOSMODE.
SAVPATCHFILE
This keyword can be used to provide the location and name of an input file that allows overriding the default, topologyderived values for the maximum fractions of the solventaccessible volume, η_{i,max}. Because values depend on hardcoded parameters (geometry) and userlevel settings (choice of parameters and keyword FMCSC_SAVPROBE), CAMPARI (re)computes these values at the beginning of each run. This utilizes the default local geometries (not input structures) and works by decomposing the molecule into suitably small model compound units. The patch prints a summary of all successful changes, and results can also be assessed via column 4 in output file SAV_BY_ATOM.dat. Note that these values rely on other patchable quantities, most notably atomic radii. Patches follow a hierarchy, and a patched value for the η_{i,max} overrides values derived from radii that could be patched themselves (here, RPATCHFILE overrides indirect reassignment via LJPATCHFILE) without touching the atomic radii. This means that it possible for the patched values of η_{i,max} to be grossly inconsistent with the underlying set of radii.ASRPATCHFILE
This keyword can be used to provide the location and name of an input file that allows overriding the default, topologyderived values for the pairwise reduction factor for atomic volumes used in most computations using the atomic volume, most prominently the ABSINTH implicit solvation model. Reduction factors are needed, because the exclusion volumes of covalently bound atoms overlap. The reduction factors are computed in linear approximation, and  by default  the overlap volume is subtracted evenly from the remaining atomic volume of each partner. These values depend on various parameters (parameters and hardcoded geometry), and CAMPARI (re)computes them at the beginning of each run. The patch prints a summary of all successful changes, and results can also be assessed via column 7 in output file SAV_BY_ATOM.dat. See SAVPATCHFILE for remarks on the hierarchy of patches of atomic parameters.FOSPATCHFILE
Since there is no external way to control details of the solvation group assignments relevant to the computation of the DMFI (→ SC_IMPSOLV) through the parameter file, CAMPARI offers users to alter the default group partitioning and to control reference free energies of solvation on a permoiety basis through a dedicated input file. This also supports alterations to transfer enthalpies and heat capacities at the patch level if a temperaturedependent DMFI is in use. This keyword is used to provide the location and name of this input file. There are some underlying restrictions to the freedom of choices, but in principle it is possible to completely redesign the underlying DMFI model using this facility. Restrictions and formatting are explained elsewhere. The applied patch implies that CAMPARI will keep the builtin default partitioning along with the default reference values from the parameter file (see elsewhere) for unpatched residues and molecules. As with other force field patches, these corrections are errorprone and CAMPARI output should always be doublechecked against the intended input. For this purpose, keyword FOSREPORT and associated output file FOS_GROUPS.vmd will be of particular use.FOSMODE
Simulation temperature is used frequently in biomolecular sampling both to explicitly probe temperaturedependent behavior and to enhance sampling. For the former, the correctness of fixed force field parameters becomes questionable. If the DMFI of the ABSINTH implicit solvation model is in use, this keyword allows the user to make some of the parameters of the model temperaturedependent themselves. There are currently two options: All values for ΔG_{FES} in the equation above are fixed to the reference values specified in the parameter file independent of temperature or any other environmental parameters. This is the default.
 CAMPARI tries to extract values for temperatureindependent enthalpies and heat capacities of the transfer
process of a given model compound from a fixed conformation in the gas phase into water from the
parameter file. By default, all CAMPARI
parameter files do not contain these parameters. The temperaturedependent values are computed as:
ΔG_{FES}(T) = (ΔG_{FES}(T_{0})  ΔH_{FES})T/T_{0} + ΔH_{FES} + ΔC_{p,FES}[T[1  ln(T/T_{0})]  T_{0}]
Here, ΔH_{FES} and ΔC_{p,FES} are the aforementioned enthalpies and heat capacities of transfer, whereas T denotes the simulation temperature and T_{0} denotes the reference temperature for the listed free energy value. T_{0} is set by keyword FOSREFT.
FOSREFT
If the DMFI of the ABSINTH implicit solvation model is in use, and if a temperaturedependent model has been requested, this keyword sets the assumed reference temperature for transfer free energies of solvation listed in the corresponding section of the parameter file. It defaults to 298K.FOSREPORT
This simple logical allows the user to request CAMPARI to print a summary of the groupbased reference free energies, enthalpies, and heat capacities of solvation read from the parameter file. The latter two terms are only relevant if a temperaturedependent model has been selected. In general, the reference free energies will correspond exactly to the terms ΔG_{FES,k} above. Note, however, that this initial output is not a summary of the system but rather of the parameters, i.e., it is more like VDWREPORT and unlike ELECREPORT or INTERREPORT. If some solvation group assignments and parameters are changed via a corresponding patch file, this keyword will also ensure that the applied patch is documented in detail in CAMPARI's log output. The actual group partitioning for the system at hand (but not the associated numerical parameters) is available from output file FOS_GROUPS.vmd.SAVPROBE
This keyword is crucial for the ABSINTH implicit solvent model and specifies the size of the solvation shell around individual atoms. The input value is interpreted to be the radius in Å of a solvent sphere rolled around each atom and consequently twice the value of SAVPROBE will yield the thickness of the assumed first solvation layer. The resultant solvation shell volume is the starting point for determining solventaccessible volume fractions (η_{i}) which are then mapped to yield atomic solvation states (υ_{i}) which are relevant for the DMFI and screened electrostatic interactions (→ SCRMODEL). In order to compute solventaccessible volumes, overlap volumes of spheres need to be calculated or estimated, and how to do that is controlled by keyword SAVMODE. It is important to note that SAVPROBE is the only keyword other than SAVMODE directly controlling the η_{i} which are otherwise purely functions of atomic parameters (see PARAMETERS). Lastly, note that this keyword is still relevant for SAV analysis even though the implicit solvent model might not be used (→ SAVCALC).SAVMODE
This keyword controls how CAMPARI calculates solventaccessible volumes. The size of the solvation shell is defined by atomic radius and the setting for SAVPROBE. There are currently two options: Linear approximations are used to calculate pairwise overlap volumes. Individual atomic volumes are scaled by reduction factors given by molecular topology.
 Pairwise overlap volumes are calculated exactly (polynomial equation). Individual atomic volumes are scaled by reduction factors given by molecular topology.
FOSFUNC
If the DMFI of the ABSINTH implicit solvation model is in use, this keyword controls which functional form is used to map the solventaccessible volume to the DMFI solvation state. The published smoothed and stretched sigmoidal function is used, which relies on 2 parameters,
viz., χ_{f} and τ_{f}.
The functional form is:
υ_{i,f} ~ [ 1.0 + exp[  (η_{i}h(χ_{f}))/τ_{f} ] ^{1}
Here, υ_{i,f} is the DMFI solvation state for atom i, η_{i} is the solventaccessible volume fraction for atom i, and h(...) is a linear function shifting the midpoint parameter χ_{f} (set by FOSMID) such that symmetry between the two natural limits of η_{i} is obtained. The normalizer is not shown in the equation. The function is smooth over the interval over which it applies.  A stairstepped, stretched sigmoidal function is used, which relies on 5 parameters,
viz., χ_{f}, τ_{f},
g_{f}, ζ_{f}, and
FOSSHIFT. This is a piecewisedefined function. The width in solventaccessible
volume space is set directly by g_{f} starting from the lower natural limit.
Within each piece, the fractional interval g_{f}ζ_{f} is flat with values set by
functional form 1 above (thus relying on χ_{f} and
τ_{f}). If we term two neighboring plateau values as υ_{1} and
υ_{2}, then the functional form for the interval of width g_{f}(1ζ_{f}) is:
υ_{i,f} = 0.5(υ_{2}  υ_{1})·(1  cos( (η_{i}  η_{1})·π(1ζ_{f})^{1}/Δη))
Here, η_{1} corresponds to the left boundary of the interval in question, and Δη is the equivalent of g_{f} in solventaccessible volume fraction space. The position of the interpolation interval within the total interval of width Δη is defined by keyword FOSSHIFT. Note that functional form 1 is theoretically recovered if g_{f} approaches 0.0. Note also that FOSSHIFT becomes irrelevant as ζ_{f} approaches 0.0 and that the limit of ζ_{f} reaching 1 (true step function) is numerically forbidden.
FOSTAU
The atomic solventaccessible volumes, η_{i}, are mapped to solvation states by two different sets of parameters, the first being responsible for obtaining υ_{i,f} which are the solvation states describing the change in DMFI with changes in conformation (the second set is responsible for obtaining υ_{i,s} which describe the change in dielectric response with change in conformation). The details of the mapping function are complicated by the requirement to normalize the υ_{i,f} to the welldefined interval [0:1] but in essence it holds:υ_{i,f} ~ [ 1.0 + exp[  (η_{i}h(χ_{f}))/τ_{f} ] ^{1}
Here, τ_{f} is the parameter determining the steepness of the sigmoidal interpolation and this is the parameter determined by this keyword. Large values will yield an approximately linear remapping between the natural limits of η_{i} which are derived from closest packing of spheres (lower limit) and model compound topology (upper limit). This case is not obvious from the above equation but is obtained via τ_{f}dependent rescaling to match the target interval. Conversely, very small values yield a stepfunction like interpolation. h(x) is a linear function shifting the midpoint parameter χ_{f} (set by FOSMID) such that symmetry between the two natural limits of η_{i} is obtained.
FOSMID
As explained for FOSTAU, the mapping from solventaccessible volumes η_{i} to solvation states υ_{i,f} relies on a midpoint parameter, χ_{f}. In the functional form given above, the midpoint of the sigmoidal function (i.e., the point of maximal slope) can be shifted toward either one of the natural limits of η_{i} by varying this keyword between zero and unity. Since the sigmoidal nature of the interpolation disappears in the limit of large values chosen for FOSTAU, FOSMID is only relevant for sufficiently small values of FOSTAU and its impact deteriorates progressively with growing FOSTAU. Note that the default is 0.5 but that it is easily possible to generate fairly asymmetric interpolation functions in the process (i.e., at values close to zero atoms are considered solvated at almost all times while at values close to unity the opposite is true). There is a Matlab script in the toolsdirectory (sigmainterpol.m) that helps assess the effect FOSTAU and FOSMID have given values for the natural limits of η_{i}.FOSGRANULE
As explained above, this keyword applies only to the case of a stairstepped interpolation function from solventaccessible volume fraction to DMFI solvation state. It sets the volume increment to assume for each step (in Å^{3}). The solvation shell volume of each atom is then discretized by this increment and a steplike function is applied to each resulting interval in ηspace. The default is set to the volume available to a single water molecule when assuming liquid water with a density of 1g/cm^{3}.FOSTIGHT
As explained above, this keyword applies only to the case of a stairstepped interpolation function from solventaccessible volume fraction to DMFI solvation state. It sets the narrowness of the cosinebased step interpolation within each interval defined by FOSGRANULE and FOSTIGHT. A value of 0.0 gives a smooth piecewise function without any plateau regions whereas smaller values intersperse plateau regions by making the transition narrower (function remains smooth, however). The theoretical limit of 1.0 gives a true step function, but this is numerically forbidden.FOSSHIFT
As explained above, this keyword applies only to the case of a stairstepped interpolation function from solventaccessible volume fraction to DMFI solvation state. It sets the position of the step interpolation within an interval defined by FOSGRANULE and FOSTIGHT, and values from 0.0 (left) to 1.0 (right) are possible. The keyword is irrelevant if FOSTIGHT (ζ_{f} above) is zero.IMPDIEL
This keyword lets the user set the assumed continuum dielectric. Primarily, this is used in the ABSINTH solvation model to treat the screening of electrostatic interactions. The dielectric constant enters the equation for the modified Coulomb sum in different ways depending on the choice for SCRMODEL. In general, the solventaccessible volumes, η_{i} will be mapped to yield solvation states υ_{i,s} for dielectric screening. The mapping process is equivalent to the one described for the DMFI but relies on a separate set of parameters (see SCRFUNC). In the published ABSINTH model, the screening factor for the polar interaction is given as:s_{ij} = [ 1  aυ_{i,s} ]·[ 1  aυ_{j,s} ]
a = (1  ε_{r}^{1/2})
Here, ε_{r} is the relative dielectric constant set by this keyword. The above equation corresponds rigorously only to using screening model 2. Note how the functional form ensures an interpolation between the vacuum (υ_{i,j} = 0.0 → ε_{eff} = 1.0) and the fully screened cases (υ_{i,j} = 1.0 → ε_{eff} = ε_{r}).
In a completely different context, this keyword also sets the assumed continuum dielectric outside the cutoff sphere when treating electrostatics interactions with reactionfield methods (→ LREL_MD). For this latter purpose, it may be advantageous to set a very large value.
SCRMODEL
This keyword has several options which allow the user to control how dielectric screening of charges is done, specifically what functional form is used for the pairwise screening factor s_{ij} for a pair of interacting atoms i and j. The electrostatic framework within ABSINTH aims specifically at ensuring that only moieties with welldefined net charges interact (this is discussed in a different context for ELECMODEL). This means that for every base functional form of s_{ij} there will be two variants, one in which the υ_{i,s} are used directly (atombased) and one in which a charge groupbased υ_{k,s} is precomputed for each group k out of its constituent atoms' solvation states υ_{i,s}^{k}. Only the latter ensures rigorously that two formally neutral charge groups interacting will not create effective charge imbalances by atomspecific screening. The downside of those models (and the reason we generally do not recommend using them) is the higher computational cost associated and the dependence on the local neutrality in the partial charge set (i.e., should the base parameters not yield any locally neutral subgroups within a residue, the relevant charge group may be as large as an entire polynucleotide residue and dielectric responses of fairly distant moieties may become coupled which suggests a length scale to the solvent response vastly inconsistent with the setting for SAVPROBE). In the latter case, it may be necessary to attempt to patch the charge groups so that an approximate grouping is obtained. For every charge group, the solvation states for the
individual sites are averaged in chargeweighted fashion (groupbased →
see above).
The resultant group solvation state υ_{k,s} is used to screen
all the charges belonging to this
group:
s_{ij} = [ 1  aυ_{k,s} ]·[ 1  aυ_{l,s} ]
a = (1  ε_{r}^{1/2})
Here, we assume atom i is part of the k^{th} charge group and atom j is part of the l^{th} charge group. ε_{r} is provided by IMPDIEL.  This is the published atombased model and explained above (→ IMPDIEL). The atomspecific screening via atomic solvation states υ_{i,s} will break the neutral paradigm somewhat but localizes and strengthens specific interactions.
 Since electrostatic interactions tend to be somewhat weak with
the aforementioned options, this model
extends the default model (1) by an important change. If the distance
of atoms i and j, r_{ij} approaches
the lengthscale of the first solvation shell, the dielectric is
augmented by a distancedependent contribution
intended to strengthen specific interactions. This yields a very
complicated (although computationally not much more
expensive) model:
s_{ij} = s_{env,ij} if r_{ij} ≥ (r_{0,ij}+d_{W}) or s_{env,ij} > [ ε_{c}·r_{0,ij} ]^{1}
s_{ij} = [ 1  f_{MIX}·[1  d_{w}^{1}(r_{ij}r_{0,ij})] ]·s_{env,ij} + f_{MIX}·[1  d_{w}^{1}(r_{ij}r_{0,ij})]·[ ε_{c}·r_{0,ij} ]^{1} if r_{ij} < (r_{0,ij}+d_{W}) and r_{ij} > r_{0,ij}
s_{ij} = (1  f_{MIX})·s_{env,ij} + f_{MIX}·[ ε_{c}·r_{0,ij} ]^{1} if r_{ij} < r_{0,ij}
s_{env,ij} = [ 1  aυ_{k,s} ]·[ 1  aυ_{l,s} ]
a = (1  ε_{r}^{1/2})
Here, d_{W} is the thickness of the solvation shell (2·SAVPROBE) and r_{0,ij} is given by the sum of the atomic radii of atoms i and j. f_{MIX} is the impact of the distancedependent contribution and set by keyword SCRMIX. ε_{c} is set by CONTACTDIEL (compare model 4). Note that the distancedependence is achieved by the interpolation performed in the distance regime r_{0,ij} < r_{ij} < (r_{0,ij}+d_{W}) but that no explicit distancedependence is introduced otherwise. Furthermore, the contact dielectric ε_{c}·r_{0,ij} is generally overridden if the environmental dielectric s_{env,ij} would lead to a stronger interaction (less screening). Importantly, model 3 operates on the groupconsistent solvation states (as model 1 does). The atomspecific modification corresponds to model 9. It should be noted that these models are largely untested and were part of initial calibration studies with the ABSINTH implicit solvent model. They are fully supported by CAMPARI, however.  This model implements a (more or less) pure distancedependent
dielectric:
s_{ij} = [ ε_{c}·r_{ij} ]^{1} if r_{ij} > r_{0,ij}
s_{ij} = [ ε_{c}·r_{0,ij} ]^{1} else
Here, ε_{c} is the strength of the distance increase of the dielectric constant and r_{0,ij} is the contact distance below which no further distance dependence to s_{ij} is applied. The resultant effective dielectric constant is ε_{c}·r_{0,ij} which should never be less than unity. ε_{c} is set by CONTACTDIEL and r_{0,ij} is defined by the sum of the atomic radii of atoms i and j. This means that the derivative of the potential is discontinuous at the contact point. Note that distancedependent dielectric models break for a variety of limiting cases, in particular for anything involving net charged species. They also rely on a cutoff criterion since they otherwise do not converge upon a meaningful limiting dielectric. In this way, distancedependent dielectrics may be seen as somewhat analogous to reactionfield treatments (see LREL_MD).  This model is a groupbased variant and therefore similar to
option 1). It attempts to take a different route
toward computing an effective dielectric. Whereas models 1, 2, 3, and 9
use an effective charge approach, this model
(just like models 6, 7, and 8) employs an effective dielectric
approach. The former implies that the solvation
state enters the potential energy for Coulombic interactions as υ_{i,s}·υ_{j,s},
i.e.,
E_{POLAR} will scale with changes in the υ_{i,s}
differently than the DMFI.
Consequently, screening model 5 implies:
s_{ij} = M( [1  a·υ_{k,s}], [1  a·υ_{l,s}] )
a = (1  ε_{r}^{1})
Here, we assume atom i is part of the k^{th} charge group and atom j is part of the l^{th} charge group and M is a function corresponding to a generalized mean whose exact form is determined by the choice for ISQM. The latter will be able to give rise to fundamentally different scaling behavior of E_{POLAR} with the υ_{i,s} illustrated for example by taking the arithmetic mean. This can more closely approximate the behavior seen for the DMFI and may allow using much more similar parameter sets τ_{s} and χ_{s} compared to τ_{f} and χ_{f} than is the case with models 1 or 2.  This model is the atombased variant of model 5:
s_{ij} = M( [ 1  a·υ_{i,s}], [ 1  a·υ_{j,s}] )
a = (1  ε_{r}^{1})
 This model is an equivalent modification to model 5 as model 3 is to model 1.
 This model is an equivalent modification to model 6 as model 3 is to model 1.
 This model is an equivalent modification to model 2 as model 3 is to model 1.
CONTACTDIEL
For certain screening models, (SCRMODEL = 3, 4, 7, 8, or 9) a value for the effective dielectric at an interatomic distance matching the sum of the two atomic radii exactly is postulated to have the limiting value of ε_{c}·r_{0,ij} (see equations above). This keyword provides the value for the parameter ε_{c}.SCRFUNC
If the ABSINTH implicit solvation model is in use and Coulombic interactions are enabled, this keyword controls which functional form is used to map the solventaccessible volume to the solvation state for charge screening (υ_{i,s} for atom i above). As explained before (see IMPDIEL), the ABSINTH implicit solvent model employs two sets of solvation states, υ_{i,f} and υ_{i,s}. The υ_{i,s} determine the effective dielectric acting between polar atoms (see equations above).The text below is basically the same as that for keyword FOSFUNC.
 The published smoothed and stretched sigmoidal function is used, which relies on 2 parameters,
viz., χ_{s} and τ_{s}.
The functional form is:
υ_{i,s} ~ [ 1.0 + exp[  (η_{i}h(χ_{s}))/τ_{s} ] ^{1}
Here, υ_{i,s} is the charge screening solvation state for atom i, η_{i} is the solventaccessible volume fraction for atom i, and h(...) is a linear function shifting the midpoint parameter χ_{s} (set by SCRMID) such that symmetry between the two natural limits of η_{i} is obtained. The normalizer is not shown in the equation. The function is smooth over the interval over which it applies.  A stairstepped, stretched sigmoidal function is used, which relies on 5 parameters,
viz., χ_{s}, τ_{s},
g_{s}, ζ_{s}, and
SCRSHIFT. This is a piecewisedefined function. The width in solventaccessible
volume space is set directly by g_{s} starting from the lower natural limit.
Within each piece, the fractional interval g_{s}ζ_{s} is flat with values set by
functional form 1 above (thus relying on χ_{s} and
τ_{s}). If we term two neighboring plateau values as υ_{1} and
υ_{2}, then the functional form for the interval of width g_{s}(1ζ_{s}) is:
υ_{i,s} = 0.5(υ_{2}  υ_{1})·(1  cos( (η_{i}  η_{1})·π(1ζ_{s})^{1}/Δη))
Here, η_{1} corresponds to the left boundary of the interval in question, and Δη is the equivalent of g_{s} in solventaccessible volume fraction space. The position of the interpolation interval within the total interval of width Δη is defined by keyword SCRSHIFT. Note that functional form 1 is theoretically recovered if g_{s} approaches 0.0. Note also that SCRSHIFT becomes irrelevant as ζ_{s} approaches 0.0 and that the limit of ζ_{s} reaching 1 (true step function) is numerically forbidden.
SCRTAU
This is the specification analogous to FOSTAU for the charge screening solvation state and provides τ_{s} rather than τ_{f}.SCRMID
This is the specification analogous to FOSMID for the charge screening solvation state and provides χ_{s} rather than χ_{f}.SCRGRANULE
This is the specification analogous to FOSGRANULE for the charge screening solvation state and provides g_{s} rather than g_{f}.SCRTIGHT
This is the specification analogous to FOSTIGHT for the charge screening solvation state and provides ζ_{s} rather than ζ_{f}.SCRSHIFT
This is the specification analogous to FOSSHIFT for the charge screening solvation state.SCRMIX
Several of the screening models (choice of 3, 7, 8, or 9 for SCRMODEL) splice a distancedependent term into the environmental chargescreening over a welldefined length scale. The impact of this contribution is set by this keyword which corresponds to the parameter f_{MIX} in the equations above. If set to values close to zero, the model approaches its unmodified base model, e.g. model 3 essentially converges to model 1. Conversely, a value close to 1.0 would yield maximum impact and let  for example  model 3 approximate model 4 for distances close to the contact distance r_{0,ij}. The choice here is naturally tightly coupled to that for CONTACTDIEL.ISQM
In those screening models postulating an effective dielectric rather than effective charges, the generalized mean function M(x,y) was introduced (see equations above). This can be an integer from 10 to 10, but large absolute values slow down the computation drastically and are not recommended. The specification here defines the order m for the generalized mean:M(x,y) = [0.5·( x^{m} + y^{m} ) ]^{1/m} if m ≠ 0
With the limiting case of:
M(x,y) = (x·y)^{1/2} if m = 0
Common cases aside from the geometric (m=0) are the arithmetic (m=1) or the harmonic (m=1) mean. Any m>1 will favor large values in an asymmetric pair, i.e., let both participating atoms appear desolvated leading to stronger interactions, while any m<1 will favor small values in an asymmetric pair, i.e., let both participating atoms appear solvated and weaken such interactions (it is the derived screening factors and not the solvation states that enter the mean). The former scenario (m>1) would rarely seem desirable as it means that  for instance in solutions of small, polar molecules  the cooperativity for converting between fully dissociated and fully associated states becomes overly pronounced on account of the positive coupling between adding more and more species to a growing cluster and the enthalpic benefit offered by that process.
SC_TOR
This keyword specifies the linear scaling factor controlling the "outside" scaling of torsional bias terms, V_{TOR}. Such a potential allows to either harmonically restrain virtually all freely rotatable dihedral angles to specific target values or to softly bias them toward such target values. The setup for these is handled through an input file (details of the format are described elsewhere). Note that a particularly useful application of E_{TOR} is to apply torsional restraints according to structural input which is useful for equilibrating molecules meant to remain in a specific, internal arrangement.TORFILE
This keyword specifies the location and name (absolute paths preferable) of the input file for individual backbone torsional bias potentials, V_{TOR} (see elsewhere for description).TORREPORT
This is a simple logical allowing the user to instruct CAMPARI to write out a complete summary of the torsional bias terms contributing to V_{TOR} (naturally parsed by residue) in the system. In addition to the annotated logoutput, this will also create the output file SAMPLE_TORFILE.dat, which is a rewriting of the current input specifications to a fully explicit and residuebased version. This is useful primarily in preserving input to the definition of torsional bias potentials that comes from structural input. It is recommended to utilize this option.SC_ZSEC
This keyword gives the linear scaling factor for a global secondary structure bias term. For values larger than zero, a harmonic bias is applied on two order parameters, f_{α} and f_{β} which measure the secondary structure content of the chain. f_{α} and f_{β} are calculated as the sequenceaveraged (excluded termini) values of a mapping function defined for each residue:z_{α} = e^{τα·(dαrα)2} if d_{α} < r_{α}
z_{α} = 1.0 else
The radius of the (spherical) αregion, r_{α}, is provided by ZS_RAD_A and its center φ/ψposition by keyword ZS_POS_A. The distance d_{α} is taken from the center of the circle and corrected for periodic wraparounds in φ/ψspace. z_{β} is defined analogously. This function represents a smooth "top hat" function which is continuous and differentiable. By tuning the parameter τ_{α} through keywords ZS_STP_A and ZS_STP_B, the Gaussian decay beyond the limits of the spherical plateau region can be turned from very shallow to step functionlike. The default definitions (all of which can be overridden) are:
α
Center: φ/ψ=(60.0,50.0)°; r_{α} = 35.0°; 1.0/τ_{α}^{1/2} ≅22.36°
β
Center: φ/ψ=(155.0,160.0)°; r_{β} = 35.0°; 1.0/τ_{β}^{1/2} ≅ 22.36°
The global values (if there are multiple polypeptide chains in the system, the average is over all of them) are then restrained:
V_{ZSEC} = c_{ZSEC}·(k_{α}·(f_{α}  f_{α}^{0})^{2} + k_{β}·(f_{β}  f_{β}^{0})^{2})
Here, c_{ZSEC} is the linear scaling factor specified by this keyword. The other parameters are explained below. Note that it may not be a good idea to use such a residuebased restraint potential for very short sequences. Here, the net content idea breaks down and (for typical choices of τ_{α/β}) the chain will have access only to values in the vicinity of those given by a discrete residue content. This may lead to a specific sampling of the ring regions around the plateaus to satisfy intermediate target values which runs counterintuitive to the intent of the potential.
When CAMPARI's shared memory (OpenMP) parallelization is in use, the calculation of V_{ZSEC} is currently executed by a single thread, possibly in concurrence with another thread addressing the complementary DSSP term but not with anything else. This is a scaling limitation, and a corresponding warning is produced.
ZS_FR_A
This keyword specifies the target αcontent for the global secondary structure bias (f_{α}^{0}) potential (values [0.0:1.0]).ZS_FR_B
This keyword specifies the target βcontent for the global secondary structure bias (f_{β}^{0}) potential (values [0.0:1.0]). Note that the sum of f_{β}^{0} and f_{α}^{0} (see ZS_FR_B) should usually not exceed unity, especially in conjunction with stiff spring constants. Doing so would generate a frustrated system for which results will often be irrelevant.ZS_FR_KA
Through this keyword, (twice) the spring constant (in kcal/mol) operating on f_{α} is provided (k_{α}) if the global secondary structure bias potential is in use.ZS_FR_KB
Analogous to ZS_FR_KA, this keyword lets the user specify the spring constant (in kcal/mol) operating on f_{β} (k_{β}) if the global secondary structure bias potential is in use. If both parameters are meant to be restrained, it usually would not seem meaningful to choose very different values for the two spring constants. In doing so, one would essentially create a primary bias (stiffer term) and a secondary bias (softer term) operating "within" the primary restraint.ZS_POS_A
This is one of the few keywords that requires two floating point numbers as input. It allows the user to override the default location of the αbasin (see SC_ZSEC). The two numbers are interpreted to be the φ and ψvalues (in degrees) for the center of the (spherical) basin. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.ZS_POS_B
See ZS_POS_A, only for the βbasin.ZS_RAD_A
This keyword requires one floating point number to be specified. It allows overriding the default radius of the αbasin (see SC_ZSEC) and is assumed to be given in degrees. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.ZS_RAD_B
See ZS_RAD_A, only for the βbasin.ZS_STP_A
This keyword requires one floating point number. It allows overriding the default steepness of the decay (τ_{α}) of the order parameter value beyond the spherical plateau region defining the αbasin (see SC_ZSEC). It is assumed to be provided in inverse degrees squared. The setting is relevant for the corresponding restraint potential and the output in ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat.ZS_STP_B
See ZS_STP_A, only for the βbasin.SC_DSSP
This keyword provides the outside scaling factor, c_{DSSP}, on biasing potential acting on order parameters derived from the secondary structure annotation of polypeptides in the simulation system using the DSSP alogrithm. In essence, this allows to bias the system to populate more and stronger hydrogen bonds characteristic for either αhelices (H) or βsheets  whether parallel, antiparallel, multipleated or hairpins (E). Since secondary structure annotation is essentially a discretized and on/off variable, it may seem surprising that a restraint potential can be applied in meaningful fashion.V_{DSSP} = c_{DSSP}·(k_{H}·(f_{H}  f_{H}^{0})^{2} + k_{E}·(f_{E}  f_{E}^{0})^{2})
Here, the k_{H} and k_{E} are (twice) the spring constants for the harmonic restraints applied to the secondary structure scores, f_{H} and f_{E}. The spring constants are set by keywords DSSP_HSC_K and DSSP_ESC_K for Hscore and Escore, respectively. f_{H} and f_{E} are exactly identical to the Hscore and Escore defined below and rely on the same base parameters (→ DSSP_MODE). Essentially, they correspond to a multiplicative function of the assignment and the quality of the hydrogen bonds giving rise to the assignment. They can  depending on system and DSSP settings  be continuous and approximately smooth order parameters over a large part of the accessible regime. The target values f_{H}^{0} and f_{E}^{0} are set via keywords DSSP_HSC and DSSP_ESC. There are a few noteworthy peculiarities which the user should keep in mind:
 DSSP Eassignments can rely both on intra and intermolecular hydrogen bonds rendering the DSSP term a true systemwide potential. Currently, CAMPARI only allows restraining global E and Hscores which may make calculations with multiple polypeptides more difficult to interpret.
 In the limit of no hydrogen bonds, the order parameters will always be discontinuous since the discrete assignment score has to be nonzero for the quality score to matter.
 Due to the potential discontinuities, dynamics calculations utilizing the DSSP biasing potential may suffer from substantial noise, in particular for stiff restraints and small systems.
 Again, due to the functional form, there is no direct driving force to form new hydrogen bonds of the right type. The potential relies on random encounters and the cooperativity of secondary structure elements.
 Lastly, in case some proper hydrogen bonds are formed, the resultant energy landscape is often very rugged and sampling may be severely hampered by the presence of the restraints. It is therefore advisable  at the very least  to perform multiple independent simulations when using DSSP restraints.
When CAMPARI's shared memory (OpenMP) parallelization is in use, the DSSP restraint potential is currently calculated by a single thread, possibly in concurrence with another thread addressing the complementary ZSEC term but not with anything else. This is a scaling limitation, and a corresponding warning is produced.
DSSP_HSC
In case DSSP restraints are used (→ SC_DSSP), this keyword allows the user to set the target Hscore (αcontent, f_{H}^{0} above). Its value is limited to the interval from zero to unity. A large value will steer the system toward forming many i→i+4 hydrogen bonds.DSSP_ESC
In case DSSP restraints are used (→ SC_DSSP), this keyword lets the user set the target Escore (βcontent, f_{E}^{0} above). Just like for DSSP_HSC, values are restricted to the interval [0.0:1.0]. A large value will bias the system toward forming characteristic βhydrogen bonds but does not distinguish between parallel or antiparallel arrangements. Note that the sum of DSSP_HSC and DSSP_ESC should probably never approach unity. Also note that the Escore can never be exactly unity for a monomeric, finite length polypeptide even when discarding termini (turn requirement).DSSP_HSC_K
If DSSP restraints are in use (→ SC_DSSP), this keywords sets (twice) the spring constant (in kcal/mol) operating on the DSSP Hscore, i.e., it sets the value of k_{H} above.DSSP_ESC_K
If DSSP restraints are in use (→ SC_DSSP), this keywords sets (twice) the spring constant (in kcal/mol) operating on the DSSP Escore, i.e., it sets the value of k_{E} above.SC_POLY
In studies of generic polymers coarse descriptors like size and shape of the macromolecule may be more relevant than structural characteristics tailored specifically to polypeptides. CAMPARI supports restraint potentials on such coarse descriptors, specifically the parameters t and δ (see description of output file POLYAVG.dat) which measure size and shape asymmetry, respectively. Twodimensional histograms of these quantities can be computed and written by CAMPARI (see output file RDHIST.dat). These moleculebased restraint potentials yield a bias term to the total potential energy, V_{POLY}, and this keyword provides its "outside" scaling factor c_{POLY}. Note that with the exception of the scaling factor, requests are generally handled through a dedicated input file (see elsewhere for details). As mentioned above, when CAMPARI's shared memory (OpenMP) parallelization is in use, all threads contribute to calculating V_{POLY} synchronously.POLYFILE
This keyword should point to the location of the input file for individual molecular polymeric biasing potentials (→ elsewhere for description).POLYREPORT
Like other report flags, this keyword is a simple logical which allows the user to obtain a complete summary of the polymeric bias terms (by molecule) in the system. It is only meaningful if polymeric biasing terms are in use (→ SC_POLY).SC_TABUL
CAMPARI has an extensive facility to supply tabulated nonbonded potentials which are then applied to the system. This keyword specifies the "outside" linear scaling factor c_{TABUL} according to:E_{TABUL} = c_{TABUL} ·ΣΣ_{i,j} I(V_{ij}^{k},V_{ij}^{k+1},m_{ij}^{k},m_{ij}^{k+1},d_{ij})
Here, the sum runs over all atom pairs i,j which have a tabulated potential specified for them, V_{ij}^{k} is the k^{th} tabulated value of the acting potential and d_{ij} is the interatomic distance. d_{ij} is located uniquely within the interval given by the k^{th} and k+1^{th} tabulated value. I(...) is the interpolation function, and CAMPARI currently performs only cubic interpolation with cubic Hermite splines:
I(V_{ij}^{k},V_{ij}^{k+1},m_{ij}^{k},m_{ij}^{k+1},d_{ij}) = (2t^{3}  3t^{2} + 1)·V_{ij}^{k} + (3t^{2}  2t^{3})·V_{ij}^{k+1} + (d_{k+1}d_{k})·[(t^{3}  2t^{2} + t)·m_{ij}^{k} + (t^{3}  t^{2})·m_{ij}^{k+1}]
t = (d_{ij}  d_{k})/(d_{k+1}d_{k})
Here, t is the relative position in the interval from k to k+1 normalized to unit length. The m_{ij}^{k} are the tangents to (slopes at) the control points (tabulated values) of the potentials. The spline is set up to recover both values and tangents at the control points. This means that the resultant function is continuously differentiable regardless of the values used for the tangents. Tangents are either read from file (without error checks → description of dedicated input file) or estimated numerically via finite differences from the potential input (see description of dedicated input file). In the latter case, some options are available to tune the spline (see TABIBIAS and TABITIGHT).
There are a few additional characteristics of the implementation of tabulated potentials in CAMPARI:
 Aside from Coulombic terms, these potentials are the only ones captured by the longer of the nonbonded cutoffs in MC runs (→ ELCUTOFF).
 When used concurrently with other nonbonded potentials, a lot of wasteful distance calculation may be performed. This is since tabulated potentials have to use their own data structure to be able to function efficiently both for cases with universal use and for very sparse use.
 Atom pairs that are in close proximity and are excluded from all other nonbonded potentials are not excluded from tabulated potentials.
TABCODEFILE
This keyword provides the index input file which determines which tabulated potential to use for which atom pair (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.TABPOTFILE
This keyword should give the name and location of the actual input file for the tabulated potentials (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.TABTANGFILE
This keyword should give the name and location of the optional input file for providing derivatives of the tabulated potentials specified via another keyword. If this file is not provided, the derivatives are estimated numerically to generate the necessary tangents for the cubic interpolation scheme. If the file is provided, however, no checks are performed on the supplied values (see elsewhere for format description). Naturally, this is only relevant if the tabulated potential is in use.TABITIGHT
If tabulated potentials are in use, and if the input file providing derivatives of the potentials is either missing or incomplete, the cubic interpolation scheme applied to the discrete input data (using cubic Hermite splines) utilizes numerical estimates of the tangents (slopes) at the nodes (control points). The shape and nature of the resulting spline can be varied somewhat with two control parameters, the first controlling the "tightness", and the second (see below) controlling a left/rightsided bias with respect to the control points. The control parameters are used in the construction of the tangents as follows:m_{ij}^{k} = [ (1t_{t})·(1+t_{b})·(V_{ij}^{k}  V_{ij}^{k1}) + (1t_{t})·(1t_{b})·(V_{ij}^{k+1}  V_{ij}^{k}) ] / (d_{k+1}  d_{k1})
This is essentially a simplified KochanekBartels spline scheme skipping the discontinuity parameter and assuming identical distance spacings. The V_{j} are the potential values at the specified distances, d_{k}, supplied via the required input file. t_{t} is the tightness parameter controlled by this keyword, and t_{b} is the bias parameter controlled by TABIBIAS. For both parameters being zero, the wellknown CatmullRom spline is obtained. Regardless of the choices for t_{t} and t_{b} (allowed values span the interval from 1 to 1), the resultant interpolation scheme will yield a function that is continuous and smooth (i.e., continuously differentiable). However, unless the control points are very sparse with respect to the features of the potentials, any nonzero settings for t_{t} and/or t_{b} will most likely lead to undesirable effects, in particular at the level of derivatives.
TABIBIAS
If tabulated potentials are in use, and if the input file providing derivatives of the potentials is either missing or incomplete, the cubic interpolation scheme applied to the discrete input data (using cubic Hermite splines) utilizes numerical estimates of the tangents (slopes) at the nodes (control points). The shape of the resulting spline utilizes a bias parameter, t_{b}, that is specified by this keyword. Its exact interpretation is explained above. Simply speaking, positive values lead to a lag (along the distance axis) in the interpolated, piecewise polynomial compared to the control points, whereas negative values do the opposite.TABREPORT
If tabulated potentials are in use (see SC_TABUL), this keyword lets the user instruct CAMPARI to print out a report of all the tabulated interactions in the system. This output can be quite large and is written to a separate output file (see TABULATED_POT.idx).SC_DREST
A lot of experimental techniques (in particular NMR or FRET) can derive distance restraints on the relative position of two sites in a biomolecule. Hence, several computational techniques are able to utilize such restraints (prominent for example in the computational determination of protein structures via NMR). CAMPARI offers the simple facility to harmonically restrain atoms which otherwise need not have any particular relationship. These restraints can be made onesided, i.e. they can also restrain a distance to simply be within or beyond a certain threshold, which is usually a more appropriate treatment for incorporating experimental results. Such requests are handled and processed through a dedicated input file (see FMCSC_DRESTFILE), and details are provided there. This keyword discussed here simply provides the "outside" scaling factor c_{DREST} for the V_{DREST} term. As mentioned above, when CAMPARI's shared memory (OpenMP) parallelization is in use, all threads contribute to calculating V_{DREST} synchronously. Note that there is no incremental treatment of this term in Monte Carlo calculations, which is a limitation.DRESTFILE
This keyword should give the location and name of the input file containing specific atomatom distance restraint requests (see elsewhere for format description). Naturally, this is only relevant if custom distance restraints are in use.DRESTREPORT
If distance restraint potentials are in use (see SC_DREST), this keyword allows the user to request a summary of the active distance restraint terms in the system.SC_EMICRO
This keyword sets the global scaling factor for a spatial density restraint potential. The method was introduced recently (Vitalis and Caflisch), and the user is referred there for additional details. The potential relies on reading and quantitatively interpreting an input density map. The interpreted density for a given lattice cell with indices l, m, and n is denoted Ξ_{lmn} and is meant to correspond to some atomic property such as mass (→ EMPROPERTY). The potential itself is as follows:E_{EMICRO} = f_{EMICRO} Σ_{ijk} (ρ_{ijk}  Ξ_{ijk} )^{2}
The value of f_{EMICRO} is set by this keyword. The potential is extensive with the number of grid cells. If it is the dominant contribution in terms of CPU time to energy evaluations, the use of Monte Carlo sampling is currently quite wasteful since the values for ΔE_{EMICRO} are not actually incremental. The sum implied in the above equation is over all lattice cells of an evaluation grid reduced in resolution to exactly that of the input density map. Note that the dimensions of the evaluation grid are controlled by system size and shape, and that its formal resolution is either assumed to be that of the input map or set explicitly by keyword EMDELTAS (although the resultant lattice is required to have cell boundaries that align exactly with those of the input map). If the resolution of the evaluation grid is finer, the values for its cells are summed up to give the coarser resolution. Furthermore, the evaluation grid may extend beyond the input map, and in such a case the summation also includes (coarse) cells where the input is assumed to be exactly the background density. Taken together, these caveats mean that it is rarely useful not to match the input lattice exactly. Importantly, the spatial density restraint provides an absolute reference in space, which means that it is most likely incorrect to use drift removal techniques. Another unusual aspect about this potential is that it only applies to physically present molecules in simulations in ensembles with fluctuating particle numbers. This is despite it not being a pairwise interaction term, and distinguishes it from potentials affecting the bath particles as well (such as bonded potentials). Because the potential is strictly a penalty term, this creates an effective mismatch that must be lumped manually into the excess chemical potential. This is neither pretty nor clean meaning that concurrent use of this techniques should be accompanied by the appropriate skepticism.
Depending on the choice for EMMODE, E_{EMICRO} can also be written using an average of the simulation density that is typically not equivalent to the canonical ensemble average:
E_{EMICRO} = f_{EMICRO} Σ_{ijk} ( ⟨ ρ_{ijk} ⟩  Ξ_{ijk} )^{2}
Here, the angular brackets indicate an average that depends on keyword EMIWEIGHT and is explained there. Further details as to why the canonical average is not used are below. Note that the potential utilizing this average no longer corresponds to a unique Hamiltonian, i.e., every time the average is updated the energy landscape changes. This means that the ensembles generated are no longer straightforward to interpret. The obvious benefits of using an ensembleaveraged restraint are twofold. First, explicit heterogeneity can explain data that would be inconsistent with a unique structure. Second, sampling is aided by the fact that "stuck" conformations will tend to become unstable in terms of E_{EMICRO} over time. As a final remark, users should keep in mind that the actual ensemble average generated may not agree with input given that this quantity was never actually restrained during the simulation.
As mentioned above, when CAMPARI's shared memory (OpenMP) parallelization is in use, all threads contribute to calculating E_{EMICRO} synchronously. However, the parallel efficiency is generally poor if either the lattices are large (in number of grid cells) relative to the number of atoms or if the solutes in a dilute system change absolute positions rapidly.
EMMODE
If the density restraint potential is in use, this keyword allows the user to choose between two options. Setting this keyword to 1 computes the restraint term by comparing the instantaneous simulation density to the input density map, whereas a choice of 2 computes the restraint term by comparing an ensembleaveraged simulation density to the input density map.While the first option is straightforward, the second one requires some additional considerations as follows. Irrespective of whether a run is in parallel or not, the ensemble average is currently obtained over the previous sampling history (beyond equilibration) of the exact trajectory in question. Note that any average is created in terms of numbers of steps, which may cause inconsistencies in hybrid sampling runs due to the different average phase space increments. Choosing an appropriate type of average is not trivial (see, e.g., this reference), because the naive approach of including the entire sampling history leads to a continuously decreasing impact of the restraint term. There are currently two ways to address this. First, the accumulation frequency for the ensemble average can be reduced by keyword EMCALC. This slows down the reduction in impact and effectively gives the system more time to explore, because it results in concatenated runs of length EMCALC, during which the potential is in fact constant. Second, CAMPARI uses a fixed weight for the instantaneous component of the average while evaluating the potential. This fixed weight is set by keyword EMIWEIGHT and provides a way to utilize the entire history without degrading the impact of the restraint potential. A third route would be to use an appropriate kernel function in the time averaging, but this is inconvenient and potentially inefficient for spatial density analysis due to the large number of terms that would have to be stored and processed to recompute the kernelbased average.
A third option for this keyword may be added in the future that allows a lateral ensemble average to be restrained in MPI averaging calculations.
EMIWEIGHT
If the density restraint potential is in use, and if the potential acts on some ensembleaveraged simulation density, this keyword allows the user to set a fixed weight for the constructed average:⟨ ρ_{ijk} ⟩ = (1w_{inst}) N_{steps}^{1} Σ_{i} ρ_{ijk}(i) ) + w_{inst} ρ_{ijk}(current)
Here, the factor w_{inst} is set by this keyword and bound to the interval from 0 to 1. The ρ_{ijk}(i) are the N_{steps} values contributing to the running, canonical average of the density, and ρ_{ijk}(current) is the density produced by the current conformation at that given lattice cell. The limiting case of w_{inst} being 1.0 recovers the instantaneous treatment (→ EMMODE). The limiting case of w_{inst} being 0.0 does not, however, produce a meaningful restraint (since it is independent of the current conformation). Both limiting cases are therefore forbidden. Note that it is currently not possible to recover the naive approach of a restraint that continuously decreases in relevance.
EMMAPFILE
This keyword provides the location and name of the mandatory density input file when using the density restraint potential. The file format is described in detail elsewhere, and here it suffices to say that the external NetCDF library is needed, and that currently no other common density file formats (.ccp4, .mrc, ...) are read directly by CAMPARI. UCSF Chimera is able to convert between various densitybased file formats, and does read and write NetCDF files.The most common application is likely that of a simulation with 3D periodic boundary conditions and a rectangular cuboid simulation volume. Here, the cells of the input lattice should align exactly with those of the analysis and evaluation lattice CAMPARI uses, and generally it will be easiest to match both origin and dimensions exactly. By default, CAMPARI will obtain the lattice cell dimensions from the input map. For nonperiodic boundaries (including simulation systems with curved boundaries), it will be required, however, to deviate from such an exact match. Here, keyword EMBUFFER can be used to define the buffer in size for the evaluation grid at any nonperiodic boundaries. Furthermore, keyword EMDELTAS can always be used to request the analysis and evaluation lattice to have cells of a smaller size, which, with the restraint potential in place, has to yield the exact input cell size by integer multiplication for all three dimensions. Lastly, keyword EMREDUCE can be used to average the input map to a lower resolution by rebinning.
Assuming no further transformations are applied (→ keywords EMREDUCE, EMTRUNCATE, EMFLATTEN), the interpreted density based on the input file is as follows:
Ξ_{ijk} = ρ_{sol} + c(ω_{ijk}  ω_{bg})
Here,the final density for a given lattice cell, Ξ_{ijk}, has units of physical density, c is a scale factor explained below, ω_{ijk} is the original input density for the same lattice cell, and ρ_{sol} and ω_{bg} are the assumed physical and input background signals, respectively. ρ_{sol} is set by keyword EMBGDENSITY, and ω_{bg} can be set by keyword EMBACKGROUND if the value determined automatically from the histogram of input densities is not appropriate. Factor c is given as follows:
c = [ M_{M}  ρ_{sol} Σ_{ijk} V_{ijk} H(ω_{ijk}ω_{t}) ] · [Σ_{ijk} (ω_{ijk}ω_{bg})V_{ijk} H(ω_{ijk}ω_{t}) ]^{1}
Here, the first term in square brackets is a hypothetical excess signal (mass) using the apparent macromolecular volume (the sum of the volume of all lattice cells with signals exceeding the threshold, ω_{t}) and the assumed total mass. The V_{ijk} are the volumes of individual lattice cells and currently have to be all equal, and H(x) denotes the Heaviside step function. The second term in square brackets is the actual excess signal (mass) derived from the input map obtained by analogous summation. Factor c has units that convert optical density (input) to physical density. It is important to note the crucial impact of keywords EMTHRESHOLD and EMTOTMASS on the quantitative interpretation of the map. In particular, many combinations of values will be rejected by CAMPARI, because they cannot produce an excess signal larger than the background. The resultant interpreted map is written to a dedicated output file at the beginning of each run. Note that this includes all optional transformations controlled by keywords EMREDUCE, EMTRUNCATE, and EMFLATTEN.
EMREDUCE
If the density restraint potential is in use, this keyword can be used to change the formal resolution of the input density map. This is accomplished by simple rebinning, i.e., the target and original lattices are aligned at the origin, and the original signal for each cell is distributed to the target cells by simple overlap. Because the input is assumed to be a density, volume renormalization is performed. Note that it is generally meaningless to create a finer grid this way, because no new information is available, and CAMPARI distributes signal assuming a flat distribution inside each original input cell. Similar to keyword EMDELTAS, this keyword requires the specification of three floating point numbers that set the target lattice cell sizes of the rebinned input map in Å for the x, y, and z dimensions, respectively. Note that the exact values will generally be slightly different because of the requirement to have the outer dimensions of both grids align exactly. Finally, users should keep in mind that physical resolution and formal resolution of the lattice used to represent the data are two distinct quantities.EMBACKGROUND
If the density restraint potential is in use, this optional keyword can be used to override the value determined to correspond to background in the input density map (ω_{bg} in the equation above). This value is commonly set by binning the densities in all cells, and identifying a wellresolved peak in the histogram. If the map does not contain encode much background signal, the histogrambased determination may be inappropriate, and this is when this keyword is useful. Note that values refer to the original input density map.EMTHRESHOLD
If the density restraint potential is in use, this important keyword controls the linear transform used to interpret the input density map in terms of a physical mass density. Specifically, it sets a threshold level in units and numbers of the (potentially rebinned) input that distinguishes signal from background. Since measurements often have low contrast, the threshold is not an obvious property of the input map. The threshold set here corresponds to parameter ω_{t} in the equation above. It is primarily responsible for the overall scaling factor, i.e., larger threshold values will generally produce interpreted maps with a wider spectrum of physical density values. Using the apparent molecular volume and the total mass, the chosen threshold directly determines the apparent physical density (reported in logoutput). This quantity poses constraints on the chosen value, because the integrated signal must be yielding a density larger than the assumed physical background density.EMTOTMASS
If the density restraint potential is in use, this keyword sets the mass in g/mol to be assumed to correspond to the signal in the input density map exceeding the threshold. In general, this can be set to correspond exactly to the explicitly represented matter in the simulation (this is the default), but exceptions may desire an override, e.g., when simulating only a part of the system without wanting to distort the interpretation of the map. The parameter corresponds to M_{M} in the equation above.EMTRUNCATE
If the density restraint potential is in use, this keyword enables truncation of the input map below the chosen value as long as it is higher than the minimum and lower than the assumed threshold level (ω_{t} in the equation above). Truncation implies that the spectrum of values for the interpreted density is completed depleted below the specified level, because all values are simply assigned the background level, ω_{bg}. This technique can be used to eliminate noise from the input that may hamper sampling. Note that values refer to the original input density map. This keyword is the exact complement to EMFLATTEN.EMFLATTEN
Depending on how a density map is generated, the signal may cover a wide spectrum of values. This is particularly true if the contrast to the background is generally low, and the lack of contrast is compensated for by averaging over similar, but heterogeneous conformations. In such cases, the ratio of peak to barely detectable signals may be impossible to describe by physical densities of instantaneous conformations. If the density restraint potential is in use, this keyword therefore allows the user to flatten an input density map at a given level specified by this keyword. The requirement is that the value be larger than the assumed threshold level. This keyword is the exact complement to EMTRUNCATE, and using both concurrently can produce an interpreted map that is purely an envelope of homogeneous density.EMHEURISTIC
The evaluation of the density restraint potential involves the summation of contributions from all the grid cells. Each cell contributes a squared difference of the input density and the actual density for the current conformation of explicit matter in the system. If the formal resolution is high, the evaluation of the potential can be costly. Occasionally, it may be possible to save some CPU time by applying dedicated heuristics, and this is what is controlled by this keyword. Choices are as follows: No heuristic is used. At each global evaluation of the density restraint potential, all grid cells are recomputed and summed up.
 When spreading the atomic masses in the system onto the analysis and evaluation grid, CAMPARI keeps track of whether any given xzslice of the input map actually received a contribution from any atom. If not, the cells constituting this xzslice are not recomputed, but instead a precomputed value for the entire slice is used. This is possible because the simulation densities in all the cells of the slice will be equivalent to the assumed background density. Efficacy of this heuristic obviously depends on the details of the system.
 This works identically to the previous option, except that xlines are considered rather than xzslices.
 This works identically to the previous options, except that local rectangular supercells are used rather than xzslices or xlines. Here, the algorithm will try to combine existing grid cells to yield approximately 1000 supercells. This option is probably the most successful in general, because it can match arbitrary arrangements of explicit matter best.
GHOST
This keyword is a simple logical that determines whether or not to (partially) "ghost" the interactions of selected particles (see FEGFILE) with the rest of the system (and eventually amongst themselves → FEG_MODE). Such scaling of interactions creates artificial systems which can be used to interpolate between two welldefined end states. The most common need for such an application arises in cases where the two end states are significantly different and one is interested in the free energy difference. For example, to calculate the aqueous free energy of solvation of a small molecule in water, one could scale the interactions of the small molecule with water from zero to their full value. Such growthbased calculations are usually complicated to set up and perform since i) trajectories evolved at a given Hamiltonian have to be evaluated (onthefly usually) assuming different Hamiltonians, and ii) it is difficult to maintain an internally consistent system of interactions such that all changes induced by the ghosting can be mapped to atomic parameters of the ghosted species. In CAMPARI, FEG (free energy growth/ghosting) calculations are therefore supported in conjunction with limited Hamiltonians only: the only potentials allowed are IPP, ATTLJ, POLAR, and the bonded interactions. In other cases, it may be possible to extract the same or related quantities through other techniques realizable in CAMPARI. As an example, the free energy of solvation for a flexible (single) solute immersed in the ABSINTH continuum solvation model can be obtained by simultaneously scaling the dielectric from 1.0 to 78.0 and the DMFI from 0.0 to 1.0. The default settings for the auxiliary keywords to GHOST are such that the molecules or residues listed in FEGFILE will be completely ghosted (i.e., invisible to the system).FEG_MODE
In FEG calculations interactions (see GHOST) are always scaled between the ghosted species and the rest of the system. A natural question arises as to what happens to interactions between or within ghosted species (if any are present)? If they are not scaled but instead use the background Hamiltonian it will be impossible to map the effect of the scaling to a change in atomic parameters which is desirable from the viewpoint of rigor. As an example, consider polar interactions between a single ghosted butane molecule and a bath of nonghosted water. A scaling of the atomic charges on the ghost butane by a factor f would give rise to interactions with the bath scaled by f and selfinteractions scaled by f^{2}. This type of scaling is enforced in CAMPARI if a method requires it such as treating electrostatics with the reactionfield method (see LREL_MD). In general, however, it is impossible to find a unique mapping while leaving the background Hamiltonian intact. It is therefore left to the user to determine which of two options to choose:1) Interactions between/within ghosted species use the full background Hamiltonian.
2) Interactions between/within ghosted species use the scaled Hamiltonian.
The choice made here is important only if such interactions are present in the system. If so, however, the raw results will usually depend strongly on it and corrections may have to be applied. As an example, consider the butanewater example from above. The fact that intramolecular interactions are scaled will contribute toward the apparent free energy obtained when interpolating between the fully ghosted and the fully present states. Hence, gas phase corrections have to be applied. They are obtained by repeating the calculation in the absence of water to compute the thermodynamic cycle which then allows isolating the free energy of solvation. Additional complications may arise if molecules are constrained (see FRZFILE).
FEG_IPP
This keyword specifies the "outside" scaling factor for the ghosted inverse power potential. Note that depending on the choice for FEG_LJMODE this is not as simple as SC_IPP and that additional parameters may determine the impact this keyword has. The setting here corresponds to the parameter s_{gIPP} below. Note as well that the inverse power potential supported in calculations with ghosted interactions always uses an exponent of 12 (i.e., setting IPPEXP to anything but the default of 12 will cause CAMPARI to abort). This keyword is only relevant if GHOST is true.FEG_ATTLJ
This keyword is analogous to FEG_IPP but controls the "outside" scaling of the attractive r^{6} dispersive term. The setting here corresponds to the parameter s_{gattLJ} below. Note that scaling this up while FEG_IPP is set to zero (or  depending on the mode  even set to something smaller) will potentially lead to numerical instabilities.FEG_LJMODE
The exact functional form of the scaled (ghosted) LennardJones potential is as follows:E_{gLJ} = 4.0·ΣΣ_{i,j} ε_{ij}f_{14,ij}·[ g(s_{gIPP})·[α·h(s_{gIPP}) + (r_{ij}/σ_{ij})^{6}]^{2}  g(s_{gattLJ})·[α·h(s_{gattLJ}) + (r_{ij}/σ_{ij})^{6}]^{1} ]
Here, the ε_{ij} and σ_{ij} are the standard pairwise LennardJones parameters (see PARAMETERS), the f_{14,ij} are potential 14 fudge factors (see FUDGE_ST_14) that generally will be unity, g(s) and h(s) are auxiliary functions whose functional form depends on the choice for this keyword, and α is the socalled softcore radius (unitless). The two scaling factors s_{gIPP} and s_{gattLJ} are provided by keywords FEG_IPP and FEG_ATTLJ). There are three possible choices determining g(s) and h(s):
 g(s) = s
h(s) = 0  g(s) = s^{f1}
h(s) = 1.0  s^{f2}  g(s) = (1.0  e^{sf1})/(1.0  e^{f1})
h(s) = (1.0  s)^{f2}
FEG_LJRAD
This keyword allows the user to specify the parameter α in the above equations (see FEG_LJMODE), i.e., the softcore "radius" for the modified LennardJones potential. It is generally of limited utility to set this to zero since in that case the scaled potential could as well be created by setting FEG_LJMODE to 1 in which case this parameters becomes meaningless. Conversely, for large softcore radii, the potential is modified for large distances which generally represents unnecessary modification which may slow down convergence in free energy calculations relying on interpolation via ghosting. Generally speaking, values around 0.5 are recommended for either mode 2 or 3. This keyword is only relevant if GHOST is true.FEG_LJEXP
This keyword sets the parameter f_{1} in the above equations (see FEG_LJMODE). It represents a way to  in a simple way  alter the weight of change experienced by the system depending on the choices of FEG_IPP and FEG_ATTLJ. In that sense, it is very closely tied to the design of the interpolation schedule (i.e., both address the exact same issue). There are no gold standard rules for picking this and the user is referred to the literature for further details. In case of free energy calculations, it will be best to inspect the schedule empirically by metrics such as the statistical precision of the pairwise estimates or overlap metrics such as (theoretical) swap probabilities and to then refine either the schedule itself or the global settings accordingly. This keyword is only relevant if GHOST is true.FEG_LJSCEXP
This keyword sets the parameter f_{2} in the above equations (see FEG_LJMODE). Much of the same discussion applies here as already mentioned for keywords FEG_LJRAD and FEG_LJEXP. This keyword is only relevant if GHOST is true.FEG_POLAR
The only other nonbonded potential besides LennardJones supported in FEG calculations is the polar potential (see SC_POLAR). This keyword provides a scaling factor (s_{gPOLAR}) for the softcore Coulomb potential. Much similar to the case for scaled LJ interactions (see above), this may involve three additional parameters (see FEG_CBMODE). Note that it would be most common to only scale this up while FEG_IPP is set to unity so as to avoid potential numerical instabilities.FEG_CBMODE
In analogy to FEG_LJMODE, this keyword determines what exact functional form CAMPARI uses for the scaled (ghosted) Coulomb potential with the "outside" scaling factor s_{gPOLAR} set by\ FEG_POLAR):E_{gLJ} = (4.0πε_{0})^{1}·ΣΣ_{i,j} g(s_{POLAR})·q_{i}q_{j}·f_{14,C,ij}·[α_{C}·h(s_{gPOLAR}) + r_{ij}]^{1}
Here, the atomic partial charges are represented as q_{i,j}, ε_{0} is the vacuum permittivity, and r_{ij} is the interatomic distance. f_{14,C,ij} denotes potential fudge factors acting on 14separated atom pairs (see FUDGE_EL_14) but will generally assume a value of unity. g(s) and h(s) are the same auxiliary functions defined above for the LennardJones potential (→ FEG_LJMODE) and α_{C} is the softcore radius (unitless) specific to the Coulomb potential (controlled by keyword FEG_CBRAD). For completeness the options are listed again in detail:
 g(s) = s
h(s) = 0  g(s) = s^{fC,1}
h(s) = 1.0  s^{fC,2}
FEG_CBRAD
This keyword is analogous to FEG_LJRAD and allows the user to choose the value for the softcore radius specific to the Coulomb potential (α_{C} in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.FEG_CBEXP
This keyword is analogous to FEG_LJEXP and allows the user to choose the value for the polynomial scaling exponent to the Coulomb potential (f_{C,1} in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.FEG_CBSCEXP
This keyword is analogous to FEG_LJSCEXP and allows the user to choose the value for the softcore scaling exponent to the Coulomb potential (f_{C,2} in the equations under FEG_CBMODE). The specification is meaningless if FEG_CBMODE is set to 1.FEG_BONDED_B
Nonbonded interactions provide a straightforward interpretation for parsing the energetics of the system into solutesolvent, solutesolute, and solutesolvent contributions. This is used in a thermodynamic cycle argument when computing  for instance  the free energy of solvation of a solute in solvent via FEG methods. Sometimes (as alluded to under FEG_MODE), it may be desirable to scale intramolecular nonbonded interactions as well. But what about intramolecular bonded interactions? This keyword allows the FEGlike scaling of bonded terms associated with a ghosted species but not of those associated with nonghosted particles. Beyond that this keywords operates just like SC_BONDED_B. Note that this almost certainly creates a pathological situation if bond length potentials are allowed to approach zero and naturally relies on bond lengths being allowed to vary (see CARTINT) to be meaningful. Note that for all bonded parameters the assignment of terms to individual residues in a multiresidue molecule is somewhat arbitrary if atoms from two different residues participate.FEG_BONDED_A
This is analogous to FEG_BONDED_B only for bond angle potentials. Note that this may lead to a pathological simulation if bond angle potentials are allowed to approach 0° or 180° and  again  relies on bond angles actually being varied throughout the simulation to be meaningful.FEG_BONDED_I
This is analogous to FEG_BONDED_B only for improper dihedral angle potentials. Note that this may lead to a pathological simulation if improper dihedral angle potentials are allowed to approach zero and  again  relies on these degrees of freedom actually being varied throughout the simulation to be meaningful.FEG_BONDED_T
This is analogous to FEG_BONDED_B only for proper dihedral angle potentials. Note that this relies on torsional angles actually being varied throughout the simulation to be meaningful (there may be subsets).FEGREPORT
This simple logical keyword lets the user instruct CAMPARI to write out a summary of the ghosted particles (residues or molecules) in free energy growth/ghosting calculations.SCULPT
The accelerated molecular dynamics method of Hamelberg et al. offers a general (parameterdependent) way to modify the potential energy landscape or individual terms thereof (torsional potentials and 14 interactions have been used most often). The idea is that a controlled modification of the landscape that leads to reduced barrier heights is capable of massively accelerating the effective dynamics without reducing the ensemble overlap dramatically. CAMPARI offers a generalization of this approach as follows:E_{ELS} = Σi E_{i} + ΔE_{i,ELS}
ΔE_{i,ELS} =  0  if V_{i}^{f} < E_{i} < V_{i}^{s} 
(V_{i}^{f}  E_{i})^{2}/(V_{i}^{f}  E_{i} + α_{i}^{f})  if V_{i}^{f} > E_{i}  
(V_{i}^{s}  E_{i})^{2}/(V_{i}^{s}  E_{i}  α_{i}^{s})  if E_{i} > V_{i}^{s} 
Here, the sum runs over all active terms of the Hamiltonian. These are generally the terms CAMPARI offers a global scaling factor for, e.g., the total DMFI of the ABSINTH model, E_{DMFI}, the total sum of improper torsional potentials, E_{BONDED_I}, etc. Limitations are discussed below. By default, the threshold energy parameters for every energy term, V_{i}^{f} and V_{i}^{s}, are initialized such that E_{i,ELS} is always zero, i.e., no sculpting occurs. They can be modified with the auxiliary keywords ELS_FILLS and ELS_SHAVES. Naturally, V_{i}^{f} must always be less than or equal to V_{i}^{s}. The parameters α_{i}^{f} and α_{i}^{s} must always be greater than or equal to zero. They serve as buffer parameters. The modified energy landscape for a given term has two possible modifications. First, its low energy states (local minima) can be filled up. Using α_{i}^{f} as zero flattens all low energy states to the specified threshold, V_{i}^{f}. Larger values for α_{i}^{f} preserve the unbiased shape of the landscape more and more, and the limit of α_{i}^{f} reaching infinity recovers the unbiased potential exactly. Second, its high energy states (barrier regions) can be shaved off and the use of α_{i}^{s} as zero flattens all barrier regions to the value of V_{i}^{s} exactly. The effect of larger values is exactly analogous. Note, however, that potentials allowing for large positive energy values must be treated with caution (notably inverse power potentials). The value of ΔE_{i,ELS} for large negative values of (V_{i}^{s}  E_{i}) obviously approaches (V_{i}^{s}  E_{i}) itself, which means that the barriers are more or less completely eliminated. This can be dangerous in conjunction with attractive nonbonded interactions (numerically speaking) and also lead to poor behavior during reweighting (see below).
This keyword (SCULPT) allows the user to specify one or more terms to be sculpted (list of integers). The choices available correspond exactly to the columns of output file ENERGY.dat (click the link for a list). It includes the total energy (choice 2), which is mutually exclusive with any other term. There are further limitations as follows:
 In gradientbased simulations (including hybrid runs), nonbonded interactions can only be controlled as a single joint term (sum), viz., the sum of all active shortrange steric interactions (see SC_IPP, SC_ATTLJ, and SC_WCA) as well as polar and tabulated interactions (see SC_POLAR and SC_TABUL). The correct code to use for this joint term is 15.
 The use of the (quasiobsolete) correction potential is not supported when using any energy landscape sculpting.
ELS_FILLS
If the energy landscape sculpting method is in use, this keyword supplies the parameters V_{i}^{f} described above. Values are to be provided in kcal/mol. For example, if the choices for SCULPT are "20 22", then a choice for ELS_FILLS of "5.0 5.0" would provide lower threshold energies of kcal/mol each to both proper dihedral angle potentials and to CMAP potentials. It is not possible to skip values, i.e., the length of the list supplied here should be identical to that for SCULPT. To disable the basin filling aspect of sculpting, it is generally safe to supply a very large negative energy here.ELS_SHAVES
If the energy landscape sculpting method is in use, this keyword supplies the parameters V_{i}^{s} described above. Values are to be provided in kcal/mol. The interpretation is identical to keyword ELS_FILLS above. To disable the barrier shaving aspect of sculpting, it is generally safe to supply a very large positive energy here.ELS_ALPHA_F
If the energy landscape sculpting method is in use, this keyword supplies the parameters α_{i}^{f} described above. Values are to be provided in kcal/mol and must be zero or positive. Note that a choice of zero inevitably leads to force discontinuities. In addition, the absence of any force (flat surface) will lead to the natural shape of the landscape being completely forgotten, which can deteriorate the statistical significance of the reweighted results.ELS_ALPHA_S
If the energy landscape sculpting method is in use, this keyword supplies the parameters α_{i}^{s} described above. Values are to be provided in kcal/mol and must be zero or positive. The keyword is interpreted identically to ELS_ALPHA_F above and applies to the barrier shaving aspect.ELS_PRINT_WEIGHTS
If the energy landscape sculpting method is in use, this keyword controls the output frequency for output file ELS_WFRAMES.dat, which contains the corresponding simulation step numbers (that will of course increase in steps of ELS_PRINT_WEIGHTS) and the associated weights. These weights are derived from knowledge of the applied net sculpting potential for each snapshot as w_{i} = exp(β E_{ELS}). They can be used in a trajectory analysis run with usersupplied frame weights. Note that large positive values of the sculpting potential will make the reweighting susceptible to shotlike noise (due to few conformations receiving very large weights).EWALD
CAMPARI supports using the Ewald decomposition technique to compute longrange electrostatic interactions in periodic systems (see LREL_MD). There are two supported approaches to computing the reciprocal space sums in the Ewald formalism: ParticleMesh Ewald (PME): This elegant and vastly popular method
introduced by Darden et al.
uses discrete Fourier transforms (DFFTs) and
cardinal Bsplines to simplify the computation of the reciprocal space
sum.
Due to the DFFTs, CAMPARI needs to be linked against the free open
sourcelibrary
FFTW for this option to be available.
Briefly, PME reciprocal space sums have different scaling components,
i) the
number of charges; ii) the number of gridpoints; iii) the
interpolation order
for the cardinal Bsplines. It depends strongly on the system which of
these
components is the speedlimiting factor, in particular since the
accuracy of the reciprocal sum depends on the simultaneous optimization
of the spline order (see
BSPLINE) and the gridsize (EWFSPAC) given
that the realspace part codetermines the Ewald parameter (EWPRM). Note,
however, that the fundamental scaling with the number of charges is
O(N). The performance of PME is only partially controlled
by CAMPARI as the library calls can be (and often are) the bottleneck. Coarser grids, higher
spline orders, and higher number densities of partial charges decrease the relative workload of the DFFTs.
The general performance of the DFFTs can sometimes be improved by providing or computing a better "plan".
This is supported by keywords EWFFTPLANNER and
EWWISDOMFILE.
If the shared memory (OpenMP) parallelization of CAMPARI is in use,
this also triggers calling the threaded FFTW library. The performance of this call
is tricky to predict, however, because it is spawned from a multithreaded execution region
to begin with (inside an OpenMP MASTER construct). This causes additional thread generation and destruction
operations that are a cost factor, both directly and indirectly (through the kernel having to
manage a temporarily oversubscribed machine). It implies that the performance
results become strongly dependent on the thread affinity model and respond to
environment variables such as OMP_PROC_BIND. Keyword THREADS_TEST
allows a quick way to test parallel FFTW performance for the system at hand.
Irrespective of these complications, PME is the recommended (since fastest)
implementation of Ewald sums.
 Standard Ewald: A straightforward computation of the reciprocal part of the original decomposition introduced by Ewald is supported by CAMPARI as well. This method is slow and scales poorly (K^{3}) with the (linear) cutoff size in the reciprocal dimension. Much like PME, the reciprocal sum fundamentally scales as O(N) with the number of charges, however. Standard Ewald might occasionally be a reasonably efficient alternative should tight cutoffs in reciprocal space be permissible (or should PME be slowed down due to a dominant cost imposed by DFFTs such as in very dilute systems using big boxes). If the shared memory (OpenMP) parallelization of CAMPARI is in use, the standard Ewald sum is expected to scale well as long as the number of residues is reasonably large.
EWERFTOL
If the Ewald method is used for treating longrange electrostatic interactions, this keyword can be used to set an accuracy tolerance for the tabulated computation of the (complementary) error function (and its derivative). This uses additional tricks to save operations (details omitted). Because the tabulated values can be a significant amount, the performance of this implementation is usually cache rather than FLOPlimited. The tabulation can be disabled at the compilation stage by passing the variable "DISABLE_ERFTAB" (see installation instructions) as many modern compilers offer support for fast math libraries occasionally also with controllable precision.BSPLINE
When using the PME method (see LREL_MD and EWALD), this keyword determines the order of the cardinal Bsplines to be used. The order can be increased at a moderate cost, such that it is sometimes advantageous to choose a higher interpolation order coupled to a relatively coarse mesh (see EWFSPAC) instead of a lower interpolation order coupled to a finer mesh. The default order is 6, and currently only even numbers are permitted (uneven numbers are adjusted to the default automatically). For various reasons, it is not recommended to use orders below 4. In any case, it can be useful to try different settings and study the predicted accuracy and initial energies that are reported at the beginning (summary of the calculation written to logoutput).EWFSPAC
When using the PME method (see LREL_MD and EWALD), this keyword determines the grid spacing for the mesh in Å. A smaller value yields a finer mesh which in turn yields more accuracy. The cost associated with finer grids easily becomes substantial (K^{3}scaling), though, even when using the DFFTs provided by FFTW. The code will occasionally adjust too coarse a value since the interpolation order (BSPLINE) requires a certain minimum for the number of available mesh points in each dimension. When using the standard Ewald method, keyword EWFSPAC determines the reciprocal space cutoffs to either side directly as the ratio of half the box side length and itself. In any case, it can be useful to try different settings and study the predicted accuracy and initial energies that are reported at the beginning (summary of the calculation written to logoutput).EWPRM
When using the Ewald method (see LREL_MD and EWALD), this can be used to overwrite the automatically determined value for the Ewald parameter. The Ewald parameter is given in units of Å^{1} (but can just as well be defined as a dimensionless parameter). It determines the relative weight of the realspace and the reciprocal sum in determining the total electrostatic energy of the system. The larger EWPRM is the more weight shifts to the reciprocal sum. Note that the accuracy of the Ewald method is highly sensitive to this parameter in conjunction with the realspace and reciprocal space cutoffs and that a catastrophic lack of accuracy can easily be realized. Therefore, the code tries to determine a reasonable value for the Ewald parameter based on the (hard) settings for the realspace cutoff (NBCUTOFF) as well as EWFSPAC and  in the case of the PME method  BSPLINE. Unfortunately, the accuracy predictor formulas in use are currently somewhat flawed (they are based on the mean force error estimates presented by Petersen). They should be more accurate for the standard Ewald method than for PME since in the latter certain error contributions from the splinebased interpolation are missing. Hence, the automatically chosen parameter should by no means considered an optimal one, merely one which  given the cutoff settings  provides comparatively small errors in forces and energies. Should the procedure be deemed inadequate or should there be an independent estimate of the error this keyword comes into play. In any case, it can be useful to try different settings and study the predicted accuracy and initial energies that are reported at the beginning (summary of the calculation written to logoutput).EWFFTPLANNER
This keyword is only relevant if the particlemesh Ewald is used (see LREL_MD and EWALD). It can then be used to control how hard the linked FFTW library will try to compute an efficient plan for the involved DFFTs before the start of the simulation. The options are as follows: A heuristic and practically costfree estimate is used. For simple cases and geometries, this is often appropriate enough (corresponds to FFTW_ESTIMATE).
 Explicit measurements are performed to pick a reasonable plan but the algorithmic space explored is limited (this is the default and corresponds to FFTW_MEASURE).
 Explicit measurements are performed to pick a reasonable plan and the algorithmic space is widened relative to the previous option (corresponds to FFTW_PATIENT).
 Many explicit measurements are performed across a wide algorithmic space to determine the best plan (this is very expensive and corresponds to FFTW_EXHAUSTIVE).
 A previously determined plan is read in in the form of a "wisdom file" (see EWWISDOMFILE).
EWSAVEWISDOM
This keyword is only relevant if the particlemesh Ewald is used (see LREL_MD and EWALD). It instructs CAMPARI to save a DFFT plan generated by the FFTW library in the form of a wisdom file. For this, the value for EWFFTPLANNER has to be 14. Note that wisdom files are not compatible between the threaded and serial FFTW libraries. The former is evoked automatically if the shared memory (OpenMP) parallelization of CAMPARI is in use. The name for the wisdom file is chosen by keyword EWWISDOMFILE.EWWISDOMFILE
This keyword is only relevant if the particlemesh Ewald is used (see LREL_MD and EWALD). It provides the name for a FFTW wisdom file to be either read in (if EWFFTPLANNER is 5) or written (if EWFFTPLANNER is not 5). Note that the wisdom file is an autonomous output file of the FFTW library and not documented further either here or in the dedicated documentation. Users can find more information elsewhere (links out).RFMODE
When using the ReactionField method (see LREL_MD), this keyword determines whether the corrections include a continuum electrolyte assumption (generalized reaction field) or not: The generalized reactionfield correction is used. The code determines the concentration of net charges (including those which are part of macromolecules) and derives an effective ionic strength. This bulk electrolyte concentration is used to model the dielectric response outside of the cutoff sphere for an individual charge in a PoissonBoltzmann sense.
 The standard reactionfield correction is used. Irrespective of the existence of free, net charges in the system, the dielectric response is simply an approximate solution to the Poisson equation.
Cutoff Settings:
(back to top)
CUTOFFMODE
If nonbonded interactions dependent on interatomic distances are in use (IPP, ATTLJ, WCA, IMPSOLV, TABUL, and POLAR), it is often necessary to truncate these interactions. Historically, there have been a large number of implementations to achieve this, both in how to to effectively determine the interactions to compute and how to deal with the force discontinuity and truncation. CAMPARI does not implement empirical switching functions. The WCA and IMPSOLV potentials have exact cutoffs by virtue of their functional forms. IPP, ATTLJ, and TABUL can only be truncated. Longrange electrostatics options are supported (LREL_MC and LREL_MD). Keyword CUTOFFMODE controls whether to apply truncation at all and how to search for nearby interaction partners. This is (currently) always done using residuebased, buffered neighbor lists. The neighbor lists are in general not postprocessed to achieve exact truncation at the chosen distance, which means that the effective range of interactions is larger (dependent on the buffer size, which are computed from residue radii).All cutoff implementations are responsive to the shared memory (OpenMP) parallelization although some particular combinations of samplers, cutoffs, and Hamiltonians may not be supported. In general, the parallel efficiency of the neighbor lists calculations is acceptable. The major performancelimiting factor is that relatively complex and large data structures are used in relation to relatively low number of floating point operations, which can expose weaknesses in cache management.
The following modes are available:
 If  for whatever reason  cutoffs are undesirable, the code will assume that all residues are spatial neighbors and compute all interactions at every step. Note that not all combinations of samplers and Hamiltonians might support this option since optimized loops relying on neighbor lists are often employed (and/or the method may rely in its formulation on a cutoff). Limited support (or performance) may also exist if the shared memory (OpenMP) parallelization is in use. It implies that most other keywords in this section become meaningless (e.g., NBCUTOFF, LREL_MD, etc.).
 This option is obsolete.
 This option instructs CAMPARI to employ gridbased cutoffs. The
gridassociation
is governed at the residue level by the position of the residues'
reference atoms.
All gridbased methods (with a uniform mesh) are difficult/inefficient
for systems with very
asymmetric density (such as a single very long
extended chain in a large periodic box) since those systems would
either require too large grids
(inefficient and memoryconsuming) or are so coarse
that no efficient prescreening can occur. Gridbased cutoffs are a
good choice for
systems with homogeneous density and many small (few atoms) residues.
They are absolutely indispensable for simulations of large explicit
water systems as any
other cutoff mode supported by CAMPARI will critically slow down
simulations
in such scenarios. Like all cutoffs in CAMPARI, the grid is used with a
buffer size dependent on residue radii (in addition to the actual interaction
cutoffs, viz., NBCUTOFF and
ELCUTOFF). The parameters of the grid are controlled using a number of keywords:
GRIDDIM, GRIDMAXGPNB, and
GRIDMAXRSNB. If the shared memory (OpenMP) parallelization
is used, the performance of gridbased cutoffs is hampered if residues are reassigned frequently,
which happens in many Monte Carlo calculations (all trial moves matter, not just the accepted ones).
This is because the global copy of the grid association must be kept in sync.
 The last available option instructs CAMPARI to employ topologyassisted cutoffs. Here, interatomic distances are simply prescreened by a master value for the two reference atoms of residue pairs. This takes advantage of molecular topology to simplify the generation of spatial neighbor lists since only residues which pass the prescreen are assumed to be spatial neighbors. Note that the program will compare the distance between the two reference atoms to the sum of the cutoff and the effective radii of the two residues in questions. These radii are currently hardcoded. This mode is the method of choice for systems with heterogeneous density and/or large (many atoms) but relatively few (<1000) residues. Note that in the presence of nonbonded interactions method 3 reduces the scaling of CPU time with system size from N^{2} to something considerably faster (where N is the number of atoms). Method 4 does not change the scaling behavior but reduces the constant factor for this cost dramatically (by experience, for a water box of ~1000 molecules, mode 4 is still slightly faster). This option should generally perform well in conjunction with the shared memory (OpenMP) parallelization of CAMPARI.
NBCUTOFF
If cutoffs of nonbonded interactions have been requested, this keyword sets the interaction range for a part of the nonbonded interactions. It is interpreted differently dependent on the type of calculation: For Monte Carlo calculations (see DYNAMICS), it simply sets the nonbonded (IPP, ATTLJ, WCA, and IMPSOLV) cutoff in Å. Neighbor lists are populated based on this value, and exact truncation is performed unless keyword MCCUTMODE is set differently from the default. All the potentials governed by NBCUTOFF should conceptually be shortrange in nature.
 For gradientbased calculations, it defines the shortrange regime, within which all interactions and forces are computed at every time step. It never truncates said interactions at a distance of NBCUTOFF Å (i.e., there is no equivalent of keyword MCCUTMODE for gradientbased calculations). These types of cutoffs are residuebased and use buffered neighbor lists, which means that the true interaction range is larger than the value specified (unless the simulation involves only monoatomic molecules).
ELCUTOFF
If cutoffs of nonbonded interaction have been requested, this keyword sets the interaction range for the remainder of the nonbonded interactions (not covered keyword NBCUTOFF). It is interpreted differently dependent on the type of calculation: For MC calculations (see DYNAMICS), it simply sets the second nonbonded (TABUL and POLAR) cutoff in Å. All the potentials governed by ELCUTOFF are potentially longrange in nature. Note that interactions beyond this second cutoff, which are Coulomb terms involving moieties flagged as carrying a net charge, are potentially still calculated (see LREL_MC).
 For gradientbased calculations, it defines the midrange regime, within which all interactions and forces are computed accurately, but only every n^{th} time step, i.e., at a lower frequency which is set by the neighbor list update frequency (see NBL_UP). It truncates said interactions at a distance of ELCUTOFF unless they involve longrange electrostatic corrections (in particular in cases involving Coulomb terms involving moieties flagged as carrying a net charge). The twinrange terms (forces and energies stemming from particle pairs with distances between NBCUTOFF and ELCUTOFF Å) are assumed to be approximately constant for the number of steps between neighbor list updates. Twinrange cutoffs are explicitly disallowed for the Ewald and reactionfield methods. If CAMPARI computes additional interactions, i.e., if LREL_MD is either 4 or 5, these interactions are subjected to the same assumption for forces and energies (particle pairs with distances beyond ELCUTOFF Å).
MCCUTMODE
When using nonbonded interaction potential in conjunction with cutoffs, Monte Carlo calculations typically truncate shortrange interactions (IPP, ATTLJ, WCA, and (usually not an issue) DMFI) exactly at the cutoff distance. Conversely, in gradientbased calculations, the cutoff on shortrange terms is always used exclusively to populate the corresponding neighbor lists. These two settings are not identical except for pairs of single atom residues. Keyword MCCUTMODE can be used to switch from the default (mode 1) to the same residuelevel exclusion approach as in dynamics calculations (mode 2). This is essential for achieving exactly the same Hamiltonian in hybrid MC/MD calculations or for gradient testing.NBL_UP
This keyword provides the update frequency for neighbor lists in gradientbased calculations. Every NBL_UP^{th} step, it is recalculated which residues are within a distance of NBCUTOFF Å (shortrange) and which ones are within a distance of ELCUTOFF Å (midrange). Interactions with the former are computed at every time step explicitly and those with the latter are computed only every NBL_UP^{th} step explicitly. For interactions outside of either cutoff, truncation occurs unless the electrostatic model chosen provides a longrange term (see LREL_MD). These latter interactions will then be recomputed at the same frequency as the midrange ones (with the exception of the reciprocal space sum in Ewald methods which is always computed at every step). Note that this keyword is irrelevant if CUTOFFMODE is set to 1, a setting useful only for debugging purposes.The assumptions made by this keyword are rather aggressive, and it is therefore recommended to use it with caution. Specifically, the neighbor lists here should not be thought of as "buffered" in any way. The integrator noise accumulating by setting this to something large can be quite substantial, and should probably be offset by a large choice for the outer cutoff distance (→ ELCUTOFF). Conversely, the use of residuelevel neighbor list with large effective radii tends to bloat the effective cutoff radius, which creates something akin to an effective buffer zone. This implementation may be changed in the future.
LREL_MC
This keyword determines CAMPARI's method of handling longrange electrostatic interactions in MC calculations. There are currently several options for this with more being added in the future. A general problem is hidden in the fact that MC calculations have to be able to compute relative energies of drastically different configurations at every step such that similarity assumptions cannot be used to speed up the calculations as is the case in MD/LD/BD. All monopoledipole and monopolemonopole interactions are computed explicitly (at full atomic resolution). By default, the governing factor is the parser for the partial charge sets which determines the individual charge groups (see option 2 for ELECMODEL and output files DIPOLE_GROUPS.vmd and MONOPOLES.vmd). Those with a total charge exceeding a threshold (usually zero) are considered "net charges", and those without are considered "dipoles". The flagging is at the residue level, and can be overwritten by a dedicated patch facility. Interactions between dipole groups are skipped even if one or both of the participating residues are flagged. For large systems, the number of interactions can grow dramatically of course. Using this option also requires allocation of a potentially large matrix if gridbased cutoffs are in use, which can hamper parallel performance.
 All monopolemonopole interactions are computed explicitly (at full atomic resolution). As in the option above, the flagging is at the residue level, and here both residues are required to be flagged. Dipoledipole and dipolemonopole interactions are skipped even if both of the participating residues are flagged. For plasmas or ionic liquids or concentrated ionic solutions, the number of interactions can become prohibitively large of course. It also requires allocation of a potentially large matrix if gridbased cutoffs is in use, which can hamper parallel performance.
 This is identical to the previous option except that monopolemonopole terms are computed at a reduced resolution, viz., polyatomic monopole groups are represented by collapsing the total charge onto a single atom, which is nearest to the true monopole center. This choice is currently the default. The same caveats as for option 2 apply.
 No additional interactions are computed (rigorous truncation).
Note that periodic boundary conditions are mutually inconsistent with any of the above treatments with the exception of truncation. This is because in PBC the largest effective cutoff value for nonbonded interactions must not exceed half of the smallest linear dimension of the box. In case of a hybrid sampler, the values for LREL_MC and LREL_MD should be matched to achieve a consistent Hamiltonian. Compatible values are 1/5 and 3/4, and 4/1 (LREL_MC/LREL_MD).
LREL_MD
Much like LREL_MC, this keyword controls how CAMPARI handles longrange electrostatic interactions in gradientbased calculations calculations. There are currently several options for this which are generally different from those available for Monte Carlo runs since two core assumptions are true for dynamics calculations; i) only global energy/force evaluations are needed; and ii) the system remains selfsimilar through several integration steps. The options are as follows: No additional interactions are computed, i.e., everything beyond the midrange cutoff is discarded. This setting can be used along with LREL_MC set to 4 and ELCUTOFF being equal to NBCUTOFF to create an exact match between dynamics and MC Hamiltonians which may be relevant for hybrid calculations (→ DYNAMICS).
 Ewald summation is used, which relies on periodic boundary conditions,and (currently) cubic boxes (→ BOUNDARY and SHAPE). This technique relies on the decomposition of an infinite sum over all periodic images into two quickly convergent contributions, a realspace and a reciprocal space part. The realspace part involves a modified Coulomb interaction, which therefore requires separate loops. Hence, support for Ewald sums is currently limited to "gasphase"type calculations with nonbonded interactions corresponding to LennardJones and polar interactions only. Even though possible in theory, there is currently no support for the ghosting of interactions, which is used in the context of free energy calculations. The reciprocal space part can be solved in a number of different ways (see EWALD and associated keywords). Note that the two cutoffs are collapsed into the shorter one (there is no midrange regime) when using Ewald techniques. Both the realspace and the reciprocal sums are recomputed at every step. Ewald summation replaces the standard Coulomb term and is relevant for all polar interactions even in the absence of full charges. It always requires the error function and a tabulated approximation exists in case the builtin variant is too slow (see installation instructions → DISABLE_ERFTAB and keyword EWERFTOL).
 The (generalized) reactionfield correction is used. The mode is picked with keyword RFMODE. This involves a modified Coulomb sum and relies on the assumption that truncation can be dealt with by assuming that a low dielectric cutoff sphere is embedded in a high dielectric medium, which gives rise to a reactionfield correction, which lets the force on a charge vanish at the cutoff distance if the difference in dielectric constants is large. The high dielectric is set with keyword IMPDIEL, and the size of the cutoff sphere is given by ELCUTOFF. This method requires modified Coulomb interactions and support for the type of nonbonded interactions is limited similar to Ewald sums except that the ghosting of interactions is supported for net neutral solutes. Note that reactionfield corrections assume dielectric homogeneity, i.e., the underlying theory breaks down if the effective dielectric inside or outside the cutoff sphere might become inhomogeneous. The latter is always the case, if, for example, a large enough macromolecule is present or if the system is nonperiodic. Note that algorithmically this is not a longrange correction and that (G)RFcorrected terms are computed with the same frequency as short and midrange terms are (see NBCUTOFF and ELCUTOFF). Due to stability issues, twinrange cutoffs are not allowed for reactionfield methods. Even then, the force discontinuity at the cutoff distance (vanishes only if the dielectric is assumed to be infinite) may cause more noise than a simple truncation scheme (option 1). The reactionfield solution replaces the standard Coulomb term, i.e., it is relevant for all polar interactions even in the absence of full charges.
 The same option as 3) in LREL_MC. The same rules and caveats apply. By matching the methods this way and setting the two cutoff criteria equal to one another, this allows a consistent choice of Hamiltonian in hybrid runs (→ DYNAMICS). This option is currently the default choice.
 The same option as 1) in LREL_MC. The same rules and caveats apply. By matching the methods this way and setting the two cutoff criteria equal to one another, this allows a consistent choice of Hamiltonian in hybrid runs (→ DYNAMICS).
GRIDDIM
If gridbased cutoffs are in use (→ CUTOFFMODE), this keyword allows the user to specify the three integers determining the x,y,z dimensions for the rectangular cutoff grid. The origin and the size of the grid are determined by the box parameters (see BOUNDARY and SHAPE). In a droplet boundary condition, the grid cannot be aligned with the simulation container exactly, and parts of it are wasteful. The extra buffer space is computed automatically, and this may lead to crashes of CAMPARI complaining that a part of the system is "off the grid". This most often occurs with an unstable (exploding) simulation but can also happen if a residuebased boundary condition is used in conjunction with bulky residues or if the restraining force is very small.The total number of grid points should not be so large that operations scaling linearly with this number become a contribution of significant computational cost. Setting the size of the grid cells equal to the cutoff is typically not an effective strategy due to the requirement of having large margins. The latter are a result of the residuebased grid association CAMPARI uses which requires accounting for the effective residue radii in determining spatial neighbor relationships via the grid.
GRIDMAXRSNB
If gridbased cutoffs are in use (→ CUTOFFMODE), this keyword allows the user to specify an initial limit for the maximum number of residues associated with a single grid point. Arrays are dynamically resized during the simulation but if the initial setup fails already, an error is returned (see also GRIDMAXGPNB). This keyword is required mostly so CAMPARI has a realistic estimate of the required memory at the beginning.GRIDMAXGPNB
If gridbased cutoffs are in use (→ CUTOFFMODE), static gridpoint neighbor lists are set up initially and used to simplify the generation of neighborlists using the grid. This keyword specifies the maximum number of gridpoint neighbors each gridpoint may possess. If the number is too small, the program will fail during the initial setup. This is again to avoid inadvertent memory emergencies (as for GRIDMAXRSNB).It can be annoying to find an acceptable value for this keyword as the distance range depends on the system and the grid. For a big system, it may be advisable to use a temporary sequence file with just the largest residue present to speed up the remainder of the initial setup. Once a proper value has been found for GRIDMAXGPNB, the real sequence can be restored and GRIDMAXRSNB can be calculated relatively easily.
GRIDREPORT
If gridbased cutoffs are in use (→ CUTOFFMODE), this simple logical instructs CAMPARI to write out a summary of the initial grid occupation statistics.CHECKFREQ
This keyword is interpreted differently dependent on the type of calculation. In pure gradientbased simulations, CHECKFREQ simply sets the interval for how often to report global ensemble variables to log output. This can be useful to track simulation progress and make sure that no unexpected behavior (instability) occurs. This output overlaps with output file ENSEMBLE.dat. There is no significant cost incurred by this reporting as the relevant numbers have been computed anyway. In trajectory analysis runs, CHECKFREQ is ignored. For monitoring the progress of processing large input data sets, keyword FLUSHTIME can be used instead.CHECKFREQ takes on a more important role in Monte Carlo calculations or the MC stretches of hybrid sampling runs. Here, it specifies the interval (in elementary steps) how often to recompute the total energy globally. This number is compared to the incremental energy obtained from the energy updates for individual MC moves (which do not compute the global nonbonded energy). The global value supersedes the incremental one (i.e., it is a reset). The numerical drift error from the incremental calculations is usually very small. Thus, the reference energy can be chosen to be either the same as what propagates the Markov chain (affected by keyword CUTOFFMODE and all associated choices) or it can be chosen as the N^{2} sum assuming a lack of cutoffs. This is controlled by keyword N2LOOP. The choice of reference energy has no implications for the Markov chain but can (and usually does) affect absolute energy values. This may be relevant for certain free energy calculations, for comparisons of simulation results obtained with different cutoff lengths, etc. Whenever absolute energies need to be comparable, it is best that that N2LOOP is set to zero. If it is not zero and the cutoffassisted and N^{2} energies differ, the cutoffsensitive values reported to output file ENERGY.dat will begin to deviate within each interval of CHECKFREQ steps. In this case, consistent output to ENERGY.dat is achievable only if ENOUT is a multiple of CHECKFREQ. The drifting inconsistency in each interval is precisely what was the original motivation of the output, i.e., to understand the magnitude of cutoff effects and to be able to diagnose the correctness of incremental energy calculations. As an additional function, if cutoffs are turned on and N2LOOP has not been set to zero, a sanity check is performed as well, i.e. given the current structure, are the derived interactions in fact complete given the chosen maximum cutoff distance set by ELCUTOFF? If not, this would most likely mean that the parameters used for deriving the list of relevant interactions (specifically, the maximum residue radii) are inappropriate (this can happen for simulations of unsupported residues).
Because both the N^{2} energy evaluation and the cutoff check can be extremely slow for large systems, low frequencies are highly recommended for these cases especially if N2LOOP is not zero.
N2LOOP
This keyword is a simple logical which allows the user control over whether or not to compute the full N^{2}loop of nonbonded interactions (on by default) as a reference. In pure gradientbased simulations, this number is reported initially only for information purposes but serves no other function. Setting N2LOOP to zero disables this initial calculation, which can be very slow for large systems, in particular as it does not benefit from the OpenMP parallelization. In restarted calculations of this type, N2LOOP never comes into play. In trajectory analysis runs, N2LOOP has no effect even if energies only are calculated (DYNAMICS is 1).The primary use of N2LOOP is to choose the reference energy in Monte Carlo calculations (see CHECKFREQ). When turned on (default), MC simulations will continuously reset the total energy to the cutofffree value. When it is turned off (zero), they will reset the total energy to the userselected cutoff scheme (which can be the same of course → CUTOFFMODE). This happens at regular intervals of CHECKFREQ steps. If N2LOOP is set to zero, it will additionally suppress the sanity check procedure for cutoffs. Note that the Markov chain of MC calculations is unaffected by this keyword (it corresponds to a shift of the arbitrary zero point). In particular in hybrid samplers, N2LOOP should probably be 0 to avoid confusion. It is important to keep in mind that, whatever the context, the N^{2} sum of nonbonded energies may not be a useful reference state, especially in periodic boundary conditions.
USESCREEN
This logical keyword applies to all Monte Carlo elementary moves (except particle deletion moves). The normal sequence of events in CAMPARI is: Perturb configuration.
 Compute shortrange terms for moving parts for new conformation.
 Compute corresponding longrange terms.
 Restore original conformation.
 Compute shortrange terms for moving parts for original conformation.
 Compute corresponding longrange terms.
 Evaluate Metropolis criterion.
 Process acceptance or rejection.
From the above, it is clear that at step 2 we do not yet have access to a difference in energies (which is only available after step 5). Consequently, this quantity is simply compared to the net value of the shortrange energy terms (→ SC_IPP, SC_ATTLJ, SC_WCA, boundary interactions, SC_BONDED_B, SC_BONDED_A, SC_BONDED_I, SC_BONDED_T, SC_EXTRA), and certain bias terms (→ SC_ZSEC, SC_POLY, SC_DSSP, SC_EMICRO, SC_DREST). With the exception of SC_ATTLJ, SC_WCA, SC_BONDED_T, and SC_BONDED_I, these are all strictly penalty terms that can only yield positive contributions to the total energy. Because of the above, the screen is most useful if SC_IPP is used. Inverse power potentials diverge for small distance and can yield arbitrarily large values, which allow meaningful choices for the associated keyword BARRIER. If all aforementioned terms are either zero or negative, the screen will not have any effect. Harmonic potentials (as used in most of the bias terms) can also yield very large values, but the likelihood of this happening during simple MC moves is very small except for SC_DREST, SC_BONDED_B, and SC_BONDED_A (for the latter two terms, this only holds in the presence of soft crosslinks). Therefore, the difficult cases are those, for which the penalty terms are generally high, but do not necessarily vary quickly or strongly upon MC moves. It may then become impossible to use a simplification of this type, i.e., if the chosen screen height is too small, the Markov chain will be corrupted, and if it is made larger, the screen no longer has any effect. To buffer against incorrect use of the method, there is an additional criterion that the incremental energy must exceed twice the total system energy (for typical interaction potentials and an equilibrated system, the latter is often a negative number, and this condition becomes trivially fulfilled).
Note that this technique assumes that the Markov chain remains unperturbed even though the actual acceptance criterion is circumvented. Depending on the setting for BARRIER, this will often be rigorously true for a finitelength simulation. Because the same threshold is used for all types of moves, the efficacy of the screen is likely move typedependent. Finally, simulations using the WangLandau acceptance criterion may not be able to use this technique (a warning is printed in any case).
BARRIER
This keyword is used in two different contexts. First, Monte Carlo moves can take advantage of a cutofflike screen eliminating proposed conformations after only a partial evaluation of the relevant energy terms. (this is enabled with USESCREEN). Then, BARRIER sets the energy threshold (screen height, cutoff value, barrier) in kcal/mol.Second, the value of BARRIER in kcal/mol is used as the hardsphere penetration penalty in the hardsphere excludedvolume implementation (enabled by setting IPPEXP to a sufficiently large value).
Parallel Settings: MPI (MultiCopy) Parallelism (Replica exchange (REX), PIGS, and MPI Averaging) and OpenMP Parallelism (Task Decomposition):
(back to top)
Preamble (this is not a keyword)
Most biomolecular simulation software packages allow a form of parallelization which one may refer to as domain decomposition. Here, the system is partitioned into a number of subsystems corresponding to the number of processor cores available to the parallel computation. Each core then  more or less  computes only interactions of its own subsystem. The main requirements for an efficient implementation are to keep the communication load as small as possible and the workload even. While for specific classes of systems (dense, truncated interactions, etc.) this method is undoubtedly superior, CAMPARI does not currently implement it. Instead, it offers a generalpurpose shared memory parallelization relying on OpenMP. A shared memory parallelization has the advantage of replacing communication calls with conceptually simpler synchronization calls. The shared memory parallelization in CAMPARI is primarily a way to speed up simulation and analysis tasks within the confines of a single machine. Current (2016) compute nodes in supercomputers offer tens of CPU cores, and significant gains can be made for many practically relevant applications. While the OpenMP parallelization is the inner layer of parallelism of CAMPARI, there is also an outer layer that implements sparse communication algorithms such as replica exchange. Like most simulation software, CAMPARI uses the MPI standard for handling the communication requirements of this outer parallelization layer. The resultant hybrid OpenMP/MPI code is particularly wellsuited to multicopy (replica exchange, PIGS) simulations of mediumsized systems.NRTHREADS
This keyword controls the number of threads that the workload of the calculation is distributed across. The actual value is respected only if the pure OpenMPenabled version of CAMPARI is used (campari_threads). For the hybrid MPI/OpenMP executable (campari_mpi_threads), the number of threads cannot be set this way in CAMPARI (because it must be known during creation of the MPI universe). Instead, the environment variable OMP_NUM_THREADS should be defined accordingly. The threads parallelization as described next is the same in both cases, so the documentation here is relevant in both cases.The OpenMP parallelization of CAMPARI is not a domain decomposition in the vein of almost all MPIparallel decompositions found in molecular dynamics codes. CAMPARI is meant to cover a diverse set of simulations (including those using different samplers or those in implicit solvent). Spatial domain decompositions work well for systems with homogeneous density undergoing global and continuous evolution (which creates workloads that are all large enough and remain comparable across many simulation steps). Conversely, a single Monte Carlo move of a small molecule in a dilute solution of biopolymers and small molecules is not meaningfully addressable by spatial decomposition techniques. Taskbased parallelizations are conceptually simpler but offer less scalability (communication/cache/memory issues). Here, the necessary calculations are simply divided at all costintensive stages across processes, which usually requires that all parallel processes “see” the entire system. This is suitable for a shared memory (OpenMP) decomposition (although cache and memory issues remain). As in any other parallel implementation, Amdahl's law holds. This is critical if the decomposable workload is very low to begin with as in those cases any nonparallel task will become a bottleneck (performance tapers off). Additionally, synchronization costs increase with increasing number of threads. Consequently, performance reversal will occur for systems with unfavorable size or properties. This is particularly tricky on modern manycore architectures where memory management (cache), task pinning, etc. are all nontrivial or sometimes impossible to analyze and control. The systems that parallelize best in general are those that have high floating point operations (FLOP) counts and can take advantage of blocky memory layouts.
The main “modes” or tasks covered by the shared memory parallelization in CAMPARI are as follows:
 Global force and energy evaluations. This is the most costintensive step in any gradientbased technique implemented in CAMPARI. In general, forces and energies are grouped into terms even though this leads to partial redundancy. For the nonbonded terms, which in general have costs that are conformationdependent (cutoffs), and a specific group of bonded and other terms loads are balanced dynamically. The limits are usually subsets of atoms, residues, molecules, or residueresidue interactions. Some bias terms have an inner parallelization that is independent of these types of limits, specifically the spatial density potential, the polymeric biasing potential, and custom distance restraints.. This is in contrast to other bias and bonded terms, which are occasionally evaluated asynchronously by individual threads.
 Incremental energy evaluations. Incremental energy calculations account for the bulk of the cost of most Monte Carlo simulations and can differ dramatically in extent from step to step. This is why CAMPARI will compute an explicit distribution of work load onto threads every time. No dynamic balancing occurs. The workload can easily be so low that the scaling limit is reached. To avoid adverse effects for larger numbers of threads, the number of synchronization operations per step is kept small. The fact that no forces need to be computed simplifies both the required synchronization operations and the required data structures considerably. The second point can lessen the negative impact of cache management on performance.
 Determination of neighbor/interaction lists. Irrespective of the sampler, any scalable simulation on a system of appreciable size featuring nonbonded interactions will require truncation or transformation of these interactions (→ CUTOFFMODE), which in turn necessitates algorithms to identify nearby species efficiently. These algorithms are all parallelized with generally good efficiency.
 Managing complex constraints.If holonomic constraints are in use, CAMPARI identifies all groups that can be solved independently. This may offer a trivial parallelization across threads. If one or few groups are expected to dominate the cost, CAMPARI evaluates whether these groups are large enough. If so, each of these groups is solved in parallel by all threads, which requires a number of synchronizations proportional to the number of actual SHAKE iterations (only SHAKE is supported for this). Note that with this and similar decompositions, it is currently not supported to use just a subset of the requested threads to define an optimal setting. Instead, either 1 or all threads have to work on such a constraint group. This means that this type of parallelization is sometimes inactivated upon increasing the number of threads further as it is no longer deemed efficient, which can be a limitation. In a similar vein, if Cartesian or internal coordinate space is used in conjunction with a gradientbased sampler, atombased forces must be redistributed to these degrees of freedom and their effective masses need to be computed. Both of these happen in recursive loops for complex molecules. CAMPARI will again detect whether the sizes of molecules are large enough to warrant "internal" parallelizaiton. If so, each of the eligible molecules is solved in parallel by all threads, which requires synchronization operations proportional to the longest continuous branch in the molecule in question.
 Coordinate operations for large molecules.When sampling in Cartesian or internal coordinate space with methods that propagate many or all degrees of freedom simultaneously, it will be necessary to globally reconstruct the Cartesian coordinates of the system based on the altered values. This conversion (Z matrix to Cartesian coordinates) scales lienarly with the number of atoms but requires a large number of trigonometric functions and has very high data dependency because it is strictly hierarchical. If required at every step, it can thus become a significant cost factor when the remainder of the ratelimiting computations are parallelized efficiently. Thus, for each molecule deemed large enough, CAMPARI analyzes automatically the hierarchy and creates a parallel procedure for solving this problem with a fixed number of threads that can be less than the total available number. As such, this is one of only two fully supported feature that can require nested OpenMP parallelism. As a result, performance is harder to control (see comments below for FFTW). Conversely, other coordinate operations such as the simple propagation in Cartesian dynamics, or simple translations by a single vector are straightforwardly parallelized.
 Threaded computation of fast Fourier transforms with FFTW.Unlike all other libraries linked by CAMPARI (e.g., NetCDF or HSL), the code looks for and uses explicitly the threaded implementation of the FFTW library. Because of the way the interface works, this involves creating a new team of threads from the parallel CAMPARI execution, i.e., it entails nested parallelism. Because the other threads of the original team are idle during FFTW calls (but not destroyed), performance is harder to control and predict, i.e., it depends on the way the kernel, compiler, and custom runtime environment end up distributing threads onto the available resources. This is why keyword THREADS_TEST also enables a performance test for threaded FFTW execution (if available and in use). This can be used to understand better the influence of environment variables like "OMP_WAIT_POLICY" or "OMP_PROC_BIND" on FFTW performance inside CAMPARI.
 A number of required utility operations required at every step of a calculation. Especially in gradientbased calculations, there are a number of simple tasks to complete that depend on atomic coordinates of the entire system. These include the actual coordinate propagation (whether Cartesian or internal), the correction of drift velocities, the calculation of total kinetic energies, or the management of polymeric descriptors and reference frames (center of mass, etc.). The overall cost of all of these is linear with the total number of atoms. Synchronization requirements occur if a molecular property like the center of mass needs to be computed by multiple threads (the necessity to do so is detected automatically) but they are generally low. In Monte Carlo calculations, quaternionbased parallelizations of pivottype moves occur in many move types along with repeated copy operations on coordinate arrays. Most of these operations for any sampler have a low cost to begin with and are thus limited in scalability unless the systems get larger. They differ from the holonomic constraint handler or the coordinate operations in that they should simply plateau in parallel performance with increasing numbers of threads (rather than become slower again). This is because of the low (but nonzero) synchronization requirements.
 Simple analysis tasks. At every step of any run (exceptions caused by the use of FRAMESFILE aside), CAMPARI evokes a highlevel routine that goes over all possible analysis tasks, evaluates whether they need to be performed at that step (depending on the output and calculation frequencies listed elsewhere), and executes the identified ones. The majority of tasks are not worthwhile to be handled by multiple threads at once, so CAMPARI uses a task parallelization using a large "SECTIONS" construct. This means that it is beneficial for the parallel efficiency of these analyses if the calculation/output frequencies are matched with each other. Conversely, it is inefficient to enter the function with multiple threads with only a single simple task to be performed.
 Specific analysis tasks. Some analysis tasks, which are inherently expensive, have been parallelized specifically to be handled by all threads at once. This includes the calculation of spatial densities, overlap metrics in hybrid MPI/OpenMP multicopy simulations (as they rely on global energy evaluations), and a number of tasks that are part of the structural clustering utility evoked during postprocessing. Specifically, the treebased clustering, the approximate progress index, and iterative algorithms in graph processing (see CADDLINKMODE, CREWEIGHT, and CMSMCFEP) have been OpenMPparallelized. Other tasks could, due to their unfavorable scaling with system size, benefit from this type of parallelization, and they may be implemented in the future, e.g., parallel contact, DSSP, or scattering analyses.
A few more comments are needed. First, the OpenMP parallelization does not extend to all parts of CAMPARI. In particular, the initial setup and final cleanup are completely serial with no thread awareness whatsoever. Some of the procedures performed there, for example, initial structure randomization, can be timeconsuming, and it is a limitation that they cannot be accelerated. Second, it is always recommended to get a quick estimate of parallel efficiency by timing the code (keyword FLUSHTIME can be used to force frequent production rate estimates). Note that some calculations have inherently heterogeneous production rates, e.g., hybrid dynamics/Monte Carlo calculations. Third, task decomposition by threads generally changes the order in which individual summands are combined to yield net properties like the total force acting on an atom. Because floating point math is not associative, this leads to a generally lower level of exact reproducibility compared to multiple executions of 100% serial code. For large sums, a multithreaded calculation will generally be more precise. The loss in reproducibility is generally smaller than the loss of reproducibility across architectures and, especially, compilers (which often has fundamentally the same reasons).
THREADS_DLB_FREQ
If the shared memory (OpenMP) parallelization of CAMPARI is in use, this keyword determines after how many steps dynamic load balancing is periodically enabled. Dynamic load balancing conservatively shifts bounds on internal entities of representation (such as atoms, molecules, degrees of freedom) between adjacent threads to improve the balance of times spent per thread. It is important to realize that embedded synchronization requirements destroy the ability to meaningfully measure load balance, which means that dynamic balancing is only performed for synchronizationfree subtasks (of which there are several, e.g., evaluation of nonbonded forces or neighbor list generation). It does so for at most THREADS_DLB_STOP elementary simulation steps after the start of each interval.In detail, this means that every THREADS_DLB_FREQ steps a new data collection and balancing interval is started and continued for up to THREADS_DLB_STOP elementary steps. The information used for balancing can be preaveraged across multiple steps using keyword THREADS_DLB_EXT. For each block, if satisfactory balance is achieved, the balancing will stop until the next interval is encountered. For small systems in particular, some subtasks will not be able to achieve a satisfactory balance (insufficient granularity). If in these cases THREADS_DLB_STOP is equal to or larger than THREADS_DLB_FREQ, continuous load balancing is obtained. This is not recommended because the measurements themselves are not completely costfree, and because a continuous adjustment of bounds is likely to yield inferior cache performance.
There are two notes of caution. First, the idea of dynamic load balancing implies that the load does not change dramatically from step to step. This may be violated in trajectory analysis runs where OpenMP is used to compute energies. Splitting the trajectories and running an MPIparallel analysisis likely to be a superior strategy here. Second, dynamic load balancing expects threads to have the same "computing power" available to them at every step. This is not necessarily the case in oversubscribed systems (more threads than CPU cores) where systeminduced waiting times occur. It can also happen in undersubscribed (less threads than CPU cores) cases on multiCPU systems where threads are not pinned to specific cores. This is because the available cache differs depending on how many threads reside on a CPU (socket). These issues are (at least theoretically) controllable at the level of the operating system, for example using environment variables such as OMP_PROC_BIND. In practice, it is very hard to predict performance accurately, and some amount of trialanderror (benchmarking) is usually needed. In particular, native hyperthreading for Intel chips should be tested as, possibly contrary to expectation, it has proven beneficial in many applications (most likely due to better cache use).
THREADS_DLB_STOP
If the shared memory (OpenMP) parallelization of CAMPARI is in use, this keyword determines the maximum length of each periodic data collection interval for dynamic load balancing. The interval frequency is set by keyword THREADS_DLB_FREQ. Data collection and bounds adjustment is stopped as soon as the load imbalance is satisfactorily small or as soon as the number of elementary passed since the beginning of each interval is equivalent to THREADS_DLB_STOP. Note that generally it is recommended that the chosen value be small in relation to that chosen for THREADS_DLB_FREQ as repeated measurement cycles and bounds adjustments can themselves adversely affect performance.THREADS_DLB_EXT
If the shared memory (OpenMP) parallelization of CAMPARI is in use, this keyword determines the number of elementary steps over which the execution times per thread are averaged in dynamic load balancing. Choosing a value different from 1, which is the default, can violate the requirement that has to be balanced in every step for performance to be optimal. However, for small systems and certain execution blocks, the measured times can be so small (and noisy) that averaging may be required to balance them effectively. This is the role of this keyword. Note that choosing larger values also limits the rate of convergence for load balance.THREADS_VERBOSE
If the shared memory (OpenMP) parallelization of CAMPARI is in use, this keyword controls the level of diagnostic output written to a dedicated output file, viz., THREADS.log. Options are as follows: No output is provided.
 Only timing information (performance and expected time to finish) is written at intervals controlled by keyword FLUSHTIME. This is the default and recommended option for normal usage.
 In addition to all output produced by the previous options, CAMPARI periodically reports updated bounds resulting from dynamic load balancing in case a reasonable balance is achieved. This is available for different categories. No output is written if the balancing approach fails to find a satisfactory solution after the requested of steps for the current interval. The bounds specify the chunk processed by each threads and can refer to different representation constructs (atoms, residues, dihedral angles, and so on).
 In addition to all output produced by the previous options, CAMPARI initially reports fixed bounds for operations with predictable cost. These bounds are used in various places, and the information will not be of much use for regular users.
 In addition to all output produced by the previous options, CAMPARI frequently reports load imbalance measures for any tasks that undergo dynamic load balancing. It is not recommended to use this option outside of specific debugging or optimization tasks as the amount of data written gets very large very quickly. Note also that the significant file I/O can interfere with external performance measurements. The output can highlight aspects of the calculation that fail to become balanced.
THREADS_TEST
This keyword is primarily for developer use. It instructs CAMPARI to not perform the actual simulation or analysis but to instead test a subset of relevant threaded execution routines relative to their serial counterparts. These tests use the actual system specified by the keyfile. The output of the tests is mostly selfexplanatory but understanding all of the reported deviations may require some insight into algorithm structure. If the particlemesh Ewald method is in use, the correctness tests are followed by a scaling test for the linked FFTW library in threaded execution mode. This is a point of concern because the library does not allow an existing parallel region to access it. More details are given elsewhere.REMC
This logical keyword  when set to 1  instructs CAMPARI to perform a calculation employing and evolving a number of copies of the system. Unlike for the mutually exclusive keyword MPIAVG, here it is allowed that each replica is evolving under a different condition (e.g., temperature). This covers the replica exchange (RE) method for simulations and parallel analysis runs yielding as many results as there are trajectories. Like all multicopy (or multireplica) methods in CAMPARI, the communication between copies is handled by MPI, and it is therefore necessary to use an MPIenabled executable. The shared memory (OpenMP) parallelization of CAMPARI can be used simultaneously as this inner parallelization layer does not deal with the exchange of information between copies. In hybrid MPI/OpenMP mode, the number of threads is no longer settable by NRTHREADS but has to use an environment variable (OMP_NUM_THREADS) at the system level instead.For a simulation task, REMC activates the replica exchange method employing REPLICAS separate conditions (processes). The conditions differ in one or more parameters (→ REDIM), and there is a dedicated input file FMCSC_REFILE to specify them. Note that the order of conditions may matter (→ RENBMODE). Irrespective of whether the underlying propagator is pure Monte Carlo (see DYNAMICS), a dynamicsbased method, or any hybrid method, restrictions apply in that the sampled ensemble must be the canonical (NVT) one (see ENSEMBLE). This can either be achieved by running constant particle number MC, Newtonian dynamics with a proper thermostat (see TSTAT), or stochastic (Langevin) dynamics (which inherently tempers the ensemble). In the RE method, structures (or conditions) are exchanged periodically between replicas using a welldefined acceptance criterion. This is controlled by keyword REFREQ and includes the case of disabling these exchanges altogether. The exchanges are generally meant to improve sampling by allowing excursions into conditions or Hamiltonians in which (enthalpic) barriers are reduced. The evaluation of the acceptance probability implies that energies of current structures must be computed for different conditions. Independently of any exchanges, this functionality is useful in free energy calculations (perturbations) as the exponential average of the work required to change condition (energy difference) is directly related to the free energy difference between those conditions (→ REOLCALC). Parameters of the method are the exchange frequency (REFREQ) the scope for possible exchange partners (RENBMODE), the number of exchange attempts in a single exchange cycle (RESWAPS), and for dynamics propagators the way of dealing with velocities upon a successful exchange (RE_VELMODE).
In CAMPARI, each replica and its output will correspond to both instantaneous and averaged information from the associated condition, i.e., the underlying trajectory is no longer continuous in conformational terms. The typical assumption is that, depending on the settings for the parameters of the method and given a suitable arrangement of replicas in the RE input file, it can be achieved that the resultant ensemble averages and distributions are, for finite samples, indistinguishable within error from a correct reference simulation for the same condition that does not utilize exchange moves. This issue is not trivial, however, and the more general and precise approach to the analysis of replica exchange data is to reweight all samples to a given target condition that should either have been part of the original replica space or that can be obtained by interpolation. This reweighting is technically possible in CAMPARI (→ FRAMESFILE) for almost all analysis features in trajectory analysis mode, but the weights have to be determined externally (e.g.,, by the weighted histogram analysis method, WHAM).
If a RE run contains Monte Carlo moves, and is combined with the WangLandau acceptance criterion, there is a necessary note of caution. Specifically, if WL_MODE is set to 1, this may result in identical copies of WangLandau runs if the exchanged parameters do not alter the Hamiltonian (since environmental conditions are irrelevant to the WangLandau sampler in such a case). In any case, the WangLandau iterations will proceed independently for each replica. This implies that it may yield results that are difficult to interpret if replicaexchange swap moves are allowed (because those  currently  always follow a Boltzmann criterion).
If an analysis run is performed, the meaning of keyword REMC changes compared to the case of simulation tasks described above. For analysis, REMC instructs CAMPARI to perform trajectory analysis in parallel while keeping the data from all replicas separate (for parallel analysis runs combining data, see keyword MPIAVG). Example applications are to speed up expensive analyses of large data sets, to compute free energy differences between ensembles, or to obtain data suitable for error estimates via block averaging. Keywords REFILE, REPLICAS, and REDIM are required. All other RE simulationrelated keywords are ignored. For the RE setup, analysis keywords REOLCALC, REOLINST, and REOLALL are respected. As alluded to, this can be useful in postprocessing simulation data for free energy growth or related calculations requiring "foreign" energies. There is another complication with RE data and that is the question how to evaluate a possible sampling benefit. Users should always keep in mind that a RE trajectory with swaps inherently averages over data from several coupled trajectories. A simple consequence of this is that data tend to look smoother and better converged if the number of replicas is increased. An assessment of the actual purpose of the method, i.e., increased barrier crossing rates by excursions into conditions amenable to barrier crossing, is more feasibly obtained by unscrambling trajectories, i.e., by looking at trajectories continuous in conformation (and not in condition). This is why CAMPARI allows the user to supply an input file with the swap history of a set of trajectories with the goal of transcribing the set of trajectories to a new set that are all continuous in conformation. The input file needs to be similar in format to the analogous output file created by CAMPARI during RE simulations. If this option is enabled, auxiliary keywords RE_TRAJSKIP and RE_TRAJOUT may become relevant.
Technically, parallel trajectory analysis requires that the REPLICAS individual trajectories are systematically named and numbered in a fashion similar to how CAMPARI writes trajectories in RE simulations. This means that every file is prefixed with "N_XXX_", where XXX gives the replica number (started from "000"). Since there is only a single keyfile, the input trajectory name specified should not include this prefix (it will be added automatically). An example is given elsewhere. Framespecific analyses (and thereby frame weights) are not yet supported in parallel trajectory analysis runs.
REFREQ
For any multireplica simulation that supports structure transfer between replicas, this keyword sets a fixed interval for attempting these structure transfers. It is an important parameter of both the replica exchange method and the PIGS method. Unlike frequencies supplied to define Monte Carlo move sets described above, this parameter is a deterministic interval, i.e., a setting of 10^{4} will imply that possible exchanges are attempted exactly every 10^{4} elementary steps. This is because, in general, the communication requirement will mandate that all replicas remain synchronized regardless. For replica exchange, a swap cycle counts as a single (Monte Carlo) step in the trajectory. For PIGS, the reseeding does not count as a step. Instead, it is performed exactly between the steps corresponding to multiples of REFREQ and the respective next ones (see elsewhere for details).All structure exchange is implemented in peertopeer mode. For generality reasons, the head node is always involved in decision making for structure exchange. This imposes an unfavorable (centralized) communication structure for some data (e.g., reassignment maps).
For replica exchange runs, structure transfer is realized as swaps between conditions. Viewed as a Monte Carlo move, such a swap attempt is defined in the context of a multicanonical ensemble. This means that any analysis should consider the entire set of simulation data and employ appropriate reweighting protocols to obtain canonical averages corresponding to the individual or even interpolated conditions. It is not immediately clear how justified it is to assume that the individual replicas in a replica exchange run can be analyzed as if they satisfied the canonical distribution for each condition individually. For a large fraction of published replica exchange simulations, swap attempts are restricted to the immediate neighbors along a onedimensional temperature coordinate, and the data coming from replicas are treated independently. Keyword RENBMODE allows the user to choose between neighboronly and global swap protocols. We emphasize again that CAMPARI does support the computation of reweighted averages and distributions by adding floatingpoint weights to a frames file.
It is difficult to provide guidelines for useful settings for this keyword. In replica exchange, very small values for this exchange attempt interval can lead to relaxation problems. With dynamics samplers, the treatment of velocities becomes an important consideration (see RE_VELMODE). There is a considerable body of literature on this subject (some of it is cited in this reference).
RESWAPS
If the replica exchange method is in use, this keyword specifies the number of swaps within a swap cycle. Each time a step is encountered that is a multiple of REFREQ CAMPARI will collect the data from all replicas, construct the required energy matrix, and randomly pick pairs of eligible replicas (see RENBMODE) for which the swap move Boltzmann acceptance criterion is evaluated. This process is repeated RESWAPS times and the map matrix (structure to condition) is upated after every successful swap. This means that it is possible for no pairs of replicas to effectively swap structures despite the presence of accepted moves. This stochastic implementation differs from that seen in other software and requires a careful choice for this keyword. For exchanges between all replicas (see RENBMODE), this should probably be at least N_{rep}·(N_{rep}1)/2, where N_{rep} is the number of replicas in the simulation. For neighbor swaps only, it should be N_{rep}1. The reason for choosing a number proportional to or larger than the unique possible exchanges is that the computational cost of computing necessary crossenergies (in Hamiltonian replica exchange) and of communicating the information required for the aforementioned matrix is, in our implementation, independent of the final number of accepted swaps. This means that the cost of a swap cycle would be largely wasted by exchanging just a single pair chosen from a much larger number of replicas. For neighbor swaps, the set of possible swaps is limited because the required energy matrix is only a tridiagonal band matrix. This means that "secondary" swaps may be rejected due to lack of information rather than the Boltzmann criterion, which can introduce biases.Note that the acceptance rates become very small once there is hardly any overlap between different replicas (in turn, the acceptance is always strictly unity if the conditions are the same  regardless of the two structures). A large number of attempted swaps in conjunction with allagainstall exchange corresponds to an equilibration of current structures across conditions. In the limit of tiny acceptance rates, the impact of the replica exchange method is no longer felt, and it reduces to a set of independent canonical simulations at different conditions (the same limit is achieved explicitly by setting REFREQ to be very large). Because of this, a reasonable swap acceptance rate is often taken as the primary diagnostic toolfor the choice of conditions (see output file for swap probabilities).
REFILE
This keyword defines location and name of the file containing the specifications for the replica exchange method (see elsewhere for details).RENBMODE
As alluded to above, the replica exchange method represents a rigorous sampling technique if one considers the multicanonical ensemble it defines. This can cause problems in the interpretation of data obtained for an individual condition. Moreover, the energetic overlap between distant conditions is often small leading to negligible swap likelihood for all but the replicas most similar in condition. This is the typical scenario for temperature replica exchange calculations in explicit solvent. Here, it is very common to restrict swap attempts to the (at most) two neighboring replicas for a series of conditions. In Hamiltonian replica exchange, the same idea might actually be more useful as it also restricts the computation of the energy matrix to neighboring conditions. Recomputing energy values for many different conditions can be costly. Therefore, the available options are: Swaps are attempted with all available replicas
 Only the (at most two) neighboring replicas are eligible for swap moves, and neighbor relationships are determined by the sequence of conditions as they appear in the input file (this is the default).
Note that almost all exchangerelated problems naturally disappear in the limit of few attempted swaps (→ REFREQ) or in the limit of poor overlap and consequently few accepted swaps. This limit is very easily reached for large, condensedphase system with typical interaction potentials (fluctuations decrease with increasing size).
REPLICAS
This keyword sets the number of subprocesses intended to be created by a multicopy simulation. For replica exchange calculations, this has to rigorously correspond to the number of processes granted by the system. A large enough number of different conditions in the corresponding input file (→ FMCSC_REFILE) has to be present. For MPI averaging calculations, which includes PIGS runs and parallel WangLandau runs, this keyword will actually be overridden by the actual processor number granted by the system. Note that if the shared memory (OpenMP) parallelization of CAMPARI is also used (hybrid OpenMP/MPI), the management of access to hardware resources (compute cores) by both MPI processes and their spawned OpenMP threads is not trivial on modern manycore CPUs. For example, a machine with two sockets with a 12core CPU each can host a calculation with 4 MPI processes using 6 OpenMP threads each in many different ways including nonsensical mappings like running everything on a single core. While this mapping can often be controlled by the user at both levels, it should really be managed automatically by the operating environment whenever possible.REDIM
If the replica exchange method is in use (→ REMC), this keyword sets the number of dimensions specifying the conditions to be expected in the dedicated input file (→ FMCSC_REFILE). Note that replica exchange calculations may rely on neighbor relations (see RENBMODE), and that those may be difficult to define if multiple dimensions are used to specify each condition.REMC_DOXYZ
For any multireplica simulation that uses pure MC sampling and supports structure transfer between replicas, this simple logical keyword lets the user choose to use Cartesian rather than torsional/rigidbody coordinates to be transferred. The keyword is ignored if the propagator is fully or partially reliant on a dynamics method. This keyword can be useful if internal degrees of freedom not sampled by MC diverge in any nodespecific input files (for example, through rare scenarios when trying to restart an MC run from (modified) restart files produced by MD).RE_VELMODE
This keyword selects how to deal with velocities for any multireplica calculation allowing structural transfer between replicas. As such, it is relevant for replicaexchange molecular dynamics runs and for PIGS runs using a molecular dynamics propagator (see DYNAMICS). One of the complications of theses types of calculations arises in the necessity to pass on or reassign velocities upon any successful structure change. The options for handling this difficulty are as follows: All velocities are always randomly reassigned upon receiving a new structure. This is equivalent to an instantaneous, global action of an Andersentype thermostat (see TSTAT). It might be the safest option to use for pure Hamiltonian replicaexchange, especially if the Andersen thermostat is used in conjunction with Newtonian dynamics. It is also the required option for PIGS runs with propagators lacking a stochastic component.
 Velocities are rescaled by a factor equivalent to (T_{i}/T_{j})^{1/2} where T_{i} is the temperature of the current node, and T_{j} the temperature of the node the received structure originated from. Note that this does not scale the instantaneous temperature to a specific value, but rather by a specific factor. Unlike the first option, it preserves directions and relative magnitudes of all velocities. This mode relaxes to the third option if temperature is not one of the replica exchange dimensions, or if the run is using the PIGS protocol instead of replica exchange.
 Velocities are taken directly from the node the incoming structure originated from, i.e., always remain associated with "their" structure. This will almost certainly lead to small artifacts for replica exchange calculations with temperature as one of its dimensions. It is the preferred choice for PIGS runs with stochastic propagators.
RETRACE
This keyword is only relevant in MPI replica exchange or in MPI PIGS (or PIGS analysis) calculations with swaps or reseedings performed. It requests that an instantaneous integer trace is written that allows the user to recapitulate the complete history of structure transfer between replicas. For replica exchange, the trace indicates which condition is (after each swap or reseeding cycle) associated with which initial starting conformation (see N_000_REXTRACE.dat). For PIGS, the trace indicates every reseeding event in an even simpler form (see N_000_PIGSTRACE.dat). For replica exchange, these data can be used to reconstruct trajectories continuous in geometric variables rather than continuous in exchange condition (the latter being the CAMPARI default). This is useful to be able to estimate the sampling enhancement provided by replica exchange in terms of conformational decorrelation or similar metrics. For MPI PIGS calculations, the trace can be read in by CAMPARI to recover the structural connectivity of configurations in graphrelated analyses.MPIAVG
This logical keyword  when set to 1  instructs CAMPARI to perform a calculation employing and evolving a number of copies of the system. Unlike for the mutually exclusive keyword REMC, here it is assumed that each replica is evolving under exactly the same condition. Note that keyword REPLICAS is no longer used to set the number of copies (the value is taken directly from the MPI environment instead). Like all multicopy (or multireplica) methods in CAMPARI, the communication between copies is handled by MPI, and it is therefore necessary to use an MPIenabled executable. The shared memory (OpenMP) parallelization of CAMPARI can be used simultaneously as this inner parallelization layer does not deal with the exchange of information between copies. In hybrid MPI/OpenMP mode, the number of threads is no longer settable by NRTHREADS but has to use an environment variable (OMP_NUM_THREADS) at the system level instead.Additional keywords can activate specific multicopy methods that utilize a similar framework, viz., parallel WangLandau runs (via MC_ACCEPT) and the PIGS protocol (via MPI_PIGS). In detail, the possible tasks are:
MPI averaging
This is the mode achieved without any additional keywords. Here, The individual copies (replicas) are strictly independent (no communication requirement) until the very end when onthefly analysis data are automatically collected and processed by the head node (see documentation of output files for details). Some analysis functions or simulation algorithms may not be supported. This is primarily a mode to save time for the user since it essentially reproduces a common mode of running molecular simulations, i.e., running multiple trajectories in parallel and analyzing the results to together. Starting conditions (see RANDOMIZE and PDBFILE) and stochasticity of the propagator are important here to avoid multiple replicas diverging only on account of numerical drift.
Parallel WangLandau runs
If the simulation is a pure Monte Carlo simulation , and if the WangLandau acceptance criterion is used, the behavior changes. WangLandau runs are essentially iterative, and in such a case keyword MPIAVG will create a parallel version of the WangLandau scheme. At an interval set by WL_FLATCHECK, the histograms are recombined over the individual nodes. The combined histogram is then what determines the move acceptance and what is used to evaluate whether to update the convergence parameter or not. The value of the convergence parameter and all other relevant settings remain synchronized throughout. In between update steps, the individual replicas evolve according to the last global histogram that was since incremented locally. This means that the value chosen for WL_FLATCHECK is a delicate quantity since both too small and too large values may impede convergence. While the former may remove the bias for an individual replica to traverse phase space faster than a canonical simulation, the latter may result in several replicas exploring the same area of phase space, thereby amplifying a lack of global convergence. Note that the communication routines used in the parallel WangLandau implementation can be finetuned using keywords MPICOLLS and MPIGRANULESZ.
PIGS runs
PIGS runs are explained in detail below. Here, CAMPARI collects information from all replicas over a specified interval. Rather than biasing the potential energy surface, this information is used to make decisions on whether to truncate some of the trajectories and to restart them from more interesting points corresponding to the current states of other replicas. PIGS stands for Progress IndexGuided Sampling and utilizes information from a method described elsewhere (please refer also to the published articles: progress index; PIGS).
Parallel analysis runs
It is possible to let CAMPARI analyze several trajectories in parallel using either the MPI replica exchange setup or the MPI averaging setup. This is enabled by setting MPI_PIGS to 0 (the default) and PDBANALYZE to 1. For an analysis run in the MPI averaging setup, the behavior is exactly as in an MPI averaging calculation, i.e., data are collected and analyzed by each MPI process and pooled at the end. The results from such a calculation should be the same as the result from a serial analysis of a single long trajectory obtained by concatenating the individual trajectories. Technically, parallel trajectory analysis requires that the REPLICAS individual trajectories are systematically named and numbered in a fashion similar to how CAMPARI writes trajectories in RE simulations. This means that every file is prefixed with "N_XXX_", where XXX gives the replica number (started from "000"). Since there is only a single keyfile, the input trajectory name specified should not include this prefix (it will be added automatically). An example is given elsewhere. Framespecific analyses (and thereby frame weights) are not yet supported in any parallel trajectory analysis runs of this type. For parallel analysis runs using the replica exchange setup, please see above.
Parallel analysis runs emulating a PIGS stretch
This option is identical to the previous one with the single exception that structural clustering and related analyses are not available in their standard form. Instead, CAMPARI will emulate the behavior of the PIGS reseeding heuristic and print out a line of the PIGS trace file. This respects input for keywords MPI_GOODPIGS, RE_TRAJTOTAL, RE_TRAJOUT, and all the required keywords for computing the progress index (starting with CCOLLECT). For details, please consult the documentation on the PIGS simulation method. This mode is enabled by setting MPI_PIGS to 1 and PDBANALYZE to 1.
MPI_PIGS
If a multireplica simulation is requested via keyword MPIAVG, this keyword allows the user to enable the PIGS enhanced sampling scheme. We refer the user to the literature for a detailed overview. Technically, PIGS utilizes parts of the infrastructure of both replicabased parallel simulation protocols (keywords MPIAVG and REMC as described below). Briefly, PIGS works as follows:Each of the REPLICAS processes running in parallel propagates a copy of the same system. After an interval of REFREQ steps has elapsed, the algorithm evaluates a heuristic that is used to selectively terminate some of the trajectories and to reseed them from the current states of other replicas. To avoid bitidentical trajectories, the propagator must have a stochastic component to it, e.g., Langevin dynamics, Monte Carlo, or Newtonian molecular dynamics with suitable thermostats. Unlike in replica exchange, the conditions associated with each replica are identical, and swaps would be redundant. The termination and reseeding of trajectories implies a loss of information and is justified only if the reseeding point ultimately leads to better sampling. The notion of "better" is not general. For PIGS, it consists of the desire to diversify individual replicas, e.g., to prevent sampling of overlapping regions of phase space. The truncation and selective reseeding of simulations is used in many methods such as distributed computing or transition path sampling.
To evaluate the heuristic, PIGS collects data from every trajectory over every interval of size REFREQ. To remain scalable, it is a memoryfree algorithm, i.e., the slice of data determining the reseeding is always of the same size. From the composite data slice, the socalled progress index is constructed (see option 4 to CMODE). The size of the data slice is therefore set by the combination of keywords REFREQ, CCOLLECT, and REPLICAS (or the actual number of replicas available). Construction of the progress index requires as essential input only the definition of a representation and distance between snapshots, which is provided by CDISTANCE and possibly CFILE. Again, for scalability reasons, the approximate progress index is constructed, and this entails additional parameters of minor importance (see keyword CPROGINDMODE for details).
With the complete progress index in hand, it is possible to locate the current snapshots for all replicas in the index. This requires that REFREQ be a multiple of CCOLLECT. The progress index is an ordered sequence of snapshots that arranges geometrically similar snapshots close to one another without using a reference state. Every snapshot is associated with a specific distance that corresponds to the length of an edge in an underlying spanning tree. From this information, a composite rank of three individual ranks is constructed. The latter are:
 Position in the progress index (larger is better as snapshots from low likelihood regions are more likely to appear there).
 Length of the associated edge (larger is better as distances tend to be larger in low likelihood regions).
 Distance from any other current snapshot in terms of progress index position (larger distances are better as they indicate more unique sampling domains).
p(X → Y) = [ ζ(X)ζ(Y) ] / (Δζ_{max}+1)
Here, ζ(X) and ζ(Y) are the composite (summed) ranks for replicas X and Y, respectively. Δζ_{max} is the maximum realizable difference in composite rank. The result is compared to a uniformly distributed random number between 0 and 1. A reseeding is accepted putatively if this random number is smaller than the evaluated expression, which biases acceptance toward cases with large rank difference. It is only putatively accepted because every replica can be protected on account of a uniqueness criterion. This is evaluated by finding the first and third quartiles of the snapshots coming from the replica in question in terms of progress index position. If they are tightly clustered, the number is small, and it is inferred that this replica samples a relatively unique area of phase space. Conversely, if they are distributed, this indicates sampling overlap. The difference in the positions of the first and third quartiles is compared to the number REFREQ/CCOLLECT, which is twice the minimum value. If it is less than this number, any putative reseeding is rejected for the replica in question.
A positive reseeding decision incurs the same mechanism as that of accepting a new structure in replica exchange. Principally, all required settings and variables for the propagator are transferred. This excludes the seed of the random number generator, i.e., otherwise identical trajectory can diverge quickly on account of the different sequences of random numbers. This is how the stochastic component of the propagator as mentioned above is relevant. For molecular dynamics propagators, velocities can be kept or reassigned, i.e., both meaningful controls supplied by RE_VELMODE are supported. For pure MC propagators, keyword REMC_DOXYZ is also supported. The history of reseeding decisions can be recorded in a trace file. This file is similar to the same output file for the replica exchange method and can be obtained with keyword RETRACE. It is strongly recommended to always write this file for subsequent diagnosis and analysis.
With the exception of structural clustering, onthefly data analysis is supported by PIGS in the same way as it is by the default multireplica (MPI averaging) setup. Data are gathered across replicas, combined, and total averages and distributions are provided. In general, however, PIGS leads to biased distributions, and it may therefore be more useful to adopt a standard protocol that stores trajectories individually for each replica (with MPIAVG_XYZ), disables most onthefly analyses, and performs all further analyses strictly in postprocessing (with PDBANALYZE).
Structural clustering uses the same infrastructure that PIGS requires to evaluate the heuristic but the data are deleted after each interval of length REFREQ. Note that this implies that the memory requirement of the head node can be large if REFREQ/CCOLLECT and the number of replicas are both large. Scalability of the protocol with respect to the number of replicas requires parallelization of the progress index computation itself, and this is only implemented at the level of the master MPI process thus far. Conversely, the subordinate ranks do nothing but communicate their snapshot to the master instantly after each collection event. Once the data for the entire stretch has been received from all replicas, the shared memory (OpenMP) parallelization of the data mining algorithms comes into play (in a hybrid MPI/OpenMP run). The available OpenMP threads are only those of the master MPI process even if multiple MPI processes reside on the same sharedmemory node. Because of these limitations, it is recommended to ensure through appropriate parameter settings that the cost added by the heuristic is kept manageable. Keep in mind that some aspects of the structural clustering facility are not available in the context of the PIGS heuristic. Obviously, CMODE is not selectable, nor are CPROGINDSTART or CPROGINDMODE controllable (they default to 4, 1, and 2, respectively). Keyword CPROGMSTFOLD has no effect (it use would be undesirable in the context of the first ranking criterion mentioned above). All keywords related to editing or utilizing the link structure of the network are irrelevant. Data preprocessing and the utilization of weights are both supported but the application of linear transforms is not (yet). The use of weights can entail a number of associated parameters that reflect or describe a (time) locality in the sequence of snapshots, e.g., a lag time. It is important to keep in mind that the PIGS algorithm simply concatenates the data from all replicas, which can lead to artificial periodicities or spikes in locally estimated fluctuations, which may or may not be desired. Lastly, the technical parameters controlling the treebased clustering and the short spanning tree construction for the progress index are of course relevant in PIGS (→ CRADIUS, CMAXRAD, BIRCHHEIGHT, BIRCHMULTI, CREFINE, CLEADER, BIRCHCHUNKSZ, CMERGEDIAM, CPROGINDRMAX, CPROGRDEPTH, CPROGRDBTSZ).
MPI_GOODPIGS
This, along with REFREQ, CCOLLECT, and CDISTANCE, is one of the main parameters of the PIGS protocol (see link for details). It determines how many of the parallel replicas are protected from being reseeded and serve as database for the remaining replicas. If MPI_GOODPIGS equals the actual number of replicas (normally set by REPLICAS), the PIGS algorithm relaxes to the propagation of independent, identical copies of the system (basic functionality of MPIAVG). There is no consensus rule for good choices for this parameter, but a reasonable starting point is usually given by setting it to REPLICAS/2.MPIAVG_XYZ
If the MPI averaging technique is in use (→ MPIAVG), this simple logical keyword lets the user choose to obtain trajectory data for each of the independent, identical replicas separately (which is also the default). If this keyword is explicitly set to zero (logical false), only a single trajectory file will be written with entries cycling not only through the time or equivalent axis but also through replica space (see elsewhere for details). The choice here is mostly a matter of convenience for postprocessing but note that with individual trajectories REPLICAS as much structural data are written as with a single file. Lastly note that very frequent write operations by different processes to a shared output file may occasionally cause race conditions and/or be inefficient due to long waiting times.MPICOLLS
This keyword acts as a simple logical (turned off by default) that allows the user to enable the usage of collective communication routines defined by the MPI standard for selected communication operations in CAMPARI (routines such as MPI_ALLREDUCE, MPI_BCAST, etc). These routines should at all times be functionally equivalent to what CAMPARI would use otherwise, i.e., collective primitives constructed exclusively from blocking send and receive operations (MPI_SEND and MPI_RECV).The reason for having such a keyword is twofold. First, buggy code in conjunction with these MPIdefined collective communication routines can be difficult to diagnose and debug, because the MPI standard requires an outcome, but not a specific implementation. Essentially, developers and users cannot make any assumptions about the underlying communication flow. In general, this is of course desired (especially from a performance point of view), since it leaves the optimization of said communication to the MPI library rather than forcing the calling program to address these issues. Second, there are enough reports on the web of potentially faulty implementations of these routines in common MPI libraries. In conjunction with additional concerns regarding thread safety, etc, it could prove advantageous to developers to have modifiable implementations in place.
MPIGRANULESZ
If custom CAMPARI routines for collective communications are in use (→ MPICOLLS), and if a calculation is performed that relies on such collective communication operations, this keyword lets the user alter the communication flow structure CAMPARI sets up to handle these cases. The keyword specifies a number of processes, amongst which communication is presumed fast (most often the number of CPU cores on a single board). The communication flow is then set up in a way that minimizes the required communication between such blocks of processes (they are generally assumed to be in sequence and to all be of identical size). This keyword is therefore unlikely to be useful for heterogeneous allocations (different numbers of cores granted on different machines or processes distributed nonsequentially). Between blocks, communication attempts to minimize latency (tree topology), whereas within blocks communication is (currently) strictly hierarchical and sequential with a single head process for each block. This means that (currently) setting MPIGRANULESZ to the number of processes granted by MPI will generate a global hierarchical flow with a single master, whereas setting it to 1 will generate a global treelike flow.Output and Analysis:
(back to top)
Preamble (this is not a keyword)
Unlike most other simulation software, CAMPARI offers to analyze certain quantities while the simulation is being performed ("onthefly"). This has the advantage that the frequency of dumping raw trajectory data to the disk does not have to control the frequency of analyses. This can save time and money by circumventing expensive write operations to disk. Of course, in a typical simulation setting, the user will still want to obtain trajectory data: for visualization, for nonyetdefined analyses, and so on. However, the builtin analyses can still prove beneficial by utilizing as much data as possible. This is generally controlled by several interval settings: analysis X should be performed or instantaneous data Y should be reported every N steps. Such keywords (see for example ANGCALC) are interpreted the same way unless otherwise noted. For example, if ANGCALC is 250 and NRSTEPS is 1000, the analysis would be performed at steps #250, 500, 750, and 1000. There is only one other keyword affecting this: the number of equilibration steps. If in the above example EQUIL is 400, the analysis would only be performed at steps, 500, 750, and 1000 (i.e., the count is always relative to the 0^{th} step).Note that some analyses can be costly. Their scaling with system size will usually be stated. At the very end, the logoutput will typically report the fraction of CPU time spent performing analysis routines. This may help assess whether some of the frequency settings should be reduced. Simple ways to disable builtin analyses are provided.
CAMPARI often groups statistics together. For example, for a melt of identical polymers, CAMPARI would by default compute only a single histogram of endtoend distances. This grouping is at times undesired and is overcome by the concept of analysis groups. Unfortunately, the opposite task of grouping information from different species together is currently not supported for such analyses.
For long CAMPARI runs, the instantaneous analysis has the downside that (currently) no intermediate results are produced (everything remains in memory until the very end). In this case, it can be useful to utilize the restart functionality (→ RSTOUT) to produce simulation blocks of equivalent sizes each with complete analysis output files. This strategy also serves to preserve more information in case of unexpected crashes or job terminations. The alternative is to follow the classical route of shifting the entire analysis burden to postprocessing by only saving instantaneous trajectory output. As mentioned above, this has the downside of dealing with larger amounts of data and, more importantly, with a loss of coordinate precision (for example, when using the xtc compression library). Another issue that can prove problematic with long runs is that some instantaneous output files (such as the file with running energy terms) are subject to buffered I/O. This depends on compiler and operating and file systems and means information can be lost in case of unclean terminations, which makes it harder to diagnose the error. Keyword FLUSHTIME helps with this problem.
When the shared memory (OpenMP) parallelization is in use, there are a few additional considerations to be taken into account. First, few analyses are per se parallelized (mentioned below for these cases). For the remainder, which are largely inexpensive in terms of CPU time, CAMPARI uses an inhomogeneous and dynamic task parallelization approach at every step. This means that it is beneficial to to make sure that analyses are triggered at the same simulation/trajectory steps. Conversely, if their executions are staggered, i.e., only one or few tasks are executed at a given step, the load distribution across threads by task cannot provide a benefit. Notably, the final processing/printing of results, with the exception of structural clustering does not employ the shared memory parallelization at all. In contrast, it is possible to chop a large trajectory into multiple pieces and to use MPIparallel trajectory analysis to speed up analyses on large data sets (whether this involves combining the results in the end or not).
RSTOUT
This keyword sets the interval specifying how often to write out a restart file. Such a file will allow continuing both crashed and normally terminated runs without losing significant accuracy due to truncation of significant digits (such as in pdbfiles). Note that they are not bitwise perfect, however. The concept is described elsewhere. Restart files are written to two files continuously replacing themselves on an alternating schedule such that even if a crash occurs during a writeoperation at least one sane restart file should exist. These files are generally named {basename}_1(2).rst. Settings for EQUIL are (of course) irrelevant for this output. Whenever a restart file is written, the system's potential energy is recalculated, which is the only part aided by CAMPARI's shared memory (OpenMP) parallelization.ANGRPFILE
This keyword sets path and name to the input file for determining analysis groups by custom request rather than by molecule type (→ ANGRPFILE). By default, CAMPARI will often combine collected analysis data for molecules of identical type. This is not always the desired behavior. For example, CAMPARI fails to recognize differences introduced to molecules of the same type by virtue of moleculespecific constraints or biasing potentials. Analysis groups alleviate this and similar problems by allowing to group molecules of identical type into arbitrary analysis groups. Note that it is never possible to combine data for molecules of chemically different type or to split a single molecule into multiple groups (although the latter may be implemented in the future). Systems employing chemical crosslinks (please refer to sequence input for details) pose a special case: here, intermolecular crosslinks do not conjoin two molecules in terms of data structures and analysis, i.e., it will for example (currently) not be possible to obtain the net radius of gyration of two crosslinked polypeptide chains. Instead, both chains will be analyzed and treated as if they were separate molecules.FLUSHTIME
This keyword determines the desired time interval (in minutes, with the caveat that there is only a single measurement per elementary step) for two things. First, every FLUSHTIME minutes (fractional values are allowed), CAMPARI will report the current production rate (in elementary steps/day) and time to finish. This happens to either logoutput (for serial or pure MPI executables) or to a dedicated output file in case the shared memory (OpenMP) parallelization is in use. This performance estimation is bound to be misleading for inhomogeneous calculations where the average cost of a step per time interval can change dramatically (for example, in simulations with hybrid propagators or analysis runs where not all analysis frequencies are 1). Second, CAMPARI will flush the buffers of all instantaneous output files every FLUSHTIME minutes. This can be useful because I/O buffers on a given system may be so large that the information in these files (which are often used to monitor a running simulation) are rarely uptodate, and that significant data loss can occur upon unexpected terminations.DISABLE_ANALYSIS
In CAMPARI, many analysis features are turned on by default. This keyword exists to simplify turning them off, for example, when trajectory postprocessing is desired, for trial runs, etc. DISABLE_ANALYSIS is processed at the lowest hierarchy level of the keyfile parser and, like all other keywords, sequentially for a given hierarchy level. It is not a keyword in the traditional sense since it does not set any CAMPARIinternal variable associated with it to a given value. Instead, it is a shortcut for listing explicitly analysis features with calculation intervals larger than the simulation length. The options are: All analysis and instantaneous output options are disabled. The only features not affected are the writing of initial and final structure files, the printing to log output (warnings, summary, and some progress information), the logfile for the OpenMP parallelization, the printing of the trace file in certain MPI multicopy runs, and the writing of restart files. Using this option without other keywords overriding DISABLE_ANALYSIS, a run will not provide much useful information.
 All analysis options are disabled. This is the same as the previous option only that simple instantaneous output features like energy or trajectory output are not turned off. Instantaneous output relying on a builtin feature are disable implicitly, e.g., running DSSP output.
 All instantaneous output options are disabled. This is the same as option 1 except that it leaves all builtin analysis features at their defaults (some are disabled by default in any case since they rely on additional information) but disables both simple and dependent instantaneous output features.
ENOUT
This keyword defines the interval how often current potential energy data are written to a file called ENERGY.dat. Note that the total energy is decomposed into the individual terms controllable by keywords of the type SC_XYZ (for example SC_IPP). It is presently not possible to obtain energy decompositions based on subcomponents of the system. The data in ENERGY.dat are the only direct printout of unperturbed energy values if energy landscape sculpting is in use. Settings for EQUIL are ignored for this output. Note that energies are calculated at every step for every term that is turned on. Energies are never calculated specifically for reporting purposes. This means that ENOUT has no significant associated cost per se and, more importantly, cannot be used to speed up postprocessing runs where energy values are required only for selected frames of an input trajectory (the correct way to deal with this situation is to first extract these frames and to then run the energy analysis in a second step).ENSOUT
By this keyword, the user sets the interval how often to write current ensemble data to a file called ENSEMBLE.dat. This is only relevant if DYNAMICS is not set to 1 or 6 (pure Monte Carlo sampling or minimization). The reported quantities are informative ensemble variables (limited output presently) including  most prominently  potential and kinetic energies. Settings for EQUIL are ignored for this output. As is the case for keyword ENOUT, these values are calculated at every step regardless, and no computation occurs specifically for reporting purposes.ACCOUT
If pure Monte Carlo or hybrid sampling is used (→ DYNAMICS), this keyword sets the interval how often to report cumulative acceptance data to a file called ACCEPTANCE.dat. Note that these data are mildly informative in that they do not directly allow to compute acceptance rates. They are mostly useful in analyzing a running simulation and assessing the performance of the move set. CAMPARI will report acceptance statistics as well as residue and moleculeresolved acceptance counts at the very end of the simulation to logoutput. The data in ACCEPTANCE.dat are only resolved by move type. Settings for EQUIL are ignored for this output.TOROUT
This keyword lets the user decide how often to write sets of internal coordinate space degrees of freedom to a file FYC.dat in a one structureperline format. These files can easily become large due to the number of degrees of freedom in general scaling linearly with system size. There are two options selected by using a positive (mode 1) or negative integer (mode 2) for TOROUT. Native CAMPARI degrees of freedom are written with a header providing residuelevel information. These generally correspond to the unconstrained degrees of freedom in Monte Carlo or torsional dynamics calculations (see sequence input for details). All but rigidbody coordinates are written to FYC.dat and much more information is provided there. Because rigidbody coordinates as missing, the information in the file is never enough to completely reconstruct the system even when assuming the default covalent geometries
 Sampled, dihedral angle degrees of freedom are written with a header that provides atomic indices corresponding to the various Zmatrix lines describing these dihedral angles. This mode excludes degrees of freedom that are actually frozen, and can include degrees of freedom that are not native to CAMPARI. All values are again written to FYC.dat, and more details are provided there. This mode never includes bond angles and/or dihedral angles that have no explicit Zmatrix entry.
XYZOUT
This very important keyword sets the frequency with which snapshots containing (at least) the Cartesian coordinates of the system (or selected subsystem) are written to a new file or appended to an existing trajectory file (→ documentation on output files and keyword XYZPDB). Part of the filename(s) will be determined by keyword BASENAME. This is the fundamental saving frequency for obtaining trajectory data and should be chosen carefully whenever the proposed simulation is resourceintensive. These files can easily be very large, and it is possible for significant I/O lag to arise. The printing of trajectory data is done by a single thread if CAMPARI's shared memory (OpenMP) parallelization is in use. This happens with other threads performing other tasks concurrently (if there are any).XYZPDB
If structural output is requested (→ XYZOUT), this keyword chooses the output file format (see documentation on output files). It is an integer [13(4,5)] interpreted as: Tinkerstyle .arcfiles (ASCII)
 ASCII .pdbfiles (default option) in various output conventions (see PDB_W_CONV and PDB_OUTPUTSTRING)
 CHARMMstyle binary .dcdfiles (these include the box information for each snapshot and have a CHARMMstyle header  note that the header is written only once by CAMPARI and contains the number of snapshots in the file. This may not always be correct if simulations are prematurely terminated or trajectory files appended)
 Compressed binary .xtcfiles as used in GROMACS: note that this option is only available if the program is linked against a proper version of XDR (see installation instructions)
 Compressed binary .ncfiles as define by the NetCDF format in AMBER convention: note that this option is only available if the program is linked against a proper NetCDF library (see installation instructions).
XYZMODE
If structural output is requested (→ XYZOUT), this integer [12] keyword determines whether to write to a series of numbered files (1) or a single file (2, the default). This, however, currently works for pdb only (specifically: .arc are always multiple files, and the binary formats always write to (append) a single file).XTCPREC
If structural output is requested (→ XYZOUT) and the chosen output format is the binary .xtcformat (option 4 for XYZPDB), this keyword can be used to specify the multiplicative factor determining the accuracy of compressed xtctrajectories (the minimum is 100.0). It is also required for proper reading of xtctrajectories in xtcanalysis mode (see PDBANALYZE and XTCFILE).PDB_NUCMODE
CAMPARI's internal representation of polynucleotides has one peculiarity. It assigns the entire PO_{4}^{} functional group to the same nucleotide residue whereas most other programs seem to assign the 3'oxygen atom to the residue carrying the sugar. This causes a nontrivial inconsistency when trying to use CAMPARIgenerated pdbfiles as input for other software. Therefore, this keyword defines how to assign the O3*atom of nucleic acids in pdboutput only. There are two options: The O3*atom is assigned to the residue carrying the 5'phosphate it is part of, i.e., it is the very first atom in that residue. This is the CAMPARIinherent convention and reflects the authentic structure of arrays in CAMPARI (which is relevant for any analysis requiring atom numbers, see for example the input on selecting specific distance distributions to be collected).
 The O3*atom is assigned to the residue carrying the sugar it is part of; this is the PDBtypical convention. Note that this inherently disrupts the 1:1correspondence between numbering in the pdbfile and how nucleic acids are represented internally. It is recommended if and only if CAMPARIoutput is sought to be compatible with other software working in this latter convention. Note that for this option to work correctly with unsupported polynucleotide residues (recognized as such) atom names have to be canonical.
PDB_W_CONV
CAMPARI can in general process different atom and residue naming conventions for the formatting of PDB files. This keyword selects the format for written files. Choices are: CAMPARI format
 GROMACS format (atom naming, nucleotide and cap residue names, ...)
 CHARMM format (atom naming, cap residue names and numbering
(patches), ...): Note that there are two exceptions pertaining to
Cterminal cap residues (NME and NH2) which arise due to nonunique
naming in CHARMM: 1), NH2 atoms are called NT2 (instead of NT) and
HT21, HT22 (instead of HT1, HT2), and 2), NME methyl hydrogens are
called HAT1, HAT2, HAT3 (instead of HT1, HT2, HT3).
 AMBER format (atom naming, nucleotide residue names, ...)
PDB_OUTPUTSTRING
This keyword allows changing the formatting string (Fortran) used for PDB files written by CAMPARI. This can be useful to make PDB files of very large systems and in particular for changing the precision of PDB files. In order for CAMPARI to read these files back in, the analogous keyword PDB_INPUTSTRING has to be used. Because Fortran in general deals poorly with stringbased I/O, any improper use of this keyword can easily lead to abnormal program termination. The default format is "a6,i5,1x,a4,a1,a3,1x,a1,i4,a1,3x,3f8.3" (with the quotes). The letters (a, i, f) give the type of variable, which must not change. The numbers give the fields lengths, and these can be customized for variables of type integer ("i") or real ("f"). It is also possible to modify the field widths of string variables ("a") but that is likely harmful because the variables in use are tied to the default format. Note that the insertion code (the last "a1" element) is always written as a blank by CAMPARI since all residues are renumbered consecutively. The same is true for the alternate location indicator (the first "a1" element). The vast majority of modifications to the output format will create files that are no longer readable (at least correctly) by other software, .e.g., visualization software, or other simulation codes, and it may be useful to use CAMPARI itself (or to write a script) to convert back these nonstandard files whenever needed.Common problems with standard PDB files, which can be addressed at least in part by the format string, are that the integer number for atom index overflows, that the chain indicator becomes fused to neighboring columns (because of overlong residue names or large residue numbers), that the residue number column overflows, that the coordinate entries get fused or overflow (if absolute coordinates are not centered at small (in absolute magnitude) values), or that the coordinate precision is insufficient for recovering exact covalent geometries based on this information alone.
XYZ_SOLVENT
If structural output is requested (→ XYZOUT), this logical keyword allows the user to suppress trajectory output for molecules labeled as solvent. This can be useful to downconvert trajectory files from explicit solvent runs or  more general  to isolate certain parts of the system from existing trajectory data (employing PDBANALYZE and ANGRPFILE). It may also be used to save space during actual simulations but it should be kept in mind that information about the solvent may be lost irrevocably and that the resultant trajectories may no longer be straightforward to analyze. A more general option for printing only parts of the system is provided by supplying an index file via keyword TRAJIDXFILE.TRAJIDXFILE
Usage of keywords XYZ_SOLVENT in conjunction with the concept of analysis groups allows the user some amount of fine control over what is written to the trajectory file. In some scenarios this may not be enough (for example, if external scripts or software or even CAMPARI itself are meant to analyze nontrivial subsets of the system). Then, the user has the option to supply a simple index file providing peratom control over what coordinate information is written to the trajectory file. Note that this will be useful for subsequent trajectory analysis runs only if the selected subset preserves the integrity of all molecules to remain in the output, or if the output format is pdb such that missing atoms can be rebuilt. For example, consider a block copolymer consisting of two blocks. The full trajectory could be reanalyzed using an index file to yield a reduced trajectory in pdbformat (keywords XYZOUT, XYZPDB, and XYZMODE) that contains only one of the two blocks. With a properly adjusted sequence input file, it may then be possible to perform intrinsic CAMPARI analyses over the isolated block which really was part of a larger molecule. In this process, almost certainly some terminal atoms would have to be rebuilt at the break point (but those may not influence the analyses). For a description of the input file format, see here. Note that all other output selection settings are ignored if an index set is used via this keyword.XYZ_FORCEBOX
If a system is simulated or analyzed that utilizes periodic boundary conditions, this keyword can be used to alter the standard CAMPARI way of placing atoms with respect to the unit cell. By default, CAMPARI will never break up molecules in trajectory output, which implies that the absolute coordinates in the trajectory file(s) can extend significantly beyond the formal boundary of the unit cell. Similarly, by default CAMPARI will assume that structural input preserves the integrity of molecules, i.e., that it conforms to exactly this standard. Sometimes (for example, for visualization or for certain analyses), it may be desired to instead have all atoms be inside the unit cell, and this is what this keyword accomplishes.There are currently 4 different options related to both input and output.
 For both input and output, CAMPARI assumes that molecules are intact and intended to be left intact.
 For output, CAMPARI will leave molecules intact. For input, CAMPARI will assume that the coordinates are such that molecules have been forced to reside inside the central unit cell. Using box information, CAMPARI will calculate image shift vectors and apply them to the processed input. This option is incompatible with PDB_READMODE being 1. Image shift vectors are evaluated before possible tolerance violations are considered (see keywords PDB_TOLERANCE_B and PDB_TOLERANCE_A.
 For input, CAMPARI assumes that molecules are intact (no corrections applied). For output, it will force coordinates to reside inside the central unit cell by breaking up molecules as needed.
 This options combines the properties of options 1 and 2 for input and output, respectively.
Note that in some cases trajectory files with brokenup molecules may be ambiguous unless information about the expected topology is present or provided. The input strategy currently implemented works as long as the length of Z matrix bonds remains small in comparison to the box dimensions. Note that the start of the simulation from a pdb file is affected by this keyword as is the analysis of (binary) trajectory files.
It is not recommended to use options 1 or 3 above for writing trajectories from a production simulation, rather the output feature is intended to transform preexisting trajectories (via trajectory analysis mode). Lastly, in the case of an analysis run, any structural input with entire molecules given as the "wrong" images will also be adjusted by options 1 and 3 above. This scenario should be avoided as it leads to inconsistencies in any operations relying on absolute coordinates. For options 0 and 2, output coordinates will not be altered by XYZ_FORCEBOX but they may still be altered by keywords XYZ_REFMOL and ALIGNCALC.
XYZ_REFMOL
If a system is simulated or analyzed that utilizes periodic boundary conditions, this keyword can be used to alter the standard CAMPARI way of placing molecules in three contexts, viz., trajectory output, structural clustering using absolute Cartesian coordinates (RMSD), and, similarly, the derivation of the alignment operator for output trajectory alignment. The use of this keyword is explained primarily in the context of the first role.By default, CAMPARI will never allow the geometric center (or center of mass in gradientbased simulation runs) to "leave" the central unit cell. When looking at intermolecular interfaces, this can lead to the undesirable effect of the interface being broken across the periodic boundary. These images often flicker back and forth, which makes visual inspection difficult unless periodic images are explicitly replicated. XYZ_REFMOL allows the user to specify a reference molecule whose center serves as reference point for all images, i.e., the coordinates of all other molecules printed to trajectory output are those of the nearest image of these molecules with respect to the chosen reference. This operation does not destroy information (i.e., it does not center or align anything) but leads to molecules being displayed that are outside of the central unit cell. In fact, the reference molecule is the only one that is guaranteed to reside in the central cell at all times.
Note that this keyword does not actually alter coordinates used internally, and therefore has no impact on the majority of analysis functions, etc. The only exceptions are structural clustering relying on absolute Cartesian coordinates (options 5, 6, and 10 for CDISTANCE) and the trajectory alignment facility. For the latter XYZ_REFMOL simply picks the reference molecule to override the internal heuristic. In both scenarios, the alignment operator is derived using imagecorrected coordinates for all conformations in question. This role of XYZ_REFMOL is distinct but not separable from the role for trajectory output (i.e., it is not possible to use XYZ_REFMOL to pick the reference molecule for image selection for clustering or trajectory alignment while retaining the default trajectory output). XYZ_REFMOL is also ignored for the pdb files written at the beginning and end of a simulation. Along similar lines, trajectory files created in such a manner can be read back by CAMPARI without problems (internally, every molecule is translated to the central unit cell upon readin as long as the box information (BOUNDARY, SIZE, and ORIGIN) is preserved).
ALIGNCALC
In trajectory analysis runs CAMPARI offers the option to structurally superpose the current Cartesian coordinates to a suitable reference. Note that this functionality is conveniently available through almost all molecular visualization software packages. CAMPARI provides automatically generated visualization scripts designed to work with VMD. If these options are unavailable or inconvenient, for example, because the visualization program tries to read an entire very large data set into memory, ALIGNCALC lets the user set the interval at which CAMPARI should perform structural alignment. For example, to create  from an original trajectory  a superposed trajectory of every 10th frame, XYZOUT would have to be 10 and ALIGNCALC would have to be 10 or a factor of 10 (5,2,1).For convenience, the root mean square deviation over the alignment set after alignment can be written to an instantaneous output file. This can be enabled by specifying a negative number to ALIGNCALC, which is, except for the sign, interpreted in the same way. Sometimes it may also be desirable to align on one set of atoms and compute RMSD values for another set. CAMPARI supports this, if in addition to an appropriate choice of alignment set, the user provides another index set via keyword CFILE. This second input then becomes an additional set to compute RMSD values for. This is the same logic as found in RMSDbased structural clustering with split sets (see option 6 for keyword CDISTANCE).
Alignment happens before any of the analysis routines are called and works by first defining a reference set of atom indices (→ ALIGNFILE). It can be somewhat timeconsuming and is currently not aided in any way by CAMPARI's shared memory (OpenMP) parallelization. Using a quaternionbased algorithm, an optimal translation and rotation is determined that minimizes  when applied to the current coordinates  the deviation between the transformed current coordinates and the reference coordinates (i.e., a set of coordinates for all atoms in the alignment set). Note that this procedure will always preserve the internal state of molecules and  except for certain cases in periodic boundary conditions  the relative arrangement of molecules. It will not, however, preserve the relative position of the system boundary. This may lead to artifacts in energetic analyses of aligned trajectories or any analyses that rely upon relative, intermolecular coordinates.
There are two ways of defining/providing the coordinates for the alignment set. The first is via an external file. Here, CAMPARI reuses the pdbtemplate functionality. If keyword PDB_TEMPLATE is specified and successfully read, then the reference coordinate set is extracted from this file for the set of atoms defined via ALIGNFILE. Note that the template may serve a double purpose in this scenario as it may still provide the atom numbering map needed to read binary trajectory formats with nonCAMPARI atom order. If no template is specified, the reference coordinates will be defined by the previously aligned structure. This successive alignment therefore uses a different reference coordinate set each time and will consequently lead to drift.
As described for keyword XYZ_REFMOL, the combination of periodic boundary conditions and multimolecular assemblies can become ambiguous in terms of absolute coordinates. By default, CAMPARI will scan the alignment set and use the molecule with the largest number of contributing atoms as the reference one. This choice can be overridden by keyword XYZ_REFMOL. It can be confusing that the alignment operator is derived for the imagecorrected coordinate sets yet the transformation is applied to the coordinates in their default state. This is relevant if molecules are not found in the central unit cell in the trajectory files. In this scenario, the output gets particularly difficult to interpret if XYZ_REFMOL and/or XYZ_FORCEBOX modify the output coordinates as well (not recommended).
ALIGNFILE
If system alignment is possible and requested (→ ALIGNCALC), this keyword allows the user to supply the path and name of a mandatory input file containing an atomic index list defining the set of atoms to align on. For example, in the simulation of a macromolecule with cosolutes it will not be meaningful to use the entire set of atoms in the system as the alignment set since the randomly dispersed cosolutes will dominate the alignment. Instead, one will typically want to only supply nonsymmetric protein atoms here.This keyword serves a second purpose, viz., if structural clustering is requested, and if an RMSD distance criterion with differing alignment and distance atom index sets is desired, this keyword lets the user specify the input file with the alignment set. Simultaneous use of both functionalities is permitted. The converse is also possible, i.e., to specify an additional distance set for RMSD evaluation and instantaneous output in the same logic. Then, keyword CFILE can be used to specify this additional distance set. Lastly, note that any set used for alignment must consist of at least three atoms.
POLOUT
This keyword sets the interval how often to compute and write current systemwide polymeric variables (→ POLYMER.dat). This instantaneous output can be useful to easily monitor structural changes (such as dimerization events) in dilute systems with heterogeneous density. It is completely uninformative for systems with homogeneous density. For simulations of a single polymer chain, distributions of polymeric order parameters as well as correlation functions can be computed from the output in POLYMER.dat. When CAMPARI's shared memory (OpenMP) parallelization is in use, the computation of the systemwide gyration tensor and related properties as well as the writing to POLYMER.dat are done by a single thread. This happens with other threads performing other tasks concurrently (if there are any).POLCALC
This keyword lets the user specify the frequency with which values for polymeric properties incurring low computational cost are computed. These data are collected and reported resolved by analysis group and include characteristic values for shape and size, histograms of endtodistances, etc. Setting this keyword such that polymeric analyses are performed, several output files are generated: (→ POLYAVG.dat, RGHIST.dat, RETEHIST.dat, and RDHIST.dat). Furthermore POLCALC controls the interval for data collection to obtain averages of the suitably defined angular correlation function along the polymer backbone, which may be related to the intrinsic stiffness or persistence length of the polymer (→ PERSISTENCE.dat and TURNS_RES.dat). Lastly, this keyword controls the frequency for the computation and averaging of molecular, radial density profiles, i.e., the mass distribution function along the radial coordinate originating from the each molecule's center of mass considering only atoms belonging to that molecule (→ DENSPROF.dat). This quantity is used in Lifshitztype polymer theories. When CAMPARI's shared memory (OpenMP) parallelization is in use, these calculations are performed by a single thread while other threads perform other tasks concurrently (if there are any).RHCALC
Since the computation of comprehensive polymerinternal distances is more expensive, this dedicated keyword controls the data collection interval for analyses relying on such data. A comprehensive set of internal distances in CAMPARI is used to compute three quantities: An alternative estimate of the polymer's spatial size which is sometimes related to the hydrodynamic radius (→ corresponding entry in POLYAVG.dat; note that should RHCALC be set such that no analysis is performed but POLCALC be chosen such that the other quantities in POLYAVG.dat are compute and provided, the corresponding column must be ignored).
 A scaling profile of the internal distances with distance of separation in primary sequence (→ INTSCAL.dat).
 The scattering (Kratky) profile of the polymer (→ KRATKY.dat; this relies on the additional frequency setting SCATTERCALC).
SCATTERCALC
As alluded to above, this keyword sets an auxiliary frequency for the calculation of scattering properties resolved by analysis group (→ KRATKY.dat). This requires computing Fourier transforms of internal distances for a series of wave vectors and is consequently a very expensive calculation. Due to the coupling to the computation of internal distances (see RHCALC), this keyword is not interpreted like the other interval keywords (???CALC). Instead, SCATTERCALC sets the calculation interval amongst only those steps chosen already via RHCALC. For example, if RHCALC is 10 and SCATTERCALC is 20, then scattering data will be accumulated every 200 steps. The data in KRATKY.dat can be used to compare simulation data directly to experiment. In a doublelogarithmic plot, it may also be possible to identify linear regimes ("power law regime" in contrast to the "Guinier regime" for smaller wave vectors) which can be fit to yield the scaling exponent for fractal objects. Conversely, for globular polymers, Porod's law may hold.SCATTERRES
Since the required number of points and range of wave vectors for the prediction of scattering profiles may be systemdependent, this keyword allows the user to adjust the spacing of wave vectors assuming scattering data are being calculated at all (→ RHCALC and SCATTERCALC). The first wave vector's absolute magnitude q=q will always be 0.5·SCATTERRES with units of Å^{1}. In general, the larger the chain, the smaller the absolute magnitudes of wave vectors needed.SCATTERVECS
Since the required number of points and range of wave vectors for the prediction of scattering profiles may be systemdependent, this keyword allows the user to adjust the total number of employed wave vectors assuming scattering data are being calculated at all (→ RHCALC and SCATTERCALC). Together with SCATTERRES, this determines the range of the wave vectors. Note that generally a coarse resolution (and hence a small number of vectors) is sufficient as scattering profiles tend to be very smooth functions.HOLESCALC
For polymers it may be interesting to analyze the distribution of "internal" void spaces. In CAMPARI, a rudimentary analysis routine exists which attempts to place spheres of varying size at different distances from the molecule's centerofmass and to record whether any overlap with part of the polymer is encountered. This analysis is recorded in instantaneous output (HOLES.dat), and the latter needs to be postprocessed. Note that this analysis is restricted to simulations of monomeric polymers. When CAMPARI's shared memory (OpenMP) parallelization is in use, it is performed by a single thread while other threads perform other tasks concurrently (if there are any).RGBINSIZE
If standard polymeric analyses are performed (→ POLCALC), this keyword sets the size of the bins in Å for the three output files RGHIST.dat, RETEHIST.dat, and DENSPROF.dat. It therefore determines the resolution along the radius of gyration or related axes.POLRGBINS
If standard polymeric analyses are performed (→ POLCALC), this keyword can be used to set the number of bins of size RGBINSIZE for the three output files RGHIST.dat, RETEHIST.dat, and DENSPROF.dat. Since quantities like the radius of gyration or endtoend distances are strongly systemdependent, it is up to the user to ensure the appropriate number of bins. Note that  just like all other histograms in CAMPARI  terminal bins will be overstocked should range exceptions occur.PHOUT
This keyword controls the frequency how often to output ionization states of certain ionizable residues. Currently, this analysis relies on pseudoMonte Carlo moves (see PHFREQ) to work and is therefore only available in straight MC runs. Further limitations are listed in the descriptions of sampler and output file.ANGCALC
This keyword lets the user define the interval how often to extract polypeptide backbone torsion angle statistics, i.e., how often to go through all nonterminal polypeptide residues and bin values for the φ/ψangles into a twodimensional histogram. This keyword also controls the data collection frequency for estimation of vicinal NMR Jcoupling constants (H_{N} to H_{α} → JCOUPLING.dat). The Ramachandran analysis itself is reported globally in a file called RAMACHANDRAN.dat. Due to the systemwide averaging (including over molecules of different type), this is probably most meaningful for simulations of single homopolymers. For more detailed control, further output files may be obtained: residuespecific as well as analysis groupspecific maps should requests have been provided via keywords RAMARES and RAMAMOL, respectively. When CAMPARI's shared memory (OpenMP) parallelization is in use, these analyses are performed by a single thread while other threads perform other tasks concurrently (if there are any).ANGRES
This keyword matters only if ANGCALC is chosen such that polypeptide backbone φ/ψstatistics are accumulated. If so, it sets the resolution in degrees for such angular distribution functions. The smallest permissible value at the moment is 1.0°.RAMARES
This keyword matters only if polypeptide φ/ψanalysis is requested (→ ANGCALC). If so, it allows the user to monitor the distributions specifically for selected polypeptide residues in the system. The first entry, which defaults to zero, specifies the number of such specific requests. The user then has to provide the appropriate number of integer values (residue numbers as defined per sequence input) on that same line in the keyfile. The maximum number for individually monitored residues is limited to 1000, but even this requires increasing the default string length CAMPARI assumes (in a file called macros.i) during compilation. Successful requests (those pointing to nonpolypeptide, nonexisting, or terminal residues will be ignored) will create output files like "RESRAMA_00024.dat".RAMAMOL
This keyword is exactly analogous to RAMARES only that it operates not on residue but on analysis groups (all residues of all molecules in that analysis group are pooled, numbering as reported initially in the logoutput). It will create files like "MOLRAMA_00002.dat".INTCALC
This keyword sets the interval how often to compute comprehensive statistics for typical internal coordinates of the system, i.e., all bond lengths, angles, torsional angles, as well as improper torsional angles (trigonalplanar centers  consult PARAMETERS for further details). Note that molecular topology defines which atom pairs  for example  share a bond. With this analysis, it is therefore not possible to analyze arbitrarily defined distances, angles, and torsion angles in the system. If turned on, up to five different output files are provided, namely INTERNAL_COORDS.idx, INTHISTS_BL.dat, INTHISTS_BA.dat, INTHISTS_DI.dat, and INTHISTS_IM.dat. When CAMPARI's shared memory (OpenMP) parallelization is in use, all of these analyses are performed (in sequence) by a single thread while other threads perform other tasks concurrently (if there are any).WHICHINT
This is one of the few keywords expecting multiple inputs and matters only if internal coordinate analysis is requested (→ INTCALC). Four integers should be provided and each one is interpreted as a logical to turn on individual groups of internal coordinate analyses. The first turns on the calculation of bond length histograms, the second that of bond angle histograms, the third that of improper dihedral angle histograms, and the fourth that of proper torsional angle histograms. Note that the number of possible internal coordinates quickly exceeds the number of atoms for any complex molecule. These analyses can therefore easily become fairly timeconsuming as well as datarich (in terms of the sizes of the output files). This is one of the reasons for introducing this selection mechanism. The other lies simply in the fact that in any simulation using CAMPARItypical torsional space constraints (see CARTINT) analyses of bond length, angle, and improper dihedral distribution is meaningless.SEGCALC
This keyword lets the user specify the interval how often to scan the polypeptide backbone for stretches of similar secondary structure (as defined in the specified file through FMCSC_BBSEGFILE). The annotation  in contrast to DSSP  is obtained purely on torsional criteria and relies on defining consensus regions within φ/ψspace. These consensus definitions are found in a supplied data file (→ BBSEGFILE). At the end of the simulation results are written to files named BB_SEGMENTS_NORM.dat, BB_SEGMENTS_NORM_RES.dat, BB_SEGMENTS.dat, and BB_SEGMENTS_RES.dat This analysis is resolved by analysis group and useful to identify coarse secondary structure propensities in polypeptides. As an example, the data in BB_SEGMENTS_NORM_RES.dat can be used to compute parameters of the helixcoil transition according to the LifsonRoig formalism (see for example Tutorial 3 or this reference). SEGCALC also controls the computation of global (at a molecular level) secondary structure order parameters f_{α} and f_{β} (which are also used for the corresponding bias potentials → SC_ZSEC used in Tutorial 9 or this reference). Various distribution histograms are written to files ZSEC_HIST.dat, ZAB_2DHIST.dat, and ZBETA_RG.dat. Analysis of these order parameters is similarly performed in analysis groupresolved fashion. When CAMPARI's shared memory (OpenMP) parallelization is in use, both of these analyses are performed sequentially by a single thread while other threads perform other tasks concurrently (if there are any).DSSPCALC
This keyword specifies how frequently to perform DSSP analysis. DSSP is a secondary structure assignment procedure for proteins (reference). All eligible (i.e., full peptide) residues are scanned for backbonebackbone hydrogen bond patterns and various statistics and running output is provided if so desired (see DSSP_NORM_RES.dat, DSSP_NORM.dat, DSSP.dat, DSSP_RES.dat, DSSP_HIST.dat, DSSP_EH_HIST.dat, and DSSP_RUNNING.dat). The DSSP results typically complement the results from backbone segment statistics (see for example BB_SEGMENTS_NORM_RES.dat) well as the former are based exclusively on hydrogen bond patterns while the latter are based exclusively on dihedral angles. When CAMPARI's shared memory (OpenMP) parallelization is in use, the DSSP analysis is performed by a single thread while other threads perform other tasks concurrently (if there are any). Similar to contact analyses, the determination of the hydrogen bond patterns scales, at some level, poorly with system size. They thus can become performancelimiting, which will similarly limit parallel efficiency.INSTDSSP
If DSSP analysis is requested (→ DSSPCALC), this keyword is interpreted as a simple logical whether to write out running traces of the full DSSP assignment for the current snapshot (see DSSP_RUNNING.dat). This can be useful when analyzing input trajectories or even individual pdbstructures with CAMPARI. Instantaneous DSSP output is currently not supported for MPIaveraging calculations (see MPIAVG). This output file can easily become very large, and it is possible for significant I/O lag to occur because of this.DSSP_MODE
Based on DSSP analysis (→ DSSPCALC), the code computes two order parameters to measure canonical secondary structure content. The Escore corresponds to the βcontent and the Hscore to the αcontent. they are systemwide quantities and are computed as follows:Escore = Efraction · ( HbondScore_E )^{1/n}
Hscore = Hfraction · ( HbondScore_H )^{1/n}
Here, Efraction and Hfraction are simply the fractions of residues which are assigned E or H according to DSSP. n is an arbitrary scaling exponent (see DSSP_EXP). HbondScore_E is a continuous variable which measures the mean quality of the hydrogen bonds forming the βsheets in the system and HbondScore_H is the analog for αhelices. In principle, all the hydrogen bond energies are collected and divided by the value for the same number of good hydrogen bonds (see DSSP_GOODHB). The quantity can be capped, however, based on the choice for DSSP_MODE:
 Every hydrogen bond can maximally contribute the value of DSSP_GOODHB. Therefore, HbondScore_X is always less than unity and only approaches unity if each and every relevant Hbond is at least as favorable as the cutoff given by DSSP_GOODHB. This is the most stringent score. The resultant Xscores will always be less or equal to the corresponding Xfractions.
 Every hydrogen bond can maximally contribute DSSP_MINHB which is always more negative than DSSP_GOODHB. The value of HbondScore_X, however, is capped to be at most unity. In this score, very strong Hbonds can compensate the effects of a few weak ones but the value for Xscore still is capped by the corresponding Xfraction.
 Every hydrogen bond can maximally contribute DSSP_MINHB. The value of HbondScore_X is not capped and can adopt values larger than unity. The Xscore is capped, however, to never exceed unity. This is the most lenient score and the only one in which Xscore can exceed the value of Xfraction.
DSSP_EXP
For the DSSP analysis in CAMPARI (→ DSSPCALC), this keyword choose the integer scaling exponent for the Hbond term in computing E and Hscores (see DSSP_MODE).DSSP_GOODHB
For the DSSP analysis in CAMPARI (→ DSSPCALC), this keyword defines the standard energy for a "good" hydrogen bond. This is used to evaluate the smoothed E and Hscores (see DSSP_MODE) and not part of the original DSSP standard. Permissible values lie between 1.0 and 4.0 kcal/mol.DSSP_MINHB
For DSSP analysis (→ DSSPCALC), this keyword specifies the minimal (= lowest possible = most favorable) energy for any hydrogen bond. Since the DSSPformula is based on inverse distances it is useful to introduce this lower cap such that conformations with steric overlap do not overly bias the analysis (for example in pdbanalyses → PDBANALYZE). Permissible values lie between 10.0 and 4.0 kcal/mol.DSSP_MAXHB
For DSSP analysis (→ DSSPCALC), this keyword allows the user to define the maximal (= highest possible = least favorable) energy fo any hydrogen bond. This is the fundamental cutoff for DSSP to consider Hbonds and therefore a very important quantity for the analysis to be meaningful. The recommended value is 0.5 kcal/mol but values between 1.0 and 0.0 kcal/mol are allowed.DSSP_CUT
For DSSP analysis (→ DSSPCALC), this keyword defines the distance cutoff applied to the C_{α}atoms of two peptide residues to consider them for hydrogen bonds. This can be relatively short (defaults to 10 Å) but the accuracy hinges on the choice for DSSP_MAXHB. Consistency has to be ensured by the user. Using a C_{α} cutoff for prescreening of residue pairs significantly reduces the computation time needed by the DSSP analysis.CONTACTCALC
This keyword specifies the interval how often to perform contact analysis, i.e., how often to get information about which and how many solute residues are close to each other. Such contacts are generally calculated according to two definitions in CAMPARI; by considering centerofmass distances and by considering minimum atomatom distances (both applied to pairs of residues). The output includes a map of average contact frequencies (CONTACTMAP.dat), histograms of contact numbers (CONTACT_HISTOGRAMS.dat), and a dependent analysis of solution structure by molecule (CLUSTERS.dat, MOLCLUSTERS.dat, and COOCLUSTERS.dat). The last analysis relies on an additional keyword: CLUSTERCALC. Note that these analyses are always restricted to residues of molecules tagged as solutes (→ "FMCSC_ANGRPFILE) in order to facilitate frequent contact analysis even if solute molecules are explicitly represented (which may be prohibitively expensive otherwise). When CAMPARI's shared memory (OpenMP) parallelization is in use, contact analysis is performed by a single thread while other threads perform other tasks concurrently (if there are any). Similar to DSSP analyses, the determination of spatial proximity patterns scales, at some level, poorly with system size and can thus become performancelimiting, which will similarly limit parallel efficiency.CLUSTERCALC
This keyword (along with CONTACTCALC) controls the computation frequency for solute cluster statistics (i.e., cluster sizes, cluster contact orders, and moleculeresolved cluster statistics) where a cluster is defined through the minimum atomatom distance contact definition (between any pair of residues). Note that this is the interval at which to perform cluster analysis from within the calculation of contacts (i.e., CLUSTERCALC is relative to CONTACTCALC, as SCATTERCALC is to RHCALC). The reason is that the cluster detection algorithm relies on the determination of contacts but that it may not always be a meaningful analysis to perform (see CLUSTERS.dat, MOLCLUSTERS.dat, and COOCLUSTERS.dat for further details on the output).CONTACTOFF
If contact analysis is requested (→ CONTACTCALC), this keyword defines a sequencespace offset to exclude neighboring residues from the analysis. For topologically connected systems (i.e., polymer chains) data for nearneighbor contacts such as i↔i+1 may be uninformative as they will always be in contact on account of the underlying topology. Note that the omission only applies to intramolecular contacts. Setting this to zero includes everything (even i↔i), and any larger integer lets the analysis start from this distance. The default here is zero, and there is rarely a reason to change it.CONTACTMIN
For contact and cluster analysis (→ CONTACTCALC), this keyword provides the threshold value for of a residueresidue contact in Å. Here, the threshold is applied to the minimum distance between any arbitrary pair of atoms formed by the two residues in question. This defaults to 5.0 Å. Note that this computationally more expensive definition has the advantage of rendering the contact probabilities more or less sizeindependent for polyatomic residues. In the presence of excluded volume interactions, monoatomic residues (ions) of different size will still yield contact statistics which include physically meaningless biases, however.CONTACTCOM
For contact and cluster analysis (→ CONTACTCALC), this keyword gives the alternative threshold value for a residueresidue contact in Å. Here, the threshold applies to the distance between the centers of mass of the two residue it question. It also defaults to 5.0 Å. Note that (in the presence of excluded volume interactions) contact probabilities obtained this way are by design dependent on the size of the interacting residues and results may be misleading if contact statistics between pairs of residues with highly variable size are compared.PCCALC
This keyword allows the user to specify how often to perform pair correlation analysis, i.e., get distance counts for a variety of intra and intermolecular distances and  in the case of intermolecular distances  proper normalization by the current volume element. It controls the computation frequency for three different classes of distance distributions: Generic intramolecular amideamide distributions covering various acceptordonor pairs, as well as a centroidcentroid distribution (→ AMIDES_PC.dat), only relevant for polypeptide systems.
 Generic intermolecular pair correlation functions for solutes (→ RBC_PC.dat), only relevant for systems with more than one solute. Note that this option can consume inordinate amounts of memory should a lot of different solute types be present. Workarounds consist of disabling this analysis or of using the analysis group feature to redeclare most of those as solvent molecule types and to use specific atomatom distributions instead.
 Specific atomatom distributions and/or pair correlation functions as defined through an index file provided by keyword PCCODEFILE (→ GENERAL_PC.dat).
When CAMPARI's shared memory (OpenMP) parallelization is in use, each of these 3 analyses is treated independently. This means that each one is executed by a single thread while other threads perform other tasks concurrently (if there are any).
DO_AMIDEPC
If pair correlation analysis is requested (→ PCCALC), this keyword enables the user to disable the computation of intramolecular amideamide distance distribution functions (→ AMIDES_PC.dat) by setting it to zero.PCBINSIZE
This keyword specifies the distance bin size in Å for pair correlation analysis (→ PCCALC).PCCODEFILE
This keyword specifies the path and filename to the input file for requesting specific pair correlation or distance distribution analyses (see FMCSC_PCCODEFILE). It is also possible to generate instantaneous traces for the selected distances with keyword INSTGPC. In general, the input is rather flexible and it is possible to pool many analogous or even unrelated atomatom distances under a certain code or to use unique codes for very specific requests. Upon successful parsing of the input and given that pair correlation analysis is globally requested (→ PCCALC), the output file GENERAL_PC.dat is created.GPCREPORT
This logical keyword instructs CAMPARI whether or not to write out a summary of the terms requested through FMCSC_PCCODEFILE (→ GENERAL_PC.idx). It is only available if distance distribution / pair correlation analysis is in use (→ PCCALC).INSTGPC
This keyword lets the user instruct CAMPARI how often to print out instantaneous values for all the specific distances selected via FMCSC_PCCODEFILE. Note that this does not include the generic distances CAMPARI analyzes, and consequently the keyword has no effect if no usable input has been provided via FMCSC_PCCODEFILE or of course if pair correlation analysis is not in use. This keyword is understood as a dependent frequency, i.e., a setting of 1 will print instantaneous values for every PCCALC^{th} step. Note that this feature is disabled by default and that the output in GENERAL_DIS.dat can easily become large.SAVCALC
This keyword specifies how often to compute (or record) solventaccessible volume (SAV) fractions and solvation states for the system. If the ABSINTH implicit solvent model is in use (→ SC_IMPSOLV), this analysis can rely on the current values for those quantities (no additional, computational cost); otherwise computing atomic SAV fractions incurs a moderate computational cost. The solventaccessible volume will globally depend on the choice for the thickness of the assumed solvation shell (→ SAVPROBE). The mapped solvation states as reported for individual atoms (please refer to the ABSINTH publication for details) will depend on further ABSINTH parameters. Some of these can be adjusted through patches, e.g., usersupplied values for overlap reduction factors.SAV analysis creates at most three output files; an instantaneous one (SAV.dat) that depends on auxiliary keyword INSTSAV, an atomresolved output file that reports simulation averages (→ SAV_BY_ATOM.dat), and finally a file containing distribution functions (histograms) for selected atoms for those quantities (→ SAV_HISTS.dat). The latter file is dependent on another auxiliary keyword, i.e., SAVATOMFILE. The instantaneous output is primarily useful as a diagnostic tool for the system while the simulation is running, and to be able to compute correlation functions, multidimensional histograms, etc. for quantities related to the solvation of specific sites on macromolecules. Please refer to the descriptions of the output files for further details.
When CAMPARI's shared memory (OpenMP) parallelization is in use, the SAV analysis is handled by a single thread while threads perform other tasks concurrently (if there are any). However, if the ABSINTH DMFI is turned on, the analysis task simply consists of recording the values. Conversely, the evaluation of the DMFI is threadassisted as usual.
INSTSAV
If analysis of solventaccessible volume fractions is requested (→ SAVCALC), this keyword allows the user to have a quantity related to the total SAV along with a running average being printed to a dedicated output file (→ SAV.dat). In addition, the values for SAV fractions for selected atoms (via SAVATOMFILE) are written out. The latter allows the construction of correlation functions, multidimensional histograms, etc. The keyword (positive integer) is interpreted as a printout frequency relative to the frequency with which SAV analysis is performed per se. This means that the effective printout frequency will be SAVCALC·INSTSAV. Depending on the choices, the resultant output file can easily become very large, and it is possible for significant I/O lag to occur because of this.SAVATOMFILE
If analysis of solventaccessible volume fractions is requested (→ SAVCALC), this keyword specifies the location and name of a simple input file (list of atomic indices, format is described elsewhere) that allows the user to select a subset of the system's atoms for creating histograms of both SAV fraction and resultant solvation state (see above). These histograms are written to a dedicated output file (→ SAV_HISTS.dat). In addition, if instantaneous output of SAVrelated quantities is requested (→ INSTSAV), the values for the SAV fractions for the selected atoms are written to the corresponding output file (SAV.dat). Note that instantaneous values for the SAV fractions allow manual computing (during postprocessing) of solvation states (using parameters set in the keyfile and/or reported in SAV_BY_ATOM.dat, and using the reference publication to retrieve the necessary expressions). It should be kept in mind that with normal settings for SAVPROBE, SAV fractions of nearby atoms are tightly coupled. This means for example that requesting information for atoms that are covalently bound will rarely yield additional information. Lastly, the binning for the histograms is fixed and uses 100 bins across the interval from zero to unity (both quantities are restricted to this interval).NUMCALC
This keyword is relevant only when the chosen thermodynamic ensemble allows for particle number fluctuations (simulation is performed in the (semi)grand canonical ensemble). It then specifies the number of Monte Carlo steps between successive accumulations of numberpresent histograms for each fluctuating particle type. For a description of the corresponding output file please refer to PARTICLENUMHIST.dat.COVCALC
This simple keyword instructs CAMPARI to collect raw data (signal trains) for select degrees of freedom in the system (currently this is restricted to all flexible dihedral angles → TRCV_xxx.tmp) every COVCALC steps. This is a nearobsolete functionality that has large practical and technical overlaps with the output written to FYC.dat via TOROUT. It was meant to provide intrinsic support for variance/covariance analyses, e.g., with the ultimate goal of performing dimensionality reduction. Given that merely raw data are provided and that dihedral angle data are generally circular (periodic) variables requiring the use of circular statistics (not as trivial as it may sound), usage of this facility is generally not recommended. This option is available in different modes (see COVMODE) and may eventually be revived or extended later. Note that CAMPARI can perform intrinsic principal component analysis (PCA) and timelagged independent component analysis (tICA) as part of the structural clustering facility (→ CCOLLECT and PCAMODE).COVMODE
This keywords chooses between (currently) two types of raw data to be provided by CAMPARI in output files TRCV_xxx.tmp. It can be set to: Internal degrees of freedom (i.e., torsions) directly in torsional space (radian)
 Internal degrees of freedom (i.e., torsions) expressed as their cosine and sine components
DIPCALC
This keyword specifies how often to compute molecular and residuewise dipole moments for netneutral molecules (or residues). Because the analysis relies on atomic partial charges, dipole analysis requires SC_POLAR to be set to a value larger than zero as charges are otherwise not assigned. The (somewhat preliminary) analysis produces output files MOLDIPOLES.dat and RESDIPOLES.dat. When CAMPARI's shared memory (OpenMP) parallelization is in use, these analyses are executed by a single thread while other threads perform other tasks concurrently (if there are any).EMCALC
This keyword specifies how often to compute spatial density distributions for the simulated system. If the density restraint potential is in use, this analysis is automatically performed at every step given that it is computed regardless. The result is an averaged density on a threedimensional grid of dimensions controlled generally by keywords EMDELTAS and SIZE. For nonperiodic boundaries, the evaluation grid is or can not be mapped to the system dimensions exactly, and keyword EMBUFFER becomes relevant. When using the density restraint potential the grid serves both the purpose of analysis as described here, and the purpose of evaluating the potential itself, which implies that it is an option to adopt the grid dimensions from the input density map. This is the default behavior for a cuboid system with 3D periodic boundary conditions when EMDELTAS is not provided.The resultant spatial density is that of a given atomic property selected by keyword EMPROPERTY. It is written to an output file in NetCDF format, an external library required to use this feature. The details of the file format CAMPARI use are described elsewhere. The spatial density is computed as follows:
ρ_{ijk} = ρ_{sol} + V_{ijk}^{1} Σ_{n}^{N} [ X_{n}  γ_{n}V_{n}ρ_{sol} ] Π_{d}^{3} B_{A} ( r_{n}^{d}  P_{ijk}^{d} )
Here, V_{ijk} is the volume of the grid cell with indices i, j, and k, N is the number of atoms in the system, X_{n} is the target property of the atom with index "n", V_{n} is that atom's volume, and r_{n}^{d} are the three components of its position vector. The parameter γ_{n} is a pairwise, volume overlap reduction factor that corrects atomic volume for overlap with covalently bound atoms. It is explained in some detail elsewhere. The parameter ρ_{sol} sets a physical background density for the property in question, and this is relevant when not all matter contributing to the property density in the system is represented explicitly. In such a case, an assumed vacuum would lead to severe errors. Note that atomic volumes and volume reduction factors are no longer relevant if ρ_{sol} is zero in the above equation. Finally, the product in the above equations utilizes cardinal Bspline functions of order "A", B_{A}, which are assumed centered at the center of each grid cell (vector P_{ijk} with components P_{ijk}^{d} for each dimension). This technique of distributing a property on a lattice is shared with the particlemesh Ewald method.
Like the corresponding density restraint potential, the accumulation of these data has been parallelized, i.e., when CAMPARI's shared memory (OpenMP) parallelization is in use, all threads work on this task synchronously. If the potential is turned on, there is no additional significant cost; otherwise the grid has to be incremented with the current configuration. Parallel efficiency can suffer in this mode of operation if subsequent snapshots have very different configuration (due to poor load balance). Parallel efficiency is generally poor if the lattices are large (in number of grid cells) relative to the number of atoms.
EMDELTAS
If the density restraint potential is not in use, but spatial density analysis is requested, this keyword is mandatory and sets the lattice cell size of the analysis grid by providing three floating point numbers corresponding to the lattice cell sizes in Å for the x, y, and z dimensions, respectively.Conversely, if the density restraint potential is in use, this keyword is optional and allows the user to set a lattice cell size different from the one used by the input density map. The keyword again requires the specification of three floating point numbers that set the lattice cell sizes in Å for the x, y, and z dimensions of the analysis and evaluation grid, respectively. Note that acceptable choices require that it be possible to superpose the cells of the input density map exactly with the analysis grid after reducing its resolution to that of the input map. Minor adjustments may be made automatically to system size and/or the origin of the input map. If, for example, in the xdimension the input map has 10 cells of width 2Å, and the evaluation grid has 26 cells of width 1Å, then the system origin has to be chosen such that the left boundary of the first cell of the input density aligns with the left boundary of the first, third, or fifth cell of the evaluation grid (but not any others). In the same example, CAMPARI would reject a system size of 25Å, because the resultant number of cells in the xdimension would not be divisible by the integer factor corresponding to the differences in resolution (here 2). It would also reject an origin aligning the first input cell to the seventh evaluation grid cell, because this would mean that the input map extends beyond the system boundaries. Finally, implied boundary conditions of the input map are not made to correspond to system boundary conditions automatically. For any periodic boundaries of the system, the evaluation grid is and must be fit exactly to the system dimensions.
EMPROPERTY
If spatial density analysis is requested, or if the density restraint potential is in use, this keyword lets the user pick an atomic property to be distributed on a lattice. If this is supposed to work as a density restraint, there are only two options available at the moment: Use atomic mass (resultant units are g/cm^{3})
 Use atomic number, i.e., proton mass (resultant units are also g/cm^{3} for convenience)
 Use atomic charge (resultant units are e/Å^{3})
Note that additional options may be made available in the future.
EMBGDENSITY
If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets an assumed background level for the atomic property in question. In general, the value should be zero if all relevant matter in the system is represented explicitly, i.e., if empty space is indeed meant to correspond to a vacuum. If not, the value should be given in appropriate units depending on the property the density is derived from. These are g/cm^{3} for mass and proton densities (atomic number), and e/Å^{3} for charge.EMBUFFER
If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets a ratio for how much to extend the evaluation grid for spatial densities beyond any nonperiodic boundaries of the system. In the direction of a nonperiodic boundary, CAMPARI takes the maximum dimension (e.g., the diameter of a sphere) and multiplies it with this factor to obtain the (approximate) size of the rectangular cuboid grid. Alignment with a potential input grid is achieved by shifting the origin of the evaluation grid slightly. Note that the behavior will generally be undefined for cases where solute material samples positions off the evaluation grid. It is up to the user to ensure that the buffer spacing is big enough for the stiffness of the boundaries to prevent this from happening.EMBSPLINE
If spatial density analysis is requested, or if the density restraint potential is in use, this keyword sets the order of Bsplines used to distribute the atomic property of interest on the lattice. This setting corresponds to parameter "A" in the equation above. Bsplines of order 3 or higher lead to functions with smooth derivatives, and are appropriate for gradientbased methods. Bsplines have finite support, and the cost per atom will increase with A^{3} for a threedimensional lattice. The limiting case of A being unity corresponds to a simple binning function, whereas for large A, a Gaussian function is recovered. The effective width does not grow linearly with A, but it is rather the tails of the functions that grow. This implies that very large values for A are probably not a useful investment of CPU time. Note that the effective width of the Bspline can be thought of as setting an inherent resolution or averaging scale for a given atom in question, since it replaces a point function with a distribution. The choice for this keyword should therefore be made in concert with the choice of formal grid resolution.DIFFRCALC
This keyword specifies how often to compute approximate fiber diffraction patterns for the whole system (excluding ghost particles in GC simulations → ENSEMBLE). The system is aligned according to an assumed fiber axis in the system (see DIFFRAXIS), and amorphous diffraction patterns using cylindrical coordinates (through FourierBessel transform) are computed. The code currently assumes atomic scattering cross sections which are proportional to atomic mass with the additional modification that all hydrogen atoms are excluded from the diffraction calculation. Specifically, the atomic scattering function for heavy atom i is proportional to m_{i}/m_{C} with a proportionality constant yielding units of the square root of scattering intensity. It is zero for hydrogen atoms. See DIFFRACTION.dat for more details. As a cautionary comment it should be noted that these calculations are somewhat untested and that output should be carefully examined. When CAMPARI's shared memory (OpenMP) parallelization is in use, the diffraction pattern is calculated by a single thread while other threads perform other tasks concurrently (if there are any). This is a limitation because of the high inherent cost of this analysis.DIFFRRMAX
For diffraction calculations (→ DIFFRCALC), this specifies the maximum number of bins in the reciprocal radial dimension (r in cylindrical coordinates). The resultant bins will be centered around zero.DIFFRZMAX
For diffraction calculations (→ DIFFRCALC), this specifies the maximum number of bins in the reciprocal axial dimension (z in cylindrical coordinates). The resultant bins will be centered around zero.DIFFRRRES
For diffraction calculations (→ DIFFRCALC), this gives the resolution in the reciprocal radial dimension (r in cylindrical coordinates) in Å^{1}.DIFFRZRES
For diffraction calculations (→ DIFFRCALC), this gives the resolution in the reciprocal axial dimension (z in cylindrical coordinates) in Å^{1}.DIFFRJMAX
This defines the maximum order of Bessel functions to use in the FourierBessel (Hankel) transform to generate the (fiber) diffraction pattern (→ DIFFRCALC). Note that the transform takes the product of actual and reciprocal radial coordinate as its argument. Hence, the maximum order will determine how meaningful the generated information for large values of inverse radial dimensions is. This soft cutoff will scale reciprocally with the size of the system in the radial dimension. These features arise due to the fact that Bessel functions of order n only contribute nonzero values beyond a (unitless) argument value of ca. n. Also note that the input file for the Bessel functions (see FMCSC_BESSELFILE) needs to provide the tabulated functions up to the necessary order.DIFFRAXIS
For diffraction calculations (→ DIFFRCALC), it is possible (and usually meaningful and necessary) to use a fixed system axis as the assumed fiber axis. This is (naturally) particularly appropriate for singlepoint calculations on specific structures. The axis' x, y, and z components have to be provided as three floating point numbers. The length of the vector is not important. The axis will pass through the point defined (see DIFFRAXON). If this keyword is not specified, the program will identify the longest possible atomatom distance in the system, and use the resultant axis. Note that this axis will not be constant with respect to the absolute (lab) coordinates, but that it is supposed to cover cases where changes in configuration are allowed (especially if rigidbody movement is permitted).DIFFRAXON
This keyword specifies the point the (constant) axis (see DIFFRAXIS) for diffraction analysis (→ DIFFRCALC) will pass through. This will define the zeropoint in the zcoordinate, and hence the origin of the cylindrical coordinate system. If this keyword is not provided, CAMPARI will assume the {0.0 0.0 0.0}point for this (independent of specifications for the system origin).REOLCALC
This keyword is only relevant in MPI replica exchange calculations (or parallel trajectory analysis runs using the same setup). It instructs CAMPARI to compute various overlap measures between the different Hamiltonians employed in the REMC/D run (see N_XXX_OVERLAP.dat). Note that this relies on the evaluation of the system energy at different conditions, i.e., Hamiltonians. Unless the only exchange dimension is temperature, CAMPARI makes the assumption that the energy has to be fully reevaluated for each condition, which means that there is a significant cost associated with the overlap calculation. Cutoffs and longrange corrections (see keywords CUTOFFMODE, LREL_MC, and LREL_MD) are always respected by these additional evaluations of cross (or foreign) energies. In dynamics runs, an additional complication arises if neighbor list updates are performed infrequently (see NBL_UP). Here, CAMPARI enforces an extra update of neighbor lists that is always outofsync with the schedule of the simulation propagation (this is for technical reasons). The unfortunate consequence is that for identical random seed, trajectories are not going to be identical if NBL_UP is greater than 1 and overlap calculations are performed with different frequencies.The user controls whether to calculate foreign energies across all replicas (see REOLALL). If only neighboring conditions are requested, output in N_XXX_OVERLAP.dat may be truncated or uninformative. It is important to mention that the MC branch of the energy functions is used only in plain REMC calculations, and that in all other cases (including hybrid methods → DYNAMICS) the dynamics branch is used. This is important since cutoff and longrange treatments can easily be inconsistent between the two (see LREL_MC and LREL_MD). Because the main cost of the overlap calculation is the evaluation of "foreign" energies, CAMPARI's shared memory (OpenMP) parallelization can employ its full parallelization scope for this task.
REOLINST
This keyword is only relevant in MPIReplica Exchange calculations (or parallel trajectory analysis runs using the same setup). This keyword requests instantaneous "foreign" energies to be written (see N_XXX_EVEC.dat). "Foreign" or "cross"energies are simply the energies of the current structure evaluated at Hamiltonians different from the one generating the ensemble. Note that the user controls whether to calculate foreign energies across all replicas (see REOLALL). If only neighboring conditions are requested, a truncated vector (length 2 or 3) is provided in N_XXX_EVEC.dat. To facilitate frequent overlap analysis with sparser instantaneous output, this keyword is interpreted as a subordinated frequency for REOLCALC (as SCATTERCALC is relative to RHCALC).REOLALL
This keyword is only relevant in MPIReplica Exchange calculations (or parallel trajectory analysis runs using the same setup). It is interpreted as a simple logical which determines whether "foreign" energies are computed over all other or just the neighboring replicas (see N_XXX_EVEC.dat and N_XXX_OVERLAP.dat).TRACEFILE
This optional keyword is relevant for the postprocessing of two types of parallel simulation runs. First, if a parallel trajectory analysis run in the RE setup is performed (→ details elsewhere), it allows the user to supply a file with a running map of replicas to starting conditions. Details of format and interpretation are given elsewhere. The default map assumed by CAMPARI is the identity mapping 1..REPLICAS. If a trace file is provided, sets of step number and an updated map for that specific step are read. This is primarily meant to make replica exchange trajectories that are continuous in condition (i.e., have conformational jumps in them) continuous in conformation (i.e., afterwards they have jumps in condition in them). In such a case, the trace file is the history of replica exchange moves such as output by CAMPARI itself. CAMPARI will then recombine information from the input trajectories according to the trace. This means that all analyses performed are on the unscrambled trajectory that can of course also be written (→ XYZOUT). Naturally, this keyword can also be used to specify any other map for other applications, e.g., to create trajectories for obtaining bootstraptype error estimates. The relation of step numbers in the trace file to frames in the trajectories is handled by keywords RE_TRAJOUT and RE_TRAJSKIP.Second, if a serial trajectory analysis run or a parallel trajectory analysis run using the MPI averaging framework is performed, it can be used to postprocess data from an parallel PIGS run. PIGS runs provide their own trace file. For the serial analysis, the trajectories from individual replicas must be concatenated in numerical order, otherwise they should be left as is. Unless the trace file itself is edited (its first column has the step number), keywords RE_TRAJTOTAL, RE_TRAJOUT, and RE_TRAJSKIP define the output settings for the original simulation run, and the settings must be matched exactly. For example, with 4 replicas, XYZOUT 50, NRSTEPS 1000, and EQUIL 500, each trajectory will have 10 snapshots. In trajectory analysis mode, the concatenated trajectory (40 snapshots) can then be supplied with settings of RE_TRAJTOTAL 10, RE_TRAJOUT 50, RE_TRAJSKIP 500, and NRSTEPS 40. Alternatively, the set of individual trajectories can be supplied to a parallel analysis run using NRSTEPS as 10 instead. The trace file is processed exclusively in the context of networkbased analyses (see CCOLLECT, CMODE, output file STRUCT_CLUSTERING.graphml, and so on). Reading in the PIGS trace accomplishes the automatic removal and addition of (conformational) network links incurred by the PIGS protocol. Overlapping functionality is provided by keywords TRAJBREAKSFILE and TRAJLINKSFILE, but these are only available in serial analysis mode. Note that a PIGS analysis run (see elsewhere for details) does of course not process the PIGS trace as it emulates the behavior of only a single PIGS stretch (reseeding interval).
RE_TRAJOUT
This keyword is relevant for some trajectory analysis runs. In particular, those runs relying on an input file with the reseeding/exchange history of a parallel simulation run need to translate the information about step numbers in this file to the analyzed data. This keyword therefore lets the user set the trajectory output frequency CAMPARI is supposed to assume for the supplied input trajectories (separate or concatenated) being analyzed. This is important because the trace is meant to use simulation step numbers that are not preserved in trajectory analysis mode (no step number or time information from input trajectories are read and used).If a parallel analysis run in the replica exchange setup run is performed, a successful unscrambling of the trajectories requires that the exchange trace is exhaustive at the level of the output frequency of this keyword. This means that it is sufficient to provide the current map of condition to starting structure for every snapshot in the input trajectories (more information can be supplied without harm, less information will lead to errors). In the replica exchange case, keyword RE_TRAJSKIP is also essential. If an analysis run on a PIGS data set is performed (possibly in parallel), the trace must contain information about all reseedings. Here, both keywords RE_TRAJSKIP and RE_TRAJTOTAL are processed as well.
Unlike in the cases outlined above, which are both related to processing a trace file, this keyword attains a different function in a parallel PIGS analysis run. Since such a run is supposed to emulate the PIGS heuristic, it serves as an output control setting to compute the step number as RE_TRAJTOTAL times RE_TRAJOUT, which is then printed to the output trace file.
RE_TRAJSKIP
This keyword is relevant for some trajectory analysis runs. In particular, those runs relying on an input file with the reseeding/exchange history of a parallel simulation run need to translate the information about step numbers in this file to the analyzed data. This keyword therefore lets the user set the equilibration period for trajectory output that CAMPARI is supposed to assume for the supplied input trajectories (separate or concatenated) being analyzed. This is important because the trace is meant to use simulation step numbers that are not preserved in trajectory analysis mode (no step number or time information from input trajectories are read and used).Both RE_TRAJOUT and this keyword are required for CAMPARI to correctly relate the frames in the trajectories to the step numbers in the trace file. Of course, it is also possible to edit the file with the trace to match the saved trajectory data exactly, and to then set RE_TRAJOUT and RE_TRAJSKIP to 1 and 0, respectively. In the case of a PIGS run being analyzed, keyword RE_TRAJTOTAL is also essential.
RE_TRAJTOTAL
If a serial trajectory analysis run or a parallel trajectory analysis run is performed and a file with the PIGS reseeding history (trace) has been provided, this keyword lets the user set the length in numbers of snapshots per replica that CAMPARI is supposed to assume for the trajectory input (→ elsewhere). In the serial case, this is usually NRSTEPS/REPLICAS whereas in the parallel case it is just NRSTEPS. This is important because the trace is meant to use simulation step numbers that are not preserved in trajectory analysis mode (no step number or time information from the input trajectory is read and used). RE_TRAJOUT, RE_TRAJSKIP, and this keyword are required for CAMPARI to correctly relate the frames in the trajectories to the step numbers in the trace file. Note that when using an input file with subsets of frames in randomaccess mode this keyword has to be adjusted to the actual number of selected frames per replica, which still has to be constant.Unlike in the processing a trace file, this keyword attains a different function in a parallel PIGS analysis run. Since such a run is supposed to emulate the PIGS heuristic, it serves as an output control setting to compute the step number as RE_TRAJOUT times RE_TRAJTOTAL, which is then printed to the output trace file.
CCOLLECT
This keyword controls the frequency with which a selected set of features (see CDISTANCE and CFILE) extracted from the trajectory data (typically in a trajectory analysis run → PDBANALYZE) is stored in a large array in memory for postprocessing. Such postprocessing currently consists of different algorithms (→ CMODE), for example to identify structural clusters in the data, and is performed after the last step of the run has completed. If CCOLLECT is set to something larger than the number of simulation steps (NRSTEPS), the clustering analysis is disabled (also the default).Various output will be produced aside from information written directly to standard out or the logfile. At the most basic level, the extracted features themselves, after the various preprocessing steps outlines below, can be written to disk in an optional output file (see CLUSTERING_FEATURES.nc and keyword CDUMP). The most common output file is a list of cluster annotations per analyzed snapshot (→ STRUCT_CLUSTERING.clu) that is produced along with a helper script for the visualization software VMD (→ STRUCT_CLUSTERING.vmd). Furthermore, CAMPARI will print a file representing the clustering as a graph in an xmlbased (socalled "graphml") format (→ STRUCT_CLUSTERING.graphml). Taken together these files allow further analyses of the clustering, primarily those that take advantage of the fact that the clustering yields a complex network/graph (e.g., cutbased free energy profiles using committor probabilities).
All clustering algorithms and also the progress index algorithm (→ CMODE) will write a number of diagnostic and reporting summaries to logoutput. For clustering algorithms, this includes a summary of the determined clusters (usually involving at least the number of contained snapshots and a measure of size) to logoutput. The exact progress index method is an exception as it does not explicitly record a clustering (the three aforementioned output files are missing). With any progress index method in use, at least one additional output file is obtained. This file is the essential requirement to create plots as in the progress index reference.
Note that structural clustering breaks the typical CAMPARI paradigm of "onthefly" analysis since the bulk of the CPU time for analysis will be invested only at the very end. Therefore, structural clustering will most often be used in trajectory analysis runs as it will be highly undesirable to risk an unclean termination of an actual simulation (certain algorithms for structural clustering require large amounts of memory and/or CPU time). Note as well that structural clustering should not be confused with the (much simpler) analysis of molecular clusters (see CLUSTERCALC and its corresponding output files). Because structural clustering and related analyses can be CPU timeintensive tasks, they are handled by CAMPARI's shared memory (OpenMP) parallelization, i.e., many algorithms are tackled by all threads at once. Most importantly, the treebased clustering and the approximate progress index method (options 4 and 5 to keyword CMODE) as well as iterative algorithms operating on derived graphs (see MAXTIME_ITERS for details) have been parallelized this way, Details are provided below, in particular for keyword CMODE.
A special remark is required for simulation runs using the MPI averaging technique. Similar to any use of the clustering functionality "onthefly", trajectory output should be generated in accordance with the setting for CCOLLECT (most easily by using MPIAVG_XYZ and a matching value of XYZOUT). This is so the clustering results can be annotated and understood at all. In an MPI averaging run, CAMPARI will then at each collection step gather data from all replicas and store them in an array allocated exclusively by the master process. The data arrangement is such that trajectories will be continuous and ordered by increasing replica number. The concatenation introduces spurious transitions that may affect subsequent computations. Data collection causes a synchronization and communication requirement absent in other types of MPI averaging calculations. At the end of the simulation, the resultant concatenated trajectory is analyzed exclusively by the master process, which  depending on settings and algorithms in use  may lead to severe imbalances in terms of both memory consumption and CPU time requirements. This should be kept in mind when using this approach across machines not sharing any memory. To enforce the complementary behavior of every identical replica analyzing its own trajectory, it is possible to use a fake replica exchange run by using a single dummy (or irrelevant) parameter for exchange. In a hybrid MPI/OpenMP calculation, the OpenMP layer on the master process performing the analysis will be limited to the number of threads granted initially even though other MPI processes residing on the same sharedmemory environment will be idle during this time. Note that the feature extraction itself, which is the only task performed during the run, does not benefit from threads parallelization except for the calculation of dynamic weights for options 2 or 4 for keyword CDISTANCE. Conversely, feature extraction in parallel does occur in an MPI averaging run as outlined above.
Because the chosen set of degrees of freedom often is a superset of an unknown subspace of particular interest to the user, CAMPARI offers two common routes for a dimensionality reduction. These rely on standard linear algebra techniques and are available if i) the chosen proximity metric is not circular (this excludes options 12 for CDISTANCE); ii) the code was linked to a linear algebra library (LAPACKcompliant, see installation instructions for general information on linking libraries); and iii) there are more samples than variables (degrees of freedom). The reason that circular (periodic) data are currently not supported is that the required measures of variance and in particular covariance become somewhat empirical and laborious to compute. If this type of transformation is performed (→ PCAMODE), CAMPARI produces up to two output files, one containing the eigenvectors themselves (PRINCIPAL_COMPONENTS.evs) and another optional one containing the data matrix in the transformed space (PRINCIPAL_COMPONENTS.dat). The latter can be used to derive probability or free energy surfaces in reduceddimensional spaces.
PCAMODE
If data for structural clustering are collected (→ CCOLLECT), this keyword instructs CAMPARI to calculate and perform a linear transformation on the collected data. As mentioned above, this option is not available for all measures of conformational distance. The linear algebra works straightforwardly for options 3 and 7 for keyword CDISTANCE and always involves centering the data first (subtraction of dimensionwise means). For options 4, 9, and 10 (local weights), the locally adaptive weights are averaged, and the input data are scaled by dimensionwise average weights. The same scaling idea is used for option 8 (global weights). Lastly, the possibility of alignment of 3D coordinates (options 5, 6, and 10 → CALIGN) causes additional complications. The general strategy here is to first align all snapshots to the last one (static alignment), which may or may not be provide a meaningful description.Five options are currently available:
 No transformation is performed
 Principal component analysis (PCA) is performed via singlevalue decomposition (SVD), and the eigenvectors of the covariance matrix are written to a dedicated output file. PCA works by identifying linear transforms of the centered data that collect maximal sample variance in as few components as possible. The principal components are normalized and orthogonal, i.e., have unit length and zero (linear) covariance. The latter should not be equated with a lack of correlation. Many nonlinear correlations between variables yield zero covariance. The amount of variance contained in the first few components can differ dramatically between data sets. The printed eigenvectors and eigenvalues are the only result of this analysis, i.e., the transform is not actually used.
 This is the same as the previous option with an important difference. Here, the original sample data are transformed and centered data in PCA space. The transformed data set is written to an additional output file. If keyword CREDUCEDIM is not zero, the original data are overwritten and lost, and any algorithm relying on conformational distance evaluations thereafter will treat these as the simplest case (CDISTANCE becomes 7). This is because the weighting or alignment requests were taken care of before. The benefit of using CREDUCEDIM is to be able to obtain a more informative representation in a space of reduced dimensionality in an unsupervised fashion.
 Time structurebased independent component analysis (tICA) is performed, which is based on original work from the 90s. tICA solves the matrix equation ΤF=ΣFΛ, where Σ is the covariance matrix, Τ is a timelagged and symmetrized covariance matrix (lag time is set by keyword CLAGTIME), F is the matrix of eigenvectors, and Λ is a diagonal matrix with eigenvalues, which correspond to the values of the autocorrelation function at the specified lag time for the transformed variables. Unlike in PCA, the eigenvectors do not form an orthonormal basis (rather, they satisfy F^{T}ΣF=I_{D}). This means that unlike PCA the transformed data do not preserve values of Euclidean distances between points even if the full dimensionality is used. As in option 2, the printed eigenvectors and eigenvalues are the only result of this analysis, i.e., the transform is not actually used.
 This is the same to option 4 as option 3 is to option 2, i.e., the original sample data are transformed to tICA space and centered with the aforementioned options, implications, and consequences.
CDISTANCE
If data for structural clustering are collected (→ CCOLLECT), this keyword defines what type of data to collect and how to define structural proximity. There are currently 10 supported options:
This option is tailored toward the intrinsic degrees of freedom of a typical CAMPARI simulation
that are also the essential internal degrees of freedom of most molecular systems, i.e. the molecules' dihedral angles.
The values {φ_{k}} for a set of K dihedral angles are collected throughout the run.
A list can be provided by using a dedicated input file (→ CFILE), otherwise most of
CAMPARI's internal degrees of freedom are used (excluding those pertaining to the conformation of fivemembered
rings). The details of the set of eligible dihedral angles are controllable by keyword
TMD_UNKMODE. More information can be found in the
description of the input file. The distance between two states is given as:
d_{l↔m} = [ (1.0/K) · Σ_{k}^{K} ( (φ_{k}^{l}  φ_{k}^{m}) mod 2π )^{2}]^{1/2}
Because dihedral angles are periodic (circular) quantities, a meaningful metric of proximity must account for boundary conditions, hence the "mod 2π" term. Dihedral anglesbased clustering poses  aside from periodicity  the challenge that all considered degrees of freedom are bounded and that the strongest contribution to the signal will come from those torsions with large variance, which unfortunately are often the ones of least interest (for example sidechain torsions). Therefore, a careful selection of the subset to use is critical for an informative clustering. Like any other method, dihedral anglebased clustering is vulnerable to Euclidean distances in highdimensional spaces becoming uninformative. Note that all dihedral anglebased proximity criteria are useful primarily for single molecules since relative intermolecular orientations are not representable whatsoever. 
This is identical to the previous option only that each dihedral angle is also associated with a locally adaptive weight. Adaptive weights
are those that change from snapshot to snapshot. Initially, the weights for this representation are set to the effective masses (the
associated diagonal element in the mass matrix, i.e., massmetric tensor) of a given dihedral angle. Evaluating a distance
requires combining these adaptive weights for 2 respective snapshots l and m, e.g. w_{k}^{lm} = f(IM_{k}^{l},IM_{k}^{m}).
The distance between two states will then be given as:
d_{l↔m} = [ (Σ_{k}^{K} w_{k}^{lm} ) ^{1} · Σ_{k}^{K} w_{k}^{lm} · ( (φ_{k}^{l}  φ_{k}^{m}) mod 2π )^{2}]^{1/2}
The actual values for the weights for individual snapshots, e.g., IM_{k}^{l}, can be altered using keyword CMODWEIGHTS. The function for combining the weights to yield w_{k}^{lm} is selected with the help of keyword CWCOMBINATION. Adaptive weights are generally normalized per snapshot (such that Σ_{k}^{K} IM_{k}^{l} evaluates to 1.0 for all l). This is different from what is described in 2 reference publications (see here and here). Any type of weighting scheme (static or adaptive) can be used to remedy the problem with the previous one regarding the impact of "uninteresting" degrees of freedom. The weighting with the effective masses ensures that slow degrees of freedom (e.g. central backbone torsions) will contribute much more to the overall signal than sidechain torsions. This effect becomes exacerbated for long chains. There are two additional caveats. First, the initial mass matrixbased weights are affected by the choice for ALIGN. Second, Dihedral angles describing disulfide bonds are supported but the presence of disulfide bonds destroys the notion of the effective masses (see CRLK_MODE for some background). The default weights for the C_{α}C_{β}SS and C_{β}SSC_{β} torsions are simply set to 1.0. This means that a meaningful use of this option while selecting disulfide bonds as part of the representation requires setting CMODWEIGHTS to something other than 0. 
This option is largely identical to option 1. It carries all the same caveats with the exception of the periodicity
of dihedral angles. Here, we expand each dihedral angle into its sine and cosine terms to construct a distance metric as follows:
d_{l↔m} = [ (0.5/K) · Σ_{k}^{K} (sin(φ_{k}^{l})  sin(φ_{k}^{m}))^{2} + (cos(φ_{k}^{l})  cos(φ_{k}^{m}))^{2}]^{1/2}
Note that the sine and cosine terms of the same angle are nonlinearly but strictly correlated. This has consequences for the interpretation of dimensionality in this representation. 
This is the analogous modification of the previous option by introducing locally adaptive weights that are initially composed from the effective masses
and can be altered by keyword CMODWEIGHTS:
d_{l↔m} = [ 0.5 (Σ_{k}^{K} w_{k}^{lm}) ^{1} · Σ_{k}^{K} w_{k}^{lm} · ( (sin(φ_{k}^{l})  sin(φ_{k}^{m}) )^{2} + (cos(φ_{k}^{l})  cos(φ_{k}^{m}) )^{2} ) ]^{1/2}
Note that this implies the presence of only a single weight per pair of Fourier terms. 
This option is probably the most commonly used variant, the positional RMSD. The
Cartesian position vectors {r_{k}} for a set of K atoms
are collected throughout the run. A list can be provided by using a dedicated input file
(→ CFILE), otherwise all atoms in the system are used.
The distance between two states is then given as:
d_{l↔m} = [ (1.0/K) · Σ_{k}^{K} ( r_{k}^{l}  RoTr(r_{k}^{m}) )^{2}]^{1/2}
Here, RoTr is meant to indicate rotation and translation operators that superpose the {r_{k}}^{m} optimally with the frame provided by the {r_{k}}^{l}. This alignment uses the same quaternionbased algorithm mentioned elsewhere. Superposition (alignment) implies that the atomic RMSD is not necessarily a bona fide metric of distance as it is not guaranteed to satisfy d_{l↔m} ≤ d_{l↔p} + d_{p↔m}, i.e., the triangle inequality. This is because the operator RoTr is different for computing d_{p↔m} than it is for computing the other two distances. In reality, for similar structures, this is never really a problem in the context of clustering. RMSDbased clustering is  like any other method  vulnerable to Euclidean distances in highdimensional spaces becoming uninformative and  in particular  to obscuring of the signal by uneven variances (a reason why very commonly terminal parts of polymer are excluded from such analyses). The alignment step for both this and the next option can be disabled with the help of keyword CALIGN (RoTR is then simply the identity operator). Without alignment external degrees of freedom become part of the distance criterion. The coordinatebased RMSD is generally difficult to use for sets of atoms spanning multiple molecules since intermolecular motion can easily provide most of the variance in the signal. In periodic boundary conditions, there is a particular difficulty of which image of a molecule to use. Keyword XYZ_REFMOL is supported in this context and can be use to circumvent this problem (although it should be kept in mind that there is no unique solution for assemblies of more than 2 molecules). 
This is similar to the previous option, and is only relevant if alignment is performed.
Then, this option allows the user to split the atomic index sets used for alignment and distance computation, i.e., the
alignment operator, RoTr, minimizes pairwise distances computed over an independent set of atoms that can either be a superset,
subset or completely different set of atoms than the one specified via CFILE.
Then, if we term the distance set {D} and the alignment set {A}, with {A} to be provided via
ALIGNFILE, the distance between two states will be given as:
d_{l↔m} = [ (1.0/D) · Σ_{d}^{D} ( r_{d}^{l}  RoTr_{{A}}(r_{d}^{m}) )^{2}]^{1/2}
Note that choosing disparate sets can easily destroy the fundamental meaning of alignment, i.e., the removal of differences caused purely by external (rigidbody) degrees of freedom. This in turn would almost certainly lead to violations of the assumption that members of different clusters are dissimilar, and can also eliminate the notion of similarity amongst members of the same cluster. Conversely, it can be useful in improving the signaltonoise ratio for cases where one is interested in states populated by a specific part of a much larger system that moves as a single entity (specifically, states characterized by relative arrangements of parts of a system may emerge more clearly if alignment is performed on the whole entity, but distances are computed only over a small portion of interest). Note that errors in calculations relying on mean cluster properties computed for example in the treebased algorithm or hierarchical clustering (→ CMODE) using mean linkage can easily become large if the two atom sets have little overlap. Specifically, a cluster of similar snapshots as determined by the distance set, which is constituted by elements with large differences in the alignment set, will produce deteriorating accuracy of, for example, computing a snapshot's mean distance to it. This is because the heterogeneity of the alignment operator is masked by the simplified algebra used to compute these properties in constant time. The general caveats for RMSDbased clustering mentioned for option 5 above remain relevant as well. 
Let us define a set of K interatomic distances, {r_{ij}} over unique atom pairs i and j. These distances
are collected throughout the run. A list can be provided by using a dedicated input file
(→ CFILE), otherwise a subset of randomly selected but unique interatomic distances
is used. The number of randomly selected degrees of freedom is usually set to 3N where N is the number
of atoms (it can be smaller for small N). Because the {r_{ij}} are geometric distances, they are also positive
and potentially large. CAMPARI allows a functional transform to be applied during data collection
to, which, generally speaking, allows focusing the sensitivity to particular distance regimes. If we consider this transformed
set of distances, {f(r_{ij})} (f(x) can be the identity function of course, which is also the default),
the distance between two states will then be given as:
d_{l↔m} = [ (1.0/K) · Σ_{k}^{K} [ f(r_{ij(k)}^{l})  f(r_{ij(k)}^{m}) ]^{2}]^{1/2}
I.e., the chosen distance metric is simply the root mean square deviation across the set of transformed interatomic distances. Distancebased clustering inherently removes external degrees of freedom from the proximity measure, and it is therefore suitable to most applications. As with any other measure, Euclidean distances in highdimensional spaces may become uninformative and results may be obscured by uneven variances. 
This is identical to the previous option only that each distance, which is potentially transformed,
is subjected to a static weight. This weight is computed initially from the combined mass of
the constituting atoms. The distance between two states would then be given as:
d_{l↔m} = [ (Σ_{k}^{K} (m_{i(k)}+m_{j(k)}) ) ^{1} · Σ_{k}^{K} (m_{i(k)}+m_{j(k)}) · [ f(r_{ij(k)}^{l})  f(r_{ij(k)}^{m}) ]^{2}]^{1/2}
Here, m_{i} denotes the mass of atom i, and f(x) is the function used for transformation (defaults to the identity function). These (static) weights are not particularly useful in the default form but can be altered by changing masses, e.g., by a suitable patch, or by means of the dedicated facility (keyword CMODWEIGHTS). They are normalized such that similar distance thresholds can be used as in the unweighted case. 
This is identical to option 7 above only that each interatomic distance, which is potentially transformed,
is subjected to a locally adaptive weight (as in options 2 and 4 above).
These weights increase the corresponding memory demands by a factor of 2 and are all initialized to be unity.
It is necessary to use the dedicated facility (keyword CMODWEIGHTS) to make
them meaningful. All localized weights available for interatomic distances require at least a window size parameter
and a rule for how to combine weights from different snapshots. The latter is expressed as function g(x1,x2) specified by keyword CWCOMBINATION.
The resultant functional form for pairwise distance between snapshots is:
d_{l↔m} = [ (Σ_{k}^{K} g(Ω_{k}^{l},Ω_{k}^{m}) ) ^{1} · Σ_{k}^{K} g(Ω_{k}^{l},Ω_{k}^{m}) · ( [f(r_{ij(k)}^{l})  f(r_{ij(k)}^{m}) ]^{2} ) ]^{1/2}
Here, Ω_{k}^{l} is the locally adaptive weight for the k^{th} feature and the l^{th} snapshot, and f(x) is the function used for transformation (defaults to the identity function). The same general caveats apply as for options 2 and 4 above. In particular, it is important to reemphasize that all locally adaptive weights are now normalized per snapshot in contrast to the descriptions found in the literature (see here and here). 
This is similar to options 5 and 9 above. Here, each of the 3K Cartesian coordinates, X, of a system of K selected atoms is subjected to a separate, locally adaptive weight.
Due to the presence of these weights, pairwise alignment is currently not
supported for this option. CAMPARI computes the Euclidean distance between snapshots, which means that any type of input data
can be analyzed straightforwardly by transcribing the data set into a fake trajectory of atoms with each Cartesian coordinate
corresponding to an input data dimension.
The locally adaptive weights increase the corresponding memory demands by a factor of 2 and are all initialized to be unity.
It is necessary to use the dedicated facility (keyword CMODWEIGHTS) to make
them meaningful. As for option 9, weights require at least a window size parameter
and a rule for how to combine them for different snapshots. The latter is expressed as function g(x1,x2) specified by keyword CWCOMBINATION.
The resultant functional form for pairwise distance between snapshots is:
d_{l↔m} = [ 3(Σ_{k}^{3K} g(Ω_{k}^{l},Ω_{k}^{m}) ) ^{1} · Σ_{k}^{3K} g(Ω_{k}^{l},Ω_{k}^{m}) · ( (X_{(k)}^{l}  X_{(k)}^{m} )^{2} ) ]^{1/2}
For the weighting aspect, the same caveats apply as for options 2, 4, and 9 above. Due to the distance definition relying on absolute coordinates, the caveats mentioned for option 5, which relate to atoms sets encompassing multiple molecules, remain relevant as well.
CFILE
If data for structural clustering or related analyses are to be collected (→ CCOLLECT), this keyword provides the path and location to an input file selecting a subset of the possible coordinates. For options 14 of the proximity measure, this file is a single column list of indices specifying specific system torsions (see elsewhere). For options 5, 6, and 10 it is a single column list of atomic indices (see elsewhere). Lastly, for options 79, it is a list of pairs of atomic indices (two columns, see elsewhere). The keyword can take on an additional meaning if instantaneous output of RMSD values is requested through ALIGNCALC. In this context, CFILE specifies an atomic index set as for option 6 of CDISTANCE.CALIGN
If structural clustering is performed (→ CCOLLECT), and an atomic RMSD variant is chosen as the proximity measure (→ CDISTANCE), this keyword can be used to specifically disable the alignment step that occurs before the actual RMSD of the two coordinate sets is computed. To achieve this, provide any value other than 1 (the default) for this on/offtype keyword. Note that alignment must be disabled for option 10 to be available for CDISTANCE.CDISTRANSFORM
If structural clustering is performed (→ CCOLLECT), and the raw features are interatomic distances (CDISTANCE is 7, 8, or 9), this keyword can be used to specify a function that stores a transformation of the interatomic distance as the feature (this is the function f(x) in the description above). Options are as follows: f(x) = x: This is the identity function and the default.
 f(x) = 1.0  [1.0 + exp((xχ)/τ)]^{1}: This is a sigmoidal function decaying from a maximum value of 1.0 to 0.0 with increasing distance. The step is centered at χ and the sharpness is given by τ. Vanishing values for τ give a transform that is equivalent to a contact map transform. For larger values of τ, the values of f(x) when x is small will increasingly deviate from 1.0 (be smaller).
 f(x) = [x + r_{buf}]^{1.0/hexp}: This is a hyperbolic function. At an interatomic distance of 1.0r_{buf}, the value is always 1.0. Smaller values will diverge, which should be avoided by choosing at least 1.0 for this parameter. At larger distances the values are very small and approach 0.0 asymptotically. The rate of approach is controlled by h_{exp}, and smaller values give a faster approach. Note that the socalled DRID metric (reference) fundamentally relies on hyperbolic transforms where r_{buf} is 0.0 and h_{exp} is 1, 1/2, or 1/3.
 f(x) = 1.0 if x > r_{cut} and f(x) = sin(x·0.5π/r_{cut}) otherwise: This is a piecewise function that for small distances resembles the identify function (with an effective scale) before tapering off to a constant value. The point where the function becomes 1.0 exactly is r_{cut}.
CDISTRANS_P1
If structural clustering is performed (→ CCOLLECT), the raw features are interatomic distances (CDISTANCE is 7, 8, or 9), and a transform other than the identity function is used, this keyword sets a shift parameter. For the sigmoidal function, this is the parameter χ, for the hyperbolic function, it is the parameter r_{buf}, and for the sine transform, it is the parameter r_{cut}. The equations are given above. The value is to be provided in Å and can be zero or positive.CDISTRANS_P2
If structural clustering is performed (→ CCOLLECT), the raw features are interatomic distances (CDISTANCE is 7, 8, or 9), and either a sigmoidal or a hyperbolic transform is used, this keyword sets a scale (or width) parameter. For the sigmoidal function, this is the parameter τ and for the hyperbolic function it is the parameter h_{exp}. The equations are given above. The value is either to be given in Å (sigmoidal) or unitless (hyperbolic) and must be positive.CWCOMBINATION
If data for structural clustering or related analyses are collected (→ CCOLLECT) and locally adaptive weights are in use, this keyword sets the function to be used for combining locally adaptive weights from different snapshots. This is relevant for options 2, 4, 9, and 10 for CDISTANCE. The input is interpreted identically to that for keyword ISQM, i.e., values of 1, 0, and 1 give harmonic, geometric, and arithmetic means, respectively. Values outside of this range can be expected to degrade performance due to expensive powers being evaluated. Special options avoiding most arithmetics altogether simply use the smaller or larger of the two values. In reality, these correspond to the limits of negative and positive infinity, and they are available by selecting 999 and 999, respectively.CPREPMODE
If data for structural clustering or related analyses are collected (→ CCOLLECT), this keyword offers the user a choice to perform simple data preprocessing operations. Specific options are as follows and are all applied independently for all data dimensions: The data are untouched.
 The data are centered (subtraction of the means).
 The data are centered and scaled by the inverse standard deviation. The resultant data are often referred to as standard or Zscores.
 The data are smoothed by cardinal Bsplines of specified order. This operation scales linearly with this order, and it is therefore computationally wasteful to specify very large values (the long tails of the polynomial functions contribute little to the smoothing). Note that virtually no result obtainable from these data is preserved upon smoothing (except the mean), which means that results may become difficult to interpret.
 The data are centered and smoothed.
 The data are converted to Zscores and then smoothed.
CSMOOTHORDER
If data for structural clustering or related analyses are collected (→ CCOLLECT), data smoothing may be in use. It is enabled by certain choices for keywords CMODWEIGHTS and CPREPMODE. Smoothing currently relies on cardinal Bsplines, and this keyword lets the user specify the order of these functions. Cardinal Bsplines are also used elsewhere (keywords BSPLINE for the PME method and EMBSPLINE for structural density restraints), but the keywords are completely independent.CMODWEIGHTS
If data for structural clustering or related analyses are collected (→ CCOLLECT), and either static or locally adaptive weights are in use (options 2, 4, 810 for CDISTANCE), it is possible to override the default weights with dataderived information obtained in postprocessing. This is required for options 9 and 10 to be meaningful (the locally adaptive weights for these cases are all initialized to be 1.0). Depending on the chosen option, additional parameters may be required. A detailed list is as follows: This leaves all weights unchanged.
 This option computes local estimates of the root mean square fluctuation (RMSF) and takes the inverse as the resultant, locally adaptive weight. The window size is chosen by the user. The definition of "local" by proximity in the trajectory itself implies that the data are ordered, usually along a time or similar progress axis. Note that this option is invariant only to data translation (centering). The windowed MSF are computed using an incremental algorithm that has constant cost with window size.
 This replaces weights with weights derived from the autocorrelation function (ACF) evaluated at fixed lag time. The weights are static, i.e., they can be understood as a prescaling of the data. For dimensions with a negative ACF at the chosen lag time, the weight is explicitly adjusted to zero, which means that the effective dimensionality can be reduced considerably. As second moments, ACF values are noisy and generally more reliable at short lag time. For options 2 and 4 for CDISTANCE, the resultant weight is always the larger of the two obtained for sine and cosine terms. The ACF is invariant under data translation and global scaling operations.
 This option computes a composite weight by taking the square root of the product of the ACF at fixed lag time (as for option 2) and the inverse RMSF over a window of specified size (as for option 2).
 This option defines locally adaptive weights based on crossings of the global mean. Specifically, for each dimension, the global data mean is computed. Over a window of a userdefined size, it is then counted how many times the value of that dimension crosses the mean. Each data point receives a weight based on a window centered at this point in terms of the trajectory. The definition of "local" by proximity in the trajectory itself implies that the data are ordered, which is most often but not necessarily by time. Because it is possible that the count is zero, the resultant, locally adaptive weights are computed as (n_{cross}+a)^{1}, where "n_{cross}" is the aforementioned number of crossings of the global mean and "a" is a userdefined buffer parameter (see keyword CTRANSBUF). For options 2 and 4 for CDISTANCE, the resultant weight is always the larger of the two obtained for sine and cosine terms. The idea behind this type of weight is to deemphasize data dimensions sampling roughly symmetric distributions with a single peak and to emphasize data dimensions sampling multimodal distributions with locally small variance. False negatives can be produced if the global mean happens to coincide with one of the peaks of a multimodal distribution. These weights are exceptionally simple, can be computed efficiently and with high accuracy for large data sets, and require no additional parameters beyond the window size. They are also invariant for data translation and global scaling.
 This option is the same as the previous one (#4) except that the data are smoothed for the purpose of generating weights. This leaves the original data untouched, i.e., it does not imply data smoothing in general (see CPREPMODE for the latter). The smoothing entails an additional parameter, viz., CSMOOTHORDER.
 This option is a combinations of options #2 and #4. The final, locally adaptive weights correspond to the square root of the static weights derived from the ACF at fixed lag time and the weights derived from crossings of the global means within windows of userdefined size.
 This option is the same as the previous one (#6) except that the data are smoothed for the purpose of generating the local component of the weights (based on crossings of the mean). This does not imply data smoothing in general. The smoothing entails an additional parameter, viz., CSMOOTHORDER.
 Similar to option #4, this defines locally adaptive weights based on counting crossings. Here, a histogram is created for each data dimension (fixed number of 100 bins). From the histogram, CAMPARI automatically locates minima in the histogram (at least 3 bins to either side have to have larger counts). Over a window of userdefined size, crossings of any of these minima are counted, and the weight is constructed as w_{max}/(n_{cross}+1). Here, w_{max} is an adjusting weight. Each minimum splits the data into two fractions of unequal size, and w_{max} is the maximum of the smaller fractional populations across all minima. If no minima are found, this option reverts to option #4 for the dimension in question. The histogram construction and minima parsing mask many parameters that cannot be controlled by the user at the moment. Histograms are constructed in a way that makes these weights invariant for shifted and scaled data. This option is marred primarily by the lack of both robustness and significance of the minima detection procedure. The meaning of keyword CTRANSBUF is preserved in exactly the same way as for option #4.
 This option is the same as the previous one (#8) except that the data are smoothed for the purpose of generating weights. This leaves the original data untouched, i.e., it does not imply data smoothing in general. The smoothing entails an additional parameter, viz., CSMOOTHORDER.
CWINDOWSIZE
If data for structural clustering or related analyses are collected (→ CCOLLECT), and certain types of locally adaptive weights are in use, this keyword sets the window size (in numbers of snapshots) from which to obtain the weight. Each snapshot is given a weight derived from data in a window centered around that point. This makes sense primarily if the data are in a specific order, most often they are assumed to be sorted by time. Points toward the beginning (or end) of the data set all obtain the same weight as the first (or last) snapshot to have access to a complete window. This implies that windows should generally be much smaller than the data set length (they can at most extend to half the data set length). This keyword is relevant for locally adaptive weights based on variances and transition counts that can be selected via CMODWEIGHTS.CLAGTIME
If data for structural clustering or related analyses are collected (→ CCOLLECT), the autocorrelation function (ACF) at fixed lag time can play a role, and this lag time is set by CLAGTIME. This is relevant if either static or locally adaptive weights are in use (options 2, 4, 810 for CDISTANCE), or if time structurebased independent component analysis (tICA) is performed (see PCAMODE). This keyword sets the time (in numbers of snapshots) to be used for this purpose.In the case of weighted distance functions, the ACF is evaluated for each dimension independently and assumes a single, generating process:
ACF(τ) = [ Σ^{Nτ}(X_{(k)}(n)μ_{(k)})(X_{(k)}(n+τ)μ_{(k)}) ] / [σ_{(k)}^{2}(Nτ)]
Here, the global data mean and variance, μ_{(k)} and σ_{(k)}^{2}, are estimated directly from the data for each dimension. Note that fewer data are available for large τ. Importantly, negative values for the ACF are all set exactly to zero meaning that these data dimensions are eliminated from distance evaluations. When applied to dihedral angles (options 2 or 4), the ACF is always evaluated separately for sine and cosine terms to avoid ambiguous definitions of variance for circular variables. The weight is then set to the larger of the two values. In case of tICA, the ACF features as a timelagged covariance matrix that is computed for simple, centered data (no circular variables, no pairwise alignment, no locally adaptive weights). No corrections and truncations are applied to this matrix.
CREDUCEDIM
If data for structural clustering are collected (→ CCOLLECT), and a linear transformation is computed and applied (→ PCAMODE), this keyword allows the user to elect to run all further postprocessing (→ CMODE) on a data set of reduced dimensionality that corresponds to the first N_{V} data vectors in the transformed space, where N_{V} is set by the choice for this keyword. The components are sorted from largest to smallest eigenvalues such that the maximum amount of variance (PCAMODE is 3) or autocorrelation (PCAMODE is 5) is included.Note that the transformed data are interpreted as simple, aperiodic signals, i.e., none of the peculiarities for different choices of CDISTANCE are considered any longer (CAMPARI internally converts everything to CDISTANCE being 7, which may lead to confusing output regarding units, etc). Specifically, for options 4, 9, and 10 for CDISTANCE, the underlying locally adaptive weights are averaged, and the data are prescaled by these averages. This means that use of this keyword for those cases changes more than just the dimensionality. Similarly, for options 5 and 6, if alignment is requested, this alignment is performed as a preprocessing step, and the last snapshot of the data is used as reference. Furthermore, for option 6, only the atom set chosen for distance evaluations is retained, and this is the set to eliminate further dimensions from with the help of this keyword. Note that Euclidean distances are invariant for the fulldimensional transformed data set relative to the original data set in PCA (PCAMODE is 3) but not in tICA (PCAMODE is 5). This of course applies only to the linear transformation and not to any possible preprocessing operations.
If no linear transform is computed, or if the choice for PCAMODE implies that the data transform is not actually computed, this keyword can be used to simply discard dimensions at the end of the internal list of dimensions. This is supported for specialized applications and should not be used unless absolutely needed (use CFILE to control dimensionality precisely). This option does not work with any distance measure requiring alignment. In all cases, if CREDUCEDIM is not specified or set to too large a value, data processing will proceed with the original data and the original size. If linear transforms have been computed, the transformed data are simply written to output file PRINCIPAL_COMPONENTS.dat but not used otherwise.
CTRANSBUF
If data for structural clustering or related analyses are collected (→ CCOLLECT), and weights are in use (options 2, 4, 810 for CDISTANCE), it is possible to use alter the definition of weights relying on counts of crossings (see option #4 and following for CMODWEIGHTS). In a general functional form of w = (n+a)^{1}, the offset or buffer parameter a is set by this keyword. The default value is 1. Large values will lead to weights with low sensitivity. The limit of CTRANSBUF approaching zero will lead to cases with n=0 receiving all the weight, which is not generally useful.CDUMP
If data for structural clustering or related analyses are collected (→ CCOLLECT), this simple logical keyword instructs CAMPARI to write the preprocessed set of features along with possible weights and after eventual dimensionality reduction to a binary output file. This file can later be reused in CAMPARI's data mining mode.CMODE
If data for structural clustering are to be collected (→ CCOLLECT), this keyword allows the user the specify the algorithm by which the accumulated data are to be clustered. Before going into detailed options, a few general words are in order: CAMPARI strives to allow the geometric and other net quantities of a collection of snapshots to be computable irrespective of which metric of proximity is chosen (→ CDISTANCE). For options 3 and 7 this is trivial. For option 1, periodicity has to be accounted for. This is solved approximately by i) making sure the proper image of an added snapshot is considered, ii) adding appropriate periodic shifts to the geometric center increments each time a boundary violation is found after updating. Other transforms are corrected accordingly. Options 2 and 4 incur the use of several additional cluster sums (means) due to the dynamic weights (for details, the reader is referred to the source code in clustering_utils.f90: key subroutines are "clustering_distance" or "cluster_addsnap"). For option 5 (atomic coordinate RMSD), the first member of a cluster defines a reference frame. This frame is used for alignment of all subsequently added frames (therefore, the definition and all derived quantities are approximate although the error is usually small for small clusters). Option 6 is a generalization of this allowing for a split of the set of coordinates into an alignment and a distance subset. Option 8 is a massweighted equivalent of option 7. Options 9 and 10 are extensions of 7 and 5 to use dynamic weights as for options 2 and 4, which again requires maintaining additional cluster sums.
 With the geometric center being defined, certain properties of a cluster are computable at constant cost with respect
to cluster size. For example, the square of the average distance from the center ("radius") is, in the simplest case, given as:
R^{2} = D^{1} N^{2} · [ N · Σ_{k}^{N} x_{k}^{2} 
Σ_{k}^{N}x_{k} · Σ_{k}^{N}x_{k}]
x_{k} denotes the coordinate vector belonging to the k^{th} member of the cluster, D is the number of coordinates, and N is the number of members of the cluster. Other properties such as the mean snapshottosnapshot distance ("diameter") are similarly available. All that is required is that each cluster accumulates the necessary cluster sums.
The currently implemented options for CMODE are as follows (the short discussion above applies to all of them except option 4 when CPROGINDMODE is 1):
 The data are clustered according to the leader algorithm. This is a very simple algorithm that sequentially scans the data. Each new snapshot is compared to the center snapshots of preexisting clusters and added to the first one for which a provided distance threshold is satisfied (→ CRADIUS). If no such cluster is found, a new cluster is spawned. Results will be input order dependent and clusters will have illdefined "centers" since the central snapshot is set at the time the cluster is spawned and remains unchanged. Processing direction(s) can be chosen with the auxiliary keyword CLEADER. The leader algorithm has not been parallelized and is executed by a single thread when CAMPARI's shared memory (OpenMP) parallelization is in use, It generally offers no obvious benefit over the treebased algorithm below (option 5). The performance of the leader algorithm deteriorates with decreasing threshold because the number of spawned clusters will eventually become significant relative to the number of snapshots.
 The data are clustered according to a modified leader algorithm. This works very similarly to the standard leader algorithm with two important modifications. First, each new snapshot is compared to the current geometric center of preexisting clusters to evaluate the threshold criterion. Second, the result is (optionally → CREFINE) postprocessed and snapshots belonging to smaller clusters that would also satisfy the threshold criterion for a larger cluster are transferred to that larger cluster. There are exactly two passes over the data of this refinement step (iteration is difficult and timeconsuming due to continuously changing cluster centers). Processing direction(s) can be chosen with the auxiliary keyword CLEADER and the threshold criterion is set via CRADIUS. Modified leaderbased clustering tends to generate fewer clusters compared to the standard leader algorithm due to better cluster centers. Due to centers changing position, the maximum snapshottosnapshot distance is no longer guaranteed to be below twice the value for CRADIUS (although in typical scenarios violations are very rare). The modified leader algorithm has not been parallelized and is executed by a single thread when CAMPARI's shared memory (OpenMP) parallelization is in use, It generally offers no obvious benefit over the treebased algorithm below (option 5).

The data are clustered according to a hierarchical algorithm. In theory, a hierarchical algorithm works by first creating
a sorted list of all N(N1)/2 unique snapshottosnapshot distances. Starting with the shortest distance, the two constituting
snapshots do one of the following:
 They spawn a new cluster (if they are both unassigned and the threshold criterion is fulfilled).
 They merge the two clusters they belong to (if they are both assigned and the threshold criterion is fulfilled).
 The cluster the previously assigned snapshot is part of is appended with the unassigned snapshot (if one of them is unassigned and the threshold criterion is fulfilled).
 They terminate the algorithm (if the threshold criterion is not fulfilled).
Because the problem as stated is intractable for large data sets, CAMPARI uses a dedicated scheme to help keep the computation as feasible as possible. In the first step, a snapshot neighbor list is generated that uses a truncation cutoff set by CCUTOFF. The neighbor list generation uses a preprocessing trick that aims to reduce the number of required distance calculations. This preprocessing step relies on a truncated leader algorithm whose target (threshold) cluster size is set by the (borrowed) keyword CMAXRAD. The resultant clusters are then used to screen groups of snapshot pairs and to exclude them from distance computations. Unfortunately, the problem of dimensionality often renders this procedure worthless. In highdimensional spaces → CFILE, volume grows with distance so quickly that the distance spectrum becomes increasingly δ functionlike, and in turn becomes unsuitable for exploiting additive relationships. This stems from conformational distances having a rigorous upper bound for systems in finite volume and with fixed topology. The situation is obfuscated further if many of the dimensions are tightly correlated (such that the effective number of dimensions is indeed lower). Alternatively, this neighbor list can be read in from a previously obtained file (→ NBLFILE). The neighbor list is then further truncated to exactly match the size threshold specified via CRADIUS. For the algorithm to work properly, CCUTOFF has to be at least twice the value of CRADIUS. From this truncated list, a global list is created and sorted according to size. This can be quite memorydemanding. The global list is then fed into the algorithm as described. The results of hierarchical clustering depend very strongly on the linkage criterion (→ CLINKAGE). For many real and highdimensional data sets, the limitations in both processing time and memory footprint mean that analyses are restricted to thousands of snapshots. The hierarchical algorithm has not been parallelized and is executed by a single thread when CAMPARI's shared memory (OpenMP) parallelization is in use, This is in part because such a parallelization would not solve the memory problem. 
The data are arranged according to the socalled progress index method described in detail elsewhere (→ reference).
The progress index is a rearrangement of the snapshots such that a given snapshot at position i is added on account of it having
the shortest available distance to any snapshot j<i. In its exact form, this is resemblant of the hierarchical clustering
described directly above. In technical terms, the main (and only real) (hyper)parameter is a specified
criterion of distance. Using this criterion, either the exact minimum spanning tree (MST) or an
approximation to it is constructed for the underlying complete graph constituted by all trajectory snapshots (vertices) and
the N·(N1)/2 unique, pairwise distances (weighted edges) between them. The spanning tree is a convenient data structure
for deriving the progress index but it is not conceptually fundamental to the algorithm.
Provided a certain starting snapshot, the spanning tree is mined to generate a sequence of snapshots (progress index)
with the above property, i.e., the snapshot added next is the one that has the minimum distance
to any other snapshot already added. The complete progress index has the desirable property that it is likely to group similar objects together
without overlap. In order to work, it requires the sampling density to be sufficiently inhomogeneous, i.e.,
there are enclosed regions (basins) that are sampled preferentially and that consequently have higher point density than the regions
connecting them. It is important to keep in mind that the chosen features along with any
preprocessing steps (e.g., weights) may project/distort the full, underlying phase space.
The method provides an annotation function for the progress index that contains kinetic (or effectively kinetic) information. This function assumes that the evolution of the system is incremental and happens on a continuous manifold. Therefore, apparent jumps in phase space such as those introduced by the replicaexchange methodology may diminish the value of this annotation. There are alternative annotation functions, and some are discussed further in the documentation of the corresponding output file. For practical concerns, there is a methodological choice to pick either the exact or the approximate scheme (→ CPROGINDMODE) in addition to providing a starting snapshot (→ CPROGINDSTART). There are further keywords associated exclusively with this methodology, the most important one being CPROGINDRMAX, which sets the number of search attempts per snapshot per iteration for the approximate scheme (this is the primary controllable determinant of computational cost). Keyword CPROGINDWIDTH is a parameter related to annotations while CBASINMIN and CBASINMAX are related to automatic ways of finding starting snapshots. The approximate scheme runs almost entirely in parallel with excellent efficiency when CAMPARI's shared memory (OpenMP) parallelization is in use. Compared to the published algorithm, there are a few technical tweaks to allow this parallelization: these pertain to the auxiliary clustering (see option 5 below as well as keyword BIRCHMULTI), to search exhaustiveness (→ CPROGRDEPTH), to the handling of points in lowdensity regions that are not between basins (→ CPROGMSTFOLD), and to the parallel random search procedure itself (→ CPROGRDBTSZ), 
The data are clustered according to a treebased algorithm (→ reference) that
shares architectural similarities with the BIRCH clustering algorithm.
The tree algorithm implemented in CAMPARI is not focused on memory efficiency but instead keeps the entire data set stored in memory.
The tree is assumed to be of a set height (number of hierarchical levels → BIRCHHEIGHT) that span a provided
range of threshold criteria (upper and lower bounds set by CMAXRAD and
CRADIUS, respectively) for cluster sizes. In the process of
providing a parallel version for CAMPARI's
shared memory (OpenMP) parallelization, a few minor modifications relative to the
published algorithm were also introduced for the serial version. These modifications
are included in the description below.
Briefly, the algorithm consists of three phases. In the first phase, the levels are looped over starting at the root of the tree (the coarsest level)
and going up to the penultimate level.
At each level, every snapshot is added to an existing cluster (if the nearest distance is below the threshold for that level) or it spawns a new one
(if it is not). The metric is defined
by the criterion of distance applied to the snapshot and the geometric center of the cluster.
The key trick is that the search space (in terms of clusters) for a given snapshot is only the (growing) set of children of the cluster
this snapshot belongs to on the previous level. By spacing the thresholds accordingly, it can thus be achieved that the
number of clusters searched per level and snapshot is constant irrespective of data set size. During the first phase, cluster centroids
move on account of snapshots being added (compare the modified leader algorithm above). The first phase can be understood
as "learning the tree" from the data. In the second phase, cluster centroids are frozen and all snapshots are reassigned
to the nearest existing cluster. This again loops over the same levels as phase 1. The second phase can be understood as "binning"
all snapshots into the learned tree. In the third and last phase, the results from phase 2 are used as follows. For a given
cluster on a given level, all the snapshots it contains after phase 2 are subjected to a "local" reclustering at the next finer level, i.e.,
the search space is restricted to the binning results at the next coarser level. Phase 3 is executed for the penultimate level,
which gives rise to the finest (leaf level) clustering, and potentially additional levels (see BIRCHMULTI).
As for refinement, the challenge is to find protocols that do not exceed the time/space complexity of the algorithm itself.
Currently, there is only one type of optional refinement step that will locally merge leaf clusters
that have different but proximal parent clusters if the diameter of the joint cluster decreases upon merging (relative to
the individual values).
Except refinement, the treebased algorithm has been fully OpenMPparallelized. While phases 2 and 3 are relatively straightforward in this regard, phase 1 is more tricky. To achieve stable results that are identical to those of the (slightly modified) serial version, the coarsest level can only be addressed by a single thread at a time, which obviously impairs load balance. This requires keyword BIRCHCHUNKSZ to be 1, which is the default. In parallel, the clustering can alternatively try to achieve better load balance by a divideandmerge scheme described in the context of keywords BIRCHCHUNKSZ and CMERGEDIAM. In general, the treebased algorithm is extremely fast and will generate more clusters than, for example, the leader algorithm with the same setting for CRADIUS. However, the cluster distribution is altered nonuniformly (the largest clusters in the treebased algorithm will often be larger, but the number of very small clusters (15 snapshots) will increase substantially, especially for large height). Overall, the clusters tend to be substantially tighter. In essence, the multiple hierarchical levels act, metaphorically speaking, as a layered array of filters that creates a resultant net pore size that is smaller than any one of the filters by themselves.  The data are assumed to be clustered already, and the clustering is read from a dedicated input file. This takes a list of snapshottocluster mappings that is identical to the simple output produced in STRUCT_CLUSTERING.clu. This mode is obviously redundant unless further operations dependent on the clustering are performed, e.g., obtaining a graph output file, networkbased reweighting, calculating a cutbased free energy profile, and so on. The actual data are read and fully processed. This has two important consequences. First, the clustering is completely reconstructed including information about geometric sizes and distances. Second, the time savings relative to redoing the clustering may not be significant (e.g., for typical applications with large data sets, the treebased clustering itself takes little time compared to reading and processing the input trajectory).
 This is the same as the previous option, i.e., the data are assumed to be clustered already, and the clustering is read from a dedicated input file. The important difference is that the actual data are not read. This means that information about geometric sizes and distances of clusters and snapshots is not present, and any options relying on this information are either disabled or redundant. This option is useful if repeated graphbased operations are to be performed on a fixed clustering, e.g., changing the lag time of a Markov state model and recalculating the steady state. It can reduce both execution time and memory usage dramatically. In this mode, CAMPARI's normal functionality is entirely skipped (although it is still required to define a(n arbitrary) system, which can lead to (irrelevant) warnings and messages being printed).
Note that the connectivity map for snapshots always refers to the actual data that ends up in memory. This is controlled by NRSTEPS, CCOLLECT, EQUIL, and, if present, an input file with subsets of frames. It is consequently up to the user to ensure that the constructed network model remains meaningful.
CLUFILE
If data for structural clustering are to be collected (→ CCOLLECT), this keyword lets the user provide the name and location of a file containing a series of integers to be interpreted as the main input for filebased clustering. This file should be formatted identically to the output file STRUCT_CLUSTERING.clu written by CAMPARI itself. The associated keyword CLUFILECOL can be used to pick from the possibly more than one columns in the input file. Some more details are provided elsewhere.The limitations are as follows. First, trajectory analysis mode has to be enabled. Second, no MPI support exists if the data are not also read. Third, the file should be analyzed with settings that pertain to the underlying data and are not remapped to the file itself (i.e., if the file is produced by CAMPARI itself, input settings should be the same as the settings for the original clustering). For example: Using NEQUIL 1000 and CCOLLECT 10, a trajectory of 10000 snapshots would give rise to 900 entries for the output in STRUCT_CLUSTERING.clu (corresponding 1010, 1020, 1030, ..., 10000). While it would be possible to reset NEQUIL to 0, NRSTEPS to 900, and CCOLLECT to 1 to read these data back in, doing so would destroy the validity of any auxiliary input file that refers to snapshot numbers in the original trajectory. Files that do so irrespective of the setting for CCOLLECT and other modifiers are files with additional breaks, trace files from PIGS runs, or files with additional links. Instead, keywords should be left at their original settings. This is because the snapshot connectivity structure is inferred independently and before the clustering (irrespective of algorithm) is performed or (if CMODE is not 7) the data are even read.
CLUFILECOL
If data for structural clustering are to be collected (→ CCOLLECT), and filebased clustering has been selected (CMODE is either 6 or 7), this keyword allows the user to pick a particular column (default is 1) from the input file. The input file format is the same as that of output file STRUCT_CLUSTERING.clu, and some more details are provided elsewhere.If the clustering was generated by the treebased clustering with a number of informative resolutions (see keywords BIRCHHEIGHT and BIRCHMULTI), CLUFILECOL can be used to quickly calculate networkderived properties for a number of resolutions, which can be used to assess the robustness of a result or to find the best resolution given a target property.
CCUTOFF
If data for structural clustering are to be collected (→ CCOLLECT), and an algorithm is used that requires a rigorous snapshot neighbor list (currently either hierarchical clustering or the exact variant of the progressindex based scheme → CMODE), this keyword defines the cutoff distance for said neighbor list. It is very critical to choose an appropriate (as small as possible) value for this parameter as otherwise CAMPARI will both run out of (virtual) memory and create humongous files that are written to disk. Note that even with a minimal setting, the problem of computing and storing the neighbor list can very easily become intractable. Often simulation data in highdimensional spaces will be clustered very unevenly in space meaning that multiple "length scales" in distance space matter. This is detrimental to a neighbor list relying on defining a single, specific length scale through CCUTOFF.NBLFILE
If data for structural clustering are to be collected (→ CCOLLECT), and an algorithm is used that requires a rigorous snapshot neighbor list (currently either hierarchical clustering or the exact variant of the progressindex based scheme → CMODE), this keyword can be used to provide name and location of an input file in the appropriate format. CAMPARI uses the versatile binary NetCDF format for this purpose, and consequently the code needs to be linked to the NetCDF library for this option to be available (see installation instructions). Most commonly, this type of file will have been created by CAMPARI itself (it is automatically written if the code is linked against NetCDF and if an algorithm is used that requires a neighbor list → corresponding documentation). This keyword is primarily meant to circumvent the costly neighbor list generation in subsequent applications of the algorithm (for instance, with different settings for CRADIUS).CRADIUS
If structural clustering is performed (→ CCOLLECT), and an algorithm is used that uses a distance (span) threshold criterion (→ CMODE), this keyword sets the value for said threshold criterion. For leaderbased clustering this is either the distance from the center snapshot (standard leader) or from the current geometric center (modified leader) and therefore constitutes a maximum cluster radius. For hierarchical clustering, twice this value is the maximum distance of any two snapshots to be part of the same cluster, so again CRADIUS will control the maximum cluster radius. For treebased clustering, this keyword again sets the maximum distance from the current geometric center. Values are to be provided in Å for proximity measures 510, unitless for 34, and in degrees for 12 (→ CDISTANCE).CREFINE
If structural clustering is performed (→ CCOLLECT), this simple logical keyword lets the user control whether to apply any possible refinement strategies to the initial clustering results. Currently, there are two such procedures: for the modified leader algorithms, a refinement procedure is available which redistributes polyvalent snapshots to larger clusters. For the treebased algorithm (for descriptions of these methods see elsewhere), a possible refinement consists of a (noniterative) merging of clusters with sufficient overlap. They are largely experimental procedures and can have a strong negative impact on performance. In particular, in the OpenMP parallel execution of the treebased algorithm, only a single thread does the refinement.CRESORT
If structural clustering is performed (→ CCOLLECT), this simple logical keyword lets the user control whether to break ties in the sorting of clusters by size in a systematic way. If set to 1, CAMPARI will resort clusters with identical sizes by the indices of their centroid representatives or origin snapshots in increasing order. This is useful primarily in OpenMP parallel executions of the treebased clustering (CMODE is 4 or 5) when clustering consistency is achieved (the batch size parameter is 1). Under these circumstances, results are technically identical across multiple runs with more than one thread, but can differ in the order of clusters of identical sizes. This inconvenience can be avoided using keyword CRESORT.CLEADER
If structural clustering is performed (→ CCOLLECT), and a leaderbased algorithm is used (→ CMODE), this keyword allows the user to alter the processing directions of the leader algorithm by the following codes: The collected trajectory data are processed forward. Clusters are searched backward (starting with the most recently spawned one).
 The collected trajectory data are processed forward. Clusters are searched forward (starting with the one spawned first).
 The collected trajectory data are processed backward. Clusters are searched backward (starting with the most recently spawned one).
 The collected trajectory data are processed backward. Clusters are searched forward (starting with the one spawned first).
CLINKAGE
If structural clustering is performed (→ CCOLLECT), and the hierarchical algorithm is used (→ CMODE), this keyword allows the user to choose between different linkage criteria: Maximum linkage: Appending a cluster with a snapshot implies that the new snapshot is less than twice the value for CRADIUS away from all snapshots currently part of the cluster. For merging two clusters, maximum linkage implies that all possible intercluster distances satisfy the threshold condition. This creates clusters with an exact upper bound for their diameter (maximum intracluster distance) and therefore resembles leader clustering.
 Minimum linkage: Appending a cluster with a snapshot implies that the new snapshot is within a distance of twice the value for CRADIUS of at least one snapshot already contained in the cluster. Merging two clusters implies that at least one intercluster distance satisfies the threshold condition. With a minimum linkage criterion clusters no longer have a welldefined radius and tend to get very large unless tiny values are used for CRADIUS. This is rarely a useful option for molecular simulation data.
 Mean linkage: Appending a cluster with a snapshot implies that the snapshot is within a distance of CRADIUS of the current geometric center of the cluster. Merging two clusters implies that their respective geometric centers are within a distance of CRADIUS of one another. This will create clusters that no longer have a rigorous upper bound for the intracluster distance and therefore resembles the modified leader algorithm.
CMAXRAD
If structural clustering is performed (→ CCOLLECT), and the treebased algorithm or the approximate progress indexbased scheme is used (→ CMODE), this keyword sets the upper distance threshold value for the hierarchical tree, i.e., it corresponds to the coarsest threshold used outside of the (virtual) root (see BIRCHHEIGHT for additional details).BIRCHHEIGHT
If structural clustering is performed (→ CCOLLECT), and the treebased algorithm or the approximate progress indexbased scheme is used (→ CMODE), this keyword sets the number of hierarchy levels in the clustering algorithm. Briefly, the treebased algorithm works by defining a series of threshold criteria (set by interpolating between CRADIUS and CMAXRAD) that define hierarchical levels. An initial clustering tree is learned from the data (phase 1), the snapshots are reassigned to the learned clusters (phase 2), and the reassignment is used to recluster at the next finer level (phase 3). In all phases, the fundamental trick is to restrict the search space for adding snapshots to clusters with the help of the tree structure (parentchild relations). The base of the tree is never counted toward BIRCHHEIGHT as it always encloses all snapshots. By specifying 1 for BIRCHHEIGHT one can thus recover an algorithm that is  in its basic outline  very similar to the modified leader scheme (see CMODE).Larger numbers of levels generally lead to the formation of more clusters. This is because of a specific characteristic that is linked to the fact that children of a cluster (i.e., a set of clusters at the next finer level) can occupy similar space as the children from a nearby parent. If a snapshot explores only the children of a single cluster, inevitably the chances increase that an actual appropriate target cluster at finer levels is missed. In phase 3, this leads to the creation of tight clusters that are "too small" with respect to the desired threshold, i.e., this new cluster could theoretically be combined with one or more nearby clusters without the maximum intracluster distance ever exceeding the distance threshold. The offered refinement option is meant precisely to combat these errors. However, this merging scheme cannot extend arbitrarily far toward the root without destroying the computational efficiency of the algorithm. It also needs to apply stringent criteria. In practical terms, the value of BIRCHHEIGHT is a relatively free parameter if the smaller clusters can be tolerated (their quality is not compromised). In particular it can be used, in conjunction with BIRCHMULTI, to create a highquality multiresolution clustering, which is a desirable starting point for network model optimization. At coarser resolutions, if a multithreaded executable is in use, the divideandmerge strategy (see BIRCHCHUNKSZ) actually offers a separate handle on this characteristic.
BIRCHMULTI
If structural clustering is performed (→ CCOLLECT), and the treebased algorithm or the approximate progress indexbased scheme is used (→ CMODE), this keyword sets the number of hierarchy levels to refine during the second stage of the algorithm. Normally, only the most finegrained (socalled leaf) level is populated in phase 3 of the algorithm. This leaves all levels closer toward the root in a less refined state. By specifying a value for BIRCHMULTI that is larger than the default of zero, the user requests CAMPARI to extend phase 3 to additional levels toward the root. The (virtual) root (single cluster) and the level with the coarsest actual threshold are both excluded from this refinement. The output in output file STRUCT_CLUSTERING.clu is adjusted to provide the correct number of coarsegrained trajectory annotations. Other analyses, unless specified otherwise, are only performed for the leaf level clustering (network). With appropriate settings for CRADIUS, CMAXRAD, and BIRCHHEIGHT, BIRCHMULTI can be used to create a highquality multiresolution clustering. The output in STRUCT_CLUSTERING.clu can then be used to perform graphbased analyses for all resolutions by choosing CMODE to be 6 or 7 and relying on keywords CLUFILE and CLUFILECOL, which is an efficient way of overcoming the aforementioned limitation.BIRCHCHUNKSZ
If structural clustering is performed (→ CCOLLECT), the treebased algorithm or the approximate progress indexbased scheme is used (→ CMODE), and the multithreaded executable is in use, this keyword controls an important aspect of the parallel clustering algorithm. To avoid load imbalance and scaling limitations, the first phase of the algorithm can, optionally, employ a divideandmerge strategy for the determination of clusters at tree levels with few clusters overall. This divideandmerge strategy can be enabled by choosing an integer value larger than 1 for this keyword (the default is 1). As a result of the divideandmerge approach, clustering results become dependent on the number of threads in use. Only a setting of 1 guarantees that the clustering result will be the same as in singlethreaded execution.The precise role of this keyword is to set a threshold of N_{s}/BIRCHCHUNKSZ for coarse clusters (including the tree's root, where all snapshots are in a single cluster) to enable the divideandmerge approach. Here, N_{s} is the total number of snapshots. Larger values will favor further divides and can improve load balance and scalability. The merging procedure is of course associated with a cost itself. The amount of merging is tunable by the associated keyword CMERGEDIAM. This in itself is a viable approach to controlling the properties of the clustering at coarser levels (see BIRCHHEIGHT for additional information).
CMERGEDIAM
If structural clustering is performed (→ CCOLLECT), the treebased algorithm or the approximate progress indexbased scheme is used (→ CMODE), and the multithreaded executable is in use, this keyword controls an important aspect of the parallel clustering algorithm. To avoid load imbalance and scaling limitations, the first phase of the algorithm can, optionally, employ a divideandmerge strategy for the determination of clusters at tree levels with few clusters overall. This divideandmerge strategy is enabled by the associated keyword BIRCHCHUNKSZ. Applying the divideandmerge approach implies that clustering results become dependent on the number of threads in use.The precise role of CMERGEDIAM is to define a leniency on a criterion for merging clusters in the divideandmerge approach. This is necessary because individual threads will likely have produced clusters with very substantial overlap. A merging of a smaller into a larger cluster is accepted whenever two conditions hold. First, the centroidtocentroid distance is required to be less than CMERGEDIAM times a dataderived mean cluster radius at the same tree level. Second, the normalized difference of joint radius and mean snapshottosnapshot distance must be less than CMERGEDIAM1.0. The default value of 1.0 therefore provides a stringent merging criterion. Smaller values make this even more stringent whereas larger values increase leniency. Note that only clusters part of a divideandmerge block and being created by different threads are considered for merging. This makes this functionality complimentary to refinement, which operates at the leaf level.
CPROGINDMODE
If structural clustering is performed (→ CCOLLECT), and the progress indexbased algorithm is used (→ CMODE), this keyword allows the user to choose between the exact (1) and the approximate scheme (2 = default). The two cases differ as follows: In the exact scheme, CAMPARI attempts to construct the true minimum spanning tree (MST) for the trajectory of interest. This is achieved by following the same setup procedure used in hierarchical clustering (described under option 3 to CMODE), i.e., a heuristicsbased scheme is used to construct a neighbor list in snapshot space up to a certain hard cutoff. Alternatively, the neighbor list can be read from a dedicated input file. From this list, a globally sorted list of near distances is constructed. This setup work provides the foundation to construct the MST without additional parameters via Kruskal's algorithm . The high cost (both in terms of time and memory) makes the exact scheme impractical for large data sets. Note that the neighbor list must be sufficient for the algorithm to run. This means that all the edges for the MST have to occur in the neighbor list, which is unfortunately not guaranteed even if each snapshot has multiple neighbors listed. Potential failures are therefore difficult to predict. When using CAMPARI's threads parallelization, the exact scheme is carried out by a single thread alone. This limitation is in part due to the same reasons as the same limitation for hierarchical clustering: threads parallelization does not solve the memory problem.
 In the approximate scheme, CAMPARI utilizes a twostage approach. The goal is to improve upon the large computational
cost associated with the exact scheme without sacrificing too much information encoded in the progress index. This is done by replacing
the calculation of the minimum spanning tree with an approximation, called a short spanning tree (SST). First, the trajectory data
are clustered using the highly efficient treebased algorithm (described under option 5 to CMODE).
For clarity, the hierarchical tree of groups of snapshots (clusters) is not to be confused with the spanning tree we wish to generate. Because
the treebased clustering is used, keywords CRADIUS, CMAXRAD, and
BIRCHHEIGHT are all relevant. The hierarchical tree is then used as follows. For every snapshot,
a fixed number (sometimes an upper limit) of guesses is made to find the shortest available distance to any other eligible snapshot. Rather than
searching exhaustively among all snapshots (as would be required for the MST), we restrict the search to pairs of eligible snapshots belonging to
the same cluster at the finest possible level of the clustering tree. The shortest guess for every snapshot becomes a candidate
edge for the SST. At every one of the ~logN iterations a number of guesses are discarded because they would introduce cycles. The algorithm is continued
until the SST is complete. At any given stage, the eligible snapshots are those belonging to different subtrees of the SST.
The SST will be formed primarily by connections between snapshots in the same clusters at the finest level.
This procedure emulates Borůvka's algorithm with a search space limited by the hierarchical
tree. Because the spanning tree thus constructed is not strictly minimal, it is important to update component memberships after
each merging operation.
The algorithm is dependent on two parameters. The first regulates the maximum number of search attempts used for finding the nextnearest and eligible neighbor for any snapshot (the minimum across a spanning tree component then becomes the candidate edge for that component). It is set by keyword CPROGINDRMAX. The respective clusters at the finest level of the hierarchical tree offering any eligible candidate edges may not offer CPROGINDRMAX guesses. In this case, the second parameter becomes relevant. It controls a depth as to how many additional levels of the hierarchical tree to descend into in order to satisfy the maximum number of guesses. This second parameter is set by keyword CPROGRDEPTH. There is a third parameter, CPROGRDBTSZ, which is a technical setting controlling how a cluster is searched randomly. This is only necessary if the number of eligible candidates in a cluster exceeds the number of missing guesses requested by CPROGINDRMAX. Then, CPROGRDBTSZ can be used to reduce the number of required random numbers. Depending on the settings, the algorithm is expected to run in approximately NlogN time with the constant prefactor determined by the clustering and the choice for CPROGINDRMAX. Similarly, the quality of the generated spanning tree depends nontrivially on both aforementioned search parameters as well as on the properties of the treebased clustering. It is of course unlikely that the SST be in fact the true MST for trajectories of appreciable length. By using appropriately large values for both CPROGINDRMAX and CPROGRDEPTH one can create an asymptotic limit for recovering the true MST. This limit can be of practical use even though a guaranteed MST computed this way requires at least O(N^{2}) time, which is worse than the time complexity of the exact form (aided by safe heuristics). However, the space (memory) complexity of this approach is much superior (linear rather than O(N^{1.5}) to O(N^{2})).
The above description reveals some minor differences relative to the original published algorithm. The change associated with keyword CPROGRDBTSZ is directly related to the parallelization of the SST construction. When CAMPARI's threadparallel version is in use, this operation offers excellent scaling properties, and limiting the number of required random numbers is beneficial for large numbers of threads. Because threads access the random number generator in nondeterministic order, keyword RANDOMSEED cannot be used to ensure that the SST in two successive executions is exactly identical for more than one thread. The second modification relative to the original publication is the straightforward generalization offered by keyword CPROGRDEPTH (in essence, the previous algorithm implied this keyword to always be 0).
CPROGINDSTART
If structural clustering is performed (→ CCOLLECT), and the progress indexbased algorithm is used (→ CMODE), this keyword allows the user to pick a specific snapshot to serve as starting point for the generation of the progress index. Because, like INISYNSNAP, the keyword uses a snapshot index, it is important to point out that the value of this keyword must always be specified in absolute terms of the input data, i.e., generally speaking, no corrections must be applied in case CCOLLECT is greater than 1, a sequential access file with userselected input frames is specified, or frames are discarded at the beginning (this was different in previous versions). CAMPARI takes care of this automatically. The use of a frames file requires particular care. If the file accesses the trajectory in random access ("as is") mode, the snapshot index is assumed to refer to the line number in the frames file rather than the index of the frame on that line. This is a general change of interpretation inherent to FMCSC_FRAMESFILE with certain input file formats. Conversely, if the file accesses the trajectory in strictly sequential mode, step numbers continue to refer to the original trajectory.As a special option, specifying zero instructs CAMPARI to find a set of suitable starting snapshots. These are generally found by generating a sample profile (discussed elsewhere) that is then scanned for extrema using an automated detection system that can be tuned with two additional keywords, CBASINMAX and CBASINMIN. The idea behind this is to generate profiles starting from a complete set of putative basins. If this automatic detection is unsuccessful, CAMPARI will revert to using the first snapshot as a starting point.
As a further option that is only available in the approximate scheme (→ CPROGINDMODE), a specified value of "1" instructs CAMPARI to use as starting snapshot the central snapshot of the largest cluster found during the preparatory treebased clustering. The default options are 1 in the approximate scheme and 0 in the exact scheme.
CPROGMSTFOLD
If structural clustering is performed (→ CCOLLECT), and the progress indexbased algorithm is used (→ CMODE), this keyword allows the user to modify the spanning tree underlying the progress index before the index is computed. The modification consists of "folding" or collapsing the leaves into their parent vertex, which means that they are added first as soon as the index encounters the parent in question. By specifying a positive integer, the user requests CPROGMSTFOLD applications of this inward folding procedure (each of which scales linearly with the number of snapshots in terms of computational cost). After each iteration, the identity of vertices as leaf vertices is updated, which means that branches are continuously folded inward. Note that already a single iteration will fold a large number of edges (the actual number is reported to log output). For multiple folded vertices connected to the same parent CAMPARI preserves the expected order (shortest distance first).The reasoning behind this modification is the following. When operating on the (minimum) spanning tree, Prim's algorithm proceeds by always finding the shortest distance available. As long as basins are sampled densely and transitions are rare, this has the desired effect of arranging snapshots in a way that allows identification of basins by suitable annotation. However, it is common for basins to have "fringe" regions where sampling density becomes low (and distances are large). Points in these regions will often be missed by the progress index and placed at the end (far away from "their" parent basin). Points in these regions are also likely to correspond to leaf vertices in the spanning tree. Therefore, it can be assumed that collapsing them into their parent will partially ameliorate this issue (they will occur in the correct basin). Users should keep in mind that this alters the rule that the progress index is built to track local density as much as possible.
CPROGRDEPTH
If structural clustering is performed (→ CCOLLECT), and the approximate progress indexbased algorithm is used (→ CMODE and CPROGINDMODE), this keyword allows the user to control the maximum search depth for random guesses. In this method, a hierarchical tree is used in conjunction with a parameter, CPROGINDRMAX, to restrict the search space for finding edges of a short spanning tree. The hierarchical tree is based on the treebased clustering algorithm, and its height is set by keyword BIRCHHEIGHT. For each snapshot, the algorithm will start searching for putative edges within the cluster the snapshot is part of at the finest level offering any eligible candidates. Often, the number of candidates is smaller than the setting for CPROGINDRMAX. Then, CAMPARI will descend the hierarchical tree toward the root by at most CPROGRDEPTH levels to fulfill the requested number of guesses per snapshot. The reason for offering this restriction is that the search at additional levels is often inefficient. This is because it introduces additional redundancy (the same candidates are evaluated more than once), and the candidates at a coarserthannecessary level are unlikely to be better guesses than the ones at the finest available level. The default for CPROGRDEPTH is zero. Note that, with a meaningful clustering in place, the default setting will prevent the spanning tree from approaching the correct minimum spanning tree in almost all cases. This is because of the hard search space restrictions. At considerable cost, this keyword can overcome the impact of these restrictions.CBASINMAX
If structural clustering is performed (→ CCOLLECT), the progress indexbased algorithm is used (→ CMODE), and an automatic determination of multiple starting snapshots for profiles is requested (→ CPROGINDSTART), this keyword controls how a test profile using the standard annotation function described elsewhere is parsed to automatically identify minima in this function. Specifically, around each eligible point in the profile, environments of varying sizes are considered, and the following criteria are used: The sum of values to the left over a stretch of n_{e} points must be greater than the sum of values over a stretch of n_{e} points centered at the point currently considered.
 The sum of values to the right over a stretch of n_{e} points must be greater than the sum of values over a stretch of n_{e} points centered at the point currently considered.
 The sum of values to the left and right over a stretch of n_{e} points each must be greater than a reference sum that is given as twice the sum of values over a stretch of n_{e} points centered at the point currently considered plus 4n_{e}.
 The left (far) half of the sum of values to the left over a stretch of n_{e} points must be greater than the right (near) one.
 The right (far) half of the sum of values to the right over a stretch of n_{e} points must be greater than the left (near) one.
 No point toward the left over a stretch of n_{e} points must be greater than or equal to the point currently considered.
 No point toward the right over a stretch of n_{e} points must be greater than the point currently considered.
CBASINMIN
If structural clustering is performed (→ CCOLLECT), the progress indexbased algorithm is used (→ CMODE), and an automatic determination of multiple starting snapshots for profiles is requested (→ CPROGINDSTART), this keyword controls the minimum value considered for n_{e} as explained in the documentation of keyword CBASINMAX.CPROGINDRMAX
If structural clustering is performed (→ CCOLLECT), the progress indexbased algorithm is used (→ CMODE), and the approximate version is chosen (→ CPROGINDMODE), this keyword controls the maximum number of attempts for a search of the next correct spanning tree neighbor of a growing spanning tree component. Depending on the choice for keyword CPROGRDEPTH, such a search first exhausts the possibilities within a given cluster of the hierarchical tree underlying the approximate algorithm and will only consider a limited amount of clusters at coarserthannecessary levels. Therefore, the parameter is interpreted as a maximum and not generally an actual value. Whenever the number of eligible candidate snapshots in a cluster is less than the missing amount of guesses for the snapshot in question, the search becomes deterministic. Otherwise, it is random (with replacement). In both cases, the eligible snapshot with the minimum distance to the spanning tree component under consideration is becomes a candidate for the next link of the approximate MST. If CAMPARI's shared memory (OpenMP) parallelization is in use, the choice for this keyword affects parallel performance at most weakly. This is because the parallelization is at the level of snapshots and not at the level of guesses.CPROGRDBTSZ
If structural clustering is performed (→ CCOLLECT), the progress indexbased algorithm is used (→ CMODE), and the approximate version is chosen (→ CPROGINDMODE), this keyword controls the structure of the random search in a cluster with a number of eligible candidates that exceeds the remaining number of required guesses for all but the first stage of Borůvka's algorithm. The default is 1 meaning that every single random guess requires a random number. Values larger than 1 imply that the random search proceeds in systematic stretches of length CPROGRDBTSZ in the contiguous stretch of eligible candidates starting from a member selected with uniform probability. The specified value is an upper limit, i.e., the number of guesses is never exceeded. Use in the first stage is forbidden so as to avoid bias from the input order, which can still be present in the list of snapshots constituting a cluster. In later stages, cluster snapshot lists have been reordered by subtree memberships, and systematic biases become increasingly unlikely. The keyword is relevant primarily if CAMPARI's shared memory (OpenMP) parallelization is in use. In this scenario, the cost of random number production can become significant relative to distance evaluations because the individual threads share the same random number generator (which leads to minor waiting times). Consequently, if the cost of the distance evaluations is high to begin with (dependent on features and metric), the default should not be changed.CPROGINDWIDTH
If structural clustering is performed (→ CCOLLECT), and the progress indexbased algorithm is used (→ CMODE), this keyword controls the auxiliary annotation function defined elsewhere. Specifically, it corresponds to the parameter l_{p} in the documentation found by following the link.TMAT_MD
This important keyword lets the user set the time direction(s) for the inferred transition matrix to be used in all related analyses, which may be iterative solutions of the steady state (see CREWEIGHT), mean first passage times (see CMSMCFEP), the generation of synthetic trajectories (or random walks, see SYNTRAJ_MD) and/or, if the code was compiled and linked with HSL support (→ installation instructions), the achievement of committor probabilities (DOPFOLD and CMSMCFEP) and/or the computation of the spectral properties of the transition matrix itself (EIGVAL_MD). For the present keyword and related analyses to have an effect, a clustering analysis must be performed (see CCOLLECT and CMODE). In case CMODE is set to 4, the approximated version of the progress index algorithm must be used (CPROGINDMODE set to 2). In this case, or if CMODE is set to 5, the underlying state space used for the evaluation of the matrix is always the one at the leaf level (→ BIRCHHEIGHT).In general, the analysis of transition matrixderived properties will be done exclusively for those states that form the reference strongly connected component, which is identified by the cluster selected with the keyword INISYNSNAP. Some algorithms may provide solutions for all eligible components individually (see INISYNSNAP). Results of all analyses that explicitly reference userselected snapshots (e.g. committor probabilities for folded and unfolded sets) are obtained over this reference strongly connected component only. The reference component is isolated from all the other ones (if any) by removing all the oneway transitions. The transition matrices can be requested to be written to file with the specialized keyword TMATREPORT. For the present keyword, the relevant options are:
 Only data derived from one transition matrix are calculated, i.e., the one that uses the forwardtime transitions between the clusters of the reference component (default). In the simplest scenario  viz. CLAGT_MSM set to 1, no breaks, no links and no trace files specified  these transitions are simply reflected in the output file STRUCT_CLUSTERING.clu when processed line by line from top to bottom.
 Only data derived from one transition matrix are calculated, i.e., the one that uses the backwardtime transitions between the clusters. Breaks and links are interpreted by reversing the timeinformation in the relevant input files. Similar to option 1, in the simplest situation, the transitions for this case are reflected in the STRUCT_CLUSTERING.clu output file when processed line by line from bottom to top.
 Both types of transition matrices (i.e. forward and backward time) are constructed and most subsequent analyses are performed twice, once per type of transition matrix.
Since the basic inference of the transition matrix is based on the count matrix, the initial estimate depends on the assumptions of snapshottosnapshot connectivity in the input trajectory. The default assumption (subsequent snapshots are connected in time) can be altered by several keywords. The general time spacing (lag time) can be changed with keyword CLAGT_MSM. Custom links and breaks can be added with specific input files TRAJBREAKSFILE and TRAJLINKSFILE. In addition, a special automatic handling of rerouted transitions is offered in case the input trajectory has been generated using the PIGS protocol and the associated trace file is provided as input. Importantly, these snapshotbased modifications can be handled and analyzed by CAMPARI at the beginning of the run. Obviously, all changes to the transition matrix impact all the routines that use it subsequently. Keyword BRKLNKREPORT can be used to instruct CAMPARI to print a summary of rerouted (or all) snapshot transitions.
TMATREPORT
This keyword is interpreted as a simple logical. When set to 1, it asks CAMPARI to write one or more files (see TMAT_xxxxxx_yyy.dat for details on formatting) the nonzero entries of the processed transition matrix(ces) (see TMAT_MD). For this option to be available, structural clustering analysis must be performed (see CCOLLECT and CMODE). If any method is used that relies on keyword INISYNSNAP, which includes synthetic trajectories, spectral decomposition, and committor probabilities, this snapshot will be used to identify the strongly connected component, for which the file(s) are written.CLAGT_MSM
This integer value specifies the lag time τ to be used to compute the transition matrix for any relevant analysis based on a network (graph, Markov state model) derived from a structural clustering, e.g., SYNTRAJ_MD, EIGVAL_MD and DOPFOLD. Setting its value to any number greater than 1 (default) entails superimposing all the transition counts between clusters as derived at fixed time distance τ along the coarsegrained trajectory (STRUCT_CLUSTERING.clu), an approach that is often called "sliding window" in the relevant literature This way, there are as many superposition steps as the integer value of the lag time. CLAGT_MSM strictly refers to the spacing (in number of frames) of the data actually stored for clustering, which depend on EQUIL, CCOLLECT, and, possibly, an input file with userselected frames. The actual distance in units of time that CLAGT_MSM corresponds to has to be computed by considering the actual spacing of the underlying data set (e.g., for a CAMPARI molecular dynamics trajectory, this would have been controlled by TIMESTEP and XYZOUT). Because CLAGT_MSM ultimately edits the way snapshots (frames) of the input trajectory are linked together, it is also relevant for the output of the progress index method (see output file PROGIDX_000000000001.dat).The sliding window mode of operation will automatically propagate modifications to the connectivity introduced by input files TRAJBREAKSFILE and TRACEFILE. Conversely, any manually added links are always kept "as is." If the user is interested in processing all the trajectories that are superimposed this way as separate entities, it is necessary to prepare as many dedicated input frames files as the integer value of the lag time and to perform a separate analysis for each of them. It is worth pointing out that the independent clustering on each input frames file may introduce some inconsistencies in this workaround.
CADDLINKMODE
If structural clustering is performed (→ CCOLLECT), this keyword allows the user to request different modifications of the link (edge) structure of the derived network (graph). This is unavailable if the exact progress index method has been selected. Modifying the link structure can be useful because the transition counts usually suffer from poor statistics for many if not most links. This can cause problems, e.g., by splitting the graph into several strongly connected components or by creating dramatic sensitivities of networkderived properties (such as the steady state) on very few elements of the transition matrix. For small values of chosen lag times, networks are assumed to be locally connected only (sparse). While this potentially reduces the impact of statistical errors, a large number of subsequent analyses (whether in CAMPARI or elsewhere) unfortunately assume a memoryless evolution, which is difficult to fulfill. Conversely, for large values of chosen lag times, the memoryfree nature of the dynamics may become appropriate, but the number of relevant transition matrix elements grows dramatically while the counts available for inference decrease considerably.These joint concerns mean that it is unfortunately not at all simple to identify the optimal transition matrix based on counts. The available options are meant to deal with this problem as follows:
 The network is left as is, i.e., all transition matrices will be inferred directly from the observed transition counts. This is the statistically optimal estimator if the system is truly Markovian. It can lead to fractured graphs, which introduce arbitrary probability relationships between subgraphs.
 Strongly connected components are identified using Tarjan's algorithm. They can result from supplying a file with trajectory breaks or a trace file for an MPI PIGS calculation. Any oneway links between different components are augmented with the reverse transition. The floating point weight for this reverse link is set by keyword CLINKWEIGHT. If there is no link in either direction, multiple components will remain as in option 0.
 Any clusters (vertices) without any observed selftransitions (selfloops) are augmented with a selftransition with a floating point weight of CLINKWEIGHT. In a Markov model sense, this will increase residence times and populations for the augmented nodes. It also removes deterministic chains of singleton clusters, which often occur in high resolution networks in fringe regions (regions of low sampling density).
 This is a combination of options 1 and 2.
 The count matrix is symmetrized. If one of the two corresponding elements is zero, this creates a new reverse link with the same properties as the existing forward one. If both directions are already populated, this means that the transition with a lower number of observed counts is augmented to match the exact count number of the more populated one. This option ignores keyword CLINKWEIGHT. This is different from symmetrization achieved by adding the entire transition count matrix obtained from the same trajectory reversed in time. Both variants imply detailed balance. Again, if no link exists in either direction, nothing is done, and multiple strongly connected components may persist as in option 1.
 This is a combination of options 2 and 4.
 Symmetrization of the count matrix is a crude way to impose detailed balance. In particular, it is almost certainly suboptimal in a statistical likelihood sense. If we assume the likelihood of the inferred transition matrix as Π_{i,j}T_{ij}^{cij}, where c_{ij} is the number of observed counts for the transition from i to j, and T_{ij} is the inferred transition matrix element, then it is possible to solve a constrained problem that maximizes this likelihood while maintaining row normalization and detailed balance as constraints on the T_{ij}. It is important that the particular form of the likelihood function is a strong imposition, i.e., the transition counts are assumed independent, which is equivalent to asserting Markovianity. Markovianity is a very challenging property to achieve with sufficient accuracy (in the sense of a true statistical test). This means that in many applications the resultant transition matrix does not actually maximize a meaningful quantity. This holds as much for the nonaugmented inference (option 0) as for this option, which includes the added constraint of maintaining detailed balance. Bowman et al. derived an iterative estimator solving this constrained problem. This estimator was subsequently simplified by Prinz et al., and their version, which works on the loglikelihood as usual, is implemented in CAMPARI. Also with this method, links with weight zero in both directions will remain empty, which again means that multiple strongly connected components may persist. This iterative algorithm benefits from CAMPARI's shared memory (OpenMP) parallelization and is under time control (the procedure can be slow). Parallelization is such that the results should always be identical to serial execution.
CLINKWEIGHT
If structural clustering is performed (→ CCOLLECT), and the addition of links (edges) to the derived network (graph) is requested (→ CADDLINKMODE), this keyword sets the floatingpoint weight for some of the added links (see above for details). Note that the basic unit is an (integer) count of observed transitions in the input trajectory. The default is therefore 1.0.TRAJBREAKSFILE
If any type of structural clustering is performed (→ CCOLLECT), or if the exact progress indexbased algorithm is used (→ CMODE), the resultant trajectory is used to infer the properties of a network. Essentially, the sequence of events in the trajectory defines a transition matrix. However, not all transitions in a trajectory may be equally valid, as they may be caused by trajectory concatenation (e.g., when using structural clustering with the MPI averaging technique, by replica exchange swaps, by nonlocal Monte Carlo moves and so on). It may therefore be appropriate to remove such spurious transitions from the analysis in order to keep inferences regarding the underlying dynamics accurate. This is what this file accomplishes, and the input and its interpretation are described in detail elsewhere.The removal of links is relevant for a number of output files, most obviously in STRUCT_CLUSTERING.graphml and TMAT_xxxxxx_yyy.dat (the mesostate (cluster) network and implied transition matrix). All output files that depend on the transition matrix (→ TMAT_MD) and the output of the progress index method are affected as well. There are two additional notes. First, CAMPARI will not remove any transitions by default, and it may sometimes be difficult to obtain or preserve the required information (e.g., the replica exchange trace file must be used to extract the exact history of accepted swaps). Second, there is no guarantee that the graph remains intact (it may fracture into multiple, disconnected subgraphs), and this may impact the interpretability of the data in the aforementioned output files. Conversely, the native processing of a PIGS trace via keyword TRACEFILE is both more convenient and more universally supported.
TRAJLINKSFILE
If any type of structural clustering is performed (→ CCOLLECT), or if the exact progress indexbased algorithm is used (→ CMODE), the resultant trajectory is used to infer the properties of a network. Essentially, the sequence of events in the trajectory defines a transition matrix. However, trajectory concatenation may give rise to scenarios where some links are spurious (→ TRAJBREAKSFILE) and others are missing, e.g., if multiple trajectories are branched off from a common starting point and simply appended for analysis purposes. This keyword can be used to add such missing links at the snapshot (frame) level. This function can overlap with keyword CADDLINKMODE, which operates at the cluster level. It also overlaps with the use of keyword TRACEFILE for managing the reseeding operations of a PIGS calculation, which is a type of simulation yielding such a set of branched trajectories. The input format is described in detail elsewhere.The addition of links is relevant for a number of output files, most obviously in STRUCT_CLUSTERING.graphml and TMAT_xxxxxx_yyy.dat (the mesostate (cluster) network and implied transition matrix). All output files that depend on the transition matrix (→ TMAT_MD) and the output of the progress index method are affected as well. We emphasize that considerable care is required to manage the links in a conformational space network through keywords (TRAJLINKSFILE, TRAJBREAKSFILE, CADDLINKMODE, TRACEFILE, and CLAGT_MSM). This is mostly due to the fact that data generation and postprocessing (necessarily) are usually separate operations, which makes it difficult to achieve a compromise between controllability and ease of use.
BRKLNKREPORT
This is a simple keyword that allows the user to request information on the snapshottosnapshot connectivity map CAMPARI assumes for all networkbased analyses (→ TMAT_MD and the output of the progress index method). Options are as follows: No report is printed.
 Rerouted snapshottosnapshot links with a step spacing that is different from the requested lag time are printed to logoutput at the beginning of the run (before any data are read). Indexing both relative to the stored data and relative to the original input is provided (the latter depends on CCOLLECT and EQUIL and, possibly, the presence of a file with userselected frames). During postprocessing, for the same links, the geometric distance for the two snapshots in question is printed as well.
 This is the same as the previous option only that all links are printed.
CREWEIGHT
If structural clustering is performed (→ CCOLLECT), which includes the case of the approximate progress index method, the resultant coarsegrained trajectory serves to define a network (graph) of clusters (vertices). If the original trajectory carries strong initial state (but no energetic) bias (for example, if it is a concatenation of many short trajectories), it may be of interest to attempt to quantify the bias in the data. This is what this keyword is meant for, and it currently supports the following options: No networkbased reweighting is undertaken.
 The steady state (equilibrium probability distribution) of the underlying (and assumed!) Markov state model is computed using an iterative algorithm. As alluded to, the resultant graph may not be strongly connected or even fractured. Any modification to the link (edge) structure of the network (→ TRAJBREAKSFILE, CADDLINKMODE, TRAJBREAKSFILE, TRACEFILE, TRAJLINKSFILE, CLAGT_MSM) can influence the steady state and any other networkderived properties profoundly. Even for a single continuous trajectory, the observed probability distribution in cluster space does not exactly agree with the networkderived prediction due to the imbalance caused by having a beginning and an end. Consequently, a simultaneous use of networkdependent properties such as mean first passage times or synthetic (statebased) trajectories and the raw sampling weight per state will be inconsistent. This is why the steady state  if computed  will be used in the subsequent computation of cutbased free energy profiles. Note that the steady state can also be computed using linear algebra (→ EIGVAL_MD) and is required in the computation of the () committor.
 This option is the same as the previous one only that all edges are first scaled by their geometric lengths. XXXXXXXXXXXXXXX
The computation of the steady state uses an iterative algorithm that can become quite timeconsuming due to the slow convergence behavior. There is a time control for all iterative schemes of this type. The algorithm is also numerically weak in that the convergence measure is unable to estimate the deviation of the current from the exact solution accurately and in that the convergence properties can differ across the network. The algorithm does detect periodicity, which generally prevents convergence (the easiest example is a system of two mutually connected states with no selftransitions), and will eventually report this and terminate. Unfortunately, the difficult cases for the iterative scheme are the same as those for the linear algebra solution. It can be illustrative to compute both solutions if possible and compare them (→ STRUCT_CLUSTERING.graphml). The steady state also provides a route toward reweighting a set of simulation data biased by initial conditions, i.e., an ensemble of short trajectories. The resultant weights are written by default to dedicated output file(s) unless CREWEIGHT is 0, and these files can usually be used as an input to FRAMESFILE for subsequent weighted analysis. The iterative algorithm has the advantage over the linear algebra solution that it benefits from CAMPARI's shared memory (OpenMP) parallelization. Parallelization is such that the results should always be identical to serial execution.
MAXTIME_ITERS
If structural clustering is performed (→ CCOLLECT), which includes the approximate progress index method, this keyword can be set to define a maximum execution time (in seconds) of any iterative scheme computing convergent properties of/from the transition matrix, which are currently used for the steady state (→ CREWEIGHT), the mean first passage times (→ CMSMCFEP) to a reference cluster, and the iterative maximum likelihood inference (→ CADDLINKMODE is 6). The normal convergence threshold for these algorithms must for accuracy reasons be set to such a small number that the execution can easily time out (without a printed solution) on timelimited resources if the network is large and not wellconnected. This is why this keyword, which defaults to an unlimited execution time, can be set to force a solution after a given time irrespective of convergence. Note that the execution time specified here refers to a single execution of an individual algorithm, and that multiple invocations as well as the remainder of CAMPARI's execution time must be estimated independently and corrected for. Note that all 3 of the aforementioned iterative algorithms can take advantage of CAMPARI's shared memory (OpenMP) parallelization. The efficiency of this depends on the size and connectedness of the network (larger is better for both).INISYNSNAP
If any type of structural clustering is performed (→ CCOLLECT), the underlying trajectory is used to infer a transition network. With this keyword, the user indicates the snapshot that is used for the selection of the reference cluster (and reference strongly connected component) in a number of related analyses (SYNTRAJ_MD, EIGVAL_MD, DOPFOLD, and CMSMCFEP). The reference cluster is simply that cluster that contains the snapshot indicated by the value of this keyword and the reference component is the one the reference cluster belongs to. In the case of the generation of random walks on the network, the reference cluster will be the starting node. This keyword also becomes relevant when committor probabilities are requested (DOPFOLD) but no input file for the reference set B (see DOPFOLD for definitions) is specified or found (CLUFOLDFILE). In this case, CAMPARI reverts to use as the only cluster of the set B the cluster selected here. If no value is specified, the default will take the cluster with the largest number of frames and its component as reference.Because the keyword uses a snapshot index, it is important to point out that the value of this keyword must always be specified in absolute terms of the input data, i.e., generally speaking, no corrections must be applied in case CCOLLECT is greater than 1, a sequential access file with userselected input frames is specified, or frames are discarded at the beginning. CAMPARI takes care of this automatically. The use of a frames file requires particular care. If the file accesses the trajectory in random access ("as is") mode, the snapshot index is assumed to refer to the line number in the frames file rather than the index of the frame on that line. This is a general change of interpretation inherent to FMCSC_FRAMESFILE with certain input file formats. Conversely, if the file accesses the trajectory in strictly sequential mode, step numbers continue to refer to the original trajectory. If the selected reference snapshot is not present in the data to be finally extracted, the program will terminate at the very beginning.
For the generation of cutbased pseudo free energy profiles (→ CMSMCFEP), a choice of 0 will produce (if possible) the requested profiles, which depend also on TMAT_MD, for the reference component containing the largest cluster (by sampling weight). As an additional option, this keyword can be set to 1, in which case CAMPARI will (if possible) utilize all strongly connected components of the underlying graph and use the largest cluster within each component (subgraph) as reference for multiple, distinct cut profiles (separate output files).
EIGVAL_MD
If any type of structural clustering is performed (→ CCOLLECT), and CAMPARI was compiled and linked with HSL support (→ installation instructions), it is possible to perform a spectral analysis of the transition matrix(ces) derived from clustering (→ TMAT_MD), providing that the rank N of the transition matrix is > 3. The HSL library deputed to this task is the FORTRAN double precision version of EB13, viz. calls to EB13ID, EB13AD, and possibly EB13BD are made in CAMPARI whenever required. Those routines implement the Arnoldi method for large sparse matrices and the user is invited to read up on the relevant documentation (EB13). CAMPARI hides, however, some of the functionality offered by the HSL library itself. For example, referring to the relevant documentation (EB13), the Arnoldi method used by CAMPARI is always the one with Chebychev acceleration of the starting vectors, i.e., ICNTL(9) is hardcoded to 2, which is the only option we have tested. Currently, this choice can be altered only by modifying the source code whewre the relevant initialization happens (subroutine calc_eigs_msm(...) in source file graph_algorithms.f90). With the present keyword, the user can decide whether or not to perform the spectral decomposition of the transition matrix and, in case it is performed, how to sort the eigenvalues of the transition matrix (with this keyword the "IND" variable of EB13AD is set to the same value with an offset of 1). The following options are available: No spectral decomposition is performed (default).
 A selected number (→ NEIGV) of eigenvalues with largest absolute values are computed.
 A selected number (→ NEIGV) of rightmost eigenvalues are computed. These are the eigenvalues with the largest real parts. This option is probably the only useful option for the spectral analysis of a transition matrix as complex eigenvalues are not generally interpretable to begin with.
 A selected number (→ NEIGV) of eigenvalues with largest imaginary parts are computed.
If the spectral analysis of the transition matrix is requested, several dependent keywords controlling the task to be solved as well as parameters of the Arnoldi method become relevant. The number of eigenvalues to be computed is set with NEIGV, while keyword DOEIGVECT lets the user request the computation of the eigenvectors associated with the NEIGV eigenvalues as well. The Arnoldi method is controlled by keywords NEIGBLOCKS, NEIGSTEPS, NEIGRST, and EIGTOL. The output produced by the use of this keyword is always written to a dedicated output file (EIGEN_XXX.dat). In addition, if the chosen option is 2 and the eigenvectors are available, output file STRUCT_CLUSTERING.graphml will contain the first eigenvector as well. Lastly, note that the same routines may be called in case the computation of the () committor was requested. In this particular case, the options are not directly controllable by the user, however.
Because the HSL routines are not (currently) threadsparallel, this functionality does not benefit from CAMPARI's shared memory (OpenMP) parallelization, which is a limitation.
NEIGV
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and linked with HSL support (→ installation instructions), and EIGVAL_MD is not zero, this integer value defines how many eigenvalues should be returned by the spectral decomposition (EIGVAL_MD) of the transition matrix(ces) (TMAT_MD). The returned eigenvalues are maximal in some sense, and this is defined by the choice for EIGVAL_MD. This keyword effectively sets the value of variable "NUMEIG" in the underlying HSL routine EB13. The value for this keyword influences the choice for NEIGSTEPS and NEIGBLOCKS, since it is required that min(N, NEIGV) ≤ NEIGSTEPS·NEIGBLOCKS ≤ N, where N is the rank of the transition matrix (N > 3). It is worth to note that the cost for the Arnoldi steps at each iteration scales as (NEIGBLOCKS·NEIGSTEPS)^{2}·N, while the cost of computing the Hessenberg matrix is proportional to (NEIGBLOCKS·NEIGSTEPS)^{3} and the memory requirements are proportional to (NEIGBLOCKS·NEIGSTEPS)^{2}, as outlined in the documentation for EB13. Therefore, increasing the number of eigenvalues to be computed can impact both the achievement of the desired convergence criterion (EIGTOL), which may be addressed by keyword NEIGRST, and can have a dramatic effect on execution time and memory footprint.DOEIGVECT
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and linked with HSL support (→ installation instructions), and EIGVAL_MD is not zero, this simple logical (1 is true) allows the user to request CAMPARI to compute eigenvectors along with eigenvalues, which are added to the same output file, viz., EIGEN_XXX.dat. If EIGVAL_MD is set to 2, the first eigenvector contains the steady state of the transition network (TMAT_MD), which is also reported in the output file STRUCT_CLUSTERING.graphml. In case the network is fractured into multiple, strongly connected components, the computation and output are limited to the strongly connected component the reference cluster (INISYNSNAP) resides in. Additional eigenvectors (→ NEIGV), which refer to the eigenvalues smaller than 1, are often interpreted to report on the involvement of each cluster in the transition associated with a characteristic time scale, which is given by the corresponding eigenvalue λ as t =  τ/ln(λ) where τ is the lag time (→ CLAGT_MSM).NEIGBLOCKS
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and linked with HSL support (→ installation instructions), and EIGVAL_MD is not zero, this keyword lets the user set the number of blocks for the Arnoldi method. It corresponds to the variable "NBLOCKS" in the reference library documentation (EB13). NEIGBLOCKS must be ≥ 1 and the conditions min(N, NEIGV) ≤ NEIGSTEPS·NEIGBLOCKS ≤ N must always hold true, with N being the rank of the transition matrix (TMAT_MD), N > 3. If NEIGBLOCKS is set to 1, the unblocked Arnoldi method is used (EB13). If this keyword is not found, the current default choice is to set NEIGBLOCKS to NEIGV + 2. However, the best choice for this value together with the value for NEIGSTEPS depends on the problem. In the reference documentation (EB13), the suggestion is to set NEIGBLOCKS to at least the value of NEIGV and to set NEIGSTEPS such that NEIGSTEPS·NEIGBLOCKS lies in the range between 3·NEIGV and 10·NEIGV.NEIGSTEPS
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and linked with HSL support (→ installation instructions), and EIGVAL_MD is not zero, this integer variable sets the number of steps for the Arnoldi method and corresponds to the variable "NSTEPS" in the reference library documentation (EB13). The minimum allowed value is 2 and the requirements min(N, NEIGV) ≤ NEIGSTEPS·NEIGBLOCKS ≤ N must always be respected, with N being the rank of the transition matrix (TMAT_MD), N > 3. The current default choice is to set this variable to ceiling((8.·NEIGV)/(NEIGV + 2)), if no specifications are given from the user. However, the best choice for this value together with the value for NEIGBLOCKS is dependent on the problem. In the reference documentation (EB13), the suggestion is to set NEIGBLOCKS to at least the value of NEIGV and to set NEIGSTEPS such that NEIGSTEPS·NEIGBLOCKS lies in the range between 3·NEIGV and 10·NEIGV.NEIGRST
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and linked with HSL support (→ installation instructions), and EIGVAL_MD is not zero, this keyword lets the user select the number of restarts of the Arnoldi's method before the execution is terminated in case the wanted convergence (EIGTOL) has not been achieved in the previous set of steps. The default value is set to 10, which means that the execution is aborted after 10 restarts from the possible intermediate and not yet converged solution. For the sake of completeness and clarity, the hardcoded value for the number of iteration within EB13, viz. ICNTL(11), is 999, and that is not the value set by this keyword.EIGTOL
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and linked with HSL support (→ installation instructions), and EIGVAL_MD is not zero, this keyword sets the tolerance on the residuals that needs to be achieved before the computed solution of the eigenvalue problem on the transition matrix(ces) (→ TMAT_MD) is deemed appropriate. It defaults to 10^{3}·"machine precision" and corresponds to CNTL(1) in the library documentation (EB13). For completeness, we mention here that ICNTL(7) is hardcoded to 1, which means that convergence is checked against the Frobenius norm of the matrix, which is computed by default.CMSMCFEP
If structural clustering is performed (→ CCOLLECT), which includes the case of the approximate progress index method, this keyword allows the user to select a type of cutbased pseudo free energy profile to be computed (reference). The target node for this profile can be chosen with keyword INISYNSNAP, which is snapshotbased and includes the selection of the largest cluster(s) (by sampling weight). Depending on the assumed direction of time (→ TMAT_MD) and the choice below more than 1 profile may be generated (separate output files). Currently, there are 4 fully supported options producing output (some hidden options exist, which will not be disabled): No cutbased free energy profiles are computed.
 The meanfirst passage times to the reference node in the Markov state model approximation are computed iteratively. After sorting all clusters according to these mean first passage times, partitions can be defined as a function of a threshold time. The cutbased pseudo free energy profile associates each threshold time with the total weight of edges (number of transitions) crossing this threshold along the trajectory, and plots the normalized weight in logarithmic fashion (see elsewhere for details). Because the iterative algorithm may be slow to converge, its maximum execution time can be controlled by keyword MAXTIME_ITERS. In this mode, a separate profile for each strongly connected component of a nonergodic graph can be produced if INISYNSNAP is 1. These profiles are referenced to the respective largest clusters (by sampling weight) in each component. The iterative algorithm used here benefits from CAMPARI's shared memory (OpenMP) parallelization (similar to network equilibration). Parallelization is such that the results should always be identical to serial execution.
 The (+) committor probabilities for a set of clusters defining a target set and an unfolded set are used to sort all clusters (folded and unfolded set members have values of 1.0 and 0.0 by definition, and clusters are sorted in decreasing order). This requires having computed those committor probabilities separately (CMSMCFEP can not be used to enable this calculation) with the help of keyword DOPFOLD. Because the committor probabilities are only available for the reference component the sets reside in, INISYNSNAP has no direct influence (in particular, option 1 is not available). See elsewhere for details on the corresponding output file(s).
 This is the same as the previous option only that the () committor probabilities are used instead, which additionally relies on keyword DOPFOLD_MINUS. See elsewhere for details on the corresponding output file(s).
 This is the combination of the prior 2 options, i.e., separate profiles based on both (+) and () committor probabilities are produced. The same dependencies and restrictions apply.
SYNTRAJ_MD
If any type of structural clustering is performed (→ CCOLLECT), this keyword specifies whether random walks on a transition network should be performed (and recorded), and how the initial and termination conditions are chosen. Synthetic trajectories are always confined within the strongly connected component that contains the cluster that hosts the initial reference snapshot (INISYNSNAP). The values allowed for the present keyword and their associated outcomes are: No synthetic trajectories are generated (default).
 Random walks are initiated in the cluster that contains the reference initial snapshot (INISYNSNAP) and are terminated either when the walker hits the cluster that contains the target snapshot (ENDSYNSNAP) or when the target number of steps per trajectory is exceeded (NSYNSNAPS). In the latter scenario, the unsuccessful trajectory is not written to file (MSM_SYN_TRAJ_xxxxx_yyy.frames). The target number of trajectories to be generated is set by the keyword NSYNTRAJS. Since trajectories may fail to hit the target end node, it is possible that the number of successful trajectories is less than NSYNTRAJS. If trajectories fail repeatedly, it is advisable to increase the number of steps per trajectory (NSYNSNAPS). If the fraction of productive trajectories is small, their lengths will obviously be biased systematically toward shorter lengths.
 Synthetic trajectories are started at the cluster that hosts the reference snapshot (INISYNSNAP) and propagated for NSYNSNAPS steps, regardless where they end. Therefore, the generation of a trajectory is always successful, viz., NSYNTRAJS trajectories are always written to file (MSM_SYN_TRAJ_xxxxx_yyy.frames).
 Each trajectory starts in a random cluster. The probability that a cluster is the starting one reflects its statistical weight, which is proportional to the raw population of the cluster if no equilibration of the transition network is performed or to the steady state of the Markov State Model otherwise (see CREWEIGHT and EIGVAL_MD). Each trajectory is propagated for NSYNSNAPS steps and written to file (MSM_SYN_TRAJ_xxxxx_yyy.frames). Keyword INISYNSNAP is used solely to identify the strongly connected component where the random walk takes place.
ENDSYNSNAP
If any type of structural clustering is performed (→ CCOLLECT), and the generation of synthetic trajectories with a target end point has been requested, this keyword lets the user select the target snapshot for these random walks. It works analogously to keyword INISYNSNAP, and it is up to the user to ensure that the reference end target node selected this way differs from the starting one and belongs to the same strongly connected component. If these conditions are not met, the relevant analyses will be skipped. In case committor probabilities are requested but no input clusters are provided (CLUUNFOLDFILE) for set A (see DOPFOLD for definitions), CAMPARI reverts to use the cluster selected here as the only one forming set A. If this keyword is not specified but needed, CAMPARI will use the last stored snapshot from the trajectory as reference end snapshot (default).Because, like INISYNSNAP, the keyword uses a snapshot index, it is important to point out that the value of this keyword must always be specified in absolute terms of the input data, i.e., generally speaking, no corrections must be applied in case CCOLLECT is greater than 1, a sequential access file with userselected input frames is specified, or frames are discarded at the beginning. CAMPARI takes care of this automatically. The use of a frames file requires particular care. If the file accesses the trajectory in random access ("as is") mode, the snapshot index is assumed to refer to the line number in the frames file rather than the index of the frame on that line. This is a general change of interpretation inherent to FMCSC_FRAMESFILE with certain input file formats. Conversely, if the file accesses the trajectory in strictly sequential mode, step numbers continue to refer to the original trajectory.
NSYNTRAJS
If any type of structural clustering is performed (→ CCOLLECT), and the generation of synthetic trajectories has been requested, this keyword specifies the target number of synthetic trajectories (random walks) to be generated. The default value is 10 and the upper limit is 10^{4}. This number is guaranteed to be the actual number of generated trajectories only if SYNTRAJ_MD is not set to 1. For trajectories with requested start and end points (mode 1), this keyword specifies the total number of attempts instead. The fraction and average length of productive trajectories will be reported to log output, but only successful trajectories are written to file.NSYNSNAPS
If any type of structural clustering is performed (→ CCOLLECT), and the generation of synthetic trajectories has been requested, this keyword sets the (maximum) number of steps per synthetic trajectory (random walk). All trajectories will have this length unless keyword SYNTRAJ_MD is set to 1. Note that in mode 1 where both a starting and an end point are used, too small a value for NSYNSNAPS will obviously bias the distribution of productive (reactive) trajectories to short ones.SYNTRAJOUT
If any type of structural clustering is performed (→ CCOLLECT), and the generation of synthetic trajectories has been requested, this keyword sets the output frequency for the synthetic trajectories (random walks) themselves. These files are documented elsewhere, but they ultimately contain lists of integers, which can get large (in total file size) very quickly. This is why this keyword allows the user to print only every SYNTRAJOUT^{th} step of each random walk to the corresponding output file. If SYNTRAJ_MD is 1, the keyword can also be set to 0, in which case all output is suppressed (using a very large value instead causes the individual files and just their respective header lines to be written).DOPFOLD
If any type of structural clustering is performed (→ CCOLLECT), and CAMPARI was compiled and liked with HSL support (→ installation instructions), this simple logical (1 is true) selects whether or not to compute (+) committor probabilities (or p_{folds}^{+} values) for the clusters that belong to the reference component of the graph inferred from the clustering. The underlying transition matrix can be modified in various ways (see TMAT_MD for details), which may weaken or fracture the graph into multiple strongly connected components.A set of clusters to form the target set B (CLUFOLDFILE) and a set of clusters to form an alternative set A (CLUUNFOLDFILE) are required and must belong to the same component. If these input files are missing choices are deferred to keywords INISYNSNAP and ENDSYNSNAP. The remaining clusters in the same component constitute the intermediate set and for them the probability that a random walker started in that intermediate state reaches any cluster in the target set B before it reaches any cluster in the other set A can be calculated. By definition (as boundary condition), all the nodes that belong to the target set B have a p_{fold}^{+} value equal to 1, while all the clusters that belong to set A have a p_{fold}^{+} value equal to 0.
If DOPFOLD is set to 1, with the aid of HSLprovided (double precision) external routines (HSL_MA48), the solution is the solution of a linear system of equations for the clusters i in the intermediate set I (Noé et al.):
p_{foldi}^{+} + Σ_{j∈I} T_{ij}p_{foldj}^{+} =  Σ_{j∈B}T_{ij}
Here, T is the underlying transition matrix (→ TMAT_MD). Once solved, the committor probabilities are written to a specific output file (PFOLD_PLUS_xxx.dat). The computed p_{fold}^{+} values are obviously sensitive to any modifications to the transition matrix. The time direction matters unless detailed balance holds (→ CADDLINKMODE), in which case the p_{fold}^{+} values become equivalent to 1.0p_{fold}^{} computed via keyword DOPFOLD_MINUS. In case detailed balance does not hold, the values for the () committors (p_{folds}^{}) must be computed separately.
One of the reasons to compute committor probabilities may be probability flux analyses and decompositions of pathways according to transition path theory (Noé et al., Berezhkovskii et al.), and committors are indeed fundamental to these analyses.
Because the HSL routines are not (currently) threadsparallel, this functionality does not benefit from CAMPARI's shared memory (OpenMP) parallelization, which is a limitation.
DOPFOLD_MINUS
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and liked with HSL support (→ installation instructions), and (+) committor probabilities have been computed, this simple logical keyword (1 is true) lets the user request the computation of () committor probabilities or p_{folds}^{} values as well. These committors are defined as the probability that a random walk that reaches an intermediate state i was last seen in the alternative set (A) rather than in the target set (B). Details on nomenclature and background are found in the description of keyword DOPFOLD. Their computation requires the solution of a linear system similar to the one specified for the (+) committors (p_{fold}^{+}):p_{foldi}^{} + Σ_{j∈I} X̄_{ij}p_{foldj}^{} =  Σ_{j∈A}X̄_{ij}
Here, X̄ is defined as T̄_{ij} = (π_{j}/π_{i}) T_{ji}, π_{i} is the steady state probability of node i of the original transition matrix T. The underlying transition matrix is affected by a number of keywords (see TMAT_MD for details). The resultant () committor probabilities are written to a specific output file (PFOLD_MINUS_xxx.dat). If microscopic reversibility (detailed imbalance) holds (→ CADDLINKMODE), the () committor probabilities can simply be computed as 1.0p_{fold}^{+}, and there is no need to use this DOPFOLD_MINUS. If it does not hold, X̄ is not simply the backward time transition matrix (→ TMAT_MD). Therefore, if DOPFOLD_MINUS is used, CAMPARI always computes T̄ from the definition above, which may be numerically problematic in case state probabilities differ by some orders of magnitude. Since we require the steady state or first eigenvector, CAMPARI will check whether an acceptable solution is already available from the use of keywords EIGVAL_MD or CREWEIGHT (in this order). If not, CAMPARI attempts to solve the steady state using the HSL library EB13 similar to what would be done if EIGVAL_MD is 2, DOEIGVECT is 1, and NEIGV is 1. For this solution, keywords NEIGBLOCKS and NEIGSTEPS are not respected (values of 3 and 3 are used instead), but the settings for EIGTOL and NEIGRST remain relevant. If the solution is successful and acceptable, it will also be reported in output file STRUCT_CLUSTERING.graphml.
PFOLDREPORT
If this simple logical is enabled (1 is true) several files that inform on the linear system(s) relevant to the achievement of committor probabilities (DOPFOLD, DOPFOLD_MINUS) may be written. Those files are: fold_clus.out, unfold_clus.out, mat_pfold_XXX_YYY.dat, rhs_pfold_XXX_YYY.dat, and possibly tmat_pfold_XXX_minus.dat and ss_pfold_XXX.dat. Files fold_clus.out and unfold_clus.out simply replicate the clusters that make up set B and set A (see also DOPFOLD for definitions) respectively, while files mat_pfold_XXX_YYY.dat contain the coefficients of the linear system(s) solved to achieve committors, viz. T  I for DOPFOLD and/or T̄  I for DOPFOLD_MINUS. These last two outputs obviously consider only the entries relevant to the edges between the intermediate states and the self transitions, similarly to the rhs_pfold_XXX_YYY.dat files, where the right hand side of the linear systems (DOPFOLD and/or DOPFOLD_MINUS) are written. Files tmat_pfold_XXX_minus.dat contain the entries of T̄ and are written only if DOPFOLD_MINUS is enabled, while files ss_pfold_XXX.dat contain the steady state that was used to achieve T̄. Those are written only in case the steady state was not available from other options (for details see DOPFOLD_MINUS).CLUFOLDFILE
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and liked with HSL support (→ installation instructions), and the computation of at least (+) committor probabilities has been requested, this keyword let the user specify the path and name of the file that stores the reference snapshots for the selection of the clusters forming the target set B for all committor probabilities. Details on the format and interpretation of the input are provided elsewhere.In case this file is not provided or not found, CAMPARI will revert to the cluster defined by keyword INISYNSNAP as the only representative of set B. In case keyword INISYNSNAP is not specified either, CAMPARI will use the cluster that contains the largest number of snapshots. Because the sets A and B have to reside in the same strongly connected component of the clustering graph, it may not be possible to know reasonable values a priori. It can therefore be helpful to use this analysis in conjunction with a previously generated clusteringderived graph. The graph can be analyzed to identify suitable values, and the committor probability analysis can be performed in a second step by reading the coarsegrained trajectory in conjunction with any link structure modifications back in by means of filebased clustering (modes 67 for CMODE).
CLUUNFOLDFILE
If any type of structural clustering is performed (→ CCOLLECT), CAMPARI was compiled and liked with HSL support (→ installation instructions), and the computation of at least (+) committor probabilities has been requested, this keyword let the user specify the path and name of the file that stores the reference snapshots for the selection of the clusters forming the alternative set A for all committor probabilities. Details on the format and interpretation of the input are provided elsewhere.In case this file is not provided or not found, CAMPARI will revert to the cluster defined by keyword ENDSYNSNAP as the only representative of set A. In case keyword ENDSYNSNAP is not specified either, CAMPARI will use the cluster that contains the last snapshot of the trajectory, which is not generally meaningful. Because the sets A and B have to reside in the same strongly connected component of the clustering graph, it may not be possible to know reasonable values a priori. It can therefore be helpful to use this analysis in conjunction with a previously generated clusteringderived graph. The graph can be analyzed to identify suitable values, and the committor probability analysis can be performed in a second step by reading the coarsegrained trajectory in conjunction with any link structure modifications back in by means of filebased clustering (modes 67 for CMODE).
NetCDF Data Mining:
(back to top)