CAMPARI Parameter Files

Overview

Here we outline the strategy used in creating, reading and using parameter files for CAMPARI:

1) Biotypes
2) Lennard-Jones (atom) types
3) Charge types
4) Free energies of solvation
5) Bond length potential types
6) Angle bending potential types
7) Dihedral angle (torsional) potential types
8) CMAP (grid correction map) potential types
9) Bonded types

(back to top)

Everything starts with the definition of biotypes. A biotype may be thought of as the parsing of atoms into categories which preserve the maximum amount of chemical information. This implies that the only atoms which should ever be pooled into the same biotype are those which are chemically indistinguishable, i.e., are at most prochiral. Naturally this would lead to an inordinate number of biotypes (and consequently parameter assignments) since - for example - even an alanine C_α in the sequence CMATS is different from one in the sequence YMATS. This leads to a compromise in which the grouping mostly follows a residue-level paradigm and in which biotypes are re-used for enantiomers and diastereomers for instance. Since there are hard-coded references to biotypes, these are cast in stone. It is easy to add biotypes but impossible to renumber existing biotypes (without having to apply major changes to the code). This may lead to biotypes not being properly grouped at times, but this should represent only a minor inconvenience. Changes to the code involving a re-assignment of biotypes have to be realized in all parameter files as well.

Each biotype should have an associated LJ-type, an associated charge type, and an associated bonded type. Since there are comparably few LJ-types, general atom information (mass, atomic number, valence) is included in these. Charge types, conversely, contain information only about atomic point charges. Bonded types are assignments which do not contain any information themselves but represent an additional level of grouping of biotypes into chemically similar ones.

The proper specification of a biotype is henceforth:
-----------------------------------------------------------------------
biotype xxx abc "arbitrary string" i j k
-----------------------------------------------------------------------
xxx is the number of the biotype, and i, j, and k are the LJ-, the charge-, and the bonded types, respectively. The string should always be descriptive (indicate the residue and atom type within) and is read and stored (notice, however, that the storage string is of fixed length, and that an oversize string might be truncated). Finally, abc is the biotype indicator, a 3-element string of alphanumeric characters. This is the unique identifier when using (i.e., writing and reading) pdb-formats (although additional numbering is introduced there for chemically identical atoms).

The right start when building a parameter file from scratch is to take all the biotype definitions and initialize all the LJ-, charge, and bonded types to 0, as this will indicate to the program that the parameter file is basically fine, but that all these residues will not be supported. This will allow you to start supporting the sequence database by adding the appropriate parameters when they are actually needed. Because there is a hard-coded match between the residue database CAMPARI supports natively and the biotypes, it is not permissible to edit the structure of the existing biotype section, i.e., to renumber (even swap), or delete biotype entries.

In simulations or analysis runs with unsupported entities, it is often convenient to add parameters through the parameter file itself. For this purpose, CAMPARI offers a patch facility for biotypes that can also be used to change biotypes for any other atom. Note that this will (re)assign all atom-specific information encoded in biotypes. To allow parameter support for unsupported entities, biotypes may be added to the parameter file. Continuous numbering is enforced.

2) Lennard-Jones (atom) types:

(back to top)

As mentioned above, LJ-types encapsulate basic atomic parameters like weight or valence which are used in various places in the code. Their other function is to define the excluded volume and dispersive interactions between two non-bonded atoms i, and j. The inverse power potential is a suitable example but the dispersive term and the WCA potential employ the same set of parameters:

E_IPP = c_IPP·4.0ΣΣ_i,jε_ijf_1-4,ij·(σ_ij/r_ij)^t

To define basic parameters, the following line is used:
---------------------------------------------------------------------------------
atom xxx   abc   "arbitrary string"   z   mass v
---------------------------------------------------------------------------------
Here, xxx is the number of the atom type (continuous numbering is enforced), abc is a generic descriptor (usually the atomic symbol), the string should be used to describe the atom type (like "sp3 carbon"), z is the atomic number, mass is the atomic mass (in units of a.m.u or g/mol), and v is the valence (the number of unique atoms directly bonded to this atom type). Note that just like for biotypes, the string has to be within double quotes, and that it is again stored in a fixed-size array (truncation might occur). xxx is the number to be used when specifying LJ types in the biotype section. It is recommended to use the atomic symbol for the correct element as abc in case that residues/molecules not natively supported by CAMPARI are part of the system. It is possible to reassign atom types for individual atoms through the Lennard-Jones patch facility. Note that this does not reassign any of the other specifications (mass, valency, etc). In the special case of a small molecule screen, atom type assignment for the (unsupported) small molecules is handled by a specific map that is either determined automatically from this section of the parameter file (using valence and nucleus number information) or overridden by the user (see FMCSC_SYBYLLJMAP).

After defining an atom type in this way, there are two necessary lines to define the symmetric LJ interaction parameters for this atom type (i.e., the σ_ii and the ε_ii).
---------------------------------------
contact   i i σ_ii
interact   i   i   ε_ii
---------------------------------------
Here, i is the atom type, and σ_ii and ε_ii are the aforementioned LJ interaction parameters given as non-negative floating point numbers. Note that these lines are mandatory, i.e., it is impossible to define an atom type without defining its self-terms for the LJ potential.

By default, the program will construct pairwise LJ parameters for different types i and j from the symmetric ones through the specified combination rule (see FMCSC_EPSRULE and FMCSC_SIGRULE). However, in order to allow special cases to be handled, the user can overwrite this default behavior by adding lines to the parameter file:
---------------------------------------
contact   i j σ_ij
interact   i   j ε_ij
---------------------------------------
Here, the pairwise term for atom types i and j is specifically set to the given floating point numbers. This is trivially possible since the code pre-computes and stores all pairwise LJ parameters (whether constructed by the default rule, or by exception). Note that the code does no checks whatsoever on how meaningful these values are, so it is imperative to be careful when constructing these exceptions in the parameter file. It is not possible to alter the numerical values of the interaction parameters directly through a patch. A particular level of optimization is reserved for water-like molecules with more than atom, but only a single Lennard-Jones site. Here, CAMPARI may enable the use of specialized loops in dynamics calculations if it detects that the corresponding molecules are all at the end of the sequence input and conform to the correct Lennard-Jones interaction patterns. This information is provided in the calculation summary.

Force fields would not be force fields were it not for further exceptions. A particularly messy area is the treatment of non-bonded interactions for atoms separated by only three bonds (1-4) and the user may read up on this elsewhere. The parameter allows one to specify specific Lennard-Jones parameters used only in such 1-4-interactions:
---------------------------------------
contact_14   i j σ_ij,14
interact_14   i   j ε_ij,14
---------------------------------------
σ_ij,14 and ε_ij,14 will be used only if CAMPARI determines the two atoms i and j to be 1-4 relative to one another (see MODE_14). Note that additional but flat (universal) fudging of interactions may be applied. The user is advised to check the GROMOS parameter implementation to see an example of complicated LJ parameters.

As an additional note, the solvent-accessible volume (SAV), which is a central portion of CAMPARI's implicit solvation model (ABSINTH), relies on the σ_ii (see FMCSC_SC_IMPSOLV, FMCSC_SAVPATCHFILE, FMCSC_ASRPATCHFILE, etc. for details). Computing SAVs mandates the σ_ii to be representative of the actual geometric size of that solvation site since the extracted parameters are the atomic radii for the LJ types. CAMPARI rigorously enforces that all atomic radii be finite, which may necessitate the use of radius overrides (see below). Problems for the ABSINTH model may arise in two scenarios:

Sometimes sizeless particles are used (e.g., the σ_ii, and ε_ii for hydrogens in common water models are both zero). This implies a strange and most likely unwanted perturbation to the SAV calculation (since these atoms are still existing sites). In this case, one could define a small enough value for σ_ii but explicitly set the σ_ij and ε_ij for hydrogen bonding partners (H-O/N/S) to zero since these interactions are what the hydrogens are made sizeless for. The use of the radius keyword explained below is of course much cleaner.
Sometimes heavily bloated particles (σ_ii large) with minuscule ε_ii are used which give rise to an effective size which is much less than the apparent one (for example Åqvist's ion parameters). This is a difficult problem as it breaks the strict coupling between excluded volume size (what SAV needs to know) and LJ parameter size (what σ_ii specifies).

In order to fix these issues CAMPARI allows the user to specify rules to overwrite the default behavior, by which radii are extracted as 0.5·σ_ii:
---------------------------------------------------
radius i r_i
---------------------------------------------------
Obviously, r_i denotes the replacement radius (in Å) to be used in all radius-dependent analyses (related to the ABSINTH model, to solvation analysis, or spatial density techniques) for LJ atom type i. If the LJ report flag is turned on (see FMCSC_VDWREPORT), such overwritten values will be reported in the log-file. Note that only positive values are allowed, but that there are no constraints otherwise. It is generally not recommended to use these as free parameters, i.e., they should either agree with the LJ parameters, or should be set to a hard value from reference data (i.e., atomic (covalent) or ionic radii). Like most other parameters, these radii can also be altered through atom-specific patches by a dedicated patch facility.

3) Charge types:

(back to top)

Charge types do nothing but define an integer (the type number) and an associated charge (in units of e):
----------------------------------------------------------
charge xxx "arbitrary string" a
----------------------------------------------------------
Here, xxx is the number of the charge type, the string is analogous to previous cases, and a is the actual charge value (an arbitrary floating point number). Most of CAMPARI's current parameter files use a generic charge type for non-charged atoms. xxx is the number to be used when specifying charge types in the biotype section. Partial charges are the key parameters for the evaluation of Coulomb's law for non-bonded atoms:

E_POLAR = c_POLAR·ΣΣ_i,j [ (f_1-4,C,ij·q_iq_j) / (4.0πε₀·r_ij)]

Here, the q_i and q_j are the charges of atoms i and j, respectively, and those parameters are what is supplied through the parameter file. Note that no information except q_i is stored and used, which means that it has no negative side effects to reuse partial charge values for different biotypes.

Charges can be messy due to the strength and long range of the Coulomb interaction. The code ensures sanity of the parameters by i) checking for the system's net charge and ii) warning when a net-neutral but ampholytic macromolecule is simulated in the absence of any mobile counterions. Partial charges can generally be expected to group into net neutral groups and groups carrying an integer net charge. This can happen at different hierarchy levels and is relevant for two purposes, viz., for cutoffs in particular in periodic boundary conditions (PBC), and for short-range electrostatics:

When using cutoffs, it is a common trap to break up dipole groups in the cutoff treatment. In PBC this can lead to the generation of an artificial electric field, which ends up aligning the system. In CAMPARI, the sanity of cutoffs in PBC is ensured by treating cutoffs at the residue level, i.e., any residue in the sequence is always completely in- or excluded in/from an interaction to another residue. Hence, at the cutoff level the effective charge groups are given by whole residues, which means that there could be arbitrary charge distributions within a residue as long as the target net charge (usually zero) is met. There are special rules available for dealing with interactions beyond the cutoff distance, and these are documented for keywords FMCSC_LREL_MC and FMCSC_LREL_MD.
The point charge assumption becomes weak when describing short-range interactions between atoms that are topologically connected. It is common practice in biomolecular force fields to address this point by a combination of exclusion rules (e.g., 1-2 and 1-3 nonbonded interactions are not computed) and fudge factors to correct for the resultant errors. Errors arise because the parameterization becomes hypersensitive to these short-range terms due to the inverse dependence on distance. Because of the rigid geometry paradigm employed in torsional space simulations, CAMPARI may not compute non-bonded interactions over rigid or semi-rigid units (see FMCSC_INTERMODEL). Specifically for electrostatics, this may then further alter the details of the electrostatic interaction model (see FMCSC_ELECMODEL for details).

The documentation on FMCSC_ELECMODEL outlines the relevance of charge groups constructed within residues (most often, small groups that are net neutral). While there is some control over the partitioning, CAMPARI generally employs an automatic detection scheme for grouping partial charges, which works as follows:

In the first step, a list of targets is defined. These targets always include net neutral groups. If the residue carries a net charge or is otherwise known to carry charged functionalities, charge group targets are defined matching the one or more integer charges. These preliminary choices of targets are often hard-coded (based on knowledge of the supported residues and their default titration states) to prevent counterintuitive assignments by atoms being absorbed into net neutral groups. Residues with fractional net charges are generally assumed to come from terminal nucleic acid residues, and some effort is made to provide meaningful targets, but this is inherently problematic. This step can be overridden by the user with the help of a dedicated patch supplied via keyword FMCSC_NCPATCHFILE.
In the next step, starting generally with arbitrary sites (which are systematically scanned for each level), a hierarchical search is performed slowly extending the number of bonds by which atoms have to be connected to the "origin" in order to be part of the same dipole/charge group. Terminal atoms cannot be centers of charge groups and are always lumped into their parent atom (this is a mandatory constraint since V4). This attempts to ensure locality of the obtained parsing. The core groups/topologies considered range from individual (non-terminal) atoms via bonded pairs, triplets, etc., up to sextuples in different base topologies. Any group that matches one of the targets exactly (within numerical precision) is assigned, and the participating atoms are removed as candidates. These steps are repeated if keyword FMCSC_POLTOL is not zero for 10 or 11 increments of this tolerance between zero and the chosen value. In these steps, the requirement for an exact match is relaxed toward a match within the chosen tolerance. In simple cases (like most polypeptide residues), this should conclude the algorithm, and the next two steps are skipped.
If not all targets are reached or not all atoms can be assigned to well-formed groups and if keyword FMCSC_POLTOL is not zero, the algorithm enters a second phase prioritizing finding small groups matching the target charges with increasing leniency instead of prioritizing the exact match of the charge of the groups. First everything is reset. Then, the same algorithm is run as in the previous step, only that this time, the charge group tolerance is increased before the search space (in terms of group size) is expanded. In other words, the only change is that the order of candidate groups and tolerances evaluated is altered.
Any unassigned atoms are lumped into a single group. This single group will usually be ill-defined in terms of topology, i.e., it will be interrupted by atoms already assigned to other charge groups. This is checked explicitly, and an attempt is made to merge this group with existing candidate groups to recover a larger group that is topologically meaningful and has a well-defined net charge matching one of the targets. If this merge is successful, no warning is printed, otherwise a warning message is produced stating the properties of any "bad" charge groups.
The resultant parsing is translated into the corresponding data structures and output to DIPOLE_GROUPS.vmd if so requested.
The list of all charge groups is parsed. All groups with a nonzero net charge that exceed (in absolute value) the setting for keyword FMCSC_POLTOL are assigned as monopole groups, and the atoms closest to the center of charge are found and annotated. These are the sites used in the "reduced resolution" cutoff treatments (see FMCSC_LREL_MD and FMCSC_LREL_MC). This relies on the initial geometry, which is a caveat if charge groups are large and flexible. The tagging as a monopole can also be patched at the residue level.

Partial charges play a hugely important role to the parameterization of most force fields. Note that the grouping scheme above is not necessarily relevant to a simulation itself. They become relevant in certain cutoff treatments (see FMCSC_LREL_MD and FMCSC_LREL_MC), in group-consistent screening models for ABSINTH (see FMCSC_SCRMODEL), and for short-range exclusions when FMCSC_ELECMODEL is not 1. Like most other parameters, partial charges can be altered in atom-specific fashion by the charge patch facility. As mentioned before, charge groups and the status of groups as ionic can be modified by a separate patch facility and keyword FMCSC_POLTOL.

4) Free energies of solvation:

(back to top)

Currently, this section of the parameter file is relevant only if the ABSINTH implicit solvation model is in use, and it is recommended to read up on that first. Because the parsing of arbitrary molecules into the appropriate model compounds (for which reference free energies of solvation are available) is not straightforward and does not always match the other hierarchical levels the program employs (i.e., residues or molecules), the read-in and use of the free energies of solvation for the direct mean field interaction (DMFI) of the continuum model is not fully available or controlled through the parameter file. Instead, there are several related keywords (e.g., FMCSC_FOSMID or FMCSC_IMPDIEL). The parameter file merely contains the reference free energy of solvation values for various model compounds:
----------------------------
fos AAA a b c
----------------------------
Here, AAA is the unique identifier, and a is the free energy of solvation for the target model compound which AAA indicates. This is the only required value. Values b and c correspond to temperature-independent enthalpies and heat capacities of the same transfer process that the free energy (a) refers to. By adding values of this type to the parameter file, it is possible to use a temperature-dependent ABSINTH implicit solvation model. This choice is controlled by keyword FMCSC_FOSMODE. If there are no values supplied for b and c, they are automatically adjusted to yield a temperature-independent model (see below). This is the case for all parameter files shipped with CAMPARI by default. Note that the interpretation of the parameters as enthalpies and heat capacities matters only semantically. Technically, they can just as well be free fitting parameters to reference data on temperature-dependent free energies of solvation. Using the identifiers defined above, the formula is:

ΔG(T) = (a - b)T/T₀ + b + c[T[1 - ln(T/T₀)] - T₀]

The values a and b are assumed to be in kcal/mol, whereas those for a are assumed to be in cal mol^-1 K^-1. T₀ is the reference temperature that a is associated with (set by keyword FMCSC_FOSREFT). If b and c are not provided, they are taken as a and zero, respectively, which makes the free energies of solvation independent of temperature (as mentioned). If just c is not provided, it is taken to be zero. It is not possible to supply c, but not b. Note that the expression for ΔG(T) is additive in all three input values, which is relevant for model compound decompositions explained below.

Next, the various categories that AAA can belong to are discussed. For small molecule residues there are entries for which AAA is exactly the same as one would use in the sequence file. This is the simplest case and does not require any assumptions. For polypeptides, the model compound needs to be obtained through formal decomposition of the polymer, which entails two challenges, i) the identification of the proper compound, and ii) the proper determination of topology-derived maximum solvent-accessible volumes (see further below).

--- Proper model compounds and parameter dependencies:
First of all, the list of model compounds comprises terms like ALA for all peptide residues with a sidechain. The model compound is just the sidechain capped by an extra hydrogen atom (or two). This assumes that the linkage between sidechain CB and backbone CA is not eliminating important sites for solvation. Since the linkage is always aliphatic, this assumption is generally valid. Note that the code makes implicit assumptions about the relative magnitude of sidechain model compounds to one another. Several sidechains (like THR or TYR) are really composites, for which advantage is taken of the fact that a smaller model compound is the dominant solvation species. As an example, the threonine sidechain (ethanol) is really modelled as a composite of two groups: methanol and an extra methyl unit. It assigns the value for methanol (as supplied through the entry "fos SER x") to the hydroxyl unit, and the difference between the THR and SER entries to the methyl group. This means that the sum for the two sub-blocks will always yield the correct limiting value supplied for THR, but(!) also that the fine-parsing can be severely corrupted by supplying an inappropriate value for SER.

A list of coupled parameters is given:
    THR: relies on SER
    TYR and TYO (as well as small molecule PCR): rely on PHE
    MET: relies on NLE
    TRP: relies on the value for naphthalene (occurs for small molecule NAP)
    GLN: relies on ASN
    SEP: relies on NUC_PO4
    TPO: relies on NUC_PO4 and SEP
    PTR: relies on NUC_PO4 and PHE
    GLH: relies on ASH
    LYD, KAC, KM1, and KM2: rely on PRP

Crosslinked residues (see here and here) constitute a special case since a native amino acid sidechain is covalently linked to another one to constitute a new solvation group. For code-internal convenience, this new group will be split into two halves. The only currently supported crosslink is the disulfide linkage between two cysteine sidechains. The new model compound is dimethyl disulfide and the FOS for each -S-CH₃ moiety is expected to be supplied by fos LINK_CC x. In general, an asymmetric crosslink would require two keywords LINK_XY and LINK_YX, where X and Y are the one-letter codes of the corresponding amino acid sidechains.
Furthermore, there is a set of entries for the backbone of polypeptides. Within the chain, the peptide unit is taken as the model compound (i.e., CONH spanning residues rs and rs+1). This parameter is given by PEP_BB. Note that the same assumption stated for the model compounds holds for this particular parsing. By choosing the CA as the linkage point it is only aliphatic hydrogens which need to be reconstructed to obtain back the full model compound, which is generally NMA. Exceptions exist for proline (rs+1), NH2 (rs+1), and FOR (rs): PEP_PRO_BB, PEP_BB_NH2, PEP_BB_FOR, as well as for formylproline: PEP_PRO_BB_FOR. Termini are supported similarly and here the model compound is generally methylamine or acetate: PEP_CNT and PEP_CCT for charged termini, and PEP_UNT and PEP_UCT for uncharged termini (currently not supported). Again, exceptions occur with proline (PEP_PRO_CNT and PEP_PRO_UNT).

Polynucleotides, which can be thought of as being composed of a sugar, a phosphate, and a base, pose a much greater challenge. One reason is that the natural break points for a decomposition procedure are all bonds (ester, glycosyl) obtained under loss of important solvation units (OH). This means that reconstructing model compounds corresponding to the free H₂PO₄^- or the free ribose would significantly overestimate the magnitude of the (favorable) solvation free energy for the corresponding unit in the polymer given that free hydroxyls contribute very substantially to free energies of solvation. Hence, our approach needs to be modified to reconstruct compounds such as dimethylphosphate or N(1)-methyluracil, for which no measurements or sometimes even calculations exist. This is particularly cumbersome for the sugar, for which 12 distinct compounds can be formulated. The polynucleotide implementation is therefore INCOMPLETE(!) pending the availability of the proper parameters.

For the five currently supported bases (A, C, T, U, G), which are considered to be the nucleic acid sidechains for the purpose of model compounds, we choose the N1- or N9-methylated forms. These heteroaromatic compounds are more complicated than peptide sidechains, and from comparative analysis an implementation was chosen which tries to isolate subgroups which have a large impact on the net favorable free energy of solvation (as before, the sum is always assured to be correct, but dependencies matter):
    XXU (and URA): relies on the value for benzene (used for small molecule BEN)
    XXT (and THY): relies on the value for benzene (see above) and on XXU
    XXC (and CYT): relies on the value for benzene (see above) and on XXU
    XXA (and ADE): relies on the value for purine (used for small molecule PUR)
    XXG (and GUA): relies on the value for purine (see above) and on XXA

For the phosphate unit there are only two possibilities: the monoester (terminal) and the diester (non-terminal): NUC_HPO4 and NUC_PO4. Note that NUC_HPO4 relies on NUC_PO4. For the sugars, the implementation is oversimplified at the current stage due to the otherwise overwhelming complexity outlined above. Basically, we assume 2-methyltetrahydrofuran as the base compound, which it more or less is for deoxyribonucleotides in the middle of a chain. For terminal residues, for free sugars, and for ribonucleotides there is always at least one extra free hydroxyl-unit. The maximum number of hydroxyls in any of the currently supported residues is three. As a simple solution, model compounds with the same number of free hydroxyls but different topology are treated identically. The difference in the free energy of solvation to the next lower compound is always assigned to the extra hydroxyl. This means that these parameters all depend back on the previous ones:
    NUC_SUG_0: no hydroxyls
    NUC_SUG_1: one free hydroxyl; relies on NUC_SUG_0
    NUC_SUG_2: two free hydroxyls; relies on NUC_SUG_0 and NUC_SUG_1
    NUC_SUG_3: three free hydroxyls; relies on NUC_SUG_0, NUC_SUG_1, and NUC_SUG_2

Note that the small molecules adenine (ADE), cytosine (CYT), etc. correspond to the free, heteroaromatic bases, which implies that there is a polar hydrogen at that nitrogen atom, which is normally involved in the glycosylic linkage. If supported in ABSINTH, this means that the values for the polynucleotide sidechain term (e.g., XPA) should not be the same as those for the free base (ADE). As mentioned before, this is different from polypeptides, which involve a simple aliphatic linkage. This difference creates a peculiarity for the small molecule purine. The value for PUR is important for setting terms XXA and XXG, and should consequently refer to the methylated derivative. To recover simulations for the small molecule purine, CAMPARI will automatically add the difference between XPA and ADE, and assign it to the polar NH unit. This means that the value for PUR in the parameter file is generally not the resultant total free energy of solvation for free (9H)-purine. Instead, this number is computed as the values PUR+ADE-XPA.

--- Topology-derived maximum solvent-accessible volume fractions:
The function setup_savol() in the source-file sav_auxil.f90 is responsible for (almost) nothing but determining what the maximum solvent-accessible volume would be in the context of the model compound for a unit which is really part of a polymer. This is necessary in order to make sure that there is a clean limiting case for fully solvated units, which in the context of the polymer should never be achieved, but in the context of the model compound should be achieved rigorously. The procedure re-builds the missing parts of the model compound (usually hydrogens) and ignores the rest of the context. Note that this is relevant even though the actual solvation group might not contain all atoms which comprise the model compound. The effect is through adjustment of the relevant atoms' maximum solvation state. The fact that CAMPARI recomputes these values for every run relying on initial (built-in) geometries may not always be satisfactory. Therefore, the maximum solvent-accessible volume fractions can be patched in atom-specific manner. The same holds for pairwise overlap reduction factors.

The code may check (and eventually warn) that certain free energies of solvation, which can be specified separately, but effectively correspond to the same model compound, are in fact identical (examples are: NMA/PEP_BB, DMA/PEP_PROBB, NVA/PRO, SER/MOH, ...). Keep in mind that these free energies of solvation are all assumed to be transfer free energies from the gas phase into dilute aqueous solution (and not transfer free energies of the native state under standard conditions into dilute aqueous solution, which are equivalent to solubilities in water).

The mapping of solvation groups to atoms is not part of the parameter file. This implies that the parameter file can be used to modify a simulation of, e.g., polyalanine by adjusting the reference free energy of solvation for all alanine sidechains, but that it cannot be used to alter the fact that the solvation group encompasses the CH₃ group with identical weights for all atoms, or to modify the reference free energy of solvation for an individual alanine sidechain. For the latter, an extensive patch facility is available. The default mappings can be visualized with output file FOS_groups.vmd.

5) Bond length potential types:

(back to top)

If the intended simulation is to use Cartesian coordinates exclusively, molecules with fixed topology require the definition of various restraints to keep the bond lengths, angles, and certain dihedral angles in the chemically proper range. Bond length potential types are nothing but a list of integers with an associated unique potential specification, but without any declaration of the (bonded) atom types the particular potential applies to.
-----------------------------------------
bond xxx i a_kl b_kl c_kl
-----------------------------------------
The integer identifier for this particular bond length potential is given by xxx. The parameter file reader requires these to be continuously numbered from unity upwards. The floating point numbers a, b, and eventually c are the parameters for this potential and their interpretation depends on the choice of type of potential as given by the integer i. Writing d for a particular bond distance, right now CAMPARI supports as choices for i:

i = 1: The potential is harmonic and for atoms k and l separated by distance r_kl given by:

E_{BONDED_B} = c_{BONDED_B}·a_kl·(r_kl - b_kl)²

As for units, b_kl is in Å and a_kl must be given in kcal·mol^-1A^-2. c_{BONDED_B} is the outside scaling factor specified by keyword FMCSC_SC_BONDED_B.

i = 2: The potential is a Morse potential and for atoms k and l separated by distance r_kl given by:

E_{BONDED_B} = c_{BONDED_B}·c_kl·(1.0 - e^{-a_kl·(r_kl - b_kl)})²

As for units, b_kl is in Å, a_kl is in Å^-1, and c is in kcal/mol. Note that the treatment of non-bonded interactions does not respond to the anharmonicity at all, i.e., dissociated particles will not "feel each other" beyond the effect of E_{BONDED_B}.

i = 3: The potential is quartic and for atoms k and l separated by distance r_kl given by:

E_{BONDED_B} = 0.25·c_{BONDED_B}·a_kl·(r_kl² - b_kl²)²

As for units, b_kl is in Å and a_kl must be given in kcal·mol^-1A^-2. This potential is supported to conform with the GROMACS/GROMOS standard. Note that it is the only one scaling the force constant by a constant pre-factor.

For a typical force field, this list would be relatively short as bond distances are often derived from chemical consideration and are hence considered to be identical for various linkages. As an example, there is typically only one potential form to describe aliphatic carbon-carbon linkages.

6) Bond angle potential types:

(back to top)

This is analogous to bond length potential types. It involves the specification of unique potentials to describe angle bending.
-----------------------------------------------------------
angle xxx i a_jkl b_jkl c_jkl d_jkl
-----------------------------------------------------------
The integer identifier for this particular bond angle potential is given by xxx. The parameter file reader requires these to be continuously numbered from unity upwards. The floating point numbers a_jkl, b_jkl, c_jkl and eventually d_jkl are the parameters for this potential and their interpretation depends on the choice of type of potential as given by the integer i. Writing α for the bond angle created by atoms j, k, and l, right now CAMPARI supports as choices for i:

i = 1: The potential is harmonic and given by:

E_{BONDED_A} = c_{BONDED_A}·a_jkl·(α - b_jkl)²

As for units, b_jkl is in degrees and a_jkl must be given in kcal·mol^-1rad^-2. c_{BONDED_A} is the outside scaling factor specified by keyword FMCSC_SC_BONDED_A. This is merely a matter of convenience, as the force constants would be very small in units of kcal·mol^-1deg^-2.

i = 2: The potential is of the so-called "Urey-Bradley" type in which the distance between the two outer atoms making up the bond angle is restrained additionally. If r_jl denotes their distance, then the potential is given by:

E_{BONDED_A} = c_{BONDED_A}·[ a_jkl·(α - b_jkl)² + c_jkl·(r_jl - d_jkl)² ]

As for units, b_jkl is in degrees, a_jkl must be given in kcal·mol^-1rad^-2, d is in Å, and c is in kcal·mol^-1Å^-2. Note that this potential is a bit redundant (a combination of two harmonic terms), but since it requires and implements 1-3-distance restraints, it is supported explicitly. Urey-Bradley terms are common in CHARMM.

i = 3: The potential is a cosine-based harmonic. It is supported to conform with the GROMACS/GROMOS standard but does not take advantage of the minuscule performance benefit it could offer. With definitions as before, the potential is given by:

E_{BONDED_A} = 0.5·c_{BONDED_A}·a_jkl·(cos(α) - cos(b_jkl))²

As for units, b_jkl is in degrees and a_jkl must be given in kcal·mol^-1. The GROMACS developers suggest the following relationship between an angular harmonic potential and the cosine-based (subscripted c) variant: 2·a_jkl,a = sin²(b_jkl,c)·a_jkl,c. Note that this potential is the only one employing a 1/2 pre-factor.

Note that bond angles are technically confined to the interval from 0 to 180°. Potentials written such that the value of the angle is required run the risk of creating numerical instabilities if the fast and simple inverse cosine of the dot product of the normed vectors is used. CAMPARI therefore employs a half-angle formula using the inverse sine instead which circumvents this problem.

7) Dihedral angle (torsional) potential types:

(back to top)

This is analogous to bond length or angle potential types. It involves the specification of unique potentials to describe rotational barriers around axes, i.e. dihedral angles. Note, however, that these potentials can also be accessed by pseudo-dihedral angles ("impropers"), which describe the stereochemistry around a trigonal or tetrahedral center.
--------------------------------------------------------------------------------------------------------
torsion xxx i a_jklm b_jklm c_jklm d_jklm e_jklm f_jklm g_jklm
--------------------------------------------------------------------------------------------------------
The integer identifier for this particular torsional potential is given by xxx. The parameter file reader requires these to be continuously numbered from unity upwards. The floating point numbers a-f_jklm are not always all needed, but specify the parameters for this potential. As before, their interpretation depends on the choice of type of potential as given by the integer i. Writing φ for the dihedral formed by atoms j, k, l, and m (in this order), right now CAMPARI supports as choices for i:

i = 1: The potential is of the so-called "Ryckaert-Bellemans" type, i.e., it is a power expansion of cosines. It is given by:

E_{BONDED_T/I} = c_{BONDED_T/I}· [ a_jklm + b_jklmcosφ + c_jklmcos²φ + d_jklmcos³φ + e_jklmcos⁴φ + f_jklmcos⁵φ + g_jklmcos⁶φ ]

As for units, all parameters are given in kcal/mol. c_{BONDED_T/I} is the outside scaling factor for either the proper dihedral or the improper dihedral term depending on what angle atoms j-m form (see FMCSC_SC_BONDED_T and FMCSC_SC_BONDED_I). Note that the parameters of almost all torsional potentials involving Fourier terms can be converted to the Ryckaert-Bellemans form using standard relationships for trigonometric functions. This assumes φ to be defined over the interval from -180° to 180°, i.e., 0° is cis, and -180°/180° is trans.

i = 2: The potential is harmonic and the potential is given by:

E_{BONDED_T/I} = 0.5·c_{BONDED_T/I}·a_jklm·(φ - b_jklm)²

As for units, b_jklm is in degrees, and a_jklm must be given in kcal/mol^-1·rad^-2.

i = 3: The potential is of the "Ryckaert-Bellemans" type. This choice assumes the potential to be written over dihedrals defined over the interval from 0° to 360°. Conversion to the internal convention is achieved by multiplying b, d, and f with -1.

Note that dihedral angles in CAMPARI are technically confined to the interval from -180° to 180°. When using potentials written directly over the dihedral angle such as harmonic restraints, the necessary wrap-around creates a discontinuity in both gradients and angular distances. While the latter can be corrected for easily, the former is a more serious problem. Blondel and Karplus designed a strategy around all singularities, and their method is employed in CAMPARI. If only the value of the dihedral angle is needed, a strategy identical to that for bond angles is used to avoid numerical instabilities.

8) CMAP (grid correction map) potential types:

(back to top)

In an endless effort to fit molecular mechanics force field parameters to new or additional reference data sets, the developers of the CHARMM force field introduced a new type of correction by allowing a potential grid to be read in and - using bicubic splines - to be interpolated to yield a correction potential E_CMAP as a function of two torsional angles, specifically the φ- and ψ-angles in polypeptides.
It should be noted upfront that such a correction essentially represents an arbitrary number of free parameters that will in general be difficult to determine. On the plus side, CMAPs allow arbitrarily shaped corrections to the local energy landscape of biomolecules that are not tied to the stipulated form of - for instance - Ryckaert-Bellemans type dihedral potentials (see above). CAMPARI supports CMAP corrections as follows:
---------------------------------------------------
cmap xxx i nb <input.dat>
---------------------------------------------------
Here, xxx is the (continuously) numbered instance of unique CMAP potential surfaces, i is the type (see below) and <input.dat> is the filename for the surface data itself relative to a directory specified by keyword FMCSC_CMAPDIR. The use of a directory keyword allows the parameter file itself to remain unmodified.
The surface data are simply a matrix of nb x nb floating point numbers specifying the energies in kcal/mol at the grid points. The maps have to be symmetric in size, and are always assumed to cover the entire available space (-180° to 180° on both axes) and to possess even bin-spacing. This is a technical requirement which simplifies the application of any interpolation scheme, in particular since the periodic boundary conditions set additional constraints for the splines. The first torsional angle (see below) is assumed to vary with row number, the second with column number. An outside scaling factor, c_{BONDED_M}, is applied to all CMAP corrections.

The integer i determines the type of splines to be used and can assume values from 1-4:

Non-interpolating cardinal B-splines are used which assume that the tabulated values are values at the centers of each square bin. This type of spline is only appropriate for finely tabulated data. An interpolation order of 3 is enough to obtain a continuously differentiable surface (set by FMCSC_CMAPORDER).
This option is identical to the previous one only that the tabulated values are assumed to be the values at the lower left corners of each square bin.
Bicubic interpolating splines are used which assume that the tabulated values are values at the centers of each square bin. Bicubic splines are constructed by generating estimates of the two first and the cross-derivative at each grid-point. Those estimates are themselves spline-based. This ensures that the first derivatives of the resultant interpolated surface are smooth and second derivatives hence continuous (but not smooth).
This option is identical to the previous one only that the tabulated values are assumed to be the values at the lower left corners of each square bin. This is the reference implementation as found in CHARMM starting with version 31.

In terms of assigning CMAP potentials to pairs of torsional angles, CAMPARI currently uses a somewhat rigid scheme. The basic rule is that all atoms have to be part of the backbone, possess at least two neighbor atoms along the main chain, and that three of the four atoms have to be shared by the two dihedral angles. This includes combinations such as φ/ψ-angles, nuc_bb₂/nuc_bb₃ or ψ/ω-angles but excludes combinations such as nuc_bb₁/nuc_bb₄-angles or χ₂/χ₃-angles (see sequence input).

9) Bonded types:

(back to top)

Bonded types are another subtype each biotype is assigned, similar to charge- and LJ-types. They describe what a biotype is chemically more so than any other subtype, as chemical considerations are often the primary determinant in designed parameters for bonded interactions. Bonded types are not listed explicitly, and have no unique type-wise parameters associated with them. They merely represent a parsing of the supported biotypes into classes of chemical equivalence. The file "bondedlst.prm" in the params/-directory contains an example list at the very end for illustrative purposes alone. Bonded types are used exclusively to simplify assigning bonded potential terms (bond length, angle, (improper) torsional and CMAP restraints) to biotypes. If such an assignment were done with biotypes directly, the number of entries in the lists explained below would become unfeasibly large very quickly. On the other hand, they have to be more discriminating than LJ-types and generally follow different considerations than charge types, which makes the definition of a separate type necessary. Note that the parameter reader still checks whether all listed bonded types in the biotype section form a continuously numbered list. Zero can be used to indicate that a specific biotype can never be part of any bonded interactions (such as a monoatomic ion).

First of all, the parameter file may contain entries to assign a specific bond length potential to a pair of bonded types:
------------------------------------------------
bonded_type_bond   j   k   i
------------------------------------------------
Here, j and k are integers identifying two bonded types, and i is an integer identifying a unique member of the list of bond length potentials (see above). If bonded interactions are used, the code will then scan the ACTUAL atoms in the system it built, go through the list of all ACTUALLY bonded atoms (as defined by molecular topology, which is hard-coded for supported residues and inferred from geometry for unsupported residues), determine the bonded types for the particular pair at hand, and then search the list of terms "bonded_type_bond" for a match. If one is found, the parameters as specified in the corresponding "bond" specification are assigned to this particular bonded interaction. It is important to understand the sequence of events here, since it means that only atoms which are actually bonded can be subjected to a potential of this type. It is IMPOSSIBLE to create extra restraints for distal atoms in the system using this formalism (at least as of now). Note that if a simulation is performed entirely in Cartesian coordinates the program will exit if for a bonded pair of atoms in the system no match in the list of "bonded_type_bond" entries is found. Note that all pairs j k need to be unique, i.e., it is not permissible to specify more than one bond length potential for a pair of bonded types. Also, k j is duly recognized as a redundant permutation.

Similarly, bond angle terms are assigned by:
---------------------------------------------------
bonded_type_angle j   k l i
---------------------------------------------------
This works analogously to the "bonded_type_bond" entries. The integers j, k, and l are the three bonded types forming the angle, and i is the choice of potential. Note that the sequence of bonded types matters, i.e., j k l implies that k is the middle atom, and hence is different from j l k. The only redundant permutation of j k l is l k j and is recognized as such by the code.

Next in line, dihedral angle terms are assigned using:
--------------------------------------------------------------
bonded_type_torsion j   k   l   m i
--------------------------------------------------------------
Again this work mostly analogously. The integer j, k, l, and m are the four bonded types forming the dihedral angle, and i is the choice of potential from the "torsion" entries (see above). The order matters as k and l are assumed to be two bonded types making up the axis, with j being connected to k, and m being connected to l. Therefore the only redundant permutation is in fact m l k j. Note, however, that permutations disappear if the four bonded types are not unique. Another important difference is that dihedral terms are not mandatory, i.e., once the program goes through the list of ACTUAL torsions for the system it created, it will not exit fatally, should it fail to find a corresponding entry. There is a report flag which can be used to diagnose such (potentially missing) terms (see FMCSC_BONDREPORT).

Improper dihedrals are assigned by:
--------------------------------------------------------------
bonded_type_imptors   j k   l m i
--------------------------------------------------------------
This works similarly to "bonded_type_torsion" entries. The main difference is the interpretation of the integers j, k, l, and m. For impropers, j is the bonded type of the central atom, and k, l, and m are bonded types of atoms directly attached to it. In atom number space, let us denote the four atoms as w x y z. Then, the only redundant permutation is "w y x z". Right now, CAMPARI only supports trigonal centers (typically sp2 carbons or nitrogens), although impropers can just as well be used to maintain chirality at tetrahedral centers. Given atom w, an improper dihedral is defined as the angle between the x y z reference plane, and anyone of the three (six) possible planes involving atom w and two of out of x y z. This means that impropers are non-unique and generally messy to set up and/or port from force field to force field. In CAMPARI, if w-z all have different bonded types, three permutations are considered and dealt with independently: j k l m (equ. j l k m), j m l k (equ. j l m k), and j k m l (equ. j m k l). It gets more complicated if the bonded types of two of the atoms bonded to the trigonal center are identical (such as at a carboxylate carbon). Then, the atom corresponding to the unique bonded type (identified so far by mass only) is to be placed at each of the three possible positions exactly once (i.e., j k l l, j l k l, and j l l k). While the first and second are identical in terms of bonded types, they can be used to target one of the specific instances of that bonded type (i.e., specific atoms) over the other (figuring out which can most likely be done only by trial-and-error, however). If all three atoms have the same bonded type, three separate entries j k k k are accepted (again, targeting specific ones will only work via trial-and-error).

Lastly, CMAP corrections are assigned through:
-----------------------------------------------------------------
bonded_type_cmap   j k   l m o i
-----------------------------------------------------------------
This works somewhat similar to proper dihedral assignments but covers two of those angles instead which are directly in sequence. This means that atoms j, k, l, and m may form the first dihedral but that k, l, and m also have to be the first three constituent atoms of the second dihedral completed by atom o. i is an integer identifying a unique member of the list of CMAP potentials (see above). Unlike for the other bonded type assignments, there are assignment restrictions beyond the mere topological ones: All five atoms have to be part of the biomolecule's backbone and be strictly located on the main chain, i.e., each of the five atoms has to have at least two bonded partners. With arbitrary assignments, the number of CMAP potentials which could be considered would become unfeasibly large. There are no redundant permutations since the inverted case is a unique assignment in itself.

All bonded parameter assignments can be patched through the dedicated patch faciliy. For example, this would allow to change the bond angle potential acting on three specific atoms that are connected covalently. This offers very good control, but becomes tedious if many patches are required. In the latter case, CAMPARI also offers to indirectly alter bonded parameter assignments by patching biotypes, which (re)assigns bonded types. It is not possible to directly patch bonded types for existing biotypes.

As a final note, it is worth pointing out that there is no requirement to specify anything more in these lists than the terms relevant for simulating the system at hand. No checks are performed outside of i) the lists only having valid entries, i.e., existing bonded types and existing potential types and ii) there being support for all bond lengths and angles actually present in the system. The file "bondedlst.prm" in the params/ directory contains a template for setting up exhaustive support using a parsing of bonded types as provided in "template.prm".