Input Files for CAMPARI Runs


Classes of Input Files:

  1. Files defining the system:
  2. Files altering the system:
  3. Files redefining standard terms of the energy function:
  4. Files defining auxiliary terms to the energy function:
  5. Files relevant for global behavior of analysis functionality:
  6. Files relevant to specific analysis routines:

Notes on Building CAMPARI Input Files


IMPORTANT COMMENT FOR MPI RUNS:

While the key-file only needs to be visible to the master node, most data-files (like the steric grids, the backbone segment file or of course the sequence information) need to be read by all nodes. Since the execution is forced to be strictly sequential, it is absolutely valid to point them all to a single copy of those files, i.e., jobs can very well be run in a pseudo-single-CPU setting on a cluster-wide NFS-mount. Otherwise, consistency between nodes with regard to the contents of those files has to be ensured by the user (or unpredictable crashes loom).




Section 1: Files defining the system


(back to top)

Sequence input file:

Keyword: FMCSC_SEQFILE

Aside from the parameter file, this file is the only one that is always required.

This input file specifies the sequence of the system to be simulated or analyzed. Every residue (the base organizational unit) occupies one line and uses a base three-letter representation. There is a set of polymer residues and small molecules that is supported in nearly every type of run CAMPARI can perform since CAMPARI possesses an internal representation of those, i.e., it is aware of molecular topology and intrinsic flexibility. These are described next. However, it is also possible to obtained limited support for other residues and molecules including complex polymers. Here, the topology and internal flexibility of each unsupported entity are inferred from a structural input file and by analogy. The details of the inference from structural input are provided elsewhere, but an overview of what is needed in the sequence file is given below the sections on natively supported residues.

Polypeptide residues:

Three-letter code, all twenty common amino acids are supported:

GLY  ALA  VAL  LEU  ILE  PRO  MET  SER  THR  ASN  GLN  CYS  PHE  TYR  TRP  HIE  HID  HIP  ASP  GLU  ARG  LYS
 

Note that neutral forms of GLU, ASP, LYS, ARG and charged forms of CYS and TYR are currently not supported. By adding _N, _C, or _D any residue can be made C-terminal, N-terminal (each charged) or the chirality can be changed to D(R) (except GLY). To achieve both or to make a free amino acid simply use both modifiers (i.e., _C_D or _D_C would give a C-terminal D-amino acid).

Non-standard amino acids:
AIB (alpha-amino isobutyric acid), ABA (alpha-amino butyric acid), NVA (norvaline = alpha-amino valeric acid), NLE (norleucine = alpha-amino capronic acid), ORN (ornithine = alpha,delta-diamino valeric acid), DAB (alpha,gamma-diamino butyric acid). The same modifiers are available.


Internal degrees of freedom:

Backbone:
All non N-terminal residues: ω (CA-1,C-1,N,CA); in case of residue -1 being ACE CA-1 is CH3; in case of FOR CA-1 is H

All except PRO: φ (C-1,N,CA,C); in case of N-terminal residues C-1 is 1HN

All: ψ (N,CA,C,N+1); in case of C-terminal residues N+1 is 2OXT

PRO: Puckering (seven non-redundant degrees of freedom): torsions C-1-N-CA-C, C-1-N-CA-CB, N-CA-CB-CG, CB-CA-N-CD and angles N-CA-CB, CA-CB-CG, CA-N-CD

Sidechain:
AIB, ALA, GLY, PRO: none
ABA, ARG, ASN, ASP, DAB, GLN, GLU, HID, HIE, HIP, LEU, LYS, MET, NVA, NLE, ORN, PHE, TRP, TYR: χ1 (N,CA,CB,CG)
SER: χ1 (N,CA,CB,OG)
THR: χ1 (N,CA,CB,OG1)
CYS: χ1 (N,CA,CB,SG)
VAL, ILE: χ1 (N,CA,CB,CG1)
HID, HIE, HIP, PHE, TYR, TRP: χ2 (CA,CB,CG,CD1)
ARG, GLN, GLU, LYS, NVA, NLE, ORN: χ2 (CA,CB,CG,CD)
ASN, ASP: χ2 (CA,CB,CG,OD1) LEU: χ2 (CA,CB,CG,CD1) ILE: χ2 (CA,CB,CG1,CD1) MET: χ2 (CA,CB,CG,SD) SER: χ2 (CA,CB,OG,HG)
THR: χ2 (CA,CB,OG1,HG1)
CYS: χ2 (CA,CB,SG,HG)
DAB: χ2 (CA,CB,CG,ND)
ARG, ORN: χ3 (CB,CG,CD,NE)
LYS, NLE: χ3 (CB,CG,CD,CE)
ORN: χ3 (CB,CG,CD,NE)
GLN, GLU: χ3 (CB,CG,CD,OE1)
MET: χ3 (CB,CG,SD,CE)
ARG: χ4 (CG,CD,NE,CZ)
LYS: χ4 (CG,CD,CE,NZ)

Note that the "missing" torsions are those assumed to be quasi-redundant, e.g. the dihedral angles formed across the three CZ to nitrogen bonds in the guanidino group of arginine. They can be sampled, however, both in torsional dynamics (→ TMD_UNKMODE) and in Monte Carlo runs (→ OTHERFREQ).

Polypeptide cap residues:

FOR (N-terminal COH-unit)  ACE (N-terminal acetyl-unit)  NH2 (C-terminal amide)  NME (C-terminal N-methyl amide)

No modifiers are available, i.e., FOR and ACE are always N-terminal, NH2 and NME are always C-terminal. It is not recommended to directly concatenate two caps to build small molecule amides.

Internal degrees of freedom:

Backbone:
FOR, ACE, NH2: none
NME: ω (CA-1,C-1,N,CH3)

Note that the "missing" torsions are those assumed to be quasi-redundant, e.g., the dihedral angles formed across the C-CH3 bond in ACE. They can be sampled, however, both in torsional dynamics (→ TMD_UNKMODE) and in Monte Carlo runs (→ OTHERFREQ).

Nucleic acid residues:

In terms of nucleic acids twelve 5'-nucleotides are supported (i.e., the base element is XM-5'P)

D5P  DPC  DPU  DPT  DPA  DPG  R5P  RPC  RPU  RPT  RPA  RPG

The DPX are the common deoxyribonucleotides, while the RPX are the ribonucleotides D5P and R5P are empty sugar-5'-phosphates (i.e., instead of the glcyosamine bond, the C1' has its native hydroxyl group). By adding _N or _C any residue can be made into a 5'- or 3'-capping residue, respectively. 5'-terminal residues generated in this way always carry a free 5'-phosphate group (singly charged). 3'-terminal residues generated in this way carry only a free 3'-hydroxyl.

Internal degrees of freedom:

Backbone:
non-5'-terminal: nuc_bb_1(C3*-O3P-P-O5P) - note that in CAMPARI convention only C3* is part of the preceding residue

5'-terminal: nuc_bb_1(O5P-P-O3P-HOP)

All: nuc_bb_2(O3P-P-O5P-C5*), nuc_bb_3(P-O5P-C5*-C4*), nuc_bb_4(O5P-C5*-C4*-C3*)

Non-3'-terminal: nuc_bb_5(C4*-C3*-O3P-P) - note that in CAMPARI convention O3P and P are part of the following residue

3'-terminal: nuc_bb_5(C4*-C3*-O3*-HO3*)

All: Sugar puckering (seven degrees of freedom): torsions C5*-C4*-C3*-O3P/O3*, C5*-C4*-C3*-C2*, C4*-C3*-C2*-C1*, C2*-C3*-C4*-O4* and angles C4*-C3*-C2*, C3*-C2*-C1*, C3*-C4*-O4*

Sidechain:
 RPC, RPU, RPT: χ1(C3*,C2*,O2*,HO2*), χ2(C2*,C1*,N1,C2)
 RPT only: χ3(C4,C5,C5M,1H5M)
 RPG, RPA: χ1(C3*,C2*,O2*,HO2*), χ2(C2*,C1*,N9,C4)
 R5P: χ1(C3*,C2*,O2*,HO2*), χ2(C2*,C1*,O1*,HO1*)
 DPC, DPU, DPT: χ1(C2*,C1*,N1,C2)
 DPT only: χ2(C4,C5,C5M,1H5M)
 DPG, DPA: χ1(C2*,C1*,N9,C4)
 D5P: χ1(C2*,C1*,O1*,HO1*)

Note that the "missing" torsions are those assumed to be quasi-redundant, e.g. the dihedral angles defined by the NH2 groups in DPA, DPG, etc. They can be sampled, however, both in torsional dynamics (→ TMD_UNKMODE) and in Monte Carlo runs (→ OTHERFREQ).
Nucleic acid cap residues:

As 5'-terminal capping groups, 12 nucleosides are also supported:

DIB  DIC  DIU  DIT  DIA  DIG  RIB  RIC  RIU  RIT  RIA  RIG

DIB and RIB are deoxyribose and ribose, respectively, i.e., empty nucleosides, i.e., sugars. The remaining DIX and RIX are the standard nucleosides with the 5 canonical bases. Note that nucleosides cannot be made C-terminal, i.e., it is not possible (yet) to build free nucleosides using this capping residues.

Internal degrees of freedom:

For backbone:
    nuc_bb_1(HO5*-O5*-C5*-C4*)
    nuc_bb_2(O5*-C5*-C4*-C3*)
    nuc_bb_3(C4*-C3*-O3P-P)
Sugar puckering and sidechain degrees of freedom are identical to full nucleotides (see above).

Small molecules:

NA+ (sodium ion), CL- (chloride ion), SPC (SPC or SPC/E water), T3P (TIP3P water), T4P (TIP4P water), T4E (TIP4P-Ewald water), T5P (TIP5P water), URE (urea), NMF (Z-N-methylformamide (i.e., polar H and O are trans)), NMA (Z-N-methylacetamide), ACA (acetamide), PPA (propionamide), FOA (formamide), DMA (N,N'-dimethylacetamide), CH4 (methane), MOH (methanol), PCR (p-cresol), K+ (potassium ion), BR- (bromide in), CS+ (caesium ion), I- (iodide ion), O2 (molecular oxygen: note that diatomic and linear molecules in general are not yet supported in dynamics), NH4 (ammonium ion), AC- (acetate ion), GDN (guanidinium cation), NO3 (nitrate ion), PCL (perchlorate ion), 1MN (methylammonium ion), 2MN (dimethylammonium ion), CYT (cytosine), URA (uracil), THY (thymine), PUR (purine), ADE (adenine), GUA (guanine), PRP (propane), NBU (n-butane), IBU (2-methylpropane), EMT (ethylmethylthioether), EOH (ethanol), MSH (methanethiol), BEN (benzene), TOL (toluene), NAP (naphthalene), IMD (δ-protonated 2-methylimidazole), IME (ε-protonated 2-methylimidazole), MIN (3-methylindol)

Degrees of freedom:
    O2, SPC, T3P, T4P, T5P, T4E, URE, CH4, FOA, BEN, NAP, PUR, ADE, GUA, CYT, URA: RBC
    NA+, CL-, K+, BR-, CS+, I-, NH4, GDN, NO3, LCP: RBC
    ACA: RBC, χ1 (N,C,CCT,1HCT)
    AC-: RBC, χ1 (1OXT,C,CH3,1H)
    MOH: RBC, χ1 (1H,C,O,HO)
    THY: RBC, χ1 (1HT,CT,C5,C6)
    1MN: RBC, χ1 (1HN,N,CT,1HT)
    NMF: RBC, ω (O,C,N,CCT), χ1 (C,N,CNT,1HNT)
    NMA: RBC, ω (O,C,N,CCT), χ1 (C,N,CNT,1HNT), χ2 (N,C,CCT,1HCT)
    PPA: RBC, χ1 (N,C,CCT,CBT), χ2 (C,CCT,CBT,1HBT)
    PCR: RBC, χ1 (C21,C1,CT,1HCT), χ2 (C31,C4,O,HO)
    DMA: RBC, χ1 (C,N,CZT,1HZT), χ2, (C,N,CET,1HET), χ3 (N,C,CCT,1HCT)
    PRP: RBC, χ1 (2CT,CB,1CT,1HT1), χ2 (1CT,CB,2CT,1HT2)
    2MN: RBC, χ1 (2CT,N,1CT,1HT1), χ2 (1CT,N,2CT,1HT2)
    IBU: RBC, χ1 (2CT,CB,1CT,1HT1), χ2 (3CT,CB,2CT,1HT2), χ3 (2CT,CB,3CT,1HT3)
    NBU: RBC, χ1 (1CT,1CB,2CB,2CT), χ2 (2CB,1CB,1CT,1HT1), χ3 (1CB,2CB,2CT,1HT2)
    EMT: RBC, χ1 (CCT,CB,S,CNT), χ2 (CB,S,CNT,1HNT), χ3 (S,CB,CCT,1HCT)
    EOH: RBC, χ1 (O,CB,CT,1HT), χ2 (CT,CB,O,HO)
    MSH: RBC, χ1 (1H,C,S,1HS)
    IMD, IME: RBC, χ1 (N2,C1,CT,1HT)
    TOL: RBC, χ1 (C2,C1,CT,1HT)
    MIN: RBC, χ1 (C2,C3,CT,1HT)
Note that the methyl spins fall away if united-atom representation is used (see FMCSC_UAMODEL) and that the remaining χs may be shifted down (relevant for EOH, PCR). Note as well that the different water models listed have different numbers of sites (3-5), different covalent geometry, and are all assumed to be completely rigid internally. Of course, a match of the published models still requires using the correct energy functions and parameters. As with all other systems, the "missing" torsions are those assumed to be quasi-redundant, e.g. the dihedral angles defined by the NH2 groups in primary amides. They can be sampled, however, both in torsional dynamics (→ TMD_UNKMODE) and in Monte Carlo runs (→ OTHERFREQ).

Chemical crosslinks:

Disulfide linkages:

The only currently supported type of crosslink are disulfide linkages between two cysteine residues on one or two polypeptide chains. They are specified by identifying the linked cysteine residue with the lower residue index number, and - separated by a blank - appending on the same line the residue index number for the target linkage residue. The target can either be on the same or in a different molecule.
Note that several limitations exist:
  1. Intramolecular crosslinks cannot be so complicated as to create a topology in which multiple ring constraints depend on each other completely. This is analyzed by CAMPARI automatically and the program will terminate abnormally if it identifies such a situation.
  2. Intermolecular crosslinks cannot create ring-like topologies but are otherwise unrestricted. This limitation is more likely to be overcome in the near future than the prior one.
  3. Two residues which are direct sequence neighbors can never be crosslinked even if they are part of separate molecules. This has code-architectural reasons.
  4. Depending on sampling algorithm and choice of methodology (→ keyword FMCSC_CRLK_MODE), crosslinks may cause undesirable side effects.

Other (unsupported) residues:

As was alluded to above, CAMPARI offers the option to infer the topology and further characteristics of residues and molecules that it has no internal reference representation of. The ultimate requirement for using this feature is the presence of a sane structural input file from which the topology can be inferred (→ FMCSC_PDBFILE or FMCSC_PDB_TEMPLATE). Then, the residue parsing from said structural input file simply has to be transferred to the sequence file. If the unknown unit is meant to be the start or end of a polymer or a single-residue small molecule, then the corresponding flags have to be appended as usual ("_N", "_C", or "_N_C"). Note that the parsing can be changed as long as concerted changes are made to both the structural input and sequence files. Note that the terminal flags are, i) always mandatory, ii) the chiral flag ("_D") is not supported since the complete topology is inferred from structural input anyway, and iii) terminal residues still have to differ in their base name if they occur in terminal positions (this is in addition to having to use the "_N" and "_C" indicators).


Examples:

The sequence input file then simply has to list valid molecules, i.e., at least one N- and one C-terminal residue (which can be the same) eventually with intra-chain residues. Every sequence input has to be terminated by an END.

Alanine dipeptide:

ACE
ALA
NME
END


An ion pair:

NA+
CL-
END


A bad input file (because the alanine-based peptide has no termini):

ALA
ALA
ALA
URE
URE
URE
END


A correct way to create an intermolecular disulfide linkage:

GLY_N
GLY
CYS 8
GLY
PRO
GLY
ALA
CYS
NME
CL-
END


A correct way to ask for analysis for a trajectory with the unsupported single-residue small molecule GMP:

CL-
CL-
CL-
CL-
NA+
NA+
NA+
NA+
NA+
GMP_N_C
END


A bad input file (impossible crosslink):

ACE
CYS 6
NME
ACE
CYS
NME
END



Structural input file:

Keyword: FMCSC_PDBFILE

Starting structures for simulations can only be supplied in pdb format. These are largely standard pdb files. CAMPARI expects to find a continuous section of entries of type "ATOM", "HETATM", and "TER", and any other records terminate the processing of input for a given snapshot. It is therefore necessary to remove "ANISOU" or similar records appearing embedded within the coordinates section. The use of "MODEL" and "ENDMDL" indicators has to be consistent. The length of the coordinates section must not exceed the actual system size too much (errors are produced). In these cases, the input file may also have to be pruned (e.g. remove waters, etc).
In trajectory analysis runs, structural input will typically correspond to a trajectory file in a variety of binary and human-readable formats. These two cases are handled separately below:

  1. Analysis runs (see FMCSC_PDBANALYZE)

    To provide maximally accurate analysis, any trajectory is processed purely by reading in Cartesian coordinates for all atoms. If the trajectory is in pdb-format (using the MODEL/ENDMDL syntax), this requires that atom names be understood by CAMPARI. While the software does a fair amount of translation (→ FMCSC_PDB_R_CONV), manual adjustment may be required. Note that the pdb file is assumed to be fixed-format, i.e. column shifts are never tolerable. The target atom names can be constructed from the biotype section in the parameter files, or - simpler - by consulting a CAMPARI-generated pdb file from a dummy run of the system of interest (see for example here). If certain atoms are not found (predominantly applies to hydrogen → FMCSC_PDB_HMODE), the molecular geometry is build according to default assumptions (e.g., reference). Premature termination and/or corrupt structures will occur if important information is missing. While translation support for the formatting of multiple hydrogens on the same site (carbon, nitrogen) exists with regards to the position of the index label, it is important that such hydrogens be in fact systematically numbered. A four-character string like ' HB ' for a sidechain hydrogen atom of an alanine residue would not be read at all, i.e., the reading utility has no capability processing identical atom names for the same residue. As mentioned before, aside from residue name conversions, the reading utility has some flexibility in interpreting atom names in general as long as they are unique on a per-residue basis. This happens: i) through simple automatic conversions for typical cases for which dual standards exist; and ii) through the keyword FMCSC_PDB_R_CONV.
    For simulation trajectory analyses it is generally recommended to utilize one of the binary formats (FMCSC_XTCFILE, FMCSC_DCDFILE, and FMCSC_NETCDFFILE) and possibly a suitable pdb-template (→ FMCSC_PDB_TEMPLATE) that is processed in exactly the same way as described above for pdb trajectories, but serves to provide a map between CAMPARI intrinsic order of atoms and the aforementioned binary trajectory file in use. Both binary trajectory file and the template (if required) need to be complete (in this context, the chosen level of representation may become important → FMCSC_UAMODEL).
    If the analysis run occurs in parallel (see elsewhere for details), CAMPARI expects not a single trajectory, but a set of systematically named trajectories. This applies to all supported formats in trajectory analysis mode.

    Portion of an example key-file:

    FMCSC_DCDFILE /home/dummy/simulation.dcd
    FMCSC_REPLICAS 4
    FMCSC_PDB_FORMAT 4
    FMCSC_REMC 1
    FMCSC_PDBANALYZE 1

    In this example, the MPI-version of CAMPARI would look for 4 individual trajectory files named "N_000_simulation.dcd", "N_001_simulation.dcd", "N_002_simulation.dcd", and "N_003_simulation.dcd" all located in the directory "/home/dummy/". This means that the specified name does not actually correspond to an existing file. The choice of prefix is consistent with the output prefix CAMPARI uses for example in replica-exchange simulations. Note that, if a template is required or desired for other reasons, the template file is not treated in the same way, i.e., all replicas will read and process the same file in the same location.
  2. Runs with a starting structure read from a pdb file

    The choice here depends on the sampled degrees of freedom. For runs in Cartesian degrees of freedom (→ FMCSC_CARTINT), small distortions are allowed to relax, and it is recommended to deal with the peculiarities of pdb files in the same way as described above. This is selected by choosing keyword FMCSC_PDB_READMODE to be 2.
    Because of the limited precision of pdb files, CAMPARI also offers an alternative that is designed primarily for runs in torsional/rigid-body space starting from CAMPARI-generated pdb files. This mode is meant to prevent the distortion of covalent, molecular geometries of frozen degrees of freedom by limited precision. Instead, by choosing keyword FMCSC_PDB_READMODE to be 1, simulation runs can rebuild the molecules using CAMPARI's default geometry. This means that the Cartesian coordinates of the atoms defining the sampled degrees of freedom are simply used to determine the value for a given degree of freedom. These values then seed the molecular building routines. Depending on where the input structure is from, this can lead to a different structure than the one found in the pdb file (with an error increasing along any polymer chains present), meaning that it is unlikely to be useful for starting a run for a dense, polymeric system with a structure obtained from the PDB. Sometimes, it may be possible to relax such conformations using the torsional restraint options (see options 3,4 under TORFILE). All comments about naming conventions apply as before. While unlikely to be a major issue, it is important to keep in mind that switching between the options for FMCSC_PDB_READMODE implies that sampling in rigid-body / torsional space occurs with a different, underlying (rigid) geometry in place.
Sometimes, either trajectory analysis runs or actual simulations contain entities (residues, molecules) that are not part of CAMPARI's built-in database. In such a case, if keyword FMCSC_PDB_READMODE is set to 2 (use Cartesian coordinates), CAMPARI possesses the facility to infer the topology and other characteristics of such unsupported residues (for a list of built-in residues, see above). The requirements that this pdb file has to fulfill are as follows:
  1. It must contain a representation of the unknown part of the system that is physically feasible and chemically canonical, i.e., is free of steric clashes and complies to bond length intervals that are typical for the atoms in their most common hybridization states. This requires guessing the identity of the chemical element encoded by a given atom name. For the common elements (C, N, O, H, S, P), it is generally the 14th column in the input file that matters (this is the second column of the four-character atom string). Recognition of rare elements can be triggered by starting the atom name in the 13th column, e.g., to trigger identification of a copper atom, one should use "CU " for the atom name. Conversely, the name " CU " would be identified as a carbon atom. The correct mass will only be retained if the parameter file in use offers an appropriate atom (LJ) type for this chemical element.
  2. Atoms within the unsupported residue(s) must be ordered such that a meaningful topology can be built, i.e., it must not be the case that an atom comes before all other atoms it is covalently bound to (naturally, with the exception of the very first one). Otherwise, resultant topologies will be garbled, and garbled topologies can severely limit the access of the unsupported entities to CAMPARI's sampling engines. In particular, inter-residue linkages should always make sure that the linkage atom is the very first one in a residue.
  3. All atom names in an unsupported residue should be unique. If there are two residues of exactly the same type (i.e., they have the same name, the same number, order and names of atoms, and they occur in the same molecular context (for a polymer, terminal residues are distinct from non-terminal ones)), then the two residues will be treated as if of identical type (which simplifies several parameter patches).
  4. Names of chemically equivalent atoms (like the various hydrogen atoms in a methyl group) should be named such that the last three characters of the corresponding field are identical, and the first character uses a numbering starting at 1. This helps with consistency in interpretation and can improve naming in output pdb files.
  5. For unknown residues whose basic architecture resembles a supported polymer type, it is recommended to follow the atom order and nomenclature used by CAMPARI (or in terms of nomenclature by the pdb file if it conforms to a convention CAMPARI understands). This enables CAMPARI to infer for example which dihedral angles in an unsupported polypeptide residue such as oxidized methionine would correspond to the φ/ψ/ω-angles. This is important in ensuring that the unsupported residue has maximal access to sampling and analysis routines that are specialized for this polymer type (e.g., DSSP analysis for polypeptides, or sugar pucker sampling for polynucleotides).
Note that not all of the above guidelines are stringent requirements. For certain types of runs or analyses, it may suffice to pay less attention to the formatting. Further details on how CAMPARI utilizes the information in the input pdb file are provided elsewhere. Whatever the application, it is of course required to use option 2 for FMCSC_PDB_READMODE if unsupported residues are present.


Restart file:

Keyword: FMCSC_RESTART

Required for: Restarting a previously terminated or aborted simulation → FMCSC_RESTART

Rather than specifying the starting conformation by a limited-precision input file such as pdb files explained above, it is more appropriate to extend or restart simulations with the help of restart files. These are human-readable and can therefore be edited if needed. The name of the restart file is assumed to be systematically constructed from the default name of the run and should correspond to {basename}.rst. Details on restarting runs are provided elsewhere.


Torsional input file:

Keyword: FMCSC_FYCFILE

Build a structure containing a single molecule from a set of internal degrees of freedom (and a sequence file of course). This option is near-deprecated and will exit fatally if multiple molecules are present. The reader expects a header line first, otherwise the file format is the same as that of output file FYC.dat. In general, usage is not recommended at this point in time.



Section 2: Files altering the system


(back to top)

Particle number fluctuation file:

Keyword: FMCSC_PARTICLEFLUCFILE

Required for: Use of a simulation ensemble with fluctuating particle numbers, i.e., cases where FMCSC_ENSEMBLE is set to 5 or 6 (grand canonical or (pseudo-)semi-grand ensemble).

First line: One integer, X, that specifies the number of particle types whose numbers can fluctuate
Each of the next X lines contains three numbers separated by whitespace:
  • An integer specifying the ID of a particle type whose numbers will be allowed to fluctuate. The type ID starts from 1 and follows the order of the sequence file. See below for an example.
  • An integer specifying the initial number of particles of this type to place in the system. If the implementation mode is chosen as the default, this initial number is also taken to be the expected (bulk) number of particles of this type.
  • A real number specifying either the excess (FMCSC_GRANDMODE is 2) or the absolute chemical potential (FMCSC_GRANDMODE is 1) for this particle type in kcal/mol.

Note on specifying ghost particles in the sequence file:

CAMPARI, like most molecular simulation packages, was initially designed and implemented with the assumption that particle numbers in a simulated system always remain constant. To enable the concept of fluctuating particle numbers, a requirement for grand canonical simulations, we employ a mechanism of "ghost particles": the lengths of internal arrays that track particles are still constant, but now, each particle has a flag indicating whether it is "present" or "ghosted". "Ghosted" particles do not interact with each other or with "present" particles, but retain their intra-particle interactions and their interactions with the implicit solvent. The grand canonical Monte Carlo moves that insert or delete particles from the system are implemented as operations that change the present/ghosted status of a particle. The trajectories are output in such a way that ghosted particles are translated in the z-direction away from the "real" box allowing both clean visualization and subsequent parsing for external analysis.

The consequence is that the end user must include sufficient ghost particles in the sequence input file. For every particle type that can fluctuate, the end user must specify enough copies of that particle such that the number present in the system never exceeds the number in the sequence file. Unfortunately, one does not necessarily know the maximum number that will be present in advance, and including a huge number of ghost particles to be on the safe side may considerably reduce computational efficiency because 1) ghosted particles still require evaluation of purely intramolecular energies, and 2) Monte Carlo moves will be expended on ghosted particles (this is how inserted particles have their degrees of freedom randomized). Therefore, some trial and error will likely be necessary to determine the appropriate number of ghost particles to specify.

Example: Suppose the sequence input file is

ACE
ALA
ALA
ALA
ALA
NME
URE
URE
URE
URE
URE
URE
URE
URE
URE
URE
URE
URE
END

and the particle number fluctuation file is (with FMCSC_GRANDMODE being 2)

1
2 5 -0.19
The particle number fluctuation file specifies that there is 1 (the integer on the first line) type of particle whose numbers shall be allowed to fluctuate. The type ID is 2, which corresponds to urea (the Ac-(Ala)4-Nme polypeptide has type ID = 1). Initially, five urea molecules will be present in the simulation box, and the remaining seven will be ghosted. This sets a bulk concentration that can be computed from the box parameters. The excess chemical potential of urea is -0.19 kcal/mol. In a binary system such as the one specified above (urea+peptide), a preliminary simulation of the single-component system (urea) should have yielded the value provided for the excess chemical potential by showing it to be consistent with the expected bulk concentration. Note that if the number of urea molecules present reaches twelve, and another grand canonical insertion move is accepted, the simulation becomes invalid. Such an event would be reported to log-output, but will not cause the simulation to terminate.


Constraint input file:

Keyword: FMCSC_FRZFILE

This input file allows selected degrees of freedom to be constrained. Here, constraints can be applied only to variables that are explicit degrees of freedom of the system. These are then explicitly excluded from the lists constituting eligible entities for Monte Caro moves, and are not integrated in gradient-based methods in internal coordinate space. This is notably distinct and conceptually much simpler than the inclusion of indirect geometric constraints that depend on more than one native degree of freedom (via Lagrange multipliers). The latter, theoretically more general approach to constraints is currently available only in Cartesian dynamics runs (→ SHAKESET, e.g., also with file-based option).
For simple, explicit constraints supplied via FMCSC_FRZFILE, the first letter in the file indicates the mode:

'T' or 't':
(molecule type-based constraints):
Generally the input is of the type:
-----------------
i   XXX
-----------------
Here, i denotes the number of the molecule type (as defined by sequence input), and XXX is one of ALL, RBC, INT, FYO, CHI.
 ALL: Constrain all degrees of freedom for all molecules of that type
 RBC: Constrain only rigid-body degrees of freedom for all molecules of that type
 INT: Constrain only internal (torsional) degrees of freedom for all molecules of that type
 FYO: Constrain only backbone degrees of freedom for all molecules of that type
 CHI: Constrain only sidechain degrees of freedom for all molecules of that type
 PUC: Constrain pucker state for all relevant residues in all molecules of that type (only relevant in MC)
 OTH: Constrain only dihedral angle degrees of freedom that are not native to CAMPARI, e.g., those in unsupported residues

'M' or 'm':
Identical only that a molecule number is given instead of a molecule type number.

'R' or 'r':
Identical only that a residue number is given instead of a molecule type number. Note that when ALL or RBC is selected in this mode, the rigid-body degrees of freedom for the whole molecule to which the selected residue belongs will be constrained.

Depending on whether the run is a pure Monte Carlo simulation, uses a gradient-based technique, or a hybrid scheme, the interpretation of these requests may differ slightly. In Monte Carlo, the constraint implies that the all relevant degrees of freedom within the specified molecules or residues are removed from all sampling lists associated with active move types that could potentially perturb the degrees of freedom in question. This means that, for example, the "FYO" request on a polypeptide residue could remove the residue from several sampling lists including omega moves, standard phi/psi moves and all types of polypeptide concerted rotation moves. For the latter in particular, this may render many other residues ineligible for these samplers. Conversely, requesting "CHI" for a lysine residue while sidechain moves and OTHER moves on native CAMPARI degrees of freedom are inactive would be redundant since the chosen move set automatically constrains these degrees of freedom.
In internal coordinate space dynamics runs, the interpretation is simpler and a constraint request will lead to the selected degrees of freedom no longer being included in the numerical integration scheme. This simplicity is also the reason why there is an additional mode exclusively available to this type of runs:

'A or 'a':
(Z matrix line-based constraints):
Input is universally of the type:
-----------------
i   INT
-----------------
Here, i denotes an atomic index (Z matrix line) that should correspond to an atom defining a rotatable dihedral angle. If so, just the selected degree of freedom is constrained in internal coordinate space dynamics. If the run is a hybrid run also featuring MC segments, the latter will not honor these constraints unless the move set is explicitly designed to do so. The correct atomic indices can be inferred by consulting the documentation on sequence input above and the reference Z matrix and pdb file(s) written by CAMPARI.

Finally, for runs in Cartesian space (dynamics only), there is an analogous set of constraints available, i.e., constraints preventing explicitly the integration of one ore more positional coordinates in one or several atoms. The mode is again atom-based (note that this mode is mutually exclusive with the Z matrix line-based constraints above):

'A or 'a':
(explicit Cartesian constraints):
Input is of the type:
-----------------
i   *
-----------------
Here, i denotes an atomic index, and "*" is either XYZ, XY, XZ, YZ, X, Y, or Z. In any of the options, all listed elements of the position vectors are constrained. As is true in general, constraints can be cumulative. Full positional constraints of this type can be used to construct crude boundary conditions for condensed phase simulations. More exotic applications would be the inclusion of fixed surfaces in the system or carrying out a simulation in 2D.
It is very important to note that these constraints are mutually incompatible with standard geometric constraints. This means that no atom constrained via FMCSC_FRZFILE must occur in any holonomic constraint group. Note as well that this option is restricted to Cartesian dynamics, which implies that hybrid runs in mixed Cartesian/internal coordinate space are most likely even less of a good idea than in general if this type of constraint is present.

Examples:
----------
M
1 CHI
3 RBC

----------
If the run has an internal coordinate space component, this input file would constrain sidechain angles in molecule #1 and the rigid-body movement of molecule #3. Note that the program will ignore most meaningless requests but will complain if all degrees of freedom of a certain type are eliminated through constraints. These move set sanity checks naturally have no effect in pure dynamics calculations.
----------
A
144 XY
145 XY

----------
If this run has a Cartesian space dynamics component, this input file would constrain the x- and y-coordinates of atoms 144 and 145. For this request to be legal, these two atoms must not participate in any holonomic constraint group (CAMPARI will exit otherwise).


Holonomic constraints input file for Cartesian dynamics:

Keyword: FMCSC_SHAKEFILE

Required for: Custom constraints on internal distances in Cartesian dynamics → FMCSC_SHAKESET

This is a relatively simple input file used to manually determine which interatomic distances to constrain during a Cartesian dynamics simulation. Canonical selections are available via keyword input (→ FMCSC_SHAKESET), but those may sometimes not fit the user's needs. The list available through this input file is composed of distances only (no explicit angular constraints are supported via file input currently). CAMPARI will parse the list to identify groups of coupled constraints to be enforced by an appropriate FMCSC_SHAKEMETHOD.
There are two modes as follows:

Mode 'A':   Two numbers per line after the first line ('T'), where the numbers specify the index of two atoms whose distance is to be constrained. Note that it is generally permissible for these atoms to be part of different molecules or to be separated by many bonds within a single molecule. Note that user-selected, geometric constraints can become a problem if a) the coupling between constraints becomes intractable (a general problem also observed for bond angle constraints → FMCSC_SHAKESET is 4 or 5), or ii) the simulation is a hybrid run in mixed Cartesian/internal coordinate space and the Monte Carlo move set is able to alter the constrained coordinate. For the latter case, CAMPARI will print a preemptive warning (note that the vast majority of default constraints selectable via FMCSC_SHAKESET are not accessible to the various MC move types). It is an important restriction that it should not be possible to constrain the atoms defining the constraint group more than toward complete internal rigidity. Everything else would be redundant constraints. These explicitly or implicitly redundant entries are to be avoided at all cost (whether by explicitly providing the same line twice or by overconstraining a subset of atoms locally). CAMPARI will not detect those and assume - from an ensemble point of view - there to be more constraints than actually present potentially leading to temperature artifacts.

Mode 'T':   This works analogously to the previous mode only that intramolecular entries have to be provided for the atom indices for the first molecule of a given type only. CAMPARI will then replicate such an entry for all molecules of the same type. Use FMCSC_SEQREPORT to get an overview of the molecule types in the system. This is a helpful simplification of the purely index-based mode if simulations with custom constraints are attempted in the presence of a bath of solute molecules which all need to be identically constrained. There is no support for intermolecular constraints in this mode unless there is exactly one molecule of each participating type.

Note that entries belonging to ineligible indices will be ignored or may cause a fatal termination of CAMPARI depending on what exactly the issue is. The correct maintaining of holonomic constraints can be analyzed by studying trajectory output with suitable software or - in most cases at least - by consulting output file INTHISTS_BL.dat.


Preferential sampling weights input file:

Keyword: FMCSC_PSWFILE

If the simulation is a Monte Carlo simulation in torsional space (see FMCSC_DYNAMICS and FMCSC_CARTINT), this keywords allows the user to alter the default picking probabilities for entities eligible to a given move type. This function can be overlapping with the constraint requests described elsewhere. The file is a sequence of blocks with each block indicated by a starting letter (separate line):

'T' or 't':
(molecule type-based adjustments for particle insertion/deletion moves):
In the simplest case, the input is of the type:
-----------------
i   w1
-----------------
Here, i denotes the number of the molecule type (as defined by sequence input), and w1 is the target sampling weight (floating point number) for this molecule type in particle insertion and deletion moves. Weights are set such that any entries not listed explicitly in this file have a weight of 1.0 (normalization happens after processing all entries). Therefore, it is redundant to choose w1 as unity. A value of zero is possible, but hard to justify in this case given that the molecule types that are allowed to fluctuate are set via user input as well. Note that the weights for particle permutation moves in the semi-grand ensemble can not be altered.

'M' or 'm':
(molecule-based adjustments for rigid-body moves):
For rigid-body moves it is permissible to control picking probabilities on a per-molecule level. Therefore, input is of the type:
-----------------
i   w1 w2
-----------------
Here, i denotes the number of the molecule (as defined by sequence input), w1 is the target sampling weight for single molecule rigid-body moves, and w2 is the target sampling weight for cluster rigid-body moves. Note that it can indeed be useful to control these separately, for example when sampling macromolecular assemblies in the presence of small molecules. In such a case, preferential picking of the macromolecules for cluster moves may prove beneficial. Some restrictions on cluster rigid-body moves are outlined elsewhere.

'A' or 'a':
(degree of freedom-based adjustments for certain dihedral angle pivot moves):
Specifically, for the various branches of single dihedral angle pivot moves ( OTHER moves) that are meant to primarily operate on those dihedral angles that cannot be sampled by the specialized move sets within CAMPARI (→ FMCSC_OTHERUNKFREQ and FMCSC_OTHERNATFREQ), CAMPARI allows adjustment of the picking frequencies at the level of individual degrees of freedom. Input is as follows:
-----------------
i   w1 
-----------------
Here, i denotes the atom number corresponding to the Z-matrix line associated with the dihedral angle for which the sampling weight should be changed. Since every dihedral angle can only occur in at most one of the three subcategories sampled by OTHER moves (viz., degrees of freedom in residues that are not natively supported; degrees of freedom in native CAMPARI residues that cannot be sampled by any of the other move types; degrees of freedom that can be sampled by other CAMPARI move types), only a single weight (w1) is sufficient. Note that this means that the sets of weights supplied in a section of this type may not all act on the same list of degrees of freedom.

'R' or 'r':
(residue-based adjustments for all other moves):
For all other moves, CAMPARI allows adjustment of the picking frequencies at the residue level. Input is as follows:
-----------------
i   w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11
-----------------
Here, i denotes the number of the residue (as defined by sequence input). The wi weights refer to polypeptide φ/ψ-pivot moves (w1), polypeptide ω-pivot moves (w2), sidechain moves (w3), polynucleotide (and other polymer) backbone pivot moves (w4), polypeptide pucker backbone moves (w5), sugar pucker backbone moves (w6), polypeptide Dinner-Ulmschneider exact concerted rotation moves with ω-sampling (w7), polypeptide Dinner-Ulmschneider exact concerted rotation moves without ω-sampling (w8), polynucleotide Dinner-Ulmschneider exact concerted rotation moves (w9), polypeptide Ulmschneider-Jorgensen exact bond angle-based concerted rotation moves (w10), and polypeptide Favrin et al. inexact concerted rotation moves (w11). This is tricky for moves sampling stretches of residues, and the complications are discussed elsewhere.
Before presenting an example, a few remarks are in order. First, it is permissible to truncate each line after w1, if there is no need to alter the picking probabilities for move types controlled by any of the omitted parameters. It is, however, not possible to alter - for example - the picking weight for a specific residue for sugar pucker moves without also specifying w1-w5.
In general, entries that request weight alterations for ineligible entities produce a warning. As mentioned above, missing entries are all assumed to be 1.0, which allows the file to remain small in most cases. Weights must be zero or positive numbers, otherwise the behavior may be undefined. Note that it is not possible to utilize the molecule type- or molecule-based modes to control picking frequencies for move types listed in the residue-based section (also holds for any other combination). Furthermore, CAMPARI does not tolerate if sampling weights deplete all entities for a given move type and will exit with an error in such a case.

Example:
----------
R
12 1.0 1.0 10.0

M
1 1.0 10.0
31 0.1

R
14 1.0 1.0 1.0 5.0 1.0 5.0 1.0 1.0 5.0

A
71 3.0

----------

This input file would lead to the following changes to the default picking probabilities (in each case, the alteration is contingent upon the presence of the corresponding move type in the move set and the eligibility/presence of the chosen entity):
  • Residue #12 would be preferentially picked for sidechain moves (factor 10 to all other eligible residues).
  • Molecule #1 would be preferentially picked for cluster rigid body moves (factor 10 to all other eligible molecules).
  • Molecule #31 would be less frequently picked for single molecule rigid body moves (factor 0.1 to all other eligible molecules).
  • Residue #14 would be preferentially sampled in all nucleotide-specific moves (pivot, puckering, and concerted rotation). For pivot and puckering, this is a factor of 5.0 to all other residues. For concerted rotation, this is a factor of 5.0 for residue #14 being the 3'-terminal residue in a polynucleotide concerted rotation stretch compared to all other eligible 3'-terminal residues in a stretch.
  • The dihedral angle in Z-matrix entry #71 (corresponds with atom number) would be preferentially sampled in whatever subcategory of OTHER moves this degree of freedom belongs to. The factor of 3.0 is with respect to all other degrees of freedom in the same subcategory.
Note that multiple blocks of a type are possible. Also note the skipping of unwanted entries on 3 out of 5 relevant lines.


Biotype patch input file:

Keyword: FMCSC_BIOTYPEPATCHFILE

The highest level patch that can be applied in CAMPARI is to change the biotypes specific atoms are assigned to. In the parameter file, biotypes are the key assignment for every possible atom in residues natively supported by CAMPARI, because they indirectly set all other, atom-specific parameter assignments handled by the parameter file, i.e., (LJ) atom types, charge types, and bonded types. Atom types in turn determine other parameters such as mass or valency. This means that (re)assigning the biotype of a given atom changes all those properties at once.
There are atom-specific parameters that cannot be changed by a biotype patch, viz., certain parameters of the solvation model (see below). Similarly, parameters that are based on groups of atoms are unaffected by biotype patches (see free energy of solvation parameters in the parameter file and corresponding patches). Biotype patches are processed before any other parameter patch is considered. This means that it is possible to override aspects of a biotype patch by subsequent application of atom type patches, mass patches, etc.
Biotype patches can be applied in two variants; the first is atom-specific, and the second applies patches consistently across all residues of identical type.

Atom-specific example:
---------------------------------------
A
113   118
114   119
115   120

---------------------------------------

This input file selects the atom-specific mode ('A'), and would (re)assign biotypes for atoms 113-115 (and only those atoms) in the system. The chosen biotypes correspond to the N, C, and CA atoms of tyrosine. Such a patch (incomplete obviously) would be appropriate for example when simulating or analyzing a system containing phosphorylated tyrosine. Then, it is often convenient to set a number of basic parameters by taking advantage of the similarity of the two entities and the corresponding overlap in parameter space. Support for residues not natively supported by CAMPARI is the main use of biotype patches. Another example is as follows:

Residue-level example:
---------------------------------------
R
1   1100
2   1101
3   1102
4   1103
5   1104
6   1105

---------------------------------------

Suppose the system is a dense phase of identical small molecules with 6 atoms each. Then, this patch could be used to very efficiently provide complete parameter support for this system. For this, we assume that biotypes 1100-1105 have been created by the user, and that the relevant information for these biotypes has been provided. By virtue of selecting option 'R', this patch would be automatically propagated to all residues of identical type and identical terminus state (the latter is relevant for polymers only, i.e., the patch would (and most likely could) not be propagated from an alanine residue in the middle of a polypeptide to a C- or N-terminal alanine residue in the same or a different polypeptide). If the residue in question is not natively supported by CAMPARI, only those unsupported residues of identical name are grouped that have exactly the same number and names of atoms and that are in the same chain position (C-terminal vs. N-terminal vs. nonterminal). Two possible applications for a patch like the second example above could be simulations of an unsupported small molecule liquid or gas or the desire to keep multiple parameterizations for the same small molecule in the same parameter file and to switch between them via patch (this in general will be less error-prone and easier to maintain than creating divergent parameter files). When using the 'R' option for a biotype patch, the atom indices in use should correspond to the first residue of that type (otherwise the patch request is ignored). Keep in mind that atom names in pdb output files will also be altered by biotype patches.


Atomic mass patch input file:

Keyword: FMCSC_MPATCHFILE

Atomic masses are always read and assigned to each atom in the system, and this input file allows changing the value assumed based on the assigned atom type. In general, masses are not explicit components of Monte Carlo sampling, and are not expected to change thermodynamic properties of the system provided that all other parameters remain fixed. Instead, masses influence system dynamics. An exception are potentials explicitly computing the mass, an example being spatial mass density restraints.

Example:
---------------------------------------
666   14.
1444   40.
2211   2.

---------------------------------------

This input file would change the masses for atoms 666, 1444, and 2211 to those (approximately) of nitrogen, argon, and deuterium, respectively. Chosen values are required to be at least 1.0, since massless sites require constraints for stable integration, but cannot use standard constraint solvers. The most common application for mass patches would be in improving integrator stability by artificially increasing the mass of those atoms responsible for the fastest degrees of freedom in the system, i.e., hydrogen. Another application are otherwise unsupported residues, for which the guessed atom types are inappropriate, or which contain elements that the parameter file does not even have theoretical support for.


Atomic radius patch input file:

Keyword: FMCSC_RPATCHFILE

Similar to atomic masses, every atom is always assigned an atomic radius, and this input file allows changing the value assumed derived from parameters, i.e., either based on the assigned atom type or on the radius override in the parameter file. Atomic radii are generally supposed to correspond to the exclusion radius of an atom, and are elements of energy terms such as the ABSINTH implicit solvation model and all spatial density restraints, but also of solvation or void space analysis.

Example:
---------------------------------------
6   1.
17   0.1
22   2.5

---------------------------------------

This input file would change the radii for atoms 6, 17, and 22 to 1.0, 0.1, and 2.5 Å, respectively. Chosen values are required to be at least 0.0, i.e. sizeless atoms are explicitly allowed. Note that atomic radii and topology in polyatomic molecules have to be matched to one another. This is because parameters derived from atomic radii, such as maximum SAV fractions or volume reduction factors, which are also patchable, neglect overlap terms corresponding to three or more covalently bound atoms. Also note that there should be no performance gain associated with reducing the radii of atoms to zero for the ABSINTH implicit solvation model.


Group-based thermostat coupling input file:

Keyword: FMCSC_TSTAT_FILE

This is a very simple input file allowing the (obsolete) adjustment of the Berendsen weak-coupling and the Bussi et al. velocity rescaling thermostats to couple parts of the system to independent thermostats, i.e., to apply velocity adjustments based on instantaneous temperatures for the subsystem and to instantaneous velocities for just that subsystem. There is no physical justification for doing this in general as it effectively restricts energy transfer between the subsystems and can easily lead to artifactual results. It was used predominantly as a trick to circumvent subsystem freezing which can sometimes occur if one subsystem has much faster kinetics and different levels of integrator noise than others (see FMCSC_TSTAT). The input file is very simple. There are two modes, a molecule-based one (first line 'M') and a molecule type-based one (first line 'T'):

Mode 'M':   One number per line after the first line ('M'), where the number specifies the temperature coupling group the molecule corresponding to the line is supposed to be part of. For a system with 100 molecules, there are 100 lines after the first one, each one containing an integer with a coupling group.

Mode 'T':   One number per line after the first line ('T'), where the number specifies the temperature coupling group the molecule type corresponding to the line is supposed to be part of. As usual, molecule types are numbered intrinsically by occurrence in the sequence input file. For a system with two molecules types (say a protein in water), there would only be two lines after the first one, both containing an integer with a coupling group.

Note that coupling groups have to be positive integers, and that non-consecutive integers are re-numbered accordingly to provide a consecutive set.


Wang-Landau sampling initial guess input file:

Keyword: FMCSC_WL_GINITFILE

Required for: Overriding of the flat initial guess for the (logarithmic) target distribution in a Wang-Landau run. Naturally, this is is irrelevant for any simulation not (at least partially) using the WL method.

The (logarithmic) target distribution of a Wang-Landau run is not generally expected to be flat, neither globally nor locally. This means that a flat initial guess (the default) may slow down the construction of the distribution's coarse features (especially at boundaries). In these cases, it may be helpful to supply a nonflat initial guess.
The format of this input for one-dimensional cases is simply two columns, the first giving the center of the bin in question, and the second giving the initial guess for the target distribution. Because all distributions are handled in logarithmic space, the only criterion of relevance are differences between bins and the relation of these differences to the starting value for the f parameter. The bin spacing has to be constant. Range settings provided via keywords FMCSC_WL_MAX and FMCSC_WL_BINSZ are overwritten. Note that the assessment of convergence (f parameter reductions) does not rely on the estimate of the target distribution (e.g., ln g(E)), but rather on the simple visitation histogram. This means that it may be necessary to choose a relatively large number of buffer steps with a nonflat initial guess to ensure that all relevant bins are populated before the actual iteration begins.

Example:
---------------------------------------
0.1 10.2
0.3 4.1
0.5 3.4
0.7 3.2
0.9 6.8

---------------------------------------

This would initialize the histogram to 5 bins with a spacing of 0.2 and a maximum value of 1.0. This example could correspond to a normalized reaction coordinate (such as Zα) and would give an extremely coarse result (the bin spacing will never vary throughout a WL run).
For the two-dimensional case, CAMPARI expects Nd2+1 columns per row and an additional header line. Nd2 is the number of bins in the initial guess for the second dimension (always a geometric reaction coordinate). In each non-header row, the values listed should be the bin center coordinate in the first dimension followed by the Nd2 initial estimates for the target distribution. Note that all Nd2 entries must be listed on every data row. The header line itself must contain an integer identical to Nd2 followed by the Nd2 increasing bin center coordinates in the second dimension.

Example:
-----------------------------------------------------
6 3.4 3.8 4.2 4.6 5.0 5.4
0.1 1.2 1.7 2.5 2.8 2.1 1.2
0.3 0.1 0.2 0.4 0.6 0.4 0.1
0.5 0.1 0.1 0.4 0.5 0.3 0.1
0.7 0.2 0.7 1.0 0.8 0.4 0.0
0.9 0.5 1.8 2.1 1.2 0.2 0.0

-----------------------------------------------------

This would initialize the histogram to 5x6 bins with spacings of 0.2 and 0.4 maximum values of 1.0 and 5.6. This example could correspond to a two geometric reaction coordinates, e.g., Zα and the radius of gyration for a very short peptide.



Section 3: Files redefining standard terms of the energy function


(back to top)

MPI replica exchange input file:

Keyword: FMCSC_REFILE

Required for: Use of Hamiltonian (or temperature) replica-exchange methodology → FMCSC_REMC

Note that the code needs to be compiled with MPI support (see INSTALL) for this option be available. On the first line a set of integers indicates different(!) "dimensions" of the requested set of replicas. Such dimensions are environmental terms and/or parameters of the Hamiltonian which are supposed to differ between the individual replicas. The by far most common application is to specify different temperatures. However, in theory many more exchange parameters are possible and may be useful for certain applications (for examples FMCSC_FEG_IPP and similar parameters in the case of free energy calculations). In CAMPARI, the following are available in REMC/D runs:
  1. Temperature
  2. Scaling factor for repulsive inverse power potential (see FMCSC_SC_IPP)
  3. Scaling factor for dispersive 6th power potential (see FMCSC_SC_ATTLJ)
  4. Scaling factor for the Weeks-Chandler-Andersen (WCA) potential (see FMCSC_SC_WCA)
  5. Scaling factor for polar (Coulombic) potential (see FMCSC_SC_POLAR)
  6. Scaling factor for free energy of solvation term for ABSINTH model (see FMCSC_SC_IMPSOLV)
  7. Implicit solvent dielectric constant (see FMCSC_IMPDIEL, note that this is currently only supported as the implicit dielectric for the ABSINTH model, and does not work when interpreted as the (generalized) reaction-field dielectric)
  8. Scaling factor for torsional bias potential (see FMCSC_SC_TOR)
  9. Scaling factor for secondary structure bias potential (see FMCSC_SC_ZSEC)
  10. Global secondary structure bias potential: target α-content (see FMCSC_ZS_FR_A)
  11. Global secondary structure bias potential: target β-content (see FMCSC_ZS_FR_B)
  12. Screening model for implicit solvent (see FMCSC_SCRMODEL)
  13. Sigmoidal steepness for free energy of solvation term (τF) in ABSINTH model (see FMCSC_FOSTAU)
  14. Sigmoidal steepness for dielectric screening (τS) in ABSINTH model (see FMCSC_SCRTAU)
  15. Sigmoidal midpoint for free energy of solvation term (χF) in ABSINTH model (see FMCSC_FOSMID)
  16. Sigmoidal midpoint for dielectric screening (χS) in ABSINTH model (see FMCSC_SCRMID)
  17. Contact dielectric for screening models 3, 4, or 7-9 (see FMCSC_CONTACTDIEL and FMCSC_SCRMODEL)
  18. Generalized mean for screening models 5-8 (see FMCSC_ISQM and FMCSC_SCRMODEL)
  19. Mixing contribution from distance dependence for screening models 3, and 7-9 (see FMCSC_SCRMIX and FMCSC_SCRMODEL)
  20. Scaling factor for ghosted repulsive inverse power potential (see FMCSC_FEG_IPP)
  21. Scaling factor for ghosted dispersive 6th power potential (see FMCSC_FEG_ATTLJ)
  22. Scaling factor for ghosted Coulombic interactions (see FMCSC_FEG_POLAR)
  23. Scaling factor for tabulated potentials (see FMCSC_SC_TABUL)
  24. Scaling factor for polymeric biasing potential (see FMCSC_SC_POLY)
  25. Scaling factor for distance restraint potential (see FMCSC_SC_DREST)
  26. Scaling factor for ghosted bond length potentials (see FMCSC_FEG_BONDED_B)
  27. Scaling factor for ghosted bond angle potentials (see FMCSC_FEG_BONDED_A)
  28. Scaling factor for ghosted improper dihedral potentials (see FMCSC_FEG_BONDED_I)
  29. Scaling factor for ghosted torsional potentials (see FMCSC_FEG_BONDED_T)
  30. Global E-score for DSSP biasing potential (see FMCSC_DSSP_ESC)
  31. Global H-score for DSSP biasing potential (see FMCSC_DSSP_HSC)
  32. Time step for dynamics calculation → this acts as a dummy in pure Monte Carlo runs (see FMCSC_TIMESTEP)
  33. Scaling factor for density restraint potential (see FMCSC_SC_EMICRO)
  34. Threshold setting for density restraint potential (see FMCSC_EMTHRESHOLD)
This header line is followed by vectors of individual conditions (one condition per line) in the n-dimensional space defined before.

Example:
-------------------
1   4
298.0  0.9
350.0  0.8

-------------------
Such an input file would set up two replicas; the first one at a temperature (code #1) of 298K and with a global scaling factor for the WCA potential of 0.9 (code #4) and the second one at a temperature of 350K with a WCA scaling factor of 0.8.

Note that the normal requests in the key-file for any dimension which is part of the replica space are going to be ignored (in the above example, putting a value for FMCSC_SC_WCA would have no effect).

Also note that setting scaling factors for energy terms to zero in this input file will be slightly different than when this is done in any other calculation. As a result, there might be no speed-up for individual replicas (this is usually not desired anyway). Most other changes will be hidden from the user and relate to the necessity to be able to calculate cross-energies for which such scaling factors might not be zero. Lastly, performance may be degraded for certain choices requiring extensive recalculation of expensive terms, e.g., the choice of screening model for the ABSINTH model or the threshold setting for the density restraint potential. In such cases, very small values for FMCSC_REFREQ may be counterproductive.


Free energy growth (ghosting) input file:

Keyword: FMCSC_FEGFILE

Required for: Use of specific, energetic decoupling of parts of the system from the rest via ghosting of interactions → FMCSC_GHOST

The input file for specifically scaling interactions of certain molecules or residues with the rest of the system. This is only supported for a limited range of Hamiltonians at the moment (see FMCSC_GHOST and related keywords in KEYWORDS):
(the first line gives the mode)

'T' or 't':
The input is then simply:
----------
i  0
----------
The integer i denotes the molecule type, for which the interactions with the rest of the system are to be ghosted. The optional 0 can be used to override the scaling settings in order to decouple an individual group from the system entirely. This is tailored to rather specific applications, for instance the stepwise calculation of free energies of solvation on a residue-by-residue basis. Note that full de-coupling requires all the bonded FEG-parameters to be set to unity (see FMCSC_FEG_BONDED_B, etc). This has primarily technical reasons.

'M' or 'm':
Analogous but with molecule number instead of molecule type number

'R' or 'r':
Analogous but with residue number instead of molecule type number

Example:
----------
r
1  0
2

----------
This will ghost the first residue in the sequence entirely (i.e., it contributes nothing to the simulation). The second residue in the sequence will have scaled interactions according to the setting of the FEG-scaling parameters (see KEYWORDS, e.g. FMCSC_FEG_IPP). Always keep in mind that interactions between ghosted entities as well as within ghosted entities are computed according to the selection in FMCSC_FEG_MODE and that this may sometimes lead to unexpected or inconsistent behavior. For example, a water box full of ghosted water molecules will behave like any other water box (with absolutely no ghosting) if FMCSC_FEG_MODE is set to 1.


Bonded potential terms patch input file:

Keyword: FMCSC_BPATCHFILE

Prerequisite: This file is only relevant if at least one of the bonded potential terms (see bond length potentials, bond angle potentials, improper dihedral angle potentials, torsional potentials, and CMAP potentials) is in use.

This file can be employed to override default assignments for bonded interactions or to add bonded interactions on internal coordinates that were previously not assigned a bonded interaction. A fundamental limitation is that such new bonded interactions can only be applied to internal coordinates that could have been assigned a term by default, e.g., it is not possible to apply a bond length potential to atoms a-b if a is not in fact covalently bound to b.
Default assignments are made based on the parameter file (see PARAMETERS and dedicated documentation), and the user can force CAMPARI to print those to log-output via FMCSC_BONDREPORT. In the patch file, each line has to contain as its first entry a keyword that is either "PATCH_BOND", "PATCH_ANGLE", "PATCH_IMPROPER", "PATCH_TORSION", or "PATCH_CMAP". Take glycine dipeptide (acetyl and N-methylamide caps) in united-atom representation as an example. This molecule has the CH3 and C atoms of the acetyl group, N, Cα, and C of glycine, and the N and CH3 atoms of the C-terminal capping group as indices 1, 2, 4, 5, 6, 9, and 10, respectively.

Example:
---------------------------------------
PATCH_BOND 1 2 23
PATCH_ANGLE 4 5 6 11
PATCH_IMPROPER 4 2 5 8 44
PATCH_TORSION 5 6 9 10 112
PATCH_CMAP 9 6 5 4 2 3

---------------------------------------

The first line of this input file would replace the existing bond length potential between the CH3 and C atoms of the acetyl group with the one in the parameter file in use that has index 23. If no potential was previously assigned (fatal in Cartesian dynamics), it would be added by this patch. If no potential with index 23 is found in the parameter file, this line of the patch file is ignored (and a warning is produced). It is important to note that the patch file exclusively handles the assignment, but not the parameters themselves. The reason is that appending the list of bonded potentials in the parameter file allows reusing an entry multiple times. If the parameters themselves were also part of the patch file, they would have to be repeated over and over again in such a case, which would increase the amount of work and the likelihood of errors creeping in.
Similarly, the second line would change (or add) the bond angle potential acting across the N→Cα→C angle in glycine to the one with index 11 in the parameter file in use. The same notes and caveats as before apply. The third line would alter the assignment for the improper dihedral angle formed by the (central) N of glycine along with the Cα and polar hydrogen (index 8) atoms of glycine, and the carbonyl carbon of the acetyl group. The potential type would be the torsional potential with index 44. The fourth line would assign the 112th torsional potential in the parameter file to the ω-torsion of the peptide bond connecting glycine and the C-terminal N-methylamide. Lastly, the fifth line would change (or add) a CMAP potential acting on the consecutive ψ- and φ-angles of glycine. Note that the atom order is reversed compared to a standard CMAP assignment as found in the CHARMM force field.
A few more comments are in order. First, redundant permutations in the lists of atoms are understood by CAMPARI (see here for definitions of what redundant permutations are for the individual cases). Second, CAMPARI will only process patches that relate to an energy term that is actually turned on. For instance, if in the above example FMCSC_SC_BONDED_M were zero, the fifth line would not be processed at all. Third, for processed patches that fail due to the atoms not forming an internal coordinate of the specified type and/or due to the potential choice being out of range, a warning is produced. Similarly, every successful patch will be summarized to log-output as well. Fourth, patch keywords that CAMPARI fails to understand will cause a fatal exit of the program. Fifth, multiple redefinitions of the same internal coordinate are considered according to input order, i.e., only the last definition will end up being relevant. This implies that is not possible to "stack" multiple potentials on an individual internal coordinate. Sixth and last, atom numbering will always correspond to CAMPARI's internal order, which can usually be extracted from pdb files produced at the beginning or end of a run.


Size exclusion and dispersion parameters patch input file:

Keyword: FMCSC_LJPATCHFILE

Prerequisite: This file is always read, but the assignments are only relevant if one of the short-range interaction potentials is active (IPP, ATTLJ, or WCA) or any functionality or potential depending on derived parameters (such as atomic radii or volumes used, for example in the ABSINTH implicit solvation model) is in use.

This file can be employed to override the Lennard-Jones σ and ε parameters by changing the default atom type assignment for a specific instance of a biotype. Each line has to contain two integers, the first giving the atom number (CAMPARI-internal numbering), and the second the reassigned atom type number in reference to the parameter file in use.

Example:
---------------------------------------
15   11
1443   1

---------------------------------------

This input file would attempt to reassign atom 15 to atom type 11 and atom 1443 to atom type 1. Note that the success depends on the existence of the appropriate number of atom types in the parameter file. As long as they are continuously numbered and all parameters are provided, it is feasible to create arbitrary amounts of atom types in the parameter file with the limiting case that - theoretically - every actual atom in the system could be assigned unique parameters with unique pairwise and 14-exceptions through the use of this patch facility.
An important caveat lies in the fact that masses, atomic numbers, valencies, and other quantities specified in the atom lines of the parameter file are not altered. This is particularly relevant when dealing with unsupported residues (see sequence input and trajectory analysis), since CAMPARI will make an automatic guess for a biotype assignment, from which the default assignment is gleaned. If this yields unsatisfactory results regarding mass or valency, this patch file is not helpful. Instead, the input pdb should be altered to aid CAMPARI in guessing an appropriate type. As usual, the index in the first column corresponds strictly to the internal numbering used by CAMPARI, which can, for instance, be read out from the pdb file produced at the beginning of the simulation (→ here) unless nucleotides are part of the system and are requested to conform to pdb convention (→ FMCSC_PDB_NUCMODE).


Charge patch input file:

Keyword: FMCSC_CPATCHFILE

Prerequisite: This file is only relevant if the polar potential is used.

This file can be employed to override partial charge assignment extracted from the parameter file (see PARAMETERS and dedicated documentation). Each line has to contain two entries, an integer giving the atom number and a floating point number giving the new partial charge.

Example:
---------------------------------------
33   -1.76
34   0.88
35   0.88

---------------------------------------

This input file would change the atomic partial charges on atoms 34 and 35 to 0.88e each, and that on atom 33 to -1.76e. Note that it is up to the user to ensure that the resultant charges are sane and self-consistent, although CAMPARI will perform its usual tests and provide warnings/exits in case they are not (keywords FMCSC_ELECREPORT and FMCSC_UNSAFE may become important or useful here). Also note that the numbering, which the index in the first column refers to, is strictly the internal numbering used by CAMPARI. This can generally be read out from the pdb file produced at the beginning of the simulation (→ here) unless nucleotides are part of the system and are requested to conform to pdb convention (→ FMCSC_PDB_NUCMODE).
This functionality is merely thought to provide any corrections to partial charges which would break the identity of biotypes set up in the parameter files and maintained within CAMPARI itself. The most common example of this would be partial charges for terminal polypeptide residues in the AMBER-class of force fields.


Residue-level net charge flag patch input file:

Keyword: FMCSC_NCPATCHFILE

Prerequisite: This file is only relevant if the polar potential is used. Additional requirements are that the electrostatic interaction model is group-based and/or that the cutoff treatment for long-range Coulombic interactions is such that LREL_MC is 1, 2, or 3 and/or that LREL_MD is either 4 or 5.

This file can be employed for two purposes. First, it can disable the flag CAMPARI assigns to a residue that has at least one charge group in it with a total charge exceeding a tolerance setting in absolute magnitude. It is currently impossible to turn the residue-level flag on (since it would not in general be clear which charge group to do this for and why). Second, it can aid the automatic determination of charge groups by defining a series of target values (presumably different from 0.0). Charge groups are generally determined by an algorithm that finds sets of atoms of a desired (integer or zero) net charge, which are also topologically close, do not isolate other polar atoms (topologically speaking), and do not introduce distinctions between atoms of identical biotype. For built-in residues, CAMPARI has a specific understanding of what to expect (e.g., it would look first for two charge groups with net charge of 1.0 in an N-terminal lysine residue). However, this understanding may not always be appropriate, and no such understanding is available for unsupported entities. This is the primary purpose of patching the charge group targets.
Each line has to contain two or more entries. The first is an integer giving the residue number, the second is either 0 or 1 (with only a value of 0 specified for a residue that contains charge groups matching the criteria outlines above being relevant), and a number of floating point values.

Example:
---------------------------------------
3   1 -1.0 -1.0 -1.0 0.0
7   0

---------------------------------------

This input file would flag the 7th residue in the system as no longer carrying a net charge irrespective of the partial charges on its atoms and irrespective of how they were grouped into charge groups. For the 3rd residue, it would instruct CAMPARI to look for 3 separate charge groups with net charges of 1.0 each, e.g., residue 3 could be an unsupported small molecule such as fully deprotonated citric acid (3-). The 0.0 specified at the end is optional, since the program will always look for as many net neutral groups as possible after processing of special groups. Note that each entry for a charge group target will be processed in exactly the order provided. Once a suitable group is found, the code proceeds to the next target. Also note that the sum of target values must match exactly the total charge of the residue in question. If any given target cannot be reached by any acceptable grouping (there are restrictions as mentioned above in addition to limitations of the search size in terms of numbers of bonds), the entry in question will simply be skipped. It is therefore possible to exercise a reasonable amount of control over the charge group parsing, but it is likely that some trial-and-error is involved.
As alluded to before, the main application domain for changing the charge flag via this patch (residue 7 in the example above) would be the suppression of the computation of long-range interactions between residues flagged as charged simply because their partial charges do not group cleanly into neutral groups. Specifically, polarization across neighboring residues in a polymer without any formal charges would generally flag both residues as charged. This is because in such a case a net neutral group is split arbitrarily because of the requirement for charge groups to be confined to residues (common with charges fit from QM data without constraints). The main application for the second functionality (residue 3) is to deal with unsupported residues, correctly reflect manual charge patches or altered parameters, to deal with unusual species such as zwitterions, or to achieve an approximate grouping for charges that do not group exactly (which can be highly relevant for certain screening models within the ABSINTH framework).


Atomic solvation parameters patch input file:

Keyword: FMCSC_SAVPATCHFILE and FMCSC_ASRPATCHFILE

Prerequisite: There are two atomic parameters that can be patched from their default values, which are primarily relevant in calculations using the ABSINTH implicit solvation model. Both use an identical file format and identical restrictions on the possible values (interval from 0.0 to 1.0), and are therefore described together.

Specifically, the maximum value for the fraction of the solvent-accessible volume (ηi,max) and the atomic volume reduction factor used in most computations based on atomic volumes can be altered. Both quantities are derived from default, hard-coded molecular topologies and not from input structures (unless an unsupported residue is in use, where the former is of course missing). They are both reported in output file SAV_BY_ATOM.dat as columns 4 and 7, respectively. A patch file supplied via FMCSC_SAVPATCHFILE will alter the ηi,max values, and a patch file supplied via FMCSC_ASRPATCHFILE will alter the volume reduction factors. In either case, each line has to contain two entries, an integer giving the atom number and a floating point number giving the new parameter constrained to the interval [0:1].

Example for FMCSC_SAVPATCHFILE:
---------------------------------------
16   0.86
4   0.92
22   0.99

---------------------------------------

This input file would change the values ηi,max for atoms 4, 16, and 22 to 0.92, 0.86, and 0.99, respectively. CAMPARI will not perform any noteworthy tests whether the patched values are meaningful. In this context, it is important to note that very small values for the ηi,max may lead to large gradients due to the compression of the interpolation regime between solvated and desolvated states. Aside from printing a report for all successful changes, the results of the patch can also be assessed by inspecting the aforementioned SAV_BY_ATOM.dat (assuming SAVCALC is appropriate).
In general, neither class of parameters is considered free, so patches of this type would seem most likely to be useful when implementing Hamiltonian support for otherwise unsupported residues. It may also come in useful for cases, where either the atomic size parameters and/or the molecular topology mean that the pairwise volume reduction factors are inappropriate (better approximation required and/or triple overlaps significant), although it should be mentioned that the radius functionality of the parameter file and radius patches for specific atoms are probably the more appropriate tools in such a case.


Free energy of solvation patch input file:

Keyword: FMCSC_FOSPATCHFILE

Prerequisite: This file is only relevant if the direct mean file interaction (DMFI) of the ABSINTH implicit solvation model is used (→ FMCSC_SC_IMPSOLV).

This file provides the input facility to specifically override the details of the DMFI model, i.e., the parsing of the molecule into solvation groups (by atoms), the weight factors for individual atoms constituting solvation groups, and the reference free energies of solvation themselves. Depending on the choice for FMCSC_FOSMODE, it may also be relevant to supply patched values for solvation enthalpies and heat capacities. For global changes of the latter type, it should be pointed out that changes directly in the parameter file are simpler (see PARAMETERS and dedicated documentation). Each line has to contain between three and six entries. The first is an integer giving the atom number using the default CAMPARI atom order (that can usually be extracted from basic output files → see above). The second is another integer identifying by a code the target solvation group. The chosen code is arbitrary as long as all separate solvation groups are referred to by a different integer. The third number is a floating point value giving the weight for the indicated atom. The sum of weights for all atoms using the same integer code in column 2 have to add up to unity exactly (if fractions occur that cannot be specified with arbitrary precision in decimal floating point representation, it is advised to provide enough digits to emulate double precision). The remaining numbers are floating point numbers giving (in this order) the reference free energy of solvation for a group, the corresponding solvation enthalpy, and the heat capacity (in kcal/mol, kcal/mol, and cal mol-1 K-1, respectively). In general, only the first number is required for a successful patch, and it needs to be specified only with the first instance of the corresponding integer code. Numbers given for further atoms belonging to the same group are ignored. Note that missing values for enthalpy and heat capacity would override values given in the parameter file (if any) by instead setting the heat capacity to zero and the enthalpy to be equal to the free energy.
A single character in the first line lets the user choose between three different modes. All three modes have identical restrictions beyond the ones already stated. First, the resultant solvation groups overlap with existing solvation groups (that are most conveniently specified in optional output file FOS_GROUPS.vmdFMCSC_FOSREPORT). None of the atoms in one of those original groups must remain unaccounted for by the patch. Second, newly defined groups must not cross residue boundaries unless they represent trivial changes to existing groups that do already cross residue boundaries (peptide unit). Here, "trivial" means that the constituting atoms are exactly the same, and that only the (nonzero) weights and/or the reference free energy of solvation are altered. Third, all weights have to be positive.

Mode 'A':   The specifications after the first line ('A') are assumed to apply only to those atoms exactly specified. This is the simplest input mode that allows maximal control (each atom can be controlled individually). Suppose the simulation encompasses two formamide molecules (6 atoms each). The standard group parsing is to have all atoms except the aldeyhde hydrogen (indices 4 and 10) constitute a single group with equal weights (0.2).

Example:
---------------------------------------
A
7   33 0.5 -8.0
8   33 0.25
9   33 0.25
11  33 0.0
12  33 0.0

---------------------------------------

The above input would change the solvation group setup for the second formamide molecule such that the group now only consists of the heavy atoms (7-9) with asymmetric weights (0.5 for N, 0.25 for C and O), and an altered reference free energy of solvation. The code "33" is arbitrary as explained above. Omitting the entries for atoms 11 and 12 would cause an error because these atoms were formerly part of an existing solvation group, and would be left unaccounted for.

Another Example:
---------------------------------------
A
9   14 0.5 -5.0
8   -1 0.5 -5.0
7   -1 0.5
10  15 1.0 -0.1
11  14 0.25
12  14 0.25

---------------------------------------

This example would define three solvation groups for the second formamide molecule. The CO unit (code "-1"), the NH2 unit (code "14"), and the lone aldehyde hydrogen (code "15"). Because the aldehyde hydrogen was previously not part of a solvation group, it can be included freely into new solvation groups.

Mode 'T': This mode works similarly to mode 'A' with the major differences that all requests are generalized in molecule type-dependent fashion. The specified indices have to refer to the first instance of a given target molecule type. The two examples listed above would both lead to premature termination of CAMPARI, because both reference the second of two formamide molecules.

Example:
----------------------------------------------------------------------------
T
3   1 0.3333333333333 -5.5 -10.0 50.0
1   2 0.5 -5.5 -6.5
2   2 0.5
5   1 0.3333333333333
6   1 0.3333333333333

----------------------------------------------------------------------------

As a result of this input file, the solvation groups on both formamide molecules would be altered. They would be changed to separate groups covering the CO and NH2 units, respectively, with both groups having reference free energies of solvation of -5.5kcal/mol and equal atomic weights. In addition, the CO unit would receive a solvation enthalpy of -6.5kcal/mol, and the NH2 unit would receive an enthalpy of -10.0kcal/mol and a heat capacity of 50.0 cal mol-1 K-1. These latter modifications would only be relevant if a temperature-dependent ABSINTH DMFI were in use.

Mode 'R': This mode allows selection and generalization by residue type, but works otherwise identically to mode 'T'. For polymeric systems, it will sometimes be of interest to selectively change details of the definition of solvation groups for a specific type of residue. For example, how does polyglutamine respond to global alterations in the sidechain solvation groups. The numbering should refer to the atomic indices of the first instance of the targeted residue type. However, care must be taken, since CAMPARI - by necessity - distinguishes terminal from nonterminal residues with regards to its type. Consider the sequence (Gln)5 with charged termini. To change the assignments for the internal glutamine residues, one would have to find the atomic indices of the second glutamine residue, and enter the appropriate modifications. Both the N-terminal and the C-terminal glutamine residue would require an additional specification each, since they are technically of different type. The same is true for crosslinked residues (if any).

As with all patch functionality, it is recommended that users check meticulously for possible errors in the parsing of patch input files. By enabling FMCSC_FOSREPORT in the key-file, CAMPARI will provide a report of all modified solvation groups (before vs. after), and users are encouraged to use this output in helping them diagnose potential errors beforehand.



Section 4: Files defining auxiliary terms to the energy function


(back to top)

CMAP input files:

Keyword: FMCSC_CMAPDIR

Required for: Use of CMAP or similar corrections → FMCSC_SC_BONDED_M

CMAP corrections are two-dimensional correction energy surfaces introduced in the CHARMM force field that operate on the φ/ψ-surfaces of polypeptide residues. From input files, a 2D map is constructed and serves as an additional energy term that is made smooth and continuously differentiable via suitable interpolation. The names of the files containing these maps have be put directly into the parameter file (see charmm.prm for an example). This is detailed elsewhere. With appropriate choices for the filenames and the directory they reside in (→ FMCSC_CMAPDIR), it is possible to utilize this functionality as a general bias or correction potential even for force fields (parameter files) that do not natively contain such corrections.


Torsional bias potentials input file:

Keyword: FMCSC_TORFILE

Required for: Biasing potentials acting on individual polypeptide or polynucleotide, rotatable dihedral angles → FMCSC_SC_TOR

The specification of simple torsional bias potentials is possible through either global or residue-specific input. The point of these potentials is usually to restrain the polymer to some specific conformation with the most common application being the restraint to the conformation provided by a structural input file. The first line has to contain the character 'G' (Global) or 'R' (Residue-specific) to distinguish between the two.

The actual input starts in the second line and is interpreted as follows:

'G'-mode:
An integer code with a defined number of parameters. Modes 1 and 3 specify harmonic restraint potentials and modes 2 and 4 are used for Gaussian well potentials.
  1. VTOR = Σi ki·(ϑii0)2

    Here, "i" runs over all the torsions which are rotatable in Monte Carlo or torsional dynamics calculations (this includes the sugar bond in nucleotides and the φ-angle in proline but not any other pucker degrees of freedom). To make the file format universal, ten equilibrium positions (ϑi0) and ten force constants (ki) have to be provided. Entries 7-10 will address χ1-4 irrespective of polymer type. Nucleotides will have the (up to) five freely rotatable backbone angles (see SEQFILE) in entries 1-5 and the dihedral around the sugar bond (C4*-C3*) in entry 6. Polypeptides will have ω-, φ-, and ψ-angles in entries 1-3 (entries 4-6 are unused). Small molecules will generally not use anything in entries 1-6 except secondary amides which have ω-angles in entry 1. First, the ten equilibrium positions are listed, then the ten force constants for a total of 21 columns (the first one being occupied by the mode identifier).
  2. VTOR = -exp(-Σiii0)2 / (2.0σi2) )

    This potential uses an identical file format except that the standard widths σi are read in the latter 10 entries rather than the harmonic force constants. VTOR creates Gaussian wells (in multiple dimensions) for each residue. It can be used to subtly favor certain regions of phase space without introducing any stringent restraints. The formula implies a unit factor such that the conformation perfectly matching the ϑi0 will give a favorable energy of FMCSC_SC_TOR kcal/mol.

  3. VTOR = Σi ki·(ϑii,struc0)2

    The only difference to mode 1 is that the ϑi,struc0 are obtained directly from the simulations initial configuration (e.g. from structural input) and not from this input file. Hence, only 10 entries are needed with the necessary force constants ki.

  4. VTOR = -exp( -Σiii,struc0)2 / (2.0σi2) )

    To mode 2 what mode 3 is to mode 1.

Example:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 G
 1 0.0 -57.0 -47.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.02 0.02 0.0 0.0 0.0 0.0 0.0 0.0 0.0

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This input file would turn on a biasing potential (stiffness: 0.02 kcal/(mol· deg.2)) biasing the backbone torsional angles of all polypeptide residues toward the ideal αR-geometry. It would also bias all present polynucleotide residues to the same conformation in their respective backbone degrees of freedom #2 and #3. It would not affect any polypeptide termini or small molecules.

'R'-mode:
Largely, this format agrees with the global specifications. The only differences are that:
  1. An extra column is needed giving the residue number for which the torsional bias subsequently specified on the same line is to be turned on.
  2. Multiple entries can be provided, each for an individual residue. Note that it is also possible to override the settings for an individual residue within the same file.

Example (let us assume residue 3 is a lysine and residue 12 a uracil ribonucleotide (RPU):
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
R
3  1  0.0 -57.0 -47.0 0.0 0.0 0.0 120.0 70.0 0.0 0.0 0.0 0.02 0.02 0.0 0.0 0.05 0.05 0.0 0.0 0.0 12  4  30.0 30.0 30.0 30.0 30.0 0.0 0.0 0.0 0.0 0.0

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This input file would turn on biasing potentials for two residues - #3 and #12. There would be a harmonic restraint biasing the backbone conformation of LYS3 to that of an ideal αR-helix. A stronger bias (ki = kcal/(mol· deg.2)) would bias the value of χ1 and χ2 to values of 120.0° and 70.0°, respectively. In contrast, RPU12 would experience a Gaussian bias favoring the backbone conformation it initially assumes (excluding the sugar bond) with a general width of 30.0°.


Polymeric biasing potentials input file:

Keyword: FMCSC_POLYFILE

Required for: Biasing potentials on polymeric properties of molecules → FMCSC_SC_POLY

Polymeric biasing potentials (see FMCSC_SC_POLY) are a feature in CAMPARI that requires some expertise on the user's end, because the resultant energy landscape for a restrained molecule can be very rugged. The potentials apply a harmonic restraint to at most two polymeric descriptors: 1) a transform of the radius of gyration meant to map values to the unit interval over a wide range of chain lengths and conditions; 2) a measure of asphericity. This input file allows the user to request polymeric restraints on a per-molecule basis or on a per molecule-type basis. The first line has to contain the character 'M' or 'T' to distinguish between the two.

The actual input starts in the second line and is interpreted as follows:

'M'-mode:
An integer code corresponding to the target molecule (numbered as in the sequence file) with four mandatory parameters, specifying the harmonic restraints on the two parameters:

 t = <f1·(f2·(Rg/Lc))f3/N0.33>
δ = 1.0 - 3.0·(λ1·λ2 + λ2·λ3 + λ1·λ3)/(λ1 + λ2 + λ3)2

Here, the fi are arbitrary factors meant to achieve the desired mapping to the interval [0,1] (values are 2.5, 1.75 and 4.0). The radius of gyration is denoted by Rg, Lc is an estimate of the contour length of the chain, and N is its sequence length. The λi are the eigenvalues of the gyration tensor (→ elsewhere for additional information).

VPOLY = kt·(t - t0)2 + kδ·(δ - δ0)2

The four floating point numbers required are: t0, δ0, kt, kδ. Both t0 and δ0 have to lie within the interval [0:1] as they are (pseudo-)normalized quantities.

The stiffnesses are simply (twice) the spring constants of the two harmonic terms and should be given in kcal/mol as both order parameters are unitless.

Example:
---------------------------------------------------------------
M
1   0.8   0.5   10.0   100.0
13   0.2   0.1   100.0   0.0

---------------------------------------------------------------

This input file would turn on a biasing potential driving molecule #1 toward a value of t of 0.8 (stiffness 10 kcal/mol and toward an asphericity of 0.5 (stiffness 100 kcal/mol). Conversely, molecule #13 would be driven toward a value of t of 0.2 with stiffness of 100 kcal/mol with no restraint on δ. It is very important to note that asphericity and t are not independent. Hence, restraining both of them to a pair of target values that are mutually exclusive will simply lead to a competition between the two force constants (aided by the underlying Hamiltonian and entropy); a scenario of little practical use. Furthermore, it should be emphasized that t is a pseudo>normalized quantity and that its accessible range will be somewhat ill-defined for systems of extreme size (very small, very large) and/or of extreme conformation (fully extended rods, collapsed points). To a lesser extent this is even true for δ, which by virtue of polymer geometry (finite width) will prohibit values rigorously approaching zero or unity for finite length chains.

'T'-mode:
An integer indicating corresponding to the target molecule type as determined by the first occurrence in the sequence file. Otherwise the input is treated the same.

Note that these potentials will never be disabled even if the molecule in question is practically rigid. In conjunction with the energy function and degrees of freedom this can easily lead to extreme strain (e.g. when using Cartesian dynamics), which may compromise simulation stability.


Tabulated potential code requests:

Keyword: FMCSC_TABCODEFILE

Required for: Use of tabulated interactions on interatomic distances → FMCSC_SC_TABUL

This file allows the specification of specific interactions for tabulated potentials to apply to. It works in almost exactly the same way as the one requesting general distance distribution analysis (see FMCSC_PCCODEFILE below). They naturally use the same framework as the requests are in fact identical (efficiently allow user to specify atom pairs or sets of pairs with an integer code corresponding to either a specific interaction potential (here) or to a separate distance distribution analysis request (PCCODEFILE). The only major difference is that inconsistent numbering of the tabulated potentials will not be tolerated, as it would require a nonsensical file for the actual potential input (see TABPOTFILE). It is important to remind the user that mistakes are very easily made in an input file this flexible. Hence, there is an output file summarizing all the terms requested (see TABULATED_POT.idx in OUTPUTFILES). Finally, from an implementation point of view, it must be mentioned that the type of input specification has no impact on the speed of the calculation. The requests are always transformed into identical data structures, which are queried efficiently during energy evaluation.


Tabulated potential input file:

Keyword: FMCSC_TABPOTFILE

Required for: Use of tabulated interactions on interatomic distances → FMCSC_SC_TABUL

A file with n+1 columns, where n is the highest integer code of a unique potential requested in FMCSC_TABCODEFILE. The number of rows is arbitrary and will be determined by the program. Note, however, that the potentials have to be stored in memory.

The first column contains the distance information (only regularly spaced bins are allowed) and should cover the distance range available to the specific atom pairs (only one distance spectrum possible for all potentials).

The i+1th to i+(n+1)th columns then contain the n individual potentials according to their integer code in units of kcal/mol (i.e., if the index file indicates that atoms 3 and 25 interact through potential 8, then the 9th column of the potential file gives the tabulated potential used for this atom-atom pair).

Example:
----------------------------------------------
2.5   10.0  10.0
3.5   10.0  10.0
4.5   10.0   0.0
5.5   0.0   0.0
6.5   10.0  10.0
7.5   10.0  10.0

----------------------------------------------

Note the regular distance spacing.
The possible integer codes to be used in the index file would hence be 1 or 2, anything else will create an error. Also note that the code does cubic Hermite spline interpolation on the potentials, i.e. provides a continuous and smooth function. The use of a cubic spline is the reason that it is recommended to explicitly taper off the potential to a constant value at the limits of the distance range considered. The inclusion of the two bins at 2.5 and 7.5Å is made for this purpose, since otherwise the tangent at 3.5 and 6.5Å would have been set to 0.0. Correspondingly, distance values outside of the limits are treated just like the last available bin (i.e., in the example above distance values of 2.0 or 50.0 would both give a potential value of 10.0 kcal/mol). Tangent data can be supplied in a separate file (see below).


Tabulated potential derivatives (tangents) input file:

Keyword: FMCSC_TABTANGFILE

Required for: Use of tabulated interactions on interatomic distances → FMCSC_SC_TABUL if it is desired that the derivatives at the tabulated points are not estimated numerically

A file with n columns, where n is the highest integer code of a unique potential requested in FMCSC_TABCODEFILE. The number of rows is arbitrary and will be determined by the program. Note, however, that the potentials and tangents have to be stored in memory.

The columns contain the n individual potential derivatives according to their integer code in units of kcal·mol-1·Å-1 (i.e., if the index file indicates that atoms 3 and 25 interact through potential 8, then the 8th column of the potential derivative file gives the tangents to be assumed for this tabulated potential and atom-atom pair). Note that the required derivative is that with respect to the (increasing) distance coordinate. The potentials themselves are given in a different input file.

Example:
----------------------------------------------
0.0   0.0
0.0  -5.0
-10.0 -2.0
0.0    2.0
10.0   5.0
0.0   0.0
----------------------------------------------

Note that CAMPARI takes the information about how many tabulated potentials are present exclusively from the potential input. It is therefore up to the user to ensure that the appropriate number of columns is present in this input file. If this file is incomplete or absent, the missing tangent values are constructed numerically from the potentials themselves. It is not possible to provide values starting from a distance value other than the first one in the tabulated potential input file. The provided derivatives are not checked in any way and taken strictly as is.


Distance restraint potential input file:

Keyword: FMCSC_DRESTFILE

Required for: Biasing potential to introduce distance restraints between pairs of atoms → FMCSC_SC_DREST

The first line has to contain only a single integer, viz, the number of restraints in the following list.

The actual input starts in the second line and is interpreted as follows:

Five mandatory parameters: two integers (i,j) specifying the atom pair for which the restraint potential is to be applied; a type indicator specifying the type of restraint potential to act on the distance of these two atoms; the parameters of the potential, specifically, the target distance R0 in Å, and the parameter kij in kcal·mol-1·Å-2 (half the force constant for the implied Hookean spring). The available types are all harmonic in nature and differ as follows:
  1. VDREST(i,j) = kij·(Rij - R0)2
  2. VDREST(i,j) = kij·H(Rij - R0) (Rij - R0)2
  3. VDREST(i,j) = kij·H(R0 - Rij) (Rij - R0)2
Here, H(x) is the Heaviside step function. The above potential would be added to the overall Hamiltonian.

Example:
---------------------------------------
2
11   45   1   10.0   5.0
22   56   2   15.0   10.0

---------------------------------------

This would add two distance restraints to the Hamiltonian. The first would act between atoms 11 and 45 and restraint their distance to a minimum position of 10.0Å with a kij of 5.0 kcal·mol-1·Å-2. The second would act between atoms 22 and 56 and harmonically restrain their distance if it exceeds a lower bound of 15.0Å with a kij of 10.0 kcal·mol-1·Å-2. Negative force constants are converted to positive ones (the negative potential is not allowed). Note that negative distances are disallowed for obvious reasons. Furthermore, positive distances can result in a frustrated energy landscape if they are small enough to place the minimum restraint energy in a distance regime leading to steric clashes in the presence of excluded volume terms. This is particularly harmful in gradient-based sampling approaches. Users should keep in mind that it is possible to specify multiple restraint terms for the same atom pair and that distance restraints are never subjected to cutoffs (unlike the more general functionality of tabulated potentials).


Density restraint input map:

Keyword: FMCSC_EMMAPFILE

Required for: Biasing potential to introduce a global, spatial density restraint on a simulation density derived from an atomic property such as mass → FMCSC_SC_EMICRO

This input file has to be provided in a specific file format in binary form. The NetCDF library allows a reasonably efficient storage of array data in binary form while maintaining a flexible and adaptable interface. For instance, trajectory output can also be written in NetCDF format. The file itself is binary, but the library comes with tools to convert appropriately formatted files from human-readable versions. For further details, users are referred to the NetCDF documentation.
The format itself is similar to that used by UCSF Chimera when writing density information in NetCDF format. Specifically, it should have the following fields defined:
  • Dimensions of spatial (the dimensionality itself) and X, Y, and Z (for the case of a three-dimensional input density, which is the only case currently supported), where the latter three dimensions give the numbers of grid cells in the respective dimensions
  • A variable deltas with an attached attribute of units; this is an array of size spatial and type float, and the default unit is Å
  • A variable densities with an attached attribute of units; this is a three-dimensional array of size Z,Y,X and type float, and the default unit is g/cm3
  • Optional, global attributes xyz_origin and xyz_step, both (currently) requiring the specification of three floating point numbers in double precision (xyz_origin is completely optional and xyz_step can substitute for variable deltas (and vice versa))
  • An optional variable origin with an attached attribute of units that is also an array of size spatial and type float with a default unit of Å
  • The data section has to contain the appropriate number of entries for densities and also for deltas in case xyz_step is not defined
Note that spatial density files written by CAMPARI define some additional, global attributes (title, program, and programVersion), none of which are read, however. Note as well that the format described here is also used for the spatial density files CAMPARI writes itself, specifically DENSITY.nc and DENSITY_INPUT_PHYS.nc.
Once the input density map is read, it is processed, i.e., interpreted quantitatively, to yield the physical input density Ξijk for a given lattice cell with indices i, j, and k. The details of this linear transformation are described elsewhere, and rely on several keywords associated with the density restraint potential, most notably EMTHRESHOLD and EMTOTMASS. The physical input density (the interpreted map is written to output file DENSITY_INPUT_PHYS.nc) is compared to the simulation density to yield the density restraint potential, EEMICRO as described elsewhere. The computation of the simulation density is desribed in detail for keyword EMCALC.



Section 5: Files relevant for global behavior of analysis functionality


(back to top)

Analysis group input file:

Keyword: FMCSC_ANGRPFILE

This input file allows the user to override the default parsing the software does for analysis purposes where data for molecules of identical type are often pooled (see documentation of output files). In addition, it allows tagging of analysis groups as solvent which has consequences for some of the analyses. The first letter in the file indicates the mode:

'T' or 't':
(molecule type-based selection):
Obviously, it is impossible to parse finer than by molecule type using this mode. The sole purpose here is solvent-tagging (by default, all single-residue molecules are tagged as solvent molecules):
----------------
XXX
----------------
XXX is a positive integer (solute) or negative integer (solvent) and the row number corresponds to the molecule type number by sequence of occurrence (see SEQFILE). Note that it is not possible to pool molecules of different type into the same analysis group, and that the actual group numbers provided in this file are irrelevant (beyond their sign).

'M' or 'm':
The format is identical but row numbers now indicate the molecule number instead of molecule type number (again, by sequence of occurrence). This format allows breaking of molecules of the same type into separate analysis groups which is useful (and required from a statistical mechanics point of view) anytime one or more of the molecules of a specific type are made unique by the application of restraint or other bias potentials, by an asymmetric system setup, etc ... Furthermore, this can be useful for error analyses even if the molecules are rigorously identical. The number provided by XXX will - as before - indicate solute/solvent-identity via its sign but each desired unique group now requires also a unique integer (at least as many as there are molecule types).


Input files to provide snapshot index sets for trajectory analysis mode:

Keyword: FMCSC_FRAMESFILE

Prerequisite: Trajectory analysis mode → FMCSC_PDBANALYZE

This input file allows analysis runs to operate only on an arbitrary subset of an input trajectory. In addition, it allows the user to provide floating-point weights for each snapshot in the trajectory. This can be useful if a trajectory is analyzed that was obtained using methodology that does not yield a well-defined statistical ensemble. In such a case, it is common to apply reweighting techniques of the weighted histogram variety that may yield per-snapshot weights. In order for the analysis features of CAMPARI to still be useful, it will then be necessary to deviate from the assumed unweighted (Boltzmann) averaging.
Therefore, this file accepts two alternative formats (per row). If a single integer is specified, the corresponding frame (numbered sequentially in the input trajectory) is added to the list of frames to analyze with a weight of unity. If an integer and a real number are provided, the frame specified by the integer is added to the list of frames to analyze with a weight given by the real number. The list thus obtained is ordered, and duplicate frames are removed ( see elsewhere for details).



Section 6: Files relevant to specific analysis routines


(back to top)

Backbone segment distribution datafile:

Keyword: FMCSC_BBSEGFILE

Required for: Analysis of φ/ψ-based backbone segment distributions → FMCSC_SEGCALC

A36x36 (i.e., 10 degree-spacing) Φ/Ψ-map indicating regions by integer codes that (can) form regular polypeptide secondary structure.

  1 = β
  2 = PII (Polyproline Type II Helix)
  3 = Unusual Region ("Pass")
  4 = αR (Right-Handed α-Helix)
  5 = Inverse C7 Equatorial (γ'-Turn)
  6 = Classic C7 Equatorial (γ-Turn)
  7 = Unusual Region (Helix with 7 Residues per Turn)
  8 = αL (Left-Handed α-Helix)

The file reproduces the visual picture a Ramachandran map gives for L-polypeptides and is read in accordingly. There are two default versions ("bbseg.dat" and "bbseg2.dat") in the data directory. The first one is a little tighter in its definitions (in particular PII and αR) while the second one is more lenient.


Input file with exchange trace for parallel analysis runs:

Keyword: FMCSC_TRACEFILE

Required for: (Un)scrambling of trajectories in parallel trajectory analysis runs on sets of trajectories generated for example by replica exchange simulations → FMCSC_REMC

Any number of lines giving a total of Nnodes+1 integers, where Nnodes is the number of different replicas. The numbers in the first column have to be increasing monotonously, and the number in the remaining columns have to be such that every integer from 1 to Nnodes occurs exactly once on every line.

The file is interpreted as a running map of some initial conformation with a set of conditions based on the assumption that swaps have occurred in the process (simulation) generating the input trajectories. Then, the desired outcome of using this functionality is to reorganize the set of trajectory data such that we obtain trajectories that are continuous in geometry. The most common application for this would be to "unscramble" replica exchange trajectories.
The provided step numbers (first column) are related to the frames in the input trajectories via keywords FMCSC_RE_TRAJSKIP and FMCSC_RE_TRAJOUT. The assumption is that for a given step, swaps have occurred first, then trajectory files were appended. In order for the procedure to work as intended, the current map for every snapshot in the input trajectories has to be present. It does not matter if there is more information, or whether the step numbers are exactly matched with FMCSC_RE_TRAJOUT.

Example:

30 4 3 2 1
60 1 3 2 4
90 1 2 3 4
120 1 4 2 3


Let us furthermore assume that FMCSC_RE_TRAJSKIP is 0, FMCSC_RE_TRAJOUT is 50, and there are 10 frames in the trajectories. The number of replicas obviously has to be 4. Then, the above file implies that for the first snapshot, CAMPARI assumes a mapping of "4 3 2 1" meaning that the first snapshot in the trajectory starting with "N_000_" (corresponding to the first replica), will end up being transferred to (and eventually analyzed and/or written by) the fourth replica. This is because the mapping current at the 30th step is, based on the file, the only one CAMPARI can assume current also for the 50th step, i.e., the first snapshot in the trajectories. By the same logic, the second snapshot would use the mapping "1 2 3 4" (from step 90), and all further snapshots would use the mapping "1 4 2 3" (from step 120). This implies that the second line in the input file is redundant (but does no harm). If a snapshot is computed to correspond to a step number smaller than any step number listed in this file, the default mapping is assumed (here,"1 2 3 4"). A suitable input file for this keyword is created by CAMPARI itself during replica exchange runs (keyword FMCSC_RETRACE enables the generation of output file N_000_REXTRACE.dat).


Requests for general PC analysis input file:

Keyword: FMCSC_PCCODEFILE

Required for: Specific interatomic distance distribution (or pair correlation) functions → FMCSC_PCCALC

Generalized pair correlations / distance distributions (or arbitrary application of tabulated potentials → TABCODEFILE) in systems with many atoms can be quite complicated and this input file tries to minimize the effort needed on the user-end to reward as diverse requests as possible. The file works in either of currently three modes listed below. Note that the input format also somewhat corresponds to the memory structure used to store the request.

Mode 1:

The first line specifies the mode and the number of listed specific atom-atom entries. These entries consist of the numbers of two atoms and with the integer code for the GPC component to which this atom-atom distance distribution should contribute. This specific list format is trivial to master but very limited when lots of analogous requests are wanted. It is probably best used in highly specific systems (such as a single polypeptide).

Example:

Suppose you study a peptide with three lysine residues and want to know the distribution of the three neutralizing counterions (atoms 301, 302, 303 in the system). Suppose the lysine sidechain nitrogens are atoms 27, 48, and 155, then the appropriate input file is as follows:

------------------------
1  9
27  301  1
27  302  1
27  303  1
48  301  1
48  302  1
48  303  1
155  301  1
155  302  1
155  303  1

------------------------

Mode 2:

Similar to mode 1, only that the list-format is interpreted to be a request to treat all molecules of the same type analogously. Atom numbering must now refer to the first molecule of its type if the two molecules are different. If the atoms belong to the same molecule and the molecule is the first of its type, then the request is honored as an intramolecular distance distribution. To request intermolecular distance distributions for molecules of the same type, the numbers should correspond to the first and second instance of this molecule type. All other requests are ignored.

Example:

Take the same example, only that there is a high concentration of background salt. Using mode 1, the file with the same request would become fairly long. Also suppose that you want to know the chloride-chloride distribution function, and the sodium-chloride and sodium-sodium distribution functions. Let us assume the first two sodium atoms are 401 and 402, then this is the appropriate input file (note it is actually shorter and now requests four unique distance distributions):

-----------------------
2  6
27  301  1
48  301  1
155  301  1
301  302  2
301  401  3
401  402  4

-----------------------

Mode 3:

This is a slightly different mode which honors requests in form of atom-atom matrices. The main disadvantage of this input format is that its sanity can easily be corrupted by the user and that large molecules are difficult to handle. The idea is for the user to supply a list of molecule-type to molecule-type requests using header lines of "T i j" (intermolecular between types i and j) or "t i" (intramolecular within type i. The main advantage is that now explicit referencing to atom numbers is needed and that it can accomplish fairly complicated requests in a very compact format. This is best explained in an example.

Example:

Suppose you have a mix of SPC water (type #1: 3 atoms) and urea (type #2: 8 atoms) and want to know a variety of possible site-site distribution functions to characterize solution structure. Then an appropriate input file would look as follows:

--------------------------------------------------
3
T  1  2
1  0  2  2  3  3  3  3
0  4  0  0  0  0  0  0
0  4  0  0  0  0  0  0
T  1  1
5  6  6
0  7  7
0  0  7
T  2  2
0  0  0  0  0  0  0  0
0  0  0  0  8  8  8  8
0  0  0  0  9  9  9  9
0  0  0  0  9  9  9  9
0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0

--------------------------------------------------

Note that the first matrix is asymmetric (first type always determines number of rows). Here we request to measure the water O to urea C (#1), N (#2), and H (#3), and the water H to urea O (#4) distribution functions. The other two matrices are intermolecular for molecules of the same type and are by definition symmetric. Note that only the upper right half is read; the numbers must all be there, however. We request all possible water-water correlation functions (O-O, O-H, and H-H in #5, #6, and #7) and the O-H (#8) and N-H (#9) distribution functions for urea-urea. Of course it is still necessary to know the order of atoms within a molecule type.

Notes:

Inconsistent numbering in the components is removed and announced through log-output (i.e., components might be renumbered). Nothing stops a user from supplying illogical requests (i.e., for example to omit some of the 9's in the urea-urea matrix in the last example) since the code knows nothing about molecular symmetry. Furthermore, in particular for modes 1 and 2, nothing stops the user form supplying an atom pair more than once which can give rise to arbitrary weights when several pairs are pooled together. This is potentially very misleading and should be kept in mind when encountering surprising results. Lastly, the code does prohibit pooling data from inter- and intramolecular distances into the same component. This is because intermolecular distances are directly transformed to pair correlation functions whereas intramolecular ones are not. In any event, it is always recommended to double-check this input file.


Input file for tabulated Bessel functions:

Keyword: FMCSC_BESSELFILE

Required for: Computation of predicted fiber diffraction patterns (preliminary implementation) → FMCSC_DIFFRCALC

This file should have a header line with two integers and a floating point number specifying the number of Bessel functions (order, "N"), the number of distance bins ("M"), and the distance resolution the actual file then has N columns and M rows of floating point numbers to supply the tabulated Bessel functions. Note that the first distance bin is assumed to be exactly zero. Also note that this file is very easily corrupted and that its sanity has to be ensured by the user. The file delivered with CAMPARI is called "BesselFunctions.dat" in the data/ directory and tabulates Bessel functions up to an order of 500 for 2001 distance bins. The latter corresponds to a maximum argument value of 500.0, as the distance resolution is set to 0.25. Finally, note that the whole file is read into memory. The file size will therefore be a good indicator of what the memory demand imposed by diffraction calculations in addition to the underlying calculation is.


Input files to provide atomic index sets:

Keywords: FMCSC_ALIGNFILE, FMCSC_CFILE (see FMCSC_CDISTANCE for details), FMCSC_SAVATOMFILE, and FMCSC_TRAJIDXFILE

This is the simplest input file format CAMPARI uses and consists of nothing but a number of lines specifying a single atomic index (assuming CAMPARI's intrinsic ordering → be careful with PDBNUCMODE) on each line. N lines will therefore constitute an index set of size N assuming all entries are valid (legal indices). In some cases, such an index file may be obtained by running an appropriate script over a CAMPARI-generated pdb file. Atom index sets are used in a variety of contexts.


Input files to provide atom pair index sets:

Keyword: FMCSC_CFILE

This input file type is very similar to atomic index input files. Here, instead of providing a single index entry on each line denoting an individual atom, instead a pair of indices needs to provided to specify a pair of atoms. This is currently only used in certain modes of structural clustering (see FMCSC_CDISTANCE and FMCSC_CFILE for details).

Example:
----------------
2   4
3   12

----------------

The above input file would define two pairs of atoms with indices of 2/4 and 3/12, respectively. This could for example be a set of (two) interatomic distances to be used for structural clustering. Note that double entries (entry order is ignored) are removed.


Input files to provide torsional index sets:

Keyword: FMCSC_CFILE

This input file type is technically identical to atomic index input files. The only difference is that here a list of system dihedral angle indices is provided. This is currently only used in structural clustering (see elsewhere for details). The intrinsic numbering of dihedral angles is most easily extracted from the header line to the output file FYC.dat controlled by keyword FMCSC_TOROUT. Briefly, given the order of residues in the sequence input file, the ω-angle in the first residue will be followed by the φ-angle, ψ-angle, the nucleic acid backbone angles, and lastly, the sidechain χ-angles. If there is a five-membered ring additional terms are printed. These are skipped in the numbering for this input file with the exception of the first dihedral angle (the φ-angle in proline and similar polypeptide residues and the dihedral angle along the C3*-C4* bond in polynucleotides). To aid with this admittedly unwieldy procedure, a summary of the selected angles will be written to log-output (standard out for single-core runs).


Example:
----------------
2
4

----------------

Suppose the sequence input file specifies proline dipeptide (Ace-Pro-Nme) as the first and only molecule in the system. Then, the above input file would select the φ-angle of the proline residue, and the ω-angle of the N-methyl cap as the torsional degrees of freedom to be included in the clustering. An entry of "1" would have selected proline's ω-angle, and "3" would have picked out proline's ψ-angle. Note that the various dihedral and bond angles of the five-membered ring are skipped.


Input file to provide trajectory breaks:

Keyword: FMCSC_TRAJBREAKSFILE

Required for: Avoiding that every microstate transition in the trajectory is counted toward output and analysis in STRUCT_CLUSTERING.graphml, cut-based pseudo free energy profiles, and the output of the progress index method. Naturally, this is is irrelevant for any simulation not using clustering and related methods.

A trajectory usually is a series of continuous snapshots of a system. The implied continuity is used to infer the system's properties in terms of transitions, pathways, and dynamics. This is often done using network-based methods (most prominently in Markov state models). The connectivity of such a network or graph is defined by the edges and their probabilities, which are most often inferred from the trajectory itself. However, not all transitions may be qualitatively similar, e.g., the swaps in replica exchange calculations differ from the incremental evolution produced by the basic sampling engine. As another example, trajectories may be concatenated in certain parallel runs or in trajectory analysis mode. This input file can be used to eliminate such apparent transitions before the corresponding analyses are performed.


Example:
----------------
10000
20000
30000
40000
50000

----------------

This file would tell CAMPARI that the transitions from the 10000th to the 10001st step, from the 20000th to the 20001st step, and so on, are all to be excluded from relevant analyses. Note that this input always refers to the basic and complete simulation length defined via FMCSC_NRSTEPS. This means that the numbers must not be adjusted to reflect settings for FMCSC_EQUIL, FMCSC_CCOLLECT or similar keywords. The output in STRUCT_CLUSTERING.clu does of course reflect these choices, which means that any effective transition spanning one or more excluded step-to-step transitions is also excluded.
Lastly, note that the removal of transitions can of course fracture the network, which may render some analyses impossible to perform or difficult to interpret.





Design downloaded from free website templates.