Input Files for CAMPARI Runs
Classes of Input Files:
- Files defining the system:
- Files altering the system:
- Files redefining standard terms of the energy function:
- Files defining auxiliary terms to the energy function:
- Files relevant for global behavior of analysis functionality:
- Files relevant to specific analysis routines:
- Files relevant to the NetCDF analysis mode:
Notes on Building CAMPARI Input Files
IMPORTANT COMMENT FOR MPI RUNS:
While the key-file only needs to be visible to the master node, most data-files (like the steric grids, the backbone segment file or of course the sequence information) need to be read by all nodes. Since the execution is forced to be strictly sequential, it is absolutely valid to point them all to a single copy of those files, i.e., jobs can very well be run in a pseudo-single-CPU setting on a cluster-wide NFS-mount. Otherwise, consistency between nodes with regard to the contents of those files has to be ensured by the user (or unpredictable crashes loom).
Section 1: Files defining the system
(back to top)
Sequence input file:
Keyword: FMCSC_SEQFILEAside from the parameter file, this file is the only one that is always required.
This input file specifies the sequence of the system to be simulated or analyzed. Every residue (the base organizational unit) occupies one line and uses a base three-letter representation. There is a set of polymer residues and small molecules that is supported in nearly every type of run CAMPARI can perform since CAMPARI possesses an internal representation of those, i.e., it is aware of molecular topology and intrinsic flexibility. These are described next. However, it is also possible to obtained limited support for other residues and molecules including complex polymers. Here, the topology and internal flexibility of each unsupported entity are inferred from a structural input file and by analogy. The details of the inference from structural input are provided elsewhere, but an overview of what is needed in the sequence file is given below the sections on natively supported residues.
Polypeptide residues:
Three-letter code, all twenty common amino acids are supported:
GLY ALA VAL LEU ILE
PRO MET SER THR ASN GLN CYS
PHE TYR TRP HIE/HID ASP GLU
ARG LYS
In addition to the histidine tautomers (HID is protonated on ND1 and HIE on NE2), alternative protonation states can be generated by:
ASH CYX GLH HIP LYD
TYO
These derive as follows from the standard amino acids:
ASP → ASH (aspartic acid by addition of a proton, pKa in free form ~3.9)
CYS → CYX (thiolate anion by removal of a proton, pKa of CYS in free form ~8.0)
GLU → GLH (glutamic acid by addition of a proton, pKa in free form ~4.1)
HIE/HID → HIP (protonated histidine (imidazolium cation) by addition of a proton, pKa in free form ~6.0)
LYS → LYD (neutral n-butylamine side chain by removal of a proton, pKa of LYS in free form ~10.5)
TYR → TYO (phenolate anion by removal of a proton, pKa of TYR in free form ~10.5)
Note that the neutral form of arginine is a species of almost zero practical relevance due to the pKa of ~12.5. It is not supported by CAMPARI at the moment. By adding _N, _C, or _D any residue can be made C-terminal, N-terminal (each charged) or the chirality can be changed to D(R) (except GLY). To achieve both or to make a free amino acid simply use both modifiers (i.e., _C_D or _D_C would give a C-terminal D-amino acid). For compatibility, CAMPARI also allows the use of most PDB codes for D-amino acids: these are DAL (D-alanine), DVA (D-valine), DLE (D-leucine), DIL (D-isoleucine), DPR (D-proline), MED (D-methionine), DCY (D-cysteine), DSN (D-serine), DTH (D-threonine), DSG (D-asparagine), DGN (D-glutamine), DAS (D-aspartate), DGL (D-glutamate), DLY (D-lysine), DAR (D-arginine), DTR (D-tryptophan), DPN (D-phenylalanine), and DTY (D-tyrosine). These names will just be translated, both for sequence input, and in the presence of (partial) structural input. For the latter, the chirality specification is of course implied by the structure (if resolved), so often redundant. Histidine, modified protonation states, and nonstandard amino acids (see below) have no such functionality.
Non-standard amino acids have the following names:
AIB ABA NVA NLE ORN DAB PTR TPO SEP KAC KM1 KM2 KM3
Since the 3-letter code is not always linked to any strict standard, the exact chemical definitions are given next:
AIB (alpha-amino isobutyric acid), ABA (alpha-amino butyric acid), NVA (norvaline = alpha-amino valeric acid), NLE (norleucine = alpha-amino capronic acid), ORN (ornithine = alpha,delta-diamino valeric acid), DAB (alpha,gamma-diamino butyric acid), PTR (hydrogen phosphate ester(-) of tyrosine), TPO (hydrogen phosphate ester(-) of threonine), SEP (hydrogen phosphate ester(-) of serine), KAC (NZ-acetylated lysine), KM1 (NZ-monomethylated lysine(+)), KM2 (NZ,NZ-dimethylated lysine(+)), KM3 (NZ,NZ,NZ-trimethylated lysine(+)). The same chirality and terminus modifiers are available.
It should be noted that quenching the tautomer isomerization for residues HID, HIE, ASH, GLH, SEP, PTR, and TPO can represent a severe approximation both thermodynamically and kinetically. Histidine is the only example for which two alternative tautomers can be built explicitly (HID/HIE). However, they do remain fixed species during simulations (quenching still occurs). Lastly, the deprotonated versions of the methylated lysines KM1 and KM2 are currently not supported despite having pKa values closer to the physiological range than lysine itself.
Internal degrees of freedom:
Backbone:
All non N-terminal residues: ω (CA-1,C-1,N,CA); in case of residue -1 being ACE CA-1 is CH3; in case of FOR CA-1 is H
All except PRO: φ (C-1,N,CA,C); in case of N-terminal residues C-1 is 1HN
All: ψ (N,CA,C,N+1); in case of C-terminal residues N+1 is 2OXT
PRO: Puckering (seven non-redundant degrees of freedom): torsions C-1-N-CA-C, C-1-N-CA-CB, N-CA-CB-CG, CB-CA-N-CD and angles N-CA-CB, CA-CB-CG, CA-N-CD
Sidechain:
AIB, ALA, GLY, PRO: none
ABA, ARG, ASH, ASN, ASP, DAB, GLH, GLN, GLU, HID, HIE, HIP, LEU, LYD, LYS, KAC, KM1, KM2, KM3, MET, NVA, NLE, ORN, PHE, PTR, TRP, TYO, TYR: χ1 (N,CA,CB,CG)
SER, SEP: χ1 (N,CA,CB,OG)
THR, TPO: χ1 (N,CA,CB,OG1)
CYS, CYX: χ1 (N,CA,CB,SG)
VAL, ILE: χ1 (N,CA,CB,CG1)
PHE, PTR, TRP, TYO, TYR: χ2 (CA,CB,CG,CD1)
HID, HIE, HIP: χ2 (CA,CB,CG,ND1)
ARG, GLH, GLN, GLU, LYD, LYS, KAC, KM1, KM2, KM3, NVA, NLE, ORN: χ2 (CA,CB,CG,CD)
ASH, ASN, ASP: χ2 (CA,CB,CG,OD1)
LEU: χ2 (CA,CB,CG,CD1)
ILE: χ2 (CA,CB,CG1,CD1)
MET: χ2 (CA,CB,CG,SD)
SER: χ2 (CA,CB,OG,HG)
THR: χ2 (CA,CB,OG1,HG1)
CYS: χ2 (CA,CB,SG,HG)
DAB: χ2 (CA,CB,CG,ND)
SEP: χ2 (CA,CB,OG,P)
TPO: χ2 (CA,CB,OG1,P)
ASH: χ3 (CB,CG,OD2,HD2)
ARG, ORN: χ3 (CB,CG,CD,NE)
LYD, LYS, KAC, KM1, KM2, KM3, NLE: χ3 (CB,CG,CD,CE)
ORN: χ3 (CB,CG,CD,NE)
GLH, GLN, GLU: χ3 (CB,CG,CD,OE1)
MET: χ3 (CB,CG,SD,CE)
TYR: χ3 (CE2,CZ,OH,HH)
PTR: χ3 (CE2,CZ,OZ,P)
SEP: χ3 (CB,OG,P,OH)
TPO: χ3 (CB,OG1,P,OH)
ARG: χ4 (CG,CD,NE,CZ)
LYD, LYS, KAC, KM1, KM2, KM3: χ4 (CG,CD,CE,NZ)
GLH: χ4 (CG,CD,OE2,HE2)
PTR: χ4 (CZ,OZ,P,OH)
SEP: χ4 (OG,P,OH,HOP)
TPO: χ4 (OG1,P,OH,HOP)
LYD: χ5 (CD,CE,NZ,1HZ)
KAC: χ5 (CD,CE,NZ,CH)
KM1, KM2, KM3: χ5 (CD,CE,NZ,CH1)
PTR: χ5 (OZ,P,OH,HOP)
KAC: χ6 (CE,NZ,CH,CI)
Note that the "missing" torsions are those assumed to be quasi-redundant, e.g. the dihedral angles formed across the three CZ to nitrogen bonds in the guanidino group of arginine. They can be sampled, however, both in torsional dynamics (→ TMD_UNKMODE) and in Monte Carlo runs (→ OTHERFREQ).
Polypeptide cap residues:
FOR (N-terminal COH-unit) ACE (N-terminal acetyl-unit) NH2 (C-terminal amide) NME (C-terminal N-methyl amide)
No modifiers are available, i.e., FOR and ACE are always N-terminal, NH2 and NME are always C-terminal. It is not recommended to directly concatenate two caps to build small molecule amides.
Internal degrees of freedom:
Backbone:
FOR, ACE, NH2: none
NME: ω (CA-1,C-1,N,CH3)
Note that the "missing" torsions are those assumed to be quasi-redundant, e.g., the dihedral angles formed across the C-CH3 bond in ACE. They can be sampled, however, both in torsional dynamics (→ TMD_UNKMODE) and in Monte Carlo runs (→ OTHERFREQ).
Nucleic acid residues:
In terms of nucleic acids twelve 5'-nucleotides are supported (i.e., the base element is XM-5'P)
D5P DPC DPU DPT DPA DPG R5P RPC RPU RPT RPA RPG
The DPX are the common deoxyribonucleotides, while the RPX are the ribonucleotides D5P and R5P are empty sugar-5'-phosphates (i.e., instead of the glcyosamine bond, the C1' has its native hydroxyl group). By adding _N or _C any residue can be made into a 5'- or 3'-capping residue, respectively. 5'-terminal residues generated in this way always carry a free 5'-phosphate group (singly charged). 3'-terminal residues generated in this way carry only a free 3'-hydroxyl.
Similar to a comment made above for polypeptide residues, it should be noted that quenching the tautomer isomerization for residues carrying a free 5'-phosphate(-) may represent a severe approximation in both a thermodynamic and kinetic sense.
Internal degrees of freedom:
Backbone:
non-5'-terminal: nuc_bb_1(C3*-O3P-P-O5P) - note that in CAMPARI convention only C3* is part of the preceding residue
5'-terminal: nuc_bb_1(O5P-P-O3P-HOP)
All: nuc_bb_2(O3P-P-O5P-C5*), nuc_bb_3(P-O5P-C5*-C4*), nuc_bb_4(O5P-C5*-C4*-C3*)
Non-3'-terminal: nuc_bb_5(C4*-C3*-O3P-P) - note that in CAMPARI convention O3P and P are part of the following residue
3'-terminal: nuc_bb_5(C4*-C3*-O3*-HO3*)
All: Sugar puckering (seven degrees of freedom): torsions C5*-C4*-C3*-O3P/O3*, C5*-C4*-C3*-C2*, C4*-C3*-C2*-C1*, C2*-C3*-C4*-O4* and angles C4*-C3*-C2*, C3*-C2*-C1*, C3*-C4*-O4*
Sidechain:
RPC, RPU, RPT: χ1(C3*,C2*,O2*,HO2*), χ2(C2*,C1*,N1,C2)
RPT only: χ3(C4,C5,C5M,1H5M)
RPG, RPA: χ1(C3*,C2*,O2*,HO2*), χ2(C2*,C1*,N9,C4)
R5P: χ1(C3*,C2*,O2*,HO2*), χ2(C2*,C1*,O1*,HO1*)
DPC, DPU, DPT: χ1(C2*,C1*,N1,C2)
DPT only: χ2(C4,C5,C5M,1H5M)
DPG, DPA: χ1(C2*,C1*,N9,C4)
D5P: χ1(C2*,C1*,O1*,HO1*)
Note that the "missing" torsions are those assumed to be quasi-redundant, e.g. the dihedral angles defined by the NH2 groups in DPA, DPG, etc. They can be sampled, however, both in torsional dynamics (→ TMD_UNKMODE) and in Monte Carlo runs (→ OTHERFREQ).
Nucleic acid cap residues:
As 5'-terminal capping groups, 12 nucleosides are also supported:
DIB DIC DIU DIT DIA
DIG RIB RIC RIU RIT RIA RIG
DIB and RIB are deoxyribose and ribose,
respectively, i.e., empty nucleosides, i.e., sugars. The remaining DIX
and RIX are the standard nucleosides with the 5 canonical bases. Note
that nucleosides cannot be made C-terminal, i.e., it is not possible
(yet) to build free nucleosides using this capping residues.Internal degrees of freedom:
For backbone:
nuc_bb_1(HO5*-O5*-C5*-C4*)
nuc_bb_2(O5*-C5*-C4*-C3*)
nuc_bb_3(C4*-C3*-O3P-P)
Sugar puckering and sidechain degrees of freedom are identical to full nucleotides (see above).
Small molecules:
NA+ (sodium ion), CL- (chloride ion), SPC (SPC or SPC/E water), T3P (TIP3P water), T4P (TIP4P water), T4E (TIP4P-Ewald water), T5P (TIP5P water), URE (urea), NMF (Z-N-methylformamide (i.e., polar H and O are trans)), NMA (Z-N-methylacetamide), ACA (acetamide), PPA (propionamide), FOA (formamide), DMA (N,N'-dimethylacetamide), CH4 (methane), MOH (methanol), PCR (p-cresol), K+ (potassium ion), BR- (bromide in), CS+ (caesium ion), I- (iodide ion), O2 (molecular oxygen: note that diatomic and linear molecules in general are not yet supported in dynamics), NH4 (ammonium ion), AC- (acetate ion), GDN (guanidinium cation), NO3 (nitrate ion), PCL (perchlorate ion), 1MN (methylammonium ion), 2MN (dimethylammonium ion), CYT (cytosine), URA (uracil), THY (thymine), PUR (purine), ADE (adenine), GUA (guanine), PRP (propane), NBU (n-butane), IBU (2-methylpropane), EMT (ethylmethylthioether), EOH (ethanol), MSH (methanethiol), BEN (benzene), TOL (toluene), NAP (naphthalene), IMD (δ-protonated 2-methylimidazole), IME (ε-protonated 2-methylimidazole), MIN (3-methylindol)
Degrees of freedom:
O2, SPC, T3P, T4P, T5P, T4E, URE, CH4, FOA, BEN, NAP, PUR, ADE, GUA, CYT, URA: RBC
NA+, CL-, K+, BR-, CS+, I-, NH4, GDN, NO3, LCP: RBC
ACA: RBC, χ1 (N,C,CCT,1HCT)
AC-: RBC, χ1 (1OXT,C,CH3,1H)
MOH: RBC, χ1 (1H,C,O,HO)
THY: RBC, χ1 (1HT,CT,C5,C6)
1MN: RBC, χ1 (1HN,N,CT,1HT)
NMF: RBC, ω (O,C,N,CCT), χ1 (C,N,CNT,1HNT)
NMA: RBC, ω (O,C,N,CCT), χ1 (C,N,CNT,1HNT), χ2 (N,C,CCT,1HCT)
PPA: RBC, χ1 (N,C,CCT,CBT), χ2 (C,CCT,CBT,1HBT)
PCR: RBC, χ1 (C21,C1,CT,1HCT), χ2 (C31,C4,O,HO)
DMA: RBC, χ1 (C,N,CZT,1HZT), χ2, (C,N,CET,1HET), χ3 (N,C,CCT,1HCT)
PRP: RBC, χ1 (2CT,CB,1CT,1HT1), χ2 (1CT,CB,2CT,1HT2)
2MN: RBC, χ1 (2CT,N,1CT,1HT1), χ2 (1CT,N,2CT,1HT2)
IBU: RBC, χ1 (2CT,CB,1CT,1HT1), χ2 (3CT,CB,2CT,1HT2), χ3 (2CT,CB,3CT,1HT3)
NBU: RBC, χ1 (1CT,1CB,2CB,2CT), χ2 (2CB,1CB,1CT,1HT1), χ3 (1CB,2CB,2CT,1HT2)
EMT: RBC, χ1 (CCT,CB,S,CNT), χ2 (CB,S,CNT,1HNT), χ3 (S,CB,CCT,1HCT)
EOH: RBC, χ1 (O,CB,CT,1HT), χ2 (CT,CB,O,HO)
MSH: RBC, χ1 (1H,C,S,1HS)
IMD, IME: RBC, χ1 (N2,C1,CT,1HT)
TOL: RBC, χ1 (C2,C1,CT,1HT)
MIN: RBC, χ1 (C2,C3,CT,1HT)
Note that the methyl spins fall away if united-atom representation is used (see FMCSC_UAMODEL) and that the remaining χs may be shifted down (relevant for EOH, PCR). Note as well that the different water models listed have different numbers of sites (3-5), different covalent geometry, and are all assumed to be completely rigid internally. Of course, a match of the published models still requires using the correct energy functions and parameters. As with all other systems, the "missing" torsions are those assumed to be quasi-redundant, e.g. the dihedral angles defined by the NH2 groups in primary amides. They can be sampled, however, both in torsional dynamics (→ TMD_UNKMODE) and in Monte Carlo runs (→ OTHERFREQ).
Chemical crosslinks:
Disulfide linkages:
The only currently supported type of crosslink are disulfide linkages between two cysteine residues on one or two polypeptide chains. They are specified by identifying the linked cysteine residue with the lower residue index number, and - separated by a blank - appending on the same line the residue index number for the target linkage residue. The target can either be on the same or in a different molecule.
Note that several limitations exist:
- Intramolecular crosslinks cannot be so complicated as to create a topology in which multiple ring constraints depend on each other completely. This is analyzed by CAMPARI automatically and the program will terminate abnormally if it identifies such a situation.
- Intermolecular crosslinks cannot create ring-like topologies but are otherwise unrestricted. This limitation is more likely to be overcome in the near future than the prior one.
- Two residues which are direct sequence neighbors can be crosslinked but their arrangement cannot be constructed randomly (see FMCSC_RANDOMIZE). Thus, for such a crosslink to be simulated, (partial) structural input is required. The reason is that the default bond lengths and angles do not allow exact closure of this cycle.
- Depending on sampling algorithm and choice of methodology (→ keyword FMCSC_CRLK_MODE), crosslinks may cause undesirable side effects.
Other (unsupported) residues:
As was alluded to above, CAMPARI offers the option to infer the topology and further characteristics of residues and molecules that it has no internal reference representation of. The ultimate requirement for using this feature is the presence of a sane structural input file from which the topology can be inferred (→ FMCSC_PDBFILE or FMCSC_PDB_TEMPLATE). Then, the residue parsing from said structural input file simply has to be transferred to the sequence file. If the unknown unit is meant to be the start or end of a polymer or a single-residue small molecule, then the corresponding flags have to be appended as usual ("_N", "_C", or "_N_C"). Note that the parsing can be changed as long as concerted changes are made to both the structural input and sequence files. Note that the terminal flags are, i) always mandatory, ii) the chiral flag ("_D") is not supported since the complete topology is inferred from structural input anyway, and iii) terminal residues still have to differ in their base name if they occur in terminal positions (this is in addition to having to use the "_N" and "_C" indicators).
Examples:
The sequence input file then simply has to list valid molecules, i.e., at least one N- and one C-terminal residue (which can be the same) eventually with intra-chain residues. Every sequence input has to be terminated by an END.
Alanine dipeptide:
ACE
ALA
NME
END
An ion pair:
NA+
CL-
END
A bad input file (because the alanine-based peptide has no termini):
ALA
ALA
ALA
URE
URE
URE
END
A correct way to create an intermolecular disulfide linkage:
GLY_N
GLY
CYS 8
GLY
PRO
GLY
ALA
CYS
NME
CL-
END
A correct way to ask for analysis for a trajectory with the unsupported single-residue small molecule GMP:
CL-
CL-
CL-
CL-
NA+
NA+
NA+
NA+
NA+
GMP_N_C
END
A bad input file (impossible crosslink):
ACE
CYS 6
NME
ACE
CYS
NME
END
Structural input file:
Keyword: FMCSC_PDBFILEStarting structures for simulations can only be supplied in pdb format. These are largely standard pdb files. CAMPARI expects to find a continuous section of entries of type "ATOM", "HETATM", "ANISOU", or "TER", and any other records terminate the processing of input for a given snapshot. It is therefore necessary to remove any additional records embedded within the coordinates section. The use of "MODEL" and "ENDMDL" indicators should be consistent. It is generally good practice to remove all superfluous coordinate information from the input file to assist the software. This is critical when trying to make sure that any missing tails of biopolymers can be added when FMCSC_PDB_READMODE is 3. Information is superfluous if it does not provide coordinates for entries in the sequence file.
The parsing of pdb files always requires that atom and residue names be understood by CAMPARI. While the software does a fair amount of translation (→ FMCSC_PDB_R_CONV), manual adjustment may sometimes be required. However, files in standard pdb convention that are not missing relevant coordinates should generally be processable "as is." In particular for files that have missing information (complete residues, see "REMARK 465" entries), the conversion script in the tools directory will likely prove useful. Note that the pdb file is assumed to be in fixed format, i.e. column shifts are never tolerable. It is, however, possible to redefine the expected fixed input format using keyword FMCSC_PDB_INPUTSTRING, which is useful for dealing with files generated by CAMPARI itself or custom scripts that are not in standard pdb format.
The target (built-in) names can be constructed from the biotype section in the parameter files, or - simpler - by consulting a CAMPARI-generated pdb file from a dummy run of the system of interest (see for example here) as long as keyword FMCSC_PDB_W_CONV is set to 1. A number of default translations of residue names occur, e.g., "HSE" to "HIE" or "HOH" to either "SPC" or "T3P" depending on keyword FMCSC_PDB_R_CONV. They can be looked up in subroutine "pdbcorrect" in source file "readpdb.f90". The most important feature here is the "soft" recognition of ambiguous PDB residue names, which occur prominently for histidine ("HIS") and nucleic acids ("G", etc). This is particularly helpful for the latter and benefits from FMCSC_PDB_READMODE being 3. CAMPARI requires the sequence file to be explicit about variants (e.g., "DPG" for deoxyguanosinephosphate) and will ignore or add atoms from/to the PDB file depending on this.
If certain atoms in a recognized (built-in) residue are not found (predominantly applies to hydrogen → FMCSC_PDB_HMODE), the molecular geometry is built according to default assumptions (e.g., reference for heavy atoms). Premature termination and/or corrupt structures will occur if important information is missing. While translation support for the formatting of multiple hydrogens on the same site (carbon, nitrogen) exists with regards to the position of the index label, it is important that such hydrogens be in fact systematically numbered. A four-character string like ' HB ' for a sidechain hydrogen atom of an alanine residue would not be read at all, i.e., the reading utility has no capability processing identical atom names for the same residue (and will usually exit with an error in such a case). There is some additional flexibility in interpreting atom names in general as long as they are unique on a per-residue basis. This happens: i) through simple automatic conversions for typical cases for which two or more standards exist (e.g., "OT1" and "OT2" are understood as carboxylate oxygen atoms of a C-terminal polypeptide residue); and ii) through the keyword FMCSC_PDB_R_CONV.
In addition to the use of pdb files to define starting conformations for simulations, CAMPARI can also use pdb trajectories in trajectory analysis runs. If structural input corresponds to a trajectory file in a format other than pdb, a pdb file may still play a role in serving as a template for remapping indeces between CAMPARI-internal numbering and the actual order found in that binary input. These two cases (trajectory analysis and starting a simulation from an input structure) are handled separately below:
- Analysis runs (see FMCSC_PDBANALYZE)
To provide maximally accurate analysis, any trajectory is processed purely by reading in Cartesian coordinates for all atoms. If the trajectory is in pdb format (using the MODEL/ENDMDL syntax) or requires a template, this necessitates that atom names be understood by CAMPARI as outlined above. For simulation trajectory analyses it is generally recommended to utilize one of the binary formats (FMCSC_XTCFILE, FMCSC_DCDFILE, FMCSC_NETCDFFILE, and FMCSC_PSQL_R_TRAJKEY) and possibly a suitable pdb-template (→ FMCSC_PDB_TEMPLATE) that is processed in exactly the same way as other pdb input files but serves to provide a map between CAMPARI intrinsic order of atoms and the aforementioned binary trajectory input in use. Both the binary trajectory input and the template (if required) need to be complete with respect to the sequence file (in this context, the chosen level of representation may become important → FMCSC_UAMODEL).
If the analysis run occurs in parallel (see elsewhere for details), CAMPARI expects not a single trajectory, but a set of systematically named trajectories. This applies to all supported formats in trajectory analysis mode.
Portion of an example key-file:
FMCSC_DCDFILE /home/dummy/simulation.dcd
FMCSC_REPLICAS 4
FMCSC_PDB_FORMAT 4
FMCSC_REMC 1
FMCSC_PDBANALYZE 1
In this example, the MPI-version of CAMPARI would look for 4 individual trajectory files named "N_000_simulation.dcd", "N_001_simulation.dcd", "N_002_simulation.dcd", and "N_003_simulation.dcd" all located in the directory "/home/dummy/". This means that the specified name does not actually correspond to an existing file. The choice of prefix is consistent with the output prefix CAMPARI uses for example in replica-exchange simulations. Note that, if a template is required or desired for other reasons, the template file is not treated in the same way, i.e., all replicas will read and process the same file in the same location.
A notable exception to this logic occurs when the data are present in a PostgreSQL database. In this case, there is no file but instead a set of entries identified by keywords FMCSC_PSQL_R_TRAJKEY (a key identifying a system), FMCSC_PSQL_R_SIMUOFF (an integer identifying the root of a (set of) simulation(s)), and FMCSC_PSQL_R_RANK (an integer subsetting a set of simulations to a single instance). Basically, this system allows larger flexibility because trajectories are not organized in separate files on a hard disk.
- Runs with a starting structure read from a pdb file
The choice here depends on the sampled degrees of freedom. For runs in Cartesian degrees of freedom (→ FMCSC_CARTINT), small distortions are allowed to relax, and it is recommended to deal with the peculiarities of pdb files in the same way as described above. This is selected by choosing keyword FMCSC_PDB_READMODE to be 2. There is an important caveat, however, which is the treatment of constraints. Keyword FMCSC_SHAKEFROM is critical in this regard. Furthermore, subsequent restarts of simulations from pdb files may still create unnecessary noise, which is avoidable with the help of keywords FMCSC_PDB_INPUTSTRING and FMCSC_PDB_OUTPUTSTRING.
Because of the limited precision of pdb files, CAMPARI also offers an alternative that is designed for runs in torsional/rigid-body space starting from CAMPARI-generated pdb files but is largely obsolete now (and continues to exist primarily for backwards compatibility reasons). This mode is meant to prevent the distortion of covalent, molecular geometries of frozen degrees of freedom by limited precision. By choosing keyword FMCSC_PDB_READMODE to be 1, simulation runs can rebuild the molecules using CAMPARI's default geometry. This means that the Cartesian coordinates of the atoms defining the sampled degrees of freedom are simply used to determine the value for a given degree of freedom. These values then seed the molecular building routines. Depending on where the input structure is from and the precision of the pdb input, this will almost always lead to a different structure than the one found in the pdb file (with an error increasing along any polymer chains present). Thus, this mode is largely obsolete and will produce satisfactory results only when a) the pdb file was generated by CAMPARI in a run not relying on external input itself and a custom precision format is used; or b) the pdb file contains only small molecules (3 atoms or less). Sometimes, it may be possible to relax slightly distorted conformations using the torsional restraint options (see options 3 and 4 for the corresponding input file). All comments about naming conventions apply as before. It is important to keep in mind that switching between the options for FMCSC_PDB_READMODE can imply that sampling in rigid-body / torsional space occurs with a different, underlying (rigid) geometry in place.
- All superfluous coordinate information must be removed for CAMPARI to read "over" a gap. For example, if the pdb file contains information for the stretch ...CGLLKM and the user is interested in creating a model with the C-terminal tail ...CGLLKSSSSSHHHHHHH, the information for the methionine residue ("M") must be removed for CAMPARI to keep reading any subsequent information in the pdb file (for example, if there is another protein chain).
- The matching procedure of sequence stretches between pdb and sequence input is greedy (as long as possible) and prefers to map to the first possible residue in the sequence (in case there are ties in length). This is a potential issue with low complexity sequences or very incomplete structural information (e.g., bound peptides in crystal structures, nucleic acids).
- Many pdb files contain partial structural information for a residue. If there are backbone atoms missing for a residue in the middle of a chain, it will normally be required to introduce an artificial gap or to do some remodeling by hand. Since version 5, terminal residues with partial information are dealt with on either end, which is particularly relevant for the 5'-end of polynucleotides where the first residue to be resolved does not always show the phosphate group for a chain continuation in that direction. In general, if any atoms in a residue CAMPARI expects to read from pdb are rebuilt (because they are missing or the geometry is off (→ FMCSC_PDB_TOLERANCE_A and FMCSC_PDB_TOLERANCE_B), this will most likely corrupt the pdb-provided conformation of atoms that depend on it (which is why the consequences for backbone atoms are much more dramatic). Because of CAMPARI's default N-to-C (or 5'-to-3') building direction, this is usually not a concern for C-terminal tails.
- As a related point, it is recommended to delete all atoms in the pdb that are branched off (side chain) but for which one or more of the atoms connecting it to the main chain are missing. For example, if in a polypeptide glutamine residue the CB atom is missing in the pdb, ATOM records for CG, CD, etc. atoms should also be deleted. CAMPARI has no facility to infer the position of atoms based on arbitrary reference atoms, and the aforementioned scenario is very likely to lead to severe violations of covalent geometries at the missing atom.
- When adding missing tails to nucleic acids, it is important to keep in mind that CAMPARI internally groups the 3' oxygen bound to the phosphate to the same residue as the phosphate, which differs from standard convention (see FMCSC_PDB_NUCMODE for details). While the read-in step is adjusted automatically, this does mean, for a complete read-in, that the oxygen atom should be there, either as the only atom in the previous residue named " O3*" or as the first atom in the first residue to be read with either name "2O3*" or " O3*" depending on whether in the input pdb file the 3'-oxygen is part of the same residue or not (duplicate atom names are not allowed and will cause an error). Note that since Version 5 CAMPARI is much more lenient in tolerating missing backbone atoms in terminal residues, which is frequently the case in polynucleotide structures in the PDB. In particular the first resolved residue at the 5'-end often misses the phosphate group even when the chain actually continues.
- It can lead to corrupt geometries if some but not all atoms of a flexible ring are read from pdb input, which is relevant for example for sugars in polynucleotides and proline or related residues, in particular N-terminal ones (where the amine hydrogens are usually missing).
- It must contain a representation of the unknown part of the system that is physically feasible and chemically canonical, i.e., is free of steric clashes and complies to bond length intervals that are typical for the atoms in their most common hybridization states. This requires guessing the identity of the chemical element encoded by a given atom name. For the common elements (C, N, O, H, S, P), it is generally the 14th column in the input file that matters (this is the second column of the four-character atom string in the default format → these numbers may change if a custom format is used). Recognition of rare elements can be triggered by starting the atom name in the 13th column, e.g., to trigger identification of a copper atom, one should use "CU " for the atom name. Conversely, the name " CU " would be identified as a carbon atom. The correct mass will only be retained if the parameter file in use offers an appropriate atom (LJ) type for this chemical element.
- The unsupported residues must be in a structure that provides, at least, for each one the immediate sequence context of residues in the same chain. For example, if simulations of a dye-labeled cysteine residue are desired, and the context of the cysteine residue in the protein sequence of interest is QCA, then the pdb input file (whether provided as a template or not), must contain this particular stretch in full. This implies that all atoms in the surrounding residues, here glutamine and alanine, must be present and read correctly. CAMPARI will expect atoms based on its intrinsic representation and the choice for keyword FMCSC_UAMODEL. This completeness requirement is fulfilled trivially if the surrounding residues are themselves unsupported or if no polymer context exists (i.e., the unsupported residue is a small molecule).
- Atoms within the unsupported residue(s) must be ordered such that a meaningful topology can be built, i.e., it must not be the case that an atom comes before all other atoms it is covalently bound to (naturally, with the exception of the very first one). Otherwise, resultant topologies will be garbled, and garbled topologies can severely limit the access of the unsupported entities to CAMPARI's sampling engines. In particular, inter-residue linkages should always make sure that the linkage atom is the very first one in a residue.
- All atom names in an unsupported residue should be unique. If there are two residues of exactly the same type (i.e., they have the same name, the same number, order and names of atoms, and they occur in the same molecular context (for a polymer, terminal residues are distinct from non-terminal ones)), then the two residues will be treated as if of identical type (which simplifies several parameter patches).
- Names of chemically equivalent atoms (like the various hydrogen atoms in a methyl group) should be named such that the last three characters of the corresponding field are identical, and the first character uses a numbering starting at 1. This helps with consistency in interpretation and can improve naming in output pdb files.
- For unknown residues whose basic architecture resembles a supported polymer type, it is recommended to follow the atom order and nomenclature used by CAMPARI (or in terms of nomenclature by the pdb file if it conforms to a convention CAMPARI understands). This enables CAMPARI to infer for example which dihedral angles in an unsupported polypeptide residue such as oxidized methionine would correspond to the φ/ψ/ω-angles. This is important in ensuring that the unsupported residue has maximal access to sampling and analysis routines that are specialized for this polymer type (e.g., DSSP analysis for polypeptides, or sugar pucker sampling for polynucleotides).
Restart file:
Keyword: FMCSC_RESTARTRequired for: Restarting a previously terminated or aborted simulation → FMCSC_RESTART
Rather than specifying the starting conformation by a limited-precision input file such as pdb files explained above, it is more appropriate to extend or restart simulations with the help of restart files. These are human-readable and can therefore be edited if needed. The name of the restart file is assumed to be systematically constructed from the default name of the run and should correspond to {basename}.rst. Details on restarting runs are provided elsewhere.
Input file for small molecule screen:
Keyword: FMCSC_MOL2FILERequired for: Instructing CAMPARI to perform repeated executions of one of its basic calculations while exchanging a part of the system for each repetition. This type of setup allows screening small molecules in the context of an otherwise fixed setup.
This input file contains the molecules to be screened. The file is expected in standard mol2 format with Sybyl atom names. The primary information used from the file are the number and list of atoms, their coordinates, and their Sybyl types. The latter are used to assign Lennard-Jones parameters (primarily for the corresponding energy terms: see FMCSC_SC_IPP, etc.) based on a mapping to available atom types in the parameter file that is either automatic or manual (for the latter, see below). The bond section of the file is used only in part: bond orders (types) are ignored, and bonds may only be added if they were not already detected by the automatic topology inference scheme. Keyword FMCSC_MOL2RESPECTBOND gives the user the option to restrict all bonds to the bond section. This is currently the only way to (try to) correct a faulty topology inference, which may arise due to, for example, intramolecular clashes for the given conformer. This does not mean that CAMPARI will tolerate or correct broken input geometries at the covalent level (e.g., 2D conformers). The only correction that FMCSC_MOL2RESPECTBOND allows in this regard is for errors in steric exclusion due to rotatable bonds. FMCSC_MOL2RESPECTBOND fulfills a second function in augmenting or replacing the inference of torsional potentials (FMCSC_MOL2BONDMODE and FMCSC_SC_BONDED_T). This is explained below in more detail.
CAMPARI can use further atom-based information from the mol2 file as follows. Partial charges (if the Coulomb potential is in use) are read from their standard field, and group assignments for the ABSINTH DMFI, which works in conjunction with a library input file, are read from the substructure field. These group assignments are integer assignments in the substructure column with the integer referring to a specific solvation group in the library file. To be able to distinguish two identical solvation groups within the same molecule, the column with the substructure name should have running integers per molecule for each individual group. Details on how small molecule screens work in general are provided elsewhere.
An example atom line of a mol2 file that CAMPARI extracts all possible information from may look like this:
9 C89 17.56776179 -16.68426708 25.12938200 C.ar 665 2 -0.1160
This example would be the 9th atom in the molecule and carry the (arbitrary) name "C89." Its Sybyl type would be "C.ar", and it would belong to a solvation group with index 665 in the library. It would be part of the 2nd instance of this solvation group type in the molecule, and its partial charge would be -0.1160e. CAMPARI expects the atom index (first column) to run up consecutively from 1 onwards. Note that mol2 files do not have fixed field widths per column and that the normal field separators are spaces.
CAMPARI allows the first molecule in the file to serve special purposes - these are explained in the documentation for the input file supplied for FMCSC_MOL2_REFMOL. If that secondary input file is missing, the first molecule in the file for FMCSC_MOL2FILE takes on the same role and special functions.
For the header sections, each molecule would typically look similar to this:
@<TRIPOS>MOLECULE
Molecule_0045871
39 42
SMALL
USER_CHARGES
This molecule would have the name "Molecule_0045871" (this name is used to reference the molecule in log-output and certain file names) with 39 atoms and 42 bonds. It is highly recommended to keep the molecule names unique as uniqueness can be tedious to accomplish in post-processing (and leads to data loss for possible pdb output → FMCSC_MOL2PDBMODE). Names obtained for different poses of the same molecule in the main output file are systematically modified (explained better here) to help distinguish them but uniquess is never guaranteed. The example molecule above would be of type "SMALL" and expected to have partial charges in the correct column ("USER_CHARGES"). Mismatches between the ATOM and BOND sections with the expected sizes will lead to fatal errors or undefined behavior.
The role of the BOND sections depends on other choices. For example, if keyword FMCSC_MOL2RESPECTBOND is 0, even a truncated BOND section may be tolerable since in this case the BOND section is only used to supplement the coordinate-based topology inference CAMPARI performs. In other cases, CAMPARI expects to find additional information. Two example lines might look like:
1 1 14 ar
2 2 8 am 305
These would be the first two bonds in the list, the first connecting atoms 1 and 14, the second connecting atoms 2 and 8 (numbers referring to the order they appear in in the ATOM section). The bits "ar" and "am" are Sybyl/Tripos-specific flags that are just passed from input to output. The last number in the second line (305) is going to be interpreted as a chemical bond type assignment by CAMPARI. This is not a field present in the official mol2-standard and the underlying convention is only found in the source code. The functionality is solely meant to be able to maintain an (at first) automatic parameterization consistently for different conformers of the same molecule, such as one would desire in a rescoring run: see description of keyword FMCSC_MOL2RESPECTBOND for more details.
The type of a molecule has to be either "SMALL" or "****" if the input file contains only small molecule atoms. CAMPARI itself can also write mol2 file containing other atoms, e.g., from mobile parts of the receptor, additional small molecules like water, etc (see keyword FMCSC_MOL2AUXINDEX). If this is the case, the type must be "BIOPOLYMER" and a mandatory input file must be supplied for keyword FMCSC_MOL2AUXINPUT. Without this input file and the fixed part of the system present, a mol2 file containing auxiliary information cannot be (re)processed. Importantly, the information extracted for these auxiliary atoms are simply the coordinates - no parameters, names, or anything else are used. All other molecule types are currently skipped. Lastly, specifying "NO_CHARGES" will skip the read-in of the corresponding column (i.e., the column can be missing without producing an error). Note that this can be confusing when mixing molecules with and without charge information in the same mol2 input file.
Auxiliary input file for small molecule screen:
Keyword: FMCSC_MOL2_REFMOLRequired for: Allowing CAMPARI to process a "special" molecule in a small molecule screen without having to modify the main input file.
This input file contains the information, in exactly the same general format as described for the main mol2 input file, for a single molecule to serve a special role in the small molecule screen. This is an overlapping function with using the first molecule in the main mol2 input file to accomplish the same thing. It is provided for convenience: library files are often large, and it can be annoying to prepend them repeatedly with custom information. Note that the file must be absolutely consistent with main input file regarding the presence of additional atoms.
The special roles are as follows.
First, the initial pose of the reference molecule can define a custom set of constraints for the remainder of the system if the run is in internal coordinate space. Keyword FMCSC_MOL2FOCUS allows two ways of defining a site of interest using the input coordinates for the reference molecule and produces an output file: MOL2FOCUS.frz. This file is formatted to serve as a possible input for user-defined constraints and can establish to define the exact same constraints independently of the coordinates of the reference molecule in subsequent runs. As a corollary to this, CAMPARI will also enable the moving receptor atoms (and some extra ones) to be written to the output mol2 file. The list of these additional atoms is supplied as another helper output file, viz., MOL2EXTRAATOMS.idx. This file is important when trying to reprocess an output mol2 file that has not only coordinates for the small molecules in it.
Second, CAMPARI supports using the name field (column 2) to highlight a target substructure (a set of covalently bonded atoms) in a molecule to derive specific information from. The reference molecule defines the substructure if there are atom names including an underscore ("_") character. The field width for the name is restricted internally to a total of 20 characters, and it is recommended to stay well clear of this limit. The part after the underscore must be an integer that is at least 1 and at most FMCSC_MOL2MAXSIZE. For example, the name of an atom could be "C89_1" instead of "C89". These integers must be unique and define the reference substructure. This substructure can then be matched by all of the input molecules in the actual input file. There are two ways of doing this: manual or automatic. In the manual way, the same set of integers is used to highlight the same or analogous substructure with the same naming convention in all subsequent molecules. This requires processing libraries, which can be inconvenient. In the automatic way, CAMPARI evokes a heuristic procedure to accomplish the same goal. This has three main advantages: first, it requires no modifications of the mol2 library; second, it allows multiple matched substructures in the same molecule to be processed; third, it is much less error-prone and also tunable. The automatic procedure is described briefly for keyword FMCSC_MOL2ASSIMILAR. Currently, a match of substructures is used twofold. First, a molecule for which at least 3 atoms are matched is rotated and translated such that the matched structures align. This happens directly after the parametrization and before any samplers are executed. Second, if keyword FMCSC_MOL2DRESTMODE is 1 and the distance restraint potential is in use, CAMPARI sets up position restraints for the substructure atoms to the reference position. With these two elements, it is possible to perform anchor-based virtual screening (procedures known as ALTA or tethered docking) based on a suitably modified library file (only the otherwise unused atom name field is relevant for this). Note that it is not generally possible for the samplers in CAMPARI to preserve the substructure position as absolute constraints unless the substructure happens to coincide with the base of motion. The soft restraints also have the advantage that they enable some leeway for reorientation of the anchor if needed. If excluded volume interactions are present, the superposition can initially give rise to large forces, which may have unwanted consequences: in Monte Carlo, rigid-body moves may cause an immediate and large rearrangement or even displacement of the anchored conformation. Gradient-based calculations may be unstable (keywords TMD_RELAX, THRESHOLD_TEMP, and THRESHOLD_INCR might all help and/or avoid uncontrolled crashes in favor of skipping the offending molecules). The position restraints are in any case helpful to find a meaningful compromise between the anchoring and steric exclusion.
Third, if FMCSC_MOL2DRESTMODE is 2, the initial pose of the reference molecule defines a rectangular cuboid to which the screened molecules are directed with suitable position restraints (see for additional information on half-sided harmonic restraints). This is primarily useful for molecules of smaller or comparable size as the reference molecule. Larger molecules run the risk of experiencing an unrealistic compaction drive to satisfy the restraints, and in this case using a positive value for FMCSC_MOL2DRESTBUF might become necessary to avoid such artifacts.
Section 2: Files altering the system
(back to top)
Particle number fluctuation file:
Keyword: FMCSC_PARTICLEFLUCFILERequired for: Use of a simulation ensemble with fluctuating particle numbers, i.e., cases where FMCSC_ENSEMBLE is set to 5 or 6 (grand canonical or (pseudo-)semi-grand ensemble).
First line: One integer, X, that specifies the number of particle types whose numbers can fluctuate
Each of the next X lines contains three numbers separated by whitespace:
- An integer specifying the ID of a particle type whose numbers will be allowed to fluctuate. The type ID starts from 1 and follows the order of the sequence file. See below for an example.
- An integer specifying the initial number of particles of this type to place in the system. If the implementation mode is chosen as the default, this initial number is also taken to be the expected (bulk) number of particles of this type.
- A real number specifying either the excess (FMCSC_GRANDMODE is 2) or the absolute chemical potential (FMCSC_GRANDMODE is 1) for this particle type in kcal/mol.
Note on specifying ghost particles in the sequence file:
CAMPARI, like most molecular simulation packages, was initially designed and implemented with the assumption that particle numbers in a simulated system always remain constant. To enable the concept of fluctuating particle numbers, a requirement for grand canonical simulations, we employ a mechanism of "ghost particles": the lengths of internal arrays that track particles are still constant, but now, each particle has a flag indicating whether it is "present" or "ghosted". "Ghosted" particles do not interact with each other or with "present" particles, but retain their intra-particle interactions and their interactions with the implicit solvent. The grand canonical Monte Carlo moves that insert or delete particles from the system are implemented as operations that change the present/ghosted status of a particle. The trajectories are output in such a way that ghosted particles are translated in the z-direction away from the "real" box allowing both clean visualization and subsequent parsing for external analysis.
The consequence is that the end user must include sufficient ghost particles in the sequence input file. For every particle type that can fluctuate, the end user must specify enough copies of that particle such that the number present in the system never exceeds the number in the sequence file. Unfortunately, one does not necessarily know the maximum number that will be present in advance, and including a huge number of ghost particles to be on the safe side may considerably reduce computational efficiency because 1) ghosted particles still require evaluation of purely intramolecular energies, and 2) Monte Carlo moves will be expended on ghosted particles (this is how inserted particles have their degrees of freedom randomized). Therefore, some trial and error will likely be necessary to determine the appropriate number of ghost particles to specify.
Example: Suppose the sequence input file is
ACE ALA ALA ALA ALA NME URE URE URE URE URE URE URE URE URE URE URE URE END
and the particle number fluctuation file is (with FMCSC_GRANDMODE being 2)
1The particle number fluctuation file specifies that there is 1 (the integer on the first line) type of particle whose numbers shall be allowed to fluctuate. The type ID is 2, which corresponds to urea (the Ac-(Ala)4-Nme polypeptide has type ID = 1). Initially, five urea molecules will be present in the simulation box, and the remaining seven will be ghosted. This sets a bulk concentration that can be computed from the box parameters. The excess chemical potential of urea is -0.19 kcal/mol. In a binary system such as the one specified above (urea+peptide), a preliminary simulation of the single-component system (urea) should have yielded the value provided for the excess chemical potential by showing it to be consistent with the expected bulk concentration. Note that if the number of urea molecules present reaches twelve, and another grand canonical insertion move is accepted, the simulation becomes invalid. Such an event would be reported to log-output, but will not cause the simulation to terminate.
2 5 -0.19
Constraint input file:
Keyword: FMCSC_FRZFILEThis input file allows selected degrees of freedom to be constrained. Here, constraints can be applied only to variables that are explicit degrees of freedom of the system. These are then explicitly excluded from the lists constituting eligible entities for Monte Caro moves, and are not integrated in gradient-based methods in internal coordinate space. This is notably distinct and conceptually much simpler than the inclusion of indirect geometric constraints that depend on more than one native degree of freedom (via Lagrange multipliers). The latter, theoretically more general approach to constraints is currently available only in Cartesian dynamics runs (→ SHAKESET, e.g., also with file-based option).
For simple, explicit constraints supplied via FMCSC_FRZFILE, the first letter in the file indicates the mode as explained below. Importantly, these modes are currently mutually exclusive, i.e., it is not possible to combine blocks using different modes. In detail, the possible options are:
'T' or 't':
(molecule type-based constraints):
Generally the input is of the type:
-----------------
i XXX
-----------------
Here, i denotes the number of the molecule type (as defined by sequence input), and XXX is one of ALL, RBC, INT, FYO, CHI, PUC, or OTH.
ALL: Constrain all degrees of freedom for all molecules of that type
RBC: Constrain only rigid-body degrees of freedom for all molecules of that type
INT: Constrain only internal (torsional) degrees of freedom for all molecules of that type
FYO: Constrain only backbone degrees of freedom for all molecules of that type
CHI: Constrain only sidechain degrees of freedom for all molecules of that type
PUC: Constrain pucker state for all relevant residues in all molecules of that type (only relevant in MC)
OTH: Constrain only dihedral angle degrees of freedom that are not native to CAMPARI, e.g., those in unsupported residues
Molecule type constraints are the most compact and least flexible. There is no way to apply fine-grained constraints, i.e., every request always affects all degrees of freedom of this type in the molecule type in question.
'M' or 'm':
This is identical to input mode 'T' only that a molecule number is given instead of a molecule type number. It thus allows fine-graining for different molecules of the same type.
'R' or 'r':
This is largely identical to input mode 'T' only that a residue number is given instead of a molecule type number. Note that when ALL or RBC is selected in this mode, the rigid-body degrees of freedom for the whole molecule, to which the selected residue belongs, will be constrained. The residue-based input mode allows fine-graining up to individual residues but not further, i.e., it is, for example, not possible to constrain selected (and not all) degrees of freedom in the same sidechain.
Depending on whether the run is a pure Monte Carlo simulation, uses a gradient-based technique, or a hybrid scheme, the interpretation of these requests may differ slightly. In Monte Carlo, the constraint implies that the all relevant degrees of freedom within the specified molecules or residues are removed from all sampling lists associated with active move types that could potentially perturb the degrees of freedom in question. This means that, for example, the "FYO" request on a polypeptide residue could remove the residue from several sampling lists including omega moves, standard phi/psi moves and all types of polypeptide concerted rotation moves. For the latter in particular, this may render many other residues ineligible for these samplers. Conversely, requesting "CHI" for a lysine residue while sidechain moves and OTHER moves on native CAMPARI degrees of freedom are inactive would be redundant since the chosen move set automatically constrains these degrees of freedom.
In internal coordinate space dynamics runs, the interpretation is simpler and a constraint request will lead to the selected degrees of freedom no longer being included in the numerical integration scheme.
The only way to control degrees of freedom individually even in the same residue is the final input mode. These differs slightly in functionality: for example, constraints are transfered to MC move sets after preferential sampling weights have been aplied.
'A or 'a':
(Z matrix line-based constraints):
Input is of the type:
-----------------
i *
-----------------
Here, i denotes an atomic index (Z matrix line) that should correspond either to an atom defining a rotatable dihedral angle or to the first atom in a molecule. For the former, just the selected degree of freedom is constrained in internal coordinate space dynamics if "*" is "INT". If the run is a hybrid run also featuring Monte Carlo segments or a pure MC run, this degree of freedom will be ignored by most simple MC moves. Consider for example a sidechain pivot move: if the residue has 3 χ-angles and one of them is constrained using the appropriate "INT" request, the sidechain pivot move will respect this and ignore the degree of freedom in question. If all 3 χ-angles are constrained using such requests, the residue will be eliminated from the candidate list completely. Note that the picking frequencies are not affected adjusted beyond that. For single dihedral angle pivot moves it is even simpler. Importantly, pucker moves (of any type) and concerted rotation moves (also of any type) are not affected by these constraints at all. For the former, full flexibility is available through the preferential sampling utility. For the latter, it is currently a limitation that these requests are ignored (a warning is produced). An error is produced if the application of such constraints depletes a move type entirely. The correct atomic indices can be obtained most easily by turning on the corresponding report: FMCSC_TMDREPORT. It is theoretically possible to also infer it by consulting the documentation on sequence input above and the reference Z matrix and pdb file(s) written by CAMPARI.
Conversely, if i is the first atom in a molecule, CAMPARI expects one of 14 alternative strings: XYZ, XY, XZ, YZ, X, Y, Z, ABC, AB, AC, BC, A, B, or C. The letters X, Y, and Z represent translational degrees of freedom of the center of mass of that molecule along the 3 cardinal axes. The letters A, B, and C represent the rotational degrees of freedom around these same axes. Because strings are checked generously, users are advised to keep the formatting of the file "clean," which means only using numbers and one of the 15 permitted strings listed above. Rotational degrees of freedom are only available for polyatomic molecules of course. These options provide maximally fine-grained control over the rigid-body degrees of freedom. Note that it is of course rarely useful to eliminate a subset of translational or rotational axes. This is mostly for debugging (possible real applications include the emulation of 2D or even 1D systems similar to the Cartesian case below). Any MC moves changing rigid-body coordinates will respect these requests as best as possible (see FMCSC_RIGIDFREQ and associated keywords). For rigid-body moves involving more than one molecule, it is possible that constraints create empty moves, i.e., moves that are treated like legitimate MC moves but do not perturb anything. This is grossly inefficient and should be avoided at all cost.
Finally, for runs in Cartesian space (dynamics only), there is an analogous set of constraints available, i.e., constraints preventing explicitly the integration of one ore more positional coordinates in one or several atoms. The mode is again atom-based (note that this mode is mutually exclusive with the Z matrix line-based constraints above):
'A or 'a':
(explicit Cartesian constraints):
Input is of the type:
-----------------
i *
-----------------
Here, i denotes an atomic index, and "*" is either XYZ, XY, XZ, YZ, X, Y, or Z. In any of the options, all listed elements of the position vectors are constrained. As is true in general, constraints can be cumulative. Full positional constraints of this type can be used to construct crude boundary conditions for condensed phase simulations. More exotic applications would be the inclusion of fixed surfaces in the system or carrying out a simulation in 2D.
It is very important to note that these constraints are mutually incompatible with standard geometric constraints. This means that no atom constrained via FMCSC_FRZFILE must occur in any holonomic constraint group. Note as well that this option is restricted to Cartesian dynamics, which implies that hybrid runs in mixed Cartesian/internal coordinate space are most likely even less of a good idea than in general if this type of constraint is present.
Examples:
----------
M
1 CHI
3 RBC
----------
If the run has an internal coordinate space component, this input file would constrain sidechain angles in molecule #1 and the rigid-body movement of molecule #3. Note that the program will ignore most meaningless requests but will complain if all degrees of freedom of a certain type are eliminated through constraints. These move set sanity checks naturally have no effect in pure dynamics calculations.
----------
A
144 XY
145 XY
----------
If this run has a Cartesian space dynamics component, this input file would constrain the x- and y-coordinates of atoms 144 and 145. For this request to be legal, these two atoms must not participate in any holonomic constraint group (CAMPARI will exit otherwise). Conversely, if this is an internal coordinate space dynamics runs, CAMPARI would eliminate translations along x- and y-axes of those two molecules whose first atoms have index 144 and 145, respectively. If those atoms do not correspond to the first atoms in molecules, these constraint requests would simply be ignored.
Holonomic constraints input file for Cartesian dynamics:
Keyword: FMCSC_SHAKEFILERequired for: Custom constraints on internal distances in Cartesian dynamics → FMCSC_SHAKESET
This is a relatively simple input file used to manually determine which interatomic distances to constrain during a Cartesian dynamics simulation. Canonical selections are available via keyword input (→ FMCSC_SHAKESET), but those may sometimes not fit the user's needs. The list available through this input file is composed of distances only (no explicit angular constraints are currently supported via the file-based input method). CAMPARI will parse the list to identify groups of coupled constraints to be enforced by an appropriate FMCSC_SHAKEMETHOD. The actual values the distances should be constrained is not provided through this input file (options are available via keyword FMCSC_SHAKEFROM).
There are two modes as follows:
Mode 'A': Two numbers per line after the first line ('T'), where the numbers specify the index of two atoms whose distance is to be constrained. Note that it is generally permissible for these atoms to be part of different molecules or to be separated by many bonds within a single molecule. Note that user-selected, geometric constraints can become a problem if a) the coupling between constraints becomes intractable (a general problem also observed for bond angle constraints → FMCSC_SHAKESET is 4 or 5), or ii) the simulation is a hybrid run in mixed Cartesian/internal coordinate space and the Monte Carlo move set is able to alter the constrained coordinate. For the latter case, CAMPARI will print a preemptive warning (note that the vast majority of default constraints selectable via FMCSC_SHAKESET are not accessible to the various MC move types). It is an important restriction that it should not be possible to constrain the atoms defining the constraint group more than toward complete internal rigidity. Everything else would be redundant constraints. These explicitly or implicitly redundant entries are to be avoided at all cost (they occur by accidentally and explicitly providing the same line twice or by overconstraining a subset of atoms locally). CAMPARI will not detect those and assume - from an ensemble point of view - there to be more constraints than actually present potentially leading to temperature artifacts.
Mode 'T': This works analogously to the previous mode only that intramolecular entries have to be provided for the atom indices for the first molecule of a given type only. CAMPARI will then replicate such an entry for all molecules of the same type. Use FMCSC_SEQREPORT to get an overview of the molecule types in the system. This is a helpful simplification of the purely index-based mode if simulations with custom constraints are attempted in the presence of a bath of solute molecules which all need to be identically constrained. There is no support for intermolecular constraints in this mode unless there is exactly one molecule of each participating type.
Note that entries belonging to ineligible indices will be ignored or may cause a fatal termination of CAMPARI depending on what exactly the issue is. The correct maintaining of holonomic constraints can be analyzed by studying trajectory output with suitable software or - in most cases at least - by consulting output file INTHISTS_BL.dat.
Preferential sampling weights input file:
Keyword: FMCSC_PSWFILEIf the simulation is a Monte Carlo simulation in torsional space (see FMCSC_DYNAMICS and FMCSC_CARTINT), this keyword allows the user to alter the default picking probabilities for entities eligible to a given move type. This function can be overlapping with the constraint requests described elsewhere. The file is a sequence of blocks with each block indicated by a starting letter (separate line):
'T' or 't':
(molecule type-based adjustments for particle insertion/deletion moves):
In the simplest case, the input is of the type:
-----------------
i w1
-----------------
Here, i denotes the number of the molecule type (as defined by sequence input), and w1 is the target sampling weight (floating point number) for this molecule type in particle insertion and deletion moves. Weights are set such that any entries not listed explicitly in this file have a weight of 1.0 (normalization happens after processing all entries). Therefore, it is redundant to choose w1 as unity. A value of zero is possible, but hard to justify in this case given that the molecule types that are allowed to fluctuate are set via user input as well. Note that the weights for particle permutation moves in the semi-grand ensemble can not be altered.
'M' or 'm':
(molecule-based adjustments for rigid-body moves):
For rigid-body moves it is permissible to control picking probabilities on a per-molecule level. Therefore, input is of the type:
-----------------
i w1 w2 w3 w4 w5
-----------------
Here, i denotes the number of the molecule (as defined by sequence input), w1 is the target sampling weight for single molecule rigid-body moves (coupled rotation and translation), w2 is the target sampling weight for rigid translation moves, w3 is the target sampling weight for rigid rotation moves (see elsewhere), w4 is the target sampling weight for cluster rigid body moves that change translational degrees of freedom, and w5 is the target sampling weight for cluster rigid-body moves that do not change translational degrees of freedom (see elsewhere for details). Note that either w1 or both w2 and w3 are relevant (but not all together). It may result in undefined or undesired behavior to introduce zero weights for unused move classes. The weights w4 and w5 should not be such that picking a sufficient set of unique molecules becomes unlikely with a simple random picking with replacement. This is related to the choice for parameter FMCSC_CLURBMAX. For example, if there are 4 eligible molecules, FMCSC_CLURBMAX is 3, but 2/4 molecules have tiny sampling weights, this leads to unnecessary inefficiency.
Note that the acceptance rate of rigid body moves is intrinsically heterogeneous whenever molecules of different size, association likelihood, etc., are considered. It can therefore make sense to control these preferential sampling weights separately, for example for macromolecular assemblies in the presence of small molecules.
'A' or 'a':
(degree of freedom-based adjustments for certain dihedral angle pivot moves):
Specifically, for the various branches of single dihedral angle pivot moves ( OTHER moves) that are meant to primarily operate on those dihedral angles that cannot be sampled by the specialized move sets within CAMPARI (→ FMCSC_OTHERUNKFREQ and FMCSC_OTHERNATFREQ), CAMPARI allows adjustment of the picking frequencies at the level of individual degrees of freedom. Input is as follows:
-----------------
i w1
-----------------
Here, i denotes the atom number corresponding to the Z-matrix line associated with the dihedral angle for which the sampling weight should be changed. Since every dihedral angle can only occur in at most one of the three subcategories sampled by OTHER moves (viz., degrees of freedom in residues that are not natively supported; degrees of freedom in native CAMPARI residues that cannot be sampled by any of the other move types; degrees of freedom that can be sampled by other CAMPARI move types), only a single weight (w1) is sufficient. Note that this means that the sets of weights supplied in a section of this type may not all act on the same list of degrees of freedom.
'R' or 'r':
(residue-based adjustments for all other moves):
For all other moves, CAMPARI allows adjustment of the picking frequencies at the residue level. Input is as follows:
-----------------
i w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11
-----------------
Here, i denotes the number of the residue (as defined by sequence input). The wi weights refer to polypeptide φ/ψ-pivot moves (w1), polypeptide ω-pivot moves (w2), sidechain moves (w3), polynucleotide (and other polymer) backbone pivot moves (w4), polypeptide pucker backbone moves (w5), sugar pucker backbone moves (w6), polypeptide Dinner-Ulmschneider exact concerted rotation moves without ω-sampling (w7), polypeptide Dinner-Ulmschneider exact concerted rotation moves with ω-sampling (w8), polynucleotide Dinner-Ulmschneider exact concerted rotation moves (w9), polypeptide Ulmschneider-Jorgensen exact bond angle-based concerted rotation moves (w10), and polypeptide Favrin et al. inexact concerted rotation moves (w11). This is tricky for moves sampling stretches of residues, and the complications are discussed elsewhere.
Before presenting an example, a few remarks are in order. First, it is permissible to truncate each line after w1, if there is no need to alter the picking probabilities for move types controlled by any of the omitted parameters. It is, however, not possible to alter - for example - the picking weight for a specific residue for sugar pucker moves without also specifying w1-w5.
In general, entries that request weight alterations for ineligible entities produce a warning. As mentioned above, missing entries are all assumed to be 1.0, which allows the file to remain small in most cases. Weights must be zero or positive numbers, otherwise the behavior may be undefined. Note that it is not possible to utilize the molecule type- or molecule-based modes to control picking frequencies for move types listed in the residue-based section (also holds for any other combination). Furthermore, CAMPARI does not tolerate if sampling weights deplete all entities for a given move type and will exit with an error in such a case.
Example:
----------
R
12 1.0 1.0 10.0
M
1 1.0 10.0
31 0.1
R
14 1.0 1.0 1.0 5.0 1.0 5.0 1.0 1.0 5.0
A
71 3.0
----------This input file would lead to the following changes to the default picking probabilities (in each case, the alteration is contingent upon the presence of the corresponding move type in the move set and the eligibility/presence of the chosen entity):
- Residue #12 would be preferentially picked for sidechain moves (factor 10 to all other eligible residues).
- Molecule #1 would be preferentially picked for cluster rigid body moves (factor 10 to all other eligible molecules).
- Molecule #31 would be less frequently picked for single molecule rigid body moves (factor 0.1 to all other eligible molecules).
- Residue #14 would be preferentially sampled in all nucleotide-specific moves (pivot, puckering, and concerted rotation). For pivot and puckering, this is a factor of 5.0 to all other residues. For concerted rotation, this is a factor of 5.0 for residue #14 being the 3'-terminal residue in a polynucleotide concerted rotation stretch compared to all other eligible 3'-terminal residues in a stretch.
- The dihedral angle in Z-matrix entry #71 (corresponds with atom number) would be preferentially sampled in whatever subcategory of OTHER moves this degree of freedom belongs to. The factor of 3.0 is with respect to all other degrees of freedom in the same subcategory.
Biotype patch input file:
Keyword: FMCSC_BIOTYPEPATCHFILEThe highest level patch that can be applied in CAMPARI is to change the biotypes specific atoms are assigned to. In the parameter file, biotypes are the key assignment for every possible atom in residues natively supported by CAMPARI, because they indirectly set all other, atom-specific parameter assignments handled by the parameter file, i.e., (LJ) atom types, charge types, and bonded types. Atom types in turn determine other parameters such as mass or valency. This means that (re)assigning the biotype of a given atom changes all those properties at once.
There are atom-specific parameters that cannot be changed by a biotype patch, viz., certain parameters of the solvation model (see below). Similarly, parameters that are based on groups of atoms are unaffected by biotype patches (see free energy of solvation parameters in the parameter file and corresponding patches). Biotype patches are processed before any other parameter patch is considered. This means that it is possible to override aspects of a biotype patch by subsequent application of atom type patches, mass patches, etc.
Biotype patches can be applied in two variants; the first is atom-specific, and the second applies patches consistently across all residues of identical type.
Atom-specific example:
---------------------------------------
A
113 118
114 119
115 120
---------------------------------------
This input file selects the atom-specific mode ('A'), and would (re)assign biotypes for atoms 113-115 (and only those atoms) in the system. The chosen biotypes correspond to the N, C, and CA atoms of tyrosine. Such a patch (incomplete obviously) would be appropriate for example when simulating or analyzing a system containing phosphorylated tyrosine. Then, it is often convenient to set a number of basic parameters by taking advantage of the similarity of the two entities and the corresponding overlap in parameter space. Support for residues not natively supported by CAMPARI is the main use of biotype patches. Another example is as follows:
Residue-level example:
---------------------------------------
R
1 1100
2 1101
3 1102
4 1103
5 1104
6 1105
---------------------------------------
Suppose the system is a dense phase of identical small molecules with 6 atoms each. Then, this patch could be used to very efficiently provide complete parameter support for this system. For this, we assume that biotypes 1100-1105 have been created by the user, and that the relevant information for these biotypes has been provided. By virtue of selecting option 'R', this patch would be automatically propagated to all residues of identical type and identical terminus state (the latter is relevant for polymers only, i.e., the patch would (and most likely could) not be propagated from an alanine residue in the middle of a polypeptide to a C- or N-terminal alanine residue in the same or a different polypeptide). If the residue in question is not natively supported by CAMPARI, only those unsupported residues of identical name are grouped that have exactly the same number and names of atoms and that are in the same chain position (C-terminal vs. N-terminal vs. nonterminal). Two possible applications for a patch like the second example above could be simulations of an unsupported small molecule liquid or gas or the desire to keep multiple parameterizations for the same small molecule in the same parameter file and to switch between them via patch (this in general will be less error-prone and easier to maintain than creating divergent parameter files). When using the 'R' option for a biotype patch, the atom indices in use should correspond to the first residue of that type (otherwise the patch request is ignored). Keep in mind that atom names in pdb output files will also be altered by biotype patches.
Atomic mass patch input file:
Keyword: FMCSC_MPATCHFILEAtomic masses are always read and assigned to each atom in the system, and this input file allows changing the value assumed based on the assigned atom type. In general, masses are not explicit components of Monte Carlo sampling, and are not expected to change thermodynamic properties of the system provided that all other parameters remain fixed. Instead, masses influence system dynamics. An exception are potentials explicitly computing the mass, an example being spatial mass density restraints.
Example:
---------------------------------------
666 14.
1444 40.
2211 2.
---------------------------------------
This input file would change the masses for atoms 666, 1444, and 2211 to those (approximately) of nitrogen, argon, and deuterium, respectively. Chosen values are required to be at least 1.0, since massless sites require constraints for stable integration, but cannot use standard constraint solvers. The most common application for mass patches would be in improving integrator stability by artificially increasing the mass of those atoms responsible for the fastest degrees of freedom in the system, i.e., hydrogen. Another application are otherwise unsupported residues, for which the guessed atom types are inappropriate, or which contain elements that the parameter file does not even have theoretical support for.
Atomic radius patch input file:
Keyword: FMCSC_RPATCHFILESimilar to atomic masses, every atom is always assigned an atomic radius, and this input file allows changing the value assumed derived from parameters, i.e., either based on the assigned atom type or on the radius override in the parameter file. Atomic radii are generally supposed to correspond to the exclusion radius of an atom, and are elements of energy terms such as the ABSINTH implicit solvation model and all spatial density restraints, but also of solvation or void space analysis.
Example:
---------------------------------------
6 1.
17 0.1
22 2.5
---------------------------------------
This input file would change the radii for atoms 6, 17, and 22 to 1.0, 0.1, and 2.5 Å, respectively. Chosen values are required to be at least 0.0, i.e. sizeless atoms are explicitly allowed. Note that atomic radii and topology in polyatomic molecules have to be matched to one another. This is because parameters derived from atomic radii, such as maximum SAV fractions or volume reduction factors, which are also patchable, neglect overlap terms corresponding to three or more covalently bound atoms. Also note that there should be no performance gain associated with reducing the radii of atoms to zero for the ABSINTH implicit solvation model.
Group-based thermostat coupling input file:
Keyword: FMCSC_TSTAT_FILEThis is a very simple input file allowing the (obsolete) adjustment of the Berendsen weak-coupling and the Bussi et al. velocity rescaling thermostats to couple parts of the system to independent thermostats, i.e., to apply velocity adjustments based on instantaneous temperatures for the subsystem and to instantaneous velocities for just that subsystem. There is no physical justification for doing this in general as it effectively restricts energy transfer between the subsystems and can easily lead to artifactual results. It was used predominantly as a trick to circumvent subsystem freezing which can sometimes occur if one subsystem has much faster kinetics and different levels of integrator noise than others (see FMCSC_TSTAT). The input file is very simple. There are two modes, a molecule-based one (first line 'M') and a molecule type-based one (first line 'T'):
Mode 'M': One number per line after the first line ('M'), where the number specifies the temperature coupling group the molecule corresponding to the line is supposed to be part of. For a system with 100 molecules, there are 100 lines after the first one, each one containing an integer with a coupling group.
Mode 'T': One number per line after the first line ('T'), where the number specifies the temperature coupling group the molecule type corresponding to the line is supposed to be part of. As usual, molecule types are numbered intrinsically by occurrence in the sequence input file. For a system with two molecules types (say a protein in water), there would only be two lines after the first one, both containing an integer with a coupling group.
Note that coupling groups have to be positive integers, and that non-consecutive integers are re-numbered accordingly to provide a consecutive set.
Wang-Landau sampling initial guess input file:
Keyword: FMCSC_WL_GINITFILERequired for: Overriding of the flat initial guess for the (logarithmic) target distribution in a Wang-Landau run. Naturally, this is is irrelevant for any simulation not (at least partially) using the WL method.
The (logarithmic) target distribution of a Wang-Landau run is not generally expected to be flat, neither globally nor locally. This means that a flat initial guess (the default) may slow down the construction of the distribution's coarse features (especially at boundaries). In these cases, it may be helpful to supply a nonflat initial guess.
The format of this input for one-dimensional cases is simply two columns, the first giving the center of the bin in question, and the second giving the initial guess for the target distribution. Because all distributions are handled in logarithmic space, the only criterion of relevance are differences between bins and the relation of these differences to the starting value for the f parameter. The bin spacing has to be constant. Range settings provided via keywords FMCSC_WL_MAX and FMCSC_WL_BINSZ are overwritten. Note that the assessment of convergence (f parameter reductions) does not rely on the estimate of the target distribution (e.g., ln g(E)), but rather on the simple visitation histogram. This means that it may be necessary to choose a relatively large number of buffer steps with a nonflat initial guess to ensure that all relevant bins are populated before the actual iteration begins.
Example:
---------------------------------------
0.1 10.2
0.3 4.1
0.5 3.4
0.7 3.2
0.9 6.8
---------------------------------------
This would initialize the histogram to 5 bins with a spacing of 0.2 and a maximum value of 1.0. This example could correspond to a normalized reaction coordinate (such as Zα) and would give an extremely coarse result (the bin spacing will never vary throughout a WL run).
For the two-dimensional case, CAMPARI expects Nd2+1 columns per row and an additional header line. Nd2 is the number of bins in the initial guess for the second dimension (always a geometric reaction coordinate). In each non-header row, the values listed should be the bin center coordinate in the first dimension followed by the Nd2 initial estimates for the target distribution. Note that all Nd2 entries must be listed on every data row. The header line itself must contain an integer identical to Nd2 followed by the Nd2 increasing bin center coordinates in the second dimension.
Example:
-----------------------------------------------------
6 3.4 3.8 4.2 4.6 5.0 5.4
0.1 1.2 1.7 2.5 2.8 2.1 1.2
0.3 0.1 0.2 0.4 0.6 0.4 0.1
0.5 0.1 0.1 0.4 0.5 0.3 0.1
0.7 0.2 0.7 1.0 0.8 0.4 0.0
0.9 0.5 1.8 2.1 1.2 0.2 0.0
-----------------------------------------------------
This would initialize the histogram to 5x6 bins with spacings of 0.2 and 0.4 maximum values of 1.0 and 5.6. This example could correspond to a two geometric reaction coordinates, e.g., Zα and the radius of gyration for a very short peptide.
Section 3: Files redefining standard terms of the energy function
(back to top)
MPI replica exchange input file:
Keyword: FMCSC_REFILERequired for: Use of Hamiltonian (or temperature) replica-exchange methodology → FMCSC_REMC
Note that the code needs to be compiled with MPI support (see INSTALL) for this option be available. On the first line a set of integers indicates different(!) "dimensions" of the requested set of replicas. Such dimensions are environmental terms and/or parameters of the Hamiltonian which are supposed to differ between the individual replicas. The by far most common application is to specify different temperatures. However, in theory many more exchange parameters are possible and may be useful for certain applications (for examples FMCSC_FEG_IPP and similar parameters in the case of free energy calculations). In CAMPARI, the following are available in REMC/D runs:
- Temperature
- Scaling factor for repulsive inverse power potential (see FMCSC_SC_IPP)
- Scaling factor for dispersive 6th power potential (see FMCSC_SC_ATTLJ)
- Scaling factor for the Weeks-Chandler-Andersen (WCA) potential (see FMCSC_SC_WCA)
- Scaling factor for polar (Coulombic) potential (see FMCSC_SC_POLAR)
- Scaling factor for free energy of solvation term for ABSINTH model (see FMCSC_SC_IMPSOLV)
- Implicit solvent dielectric constant (see FMCSC_IMPDIEL, note that this is currently only supported as the implicit dielectric for the ABSINTH model, and does not work when interpreted as the (generalized) reaction-field dielectric)
- Scaling factor for torsional bias potential (see FMCSC_SC_TOR)
- Scaling factor for secondary structure bias potential (see FMCSC_SC_ZSEC)
- Global secondary structure bias potential: target α-content (see FMCSC_ZS_FR_A)
- Global secondary structure bias potential: target β-content (see FMCSC_ZS_FR_B)
- Screening model for implicit solvent (see FMCSC_SCRMODEL)
- Sigmoidal steepness for free energy of solvation term (τF) in ABSINTH model (see FMCSC_FOSTAU)
- Sigmoidal steepness for dielectric screening (τS) in ABSINTH model (see FMCSC_SCRTAU)
- Sigmoidal midpoint for free energy of solvation term (χF) in ABSINTH model (see FMCSC_FOSMID)
- Sigmoidal midpoint for dielectric screening (χS) in ABSINTH model (see FMCSC_SCRMID)
- Contact dielectric for screening models 3, 4, or 7-9 (see FMCSC_CONTACTDIEL and FMCSC_SCRMODEL)
- Generalized mean for screening models 5-8 (see FMCSC_ISQM and FMCSC_SCRMODEL)
- Mixing contribution from distance dependence for screening models 3, and 7-9 (see FMCSC_SCRMIX and FMCSC_SCRMODEL)
- Scaling factor for ghosted repulsive inverse power potential (see FMCSC_FEG_IPP)
- Scaling factor for ghosted dispersive 6th power potential (see FMCSC_FEG_ATTLJ)
- Scaling factor for ghosted Coulombic interactions (see FMCSC_FEG_POLAR)
- Scaling factor for tabulated potentials (see FMCSC_SC_TABUL)
- Scaling factor for polymeric biasing potential (see FMCSC_SC_POLY)
- Scaling factor for distance restraint potential (see FMCSC_SC_DREST)
- Scaling factor for ghosted bond length potentials (see FMCSC_FEG_BONDED_B)
- Scaling factor for ghosted bond angle potentials (see FMCSC_FEG_BONDED_A)
- Scaling factor for ghosted improper dihedral potentials (see FMCSC_FEG_BONDED_I)
- Scaling factor for ghosted torsional potentials (see FMCSC_FEG_BONDED_T)
- Global E-score for DSSP biasing potential (see FMCSC_DSSP_ESC)
- Global H-score for DSSP biasing potential (see FMCSC_DSSP_HSC)
- Time step for dynamics calculation → this acts as a dummy in pure Monte Carlo runs (see FMCSC_TIMESTEP)
- Scaling factor for density restraint potential (see FMCSC_SC_EMICRO)
- Threshold setting for density restraint potential (see FMCSC_EMTHRESHOLD)
Example:
-------------------
1 4
298.0 0.9
350.0 0.8
-------------------
Such an input file would set up two replicas; the first one at a temperature (code #1) of 298K and with a global scaling factor for the WCA potential of 0.9 (code #4) and the second one at a temperature of 350K with a WCA scaling factor of 0.8.
Note that the normal requests in the key-file for any dimension which is part of the replica space are going to be ignored (in the above example, putting a value for FMCSC_SC_WCA would have no effect).
Also note that setting scaling factors for energy terms to zero in this input file will be slightly different than when this is done in any other calculation. As a result, there might be no speed-up for individual replicas (this is usually not desired anyway). Most other changes will be hidden from the user and relate to the necessity to be able to calculate cross-energies for which such scaling factors might not be zero. Lastly, performance may be degraded for certain choices requiring extensive recalculation of expensive terms, e.g., the choice of screening model for the ABSINTH model or the threshold setting for the density restraint potential. In such cases, very small values for FMCSC_REFREQ may be counterproductive.
Free energy growth (ghosting) input file:
Keyword: FMCSC_FEGFILERequired for: Use of specific, energetic decoupling of parts of the system from the rest via ghosting of interactions → FMCSC_GHOST
The input file for specifically scaling interactions of certain molecules or residues with the rest of the system. This is only supported for a limited range of Hamiltonians at the moment (see FMCSC_GHOST and related keywords in KEYWORDS):
(the first line gives the mode)
'T' or 't':
The input is then simply:
----------
i 0
----------
The integer i denotes the molecule type, for which the interactions with the rest of the system are to be ghosted. The optional 0 can be used to override the scaling settings in order to decouple an individual group from the system entirely. This is tailored to rather specific applications, for instance the stepwise calculation of free energies of solvation on a residue-by-residue basis. Note that full de-coupling requires all the bonded FEG-parameters to be set to unity (see FMCSC_FEG_BONDED_B, etc). This has primarily technical reasons.
'M' or 'm':
Analogous but with molecule number instead of molecule type number
'R' or 'r':
Analogous but with residue number instead of molecule type number
Example:
----------
r
1 0
2
----------
This will ghost the first residue in the sequence entirely (i.e., it contributes nothing to the simulation). The second residue in the sequence will have scaled interactions according to the setting of the FEG-scaling parameters (see KEYWORDS, e.g. FMCSC_FEG_IPP). Always keep in mind that interactions between ghosted entities as well as within ghosted entities are computed according to the selection in FMCSC_FEG_MODE and that this may sometimes lead to unexpected or inconsistent behavior. For example, a water box full of ghosted water molecules will behave like any other water box (with absolutely no ghosting) if FMCSC_FEG_MODE is set to 1.
Bonded potential terms patch input file:
Keyword: FMCSC_BPATCHFILEPrerequisite: This file is only relevant if at least one of the bonded potential terms (see bond length potentials, bond angle potentials, improper dihedral angle potentials, torsional potentials, and CMAP potentials) is in use.
This file can be employed to override default assignments for bonded interactions or to add bonded interactions on internal coordinates that were previously not assigned a bonded interaction. A fundamental limitation is that such new bonded interactions can only be applied to internal coordinates that could have been assigned a term by default, e.g., it is not possible to apply a bond length potential to atoms a-b if a is not in fact covalently bound to b.
Default assignments are made based on the parameter file (see PARAMETERS and dedicated documentation), and the user can force CAMPARI to print those to log-output via FMCSC_BONDREPORT. In the patch file, each line has to contain as its first entry a keyword that is either "PATCH_BOND", "PATCH_ANGLE", "PATCH_IMPROPER", "PATCH_TORSION", or "PATCH_CMAP". Take glycine dipeptide (acetyl and N-methylamide caps) in united-atom representation as an example. This molecule has the CH3 and C atoms of the acetyl group, N, Cα, and C of glycine, and the N and CH3 atoms of the C-terminal capping group as indices 1, 2, 4, 5, 6, 9, and 10, respectively.
Example:
---------------------------------------
PATCH_BOND 1 2 23
PATCH_ANGLE 4 5 6 11
PATCH_IMPROPER 4 2 5 8 44
PATCH_TORSION 5 6 9 10 112
PATCH_CMAP 9 6 5 4 2 3
---------------------------------------
The first line of this input file would replace the existing bond length potential between the CH3 and C atoms of the acetyl group with the one in the parameter file in use that has index 23. If no potential was previously assigned (fatal in Cartesian dynamics), it would be added by this patch. If no potential with index 23 is found in the parameter file, this line of the patch file is ignored (and a warning is produced). It is important to note that the patch file exclusively handles the assignment, but not the parameters themselves. The reason is that appending the list of bonded potentials in the parameter file allows reusing an entry multiple times. If the parameters themselves were also part of the patch file, they would have to be repeated over and over again in such a case, which would increase the amount of work and the likelihood of errors creeping in.
Similarly, the second line would change (or add) the bond angle potential acting across the N→Cα→C angle in glycine to the one with index 11 in the parameter file in use. The same notes and caveats as before apply. The third line would alter the assignment for the improper dihedral angle formed by the (central) N of glycine along with the Cα and polar hydrogen (index 8) atoms of glycine, and the carbonyl carbon of the acetyl group. The potential type would be the torsional potential with index 44. The fourth line would assign the 112th torsional potential in the parameter file to the ω-torsion of the peptide bond connecting glycine and the C-terminal N-methylamide. Lastly, the fifth line would change (or add) a CMAP potential acting on the consecutive ψ- and φ-angles of glycine. Note that the atom order is reversed compared to a standard CMAP assignment as found in the CHARMM force field.
A few more comments are in order. First, redundant permutations in the lists of atoms are understood by CAMPARI (see here for definitions of what redundant permutations are for the individual cases). Second, CAMPARI will only process patches that relate to an energy term that is actually turned on. For instance, if in the above example FMCSC_SC_BONDED_M were zero, the fifth line would not be processed at all. Third, for processed patches that fail due to the atoms not forming an internal coordinate of the specified type and/or due to the potential choice being out of range, a warning is produced. Similarly, every successful patch will be summarized to log-output as well. Fourth, patch keywords that CAMPARI fails to understand will cause a fatal exit of the program. Fifth, multiple redefinitions of the same internal coordinate are considered according to input order, i.e., only the last definition will end up being relevant. This implies that is not possible to "stack" multiple potentials on an individual internal coordinate. Sixth and last, atom numbering will always correspond to CAMPARI's internal order, which can usually be extracted from pdb files produced at the beginning or end of a run.
Size exclusion and dispersion parameters patch input file:
Keyword: FMCSC_LJPATCHFILEPrerequisite: This file is always read, but the assignments are only relevant if one of the short-range interaction potentials is active (IPP, ATTLJ, or WCA) or any functionality or potential depending on derived parameters (such as atomic radii or volumes used, for example in the ABSINTH implicit solvation model) is in use.
This file can be employed to override the Lennard-Jones σ and ε parameters by changing the default atom type assignment for a specific instance of a biotype. Each line has to contain two integers, the first giving the atom number (CAMPARI-internal numbering), and the second the reassigned atom type number in reference to the parameter file in use.
Example:
---------------------------------------
15 11
1443 1
---------------------------------------
This input file would attempt to reassign atom 15 to atom type 11 and atom 1443 to atom type 1. Note that the success depends on the existence of the appropriate number of atom types in the parameter file. As long as they are continuously numbered and all parameters are provided, it is feasible to create arbitrary amounts of atom types in the parameter file with the limiting case that - theoretically - every actual atom in the system could be assigned unique parameters with unique pairwise and 14-exceptions through the use of this patch facility.
An important caveat lies in the fact that masses, atomic numbers, valencies, and other quantities specified in the atom lines of the parameter file are not altered. This is particularly relevant when dealing with unsupported residues (see sequence input and trajectory analysis), since CAMPARI will make an automatic guess for a biotype assignment, from which the default assignment is gleaned. If this yields unsatisfactory results regarding mass or valency, this patch file is not helpful. Instead, the input pdb should be altered to aid CAMPARI in guessing an appropriate type. As usual, the index in the first column corresponds strictly to the internal numbering used by CAMPARI, which can, for instance, be read out from the pdb file produced at the beginning of the simulation (→ here) unless nucleotides are part of the system and are requested to conform to pdb convention (→ FMCSC_PDB_NUCMODE).
Charge patch input file:
Keyword: FMCSC_CPATCHFILEPrerequisite: This file is only relevant if the polar potential is used.
This file can be employed to override partial charge assignment extracted from the parameter file (see PARAMETERS and dedicated documentation). Each line has to contain two entries, an integer giving the atom number and a floating point number giving the new partial charge.
Example:
---------------------------------------
33 -1.76
34 0.88
35 0.88
---------------------------------------
This input file would change the atomic partial charges on atoms 34 and 35 to 0.88e each, and that on atom 33 to -1.76e. Note that it is up to the user to ensure that the resultant charges are sane and self-consistent, although CAMPARI will perform its usual tests and provide warnings/exits in case they are not (keywords FMCSC_ELECREPORT and FMCSC_UNSAFE may become important or useful here). Also note that the numbering, which the index in the first column refers to, is strictly the internal numbering used by CAMPARI. This can generally be read out from the pdb file produced at the beginning of the simulation (→ here) unless nucleotides are part of the system and are requested to conform to pdb convention (→ FMCSC_PDB_NUCMODE).
This functionality is merely thought to provide any corrections to partial charges which would break the identity of biotypes set up in the parameter files and maintained within CAMPARI itself. The most common example of this would be partial charges for terminal polypeptide residues in the AMBER-class of force fields.
Residue-level net charge flag patch input file:
Keyword: FMCSC_NCPATCHFILEPrerequisite: This file is only relevant if the polar potential is used. Additional requirements are that the electrostatic interaction model is group-based and/or that the cutoff treatment for long-range Coulombic interactions is such that LREL_MC is 1, 2, or 3 and/or that LREL_MD is either 4 or 5.
This file can be employed for two purposes. First, it can disable the flag CAMPARI assigns to a residue that has at least one charge group in it with a total charge exceeding a tolerance setting in absolute magnitude. It is currently impossible to turn the residue-level flag on (since it would not in general be clear which charge group to do this for and why). Second, it can aid the automatic determination of charge groups by defining a series of target values (presumably different from 0.0). Charge groups are generally determined by an algorithm that finds sets of atoms of a desired (integer or zero) net charge, which are also topologically close, meaning that they do not isolate other polar atoms (topologically speaking). For built-in residues, CAMPARI has a specific understanding of what to expect (e.g., it would look for two charge groups with net charge of 1.0 along with any number of neutral groups in an N-terminal lysine residue). However, this understanding may not always be appropriate, and no such understanding is available for unsupported entities. This is the primary purpose of patching the charge group targets.
Each line has to contain two or more entries. The first is an integer giving the residue number, the second is either 0 or 1 (with only a value of 0 specified for a residue that contains charge groups matching the criteria outlines above being relevant), and a number of floating point values.
Example:
---------------------------------------
3 1 -1.0 -1.0 -1.0 0.0
7 0
---------------------------------------
This input file would flag the 7th residue in the system as no longer carrying a net charge irrespective of the partial charges on its atoms and irrespective of how they were grouped into charge groups. For the 3rd residue, it would instruct CAMPARI to look for 3 separate charge groups with net charges of 1.0 each, e.g., residue 3 could be an unsupported small molecule such as fully deprotonated citric acid (3-). The 0.0 specified at the end is optional, since the program will always look for as many net neutral groups as possible while also processing the nonneutral groups. Note that the sum of target values must match exactly the total charge of the residue in question. If any given target cannot be reached by any acceptable grouping (there are restrictions as mentioned above in addition to limitations of the search size in terms of numbers of bonds), the entry in question will be ignored or lead to an error. It is therefore possible to exercise a reasonable amount of control over the charge group parsing, but it is likely that some trial-and-error is involved.
As alluded to before, the main application domain for changing the charge flag via this patch (residue 7 in the example above) would be the suppression of the computation of long-range interactions between residues flagged as charged simply because their partial charges do not group cleanly into neutral groups. Specifically, polarization across neighboring residues in a polymer without any formal charges would generally flag both residues as charged. This is because in such a case a net neutral group is split arbitrarily because of the requirement for charge groups to be confined to residues (common with charges fit from QM data without constraints). The main application for the second functionality (residue 3) is to deal with unsupported residues, correctly reflect manual charge patches or altered parameters, to deal with unusual species such as zwitterions, or to achieve an approximate grouping for charges that do not group exactly (which can be highly relevant for certain screening models within the ABSINTH framework).
Input file for mapping Sybyl (mol2) atom types:
Keyword: FMCSC_SYBYLLJMAPRequired for: Overriding the automatically determined map of Sybyl (mol2) atom types to atom types in the atom type definitions in the parameter file. This is currently only relevant in a small molecule screen.
This input file contains an integer map of 40 Sybyl atom types to atom types in the parameter file. The file format is very simple - one integer per line. The types are as follows:
- standard sp2 or lower C (C.2 or C.1 or C.cat)
- standard sp3 C (C.3 or generic)
- standard sp2 N (N.2, N.am, N.pl3, or generic)
- standard sp3 N(+) (N.3 or N.4)
- ring-based sp2 N or sp1 N (N.ar or N.1)
- sp3 O (any oxygen that is not O.2, mostly O.3)
- sp2 O (O.2)
- apolar H (H)
- polar H (H, not Sybyl-derived but identified by partial charge)
- standard sp3 S (S.3 or any sulfur with two substituents)
- standard sp2 S (S.2 or sulfide anion)
- sulfur with 4 substituents (S.O2 or similar)
- any phosphorous with at least 4 substituents (P.3)
- covalently bound F (F)
- covalently bound Cl (Cl or Hal)
- covalently bound Br (Br)
- covalently bound I (I)
- aluminium (Al)
- copper (Cu)
- magnesium (Mg)
- calcium (Ca)
- iron (Fe)
- manganese (Mn)
- cobalt (Co)
- chromium (Cr)
- zinc (Zn)
- silicon (Si)
- arsenic (As)
- selenium (Se)
- lithium (Li)
- potassium (K)
- sodium (Na)
- sulfur with 3 substituents (S.O or similar)
- phosphorous with fewer than 4 substituents (not a Sybyl type)
- aromatic sp2 C (C.ar)
- any lone pair (massless and sizeless dummy particle) (LP, LPH)
- oxygen in TIP3P water (O.t3p)
- hydrogen in TIP3P water (H.t3p)
- oxygen in SPC water (O.spc)
- hydrogen in SPC water (H.spc)
Atomic solvation parameters patch input file:
Keyword: FMCSC_SAVPATCHFILE and FMCSC_ASRPATCHFILEPrerequisite: There are two atomic parameters that can be patched from their default values, which are primarily relevant in calculations using the ABSINTH implicit solvation model. Both use an identical file format and identical restrictions on the possible values (interval from 0.0 to 1.0), and are therefore described together.
Specifically, the maximum value for the fraction of the solvent-accessible volume (ηi,max) and the atomic volume reduction factor used in most computations based on atomic volumes can be altered. Both quantities are derived from default, hard-coded molecular topologies and not from input structures (unless an unsupported residue is in use, where the former is of course missing). They are both reported in output file SAV_BY_ATOM.dat as columns 4 and 7, respectively. A patch file supplied via FMCSC_SAVPATCHFILE will alter the ηi,max values, and a patch file supplied via FMCSC_ASRPATCHFILE will alter the volume reduction factors. In either case, each line has to contain two entries, an integer giving the atom number and a floating point number giving the new parameter constrained to the interval [0:1].
Example for FMCSC_SAVPATCHFILE:
---------------------------------------
16 0.86
4 0.92
22 0.99
---------------------------------------
This input file would change the values ηi,max for atoms 4, 16, and 22 to 0.92, 0.86, and 0.99, respectively. CAMPARI will not perform any noteworthy tests whether the patched values are meaningful. In this context, it is important to note that very small values for the ηi,max may lead to large gradients due to the compression of the interpolation regime between solvated and desolvated states. Aside from printing a report for all successful changes, the results of the patch can also be assessed by inspecting the aforementioned SAV_BY_ATOM.dat (assuming SAVCALC is appropriate).
In general, neither class of parameters is considered free, so patches of this type would seem most likely to be useful when implementing Hamiltonian support for otherwise unsupported residues. It may also come in useful for cases, where either the atomic size parameters and/or the molecular topology mean that the pairwise volume reduction factors are inappropriate (better approximation required and/or triple overlaps significant), although it should be mentioned that the radius functionality of the parameter file and radius patches for specific atoms are probably the more appropriate tools in such a case.
Free energy of solvation patch input file:
Keyword: FMCSC_FOSPATCHFILEPrerequisite: This file is only relevant if the direct mean file interaction (DMFI) of the ABSINTH implicit solvation model is used (→ FMCSC_SC_IMPSOLV).
This file provides the input facility to specifically override the details of the DMFI model, i.e., the parsing of the molecule into solvation groups (by atoms), the weight factors for individual atoms constituting solvation groups, and the reference free energies of solvation themselves. Depending on the choice for FMCSC_FOSMODE, it may also be relevant to supply patched values for solvation enthalpies and heat capacities. For global changes of the latter type, it should be pointed out that changes directly in the parameter file are simpler (see PARAMETERS and dedicated documentation). Each line has to contain between three and six entries. The first is an integer giving the atom number using the default CAMPARI atom order (that can usually be extracted from basic output files → see above). The second is another integer identifying by a code the target solvation group. The chosen code is arbitrary as long as all separate solvation groups are referred to by a different integer. The third number is a floating point value giving the weight for the indicated atom. The sum of weights for all atoms using the same integer code in column 2 have to add up to unity exactly (if fractions occur that cannot be specified with arbitrary precision in decimal floating point representation, it is advised to provide enough digits to emulate double precision). The remaining numbers are floating point numbers giving (in this order) the reference free energy of solvation for a group, the corresponding solvation enthalpy, and the heat capacity (in kcal/mol, kcal/mol, and cal mol-1 K-1, respectively). In general, only the first number is required for a successful patch, and it needs to be specified only with the first instance of the corresponding integer code. Numbers given for further atoms belonging to the same group are ignored. Note that missing values for enthalpy and heat capacity would override values given in the parameter file (if any) by instead setting the heat capacity to zero and the enthalpy to be equal to the free energy.
A single character in the first line lets the user choose between three different modes. All three modes have identical restrictions beyond the ones already stated. First, the resultant solvation groups overlap with existing solvation groups (that are most conveniently specified in optional output file FOS_GROUPS.vmd → FMCSC_FOSREPORT). None of the atoms in one of those original groups must remain unaccounted for by the patch. Second, newly defined groups must not cross residue boundaries unless they represent trivial changes to existing groups that do already cross residue boundaries (peptide unit). Here, "trivial" means that the constituting atoms are exactly the same, and that only the (nonzero) weights and/or the reference free energy of solvation are altered. Third, all weights have to be positive.
Mode 'A': The specifications after the first line ('A') are assumed to apply only to those atoms exactly specified. This is the simplest input mode that allows maximal control (each atom can be controlled individually). Suppose the simulation encompasses two formamide molecules (6 atoms each). The standard group parsing is to have all atoms except the aldeyhde hydrogen (indices 4 and 10) constitute a single group with equal weights (0.2).
Example:
---------------------------------------
A
7 33 0.5 -8.0
8 33 0.25
9 33 0.25
11 33 0.0
12 33 0.0
---------------------------------------
The above input would change the solvation group setup for the second formamide molecule such that the group now only consists of the heavy atoms (7-9) with asymmetric weights (0.5 for N, 0.25 for C and O), and an altered reference free energy of solvation. The code "33" is arbitrary as explained above. Omitting the entries for atoms 11 and 12 would cause an error because these atoms were formerly part of an existing solvation group, and would be left unaccounted for.
Another Example:
---------------------------------------
A
9 14 0.5 -5.0
8 -1 0.5 -5.0
7 -1 0.5
10 15 1.0 -0.1
11 14 0.25
12 14 0.25
---------------------------------------
This example would define three solvation groups for the second formamide molecule. The CO unit (code "-1"), the NH2 unit (code "14"), and the lone aldehyde hydrogen (code "15"). Because the aldehyde hydrogen was previously not part of a solvation group, it can be included freely into new solvation groups.
Mode 'T': This mode works similarly to mode 'A' with the major differences that all requests are generalized in molecule type-dependent fashion. The specified indices have to refer to the first instance of a given target molecule type. The two examples listed above would both lead to premature termination of CAMPARI, because both reference the second of two formamide molecules.
Example:
----------------------------------------------------------------------------
T
3 1 0.3333333333333 -5.5 -10.0 50.0
1 2 0.5 -5.5 -6.5
2 2 0.5
5 1 0.3333333333333
6 1 0.3333333333333
----------------------------------------------------------------------------
As a result of this input file, the solvation groups on both formamide molecules would be altered. They would be changed to separate groups covering the CO and NH2 units, respectively, with both groups having reference free energies of solvation of -5.5kcal/mol and equal atomic weights. In addition, the CO unit would receive a solvation enthalpy of -6.5kcal/mol, and the NH2 unit would receive an enthalpy of -10.0kcal/mol and a heat capacity of 50.0 cal mol-1 K-1. These latter modifications would only be relevant if a temperature-dependent ABSINTH DMFI were in use.
Mode 'R': This mode allows selection and generalization by residue type, but works otherwise identically to mode 'T'. For polymeric systems, it will sometimes be of interest to selectively change details of the definition of solvation groups for a specific type of residue. For example, how does polyglutamine respond to global alterations in the sidechain solvation groups. The numbering should refer to the atomic indices of the first instance of the targeted residue type. However, care must be taken, since CAMPARI - by necessity - distinguishes terminal from nonterminal residues with regards to its type. Consider the sequence (Gln)5 with charged termini. To change the assignments for the internal glutamine residues, one would have to find the atomic indices of the second glutamine residue, and enter the appropriate modifications. Both the N-terminal and the C-terminal glutamine residue would require an additional specification each, since they are technically of different type. The same is true for crosslinked residues (if any).
As with all patch functionality, it is recommended that users check meticulously for possible errors in the parsing of patch input files. By enabling FMCSC_FOSREPORT in the key-file, CAMPARI will provide a report of all modified solvation groups (before vs. after), and users are encouraged to use this output in helping them diagnose potential errors beforehand.
Input file for library of solvation groups:
Keyword: FMCSC_MOL2FOSLIBFILEPrerequisite: This file is only relevant if the direct mean file interaction (DMFI) of the ABSINTH implicit solvation model enable by keyword FMCSC_SC_IMPSOLV is in use while performing a small molecule screen. It then provides the library of solvation groups used to automatically set up the small molecules in the input mol2 file.
While using the ABSINTH DMFI, the question arises how to assign solvation groups in unknown molecules. While this problem needs to be solved fundamentally at a different level, the technical solution is provided by this input file. It specifies a library of solvation groups with integer codes, reference free energies of solvation and two sets of atom numbers. The first number of atoms is the minimum expected number of atoms in a solvation group of this type while the second number is the number of polar hydrogen atoms.
Example:
---------------------------------------
1 -14.0 6 0
2 8.0 3 0
3 -22.4 3 1
---------------------------------------
This input file would define a library of 3 solvation groups. The value for the reference free energy of solvation is assumed to be in kJ/mol. The groups can then be used in the mol2 input file. This works by utilizing the substructure index and substructure name columns of the mol2 file. The index should refer to the index in the library file and provides the main assignment. The name column should be filled with integers giving the index per molecule of a given group. This is required to be able to distinguish cases where the same solvation group occurs two or more times in the same molecule.
Note that the other parameters that are normally patchable (atomic volume reduction factors, maximum solvent accessible volume fractions, etc) are all determined automatically in a more elaborate manner than is normally done for simulations of unsupported residues.
Section 4: Files defining auxiliary terms to the energy function
(back to top)
CMAP input files:
Keyword: FMCSC_CMAPDIRRequired for: Use of CMAP or similar corrections → FMCSC_SC_BONDED_M
CMAP corrections are two-dimensional correction energy surfaces introduced in the CHARMM force field that operate on the φ/ψ-surfaces of polypeptide residues. From input files, a 2D map is constructed and serves as an additional energy term that is made smooth and continuously differentiable via suitable interpolation. The names of the files containing these maps have be put directly into the parameter file (see charmm.prm for an example). This is detailed elsewhere. With appropriate choices for the filenames and the directory they reside in (→ FMCSC_CMAPDIR), it is possible to utilize this functionality as a general bias or correction potential even for force fields (parameter files) that do not natively contain such corrections.
Torsional bias potentials input file:
Keyword: FMCSC_TORFILERequired for: Biasing potentials acting on individual polypeptide or polynucleotide, rotatable dihedral angles → FMCSC_SC_TOR
The specification of simple torsional bias potentials is possible through either global or residue-specific input. The point of these potentials is usually to restrain the polymer to some specific conformation with the most common application being the restraint to the conformation provided by a structural input file. The first line has to contain the character 'G' (Global) or 'R' (Residue-specific) to distinguish between the two.
The actual input starts in the second line and is interpreted as follows:
'G'-mode:
An integer code with a defined number of parameters. Modes 1 and 3 specify harmonic restraint potentials and modes 2 and 4 are used for Gaussian well potentials.
-
VTOR = Σi ki·(ϑi-ϑi0)2
Here, "i" runs over most of the torsions which are rotatable in Monte Carlo or torsional dynamics calculations. Dihedral angles in unsupported residues are supported (up to a maximum number of 12), while dihedral angles in supported residues, which are not sampled natively by CAMPARI, are excluded. The sugar bond in nucleotides and the φ-angle in proline are included, whereas any other pucker degrees of freedom are not. For details, see FMCSC CARTINT, FMCSC_SEQFILE, and FMCSC DYNAMICS.
To make the file format general, 12 equilibrium positions (ϑi0) and 12 force constants (ki) have to be provided. Entries 7-12 will address χ1-6 for all supported residues. For any unsupported residues of recognized polymer type, these entries will be reserved for the unclassified dihedral angles, which - by definition - are usually equivalent to the χ-angles. Nucleotides will have the (up to) five freely rotatable backbone angles (see FMCSC_SEQFILE) in entries 1-5 and the dihedral around the sugar bond (C4*-C3*) in entry 6. Polypeptides will have ω-, φ-, and ψ-angles in entries 1-3 (entries 4-6 are unused). Supported small molecules will generally not use anything in entries 1-6 except secondary amides which have ω-angles in entry 1. Unsupported residues of unrecognized polymer type (including small molecules) utilize all 12 entries for their respective dihedral angles. The counting is by Z-matrix line in ascending order. This means that degrees of freedom in unsupported residues of unsupported polymer type going beyond the 12th dihedral angle or in unsupported residues of supported polymer type going beyond the 6th dihedral angle cannot be biased this way. Using this feature for unsupported residues requires a good understanding of the content provided for FMCSC PDBFILE and FMCSC PDB_TEMPLATE and knowledge of the Z-matrix format.
Lastly, dihedral angles defined by sulfide bridges are supported even though they are not native CAMPARI dihedral angles. The first cysteine (in sequence) will use entries 7-9 for the χ1-angle, the Cα-Cβ-S-S angle, and the Cβ-S-S-Cβ angle, respectively. The second cysteine will use entries 7-8 for its χ1- and Cα-Cβ-S-S angles. The 12 equilibrium positions have to be listed first followed by the 12 force constants for a total of 25 columns (the first one being occupied by the mode identifier).
-
VTOR = -exp(-Σi (ϑi-ϑi0)2 / (2.0σi2) )
This potential uses an identical file format except that the standard widths σi are read in the latter 12 entries rather than the harmonic force constants. VTOR creates Gaussian wells (in multiple dimensions) for each residue. It can be used to subtly favor certain regions of phase space without introducing any stringent restraints. The formula implies a unit factor such that the conformation perfectly matching the ϑi0 will give a favorable energy of FMCSC_SC_TOR kcal/mol.
-
VTOR = Σi ki·(ϑi-ϑi,struc0)2
The only difference to mode 1 is that the ϑi,struc0 are obtained directly from the simulations initial configuration (e.g. from structural input) and not from this input file. Hence, only 12 entries are needed with the necessary force constants ki. Note that this does not work in trajectory analysis mode, i.e., the wrong minimum positions will be used.
-
VTOR = -exp( -Σi (ϑi-ϑi,struc0)2 / (2.0σi2) )
To mode 2 what mode 3 is to mode 1.
Example:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
G
1 0.0 -57.0 -47.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.02 0.02 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This input file would turn on a biasing potential (stiffness: 0.02 kcal/(mol· deg.2)) biasing the backbone torsional angles of all polypeptide residues toward the ideal αR-geometry. It would also bias all present polynucleotide residues to the same conformation in their respective backbone degrees of freedom #2 and #3. It would not affect any polypeptide termini or small molecules.
'R'-mode:
Largely, this format agrees with the global specifications. The only differences are that:
- An extra column is needed giving the residue number for which the torsional bias subsequently specified on the same line is to be turned on.
- Multiple entries can be provided, each for an individual residue. Note that it is also possible to override (not(!) append) the settings for an
individual residue within the same file.
Example (let us assume residue 3 is a lysine and residue 12 a uracil ribonucleotide (RPU):
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
R
3 1 0.0 -57.0 -47.0 0.0 0.0 0.0 120.0 70.0 0.0 0.0 0.0 0.0 0.0 0.02 0.02 0.0 0.0 0.05 0.05 0.0 0.0 0.0 0.0 0.0
12 4 30.0 30.0 30.0 30.0 30.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This input file would turn on biasing potentials for two residues - #3 and #12. There would be a harmonic restraint biasing the backbone conformation of LYS3 to that of an ideal αR-helix. A stronger bias (ki = kcal/(mol· deg.2)) would bias the value of χ1 and χ2 to values of 120.0° and 70.0°, respectively. In contrast, RPU12 would experience a Gaussian bias favoring the backbone conformation it initially assumes (excluding the sugar bond) with a general width of 30.0°.
Polymeric biasing potentials input file:
Keyword: FMCSC_POLYFILERequired for: Biasing potentials on polymeric properties of molecules → FMCSC_SC_POLY
Polymeric biasing potentials (see FMCSC_SC_POLY) are a feature in CAMPARI that requires some expertise on the user's end, because the resultant energy landscape for a restrained molecule can be very rugged. The potentials apply a harmonic restraint to at most two polymeric descriptors: 1) a transform of the radius of gyration meant to map values to the unit interval over a wide range of chain lengths and conditions; 2) a measure of asphericity. This input file allows the user to request polymeric restraints on a per-molecule basis or on a per molecule-type basis. The first line has to contain the character 'M' or 'T' to distinguish between the two.
The actual input starts in the second line and is interpreted as follows:
'M'-mode:
An integer code corresponding to the target molecule (numbered as in the sequence file) with four mandatory parameters, specifying the harmonic restraints on the two parameters:
t = <f1·(f2·(Rg/Lc))f3/N0.33>
δ = 1.0 - 3.0·(λ1·λ2 + λ2·λ3 + λ1·λ3)/(λ1 + λ2 + λ3)2
Here, the fi are arbitrary factors meant to achieve the desired mapping to the interval [0,1] (values are 2.5, 1.75 and 4.0). The radius of gyration is denoted by Rg, Lc is an estimate of the contour length of the chain, and N is its sequence length. The λi are the eigenvalues of the gyration tensor (→ elsewhere for additional information).
VPOLY = kt·(t - t0)2 + kδ·(δ - δ0)2
The four floating point numbers required are: t0, δ0, kt, kδ. Both t0 and δ0 have to lie within the interval [0:1] as they are (pseudo-)normalized quantities.
The stiffnesses are simply (twice) the spring constants of the two harmonic terms and should be given in kcal/mol as both order parameters are unitless.
Example:
---------------------------------------------------------------
M
1 0.8 0.5 10.0 100.0
13 0.2 0.1 100.0 0.0
---------------------------------------------------------------
This input file would turn on a biasing potential driving molecule #1 toward a value of t of 0.8 (stiffness 10 kcal/mol and toward an asphericity of 0.5 (stiffness 100 kcal/mol). Conversely, molecule #13 would be driven toward a value of t of 0.2 with stiffness of 100 kcal/mol with no restraint on δ. It is very important to note that asphericity and t are not independent. Hence, restraining both of them to a pair of target values that are mutually exclusive will simply lead to a competition between the two force constants (aided by the underlying Hamiltonian and entropy); a scenario of little practical use. Furthermore, it should be emphasized that t is a pseudonormalized quantity and that its accessible range will be somewhat ill-defined for systems of extreme size (very small, very large) and/or of extreme conformation (fully extended rods, collapsed points). To a lesser extent this is even true for δ, which by virtue of polymer geometry (finite width) will prohibit values rigorously approaching zero or unity for finite length chains.
'T'-mode:
An integer indicating corresponding to the target molecule type as determined by the first occurrence in the sequence file. Otherwise the input is treated the same.
Note that these potentials will never be disabled even if the molecule in question is practically rigid. In conjunction with the energy function and degrees of freedom this can easily lead to extreme strain (e.g. when using Cartesian dynamics), which may compromise simulation stability.
Tabulated potential code requests:
Keyword: FMCSC_TABCODEFILERequired for: Use of tabulated interactions on interatomic distances → FMCSC_SC_TABUL
This file allows the specification of specific interactions for tabulated potentials to apply to. It works in almost exactly the same way as the one requesting general distance distribution analysis (see detailed format description). The major difference is that the tabulated potential mappings ignore analysis groups and only partition by molecule types (relevant for input modes 2/3).
The two functionalities naturally use the same framework. The requests are in fact identical because they efficiently allow the user to specify atom pairs or sets of pairs to become associated with an integer code corresponding to either a specific interaction potential (here) or to a separate distance distribution analysis request (when the request is made by FMCSC_PCCODEFILE). The only other major difference is that inconsistent numbering of the tabulated potentials will not be tolerated, as it would require a nonsensical file for the actual potential input (see FMCSC_TABPOTFILE). It is important to keep in mind that mistakes are very easily made in an input file this flexible. Hence, there is an output file summarizing all the terms requested with additional information (see documentation on TABULATED_POT.idx, which is enabled by keyword FMCSC_TABREPORT). Finally, from an implementation point of view, it must be mentioned that the type of input specification (modes 1-3 are available, see detailed format description) has no impact on the speed of the calculation. The requests are always transformed into identical data structures, which are queried efficiently during energy and force evaluations.
Tabulated potential input file:
Keyword: FMCSC_TABPOTFILERequired for: Use of tabulated interactions on interatomic distances → FMCSC_SC_TABUL
A file with n+1 columns, where n is the highest integer code of a unique potential requested in FMCSC_TABCODEFILE. The number of rows is arbitrary and will be determined by the program. Note, however, that the potentials have to be stored in memory.
The first column contains the distance information (only regularly spaced bins are allowed) and should cover the distance range available to the specific atom pairs (only one distance spectrum possible for all potentials).
The i+1th to i+(n+1)th columns then contain the n individual potentials according to their integer code in units of kcal/mol (i.e., if the index file indicates that atoms 3 and 25 interact through potential 8, then the 9th column of the potential file gives the tabulated potential used for this atom-atom pair).
Example:
----------------------------------------------
2.5 10.0 10.0
3.5 10.0 10.0
4.5 10.0 0.0
5.5 0.0 0.0
6.5 10.0 10.0
7.5 10.0 10.0
----------------------------------------------
Note the regular distance spacing.
The possible integer codes to be used in the index file would hence be 1 or 2, anything else will create an error. Also note that the code does cubic Hermite spline interpolation on the potentials, i.e. provides a continuous and smooth function. The use of a cubic spline is the reason that it is recommended to explicitly taper off the potential to a constant value at the limits of the distance range considered. The inclusion of the two bins at 2.5 and 7.5Å is made for this purpose, since otherwise the tangent at 3.5 and 6.5Å would have been set to 0.0. Correspondingly, distance values outside of the limits are treated just like the last available bin (i.e., in the example above distance values of 2.0 or 50.0 would both give a potential value of 10.0 kcal/mol). Tangent data can be supplied in a separate file (see below).
Tabulated potential derivatives (tangents) input file:
Keyword: FMCSC_TABTANGFILERequired for: Use of tabulated interactions on interatomic distances → FMCSC_SC_TABUL if it is desired that the derivatives at the tabulated points are not estimated numerically
A file with n columns, where n is the highest integer code of a unique potential requested in FMCSC_TABCODEFILE. The number of rows is arbitrary and will be determined by the program. Note, however, that the potentials and tangents have to be stored in memory.
The columns contain the n individual potential derivatives according to their integer code in units of kcal·mol-1·Å-1 (i.e., if the index file indicates that atoms 3 and 25 interact through potential 8, then the 8th column of the potential derivative file gives the tangents to be assumed for this tabulated potential and atom-atom pair). Note that the required derivative is that with respect to the (increasing) distance coordinate. The potentials themselves are given in a different input file.
Example:
----------------------------------------------
0.0 0.0
0.0 -5.0
-10.0 -2.0
0.0 2.0
10.0 5.0
0.0 0.0
----------------------------------------------Note that CAMPARI takes the information about how many tabulated potentials are present exclusively from the potential input. It is therefore up to the user to ensure that the appropriate number of columns is present in this input file. If this file is incomplete or absent, the missing tangent values are constructed numerically from the potentials themselves. It is not possible to provide values starting from a distance value other than the first one in the tabulated potential input file. The provided derivatives are not checked in any way and taken strictly as is.
Distance restraint potential input file:
Keyword: FMCSC_DRESTFILERequired for: Biasing potential to introduce distance restraints between pairs of atoms or position restraints for individual atoms → FMCSC_SC_DREST
The first line has to contain only a single integer, viz, the number of restraints in the following list.
The actual input starts in the second line, has five mandatory columns, and is interpreted as follows:
- An integer giving the index of the (first) reference atom.
- An integer that is either -1, -2, -3, or the index of the second reference atom. In the latter case, a distance restraint is requested while in the former case a position restraint on the x (-1), y (-2), or z (-3) axis is created.
- A type indicator specifying the type of restraint potential to act on the distance of these two atoms or on the position of the first atom.
- The target distance or position in Å (this is referred to as R0 below)
- The restraint strength (i.e., the the parameter km below) in kcal·mol-1·Å-2 (half the force constant for the implied Hookean spring).
- VDREST(m) = km·(Reff - R0)2
- VDREST(m) = km·H(Reff - R0) (Reff - R0)2
- VDREST(m) = km·H(R0 - Reff) (Reff - R0)2
Example:
---------------------------------------
3
11 45 1 10.0 5.0
22 56 2 15.0 10.0
44 -1 3 -2.0 10.0
---------------------------------------
This would add one position and two distance restraints to the Hamiltonian. The first would act between atoms 11 and 45 and restrain their distance to a target position of 10.0Å with a km of 5.0 kcal·mol-1·Å-2. The second would act between atoms 22 and 56 and harmonically restrain their distance if it exceeds the target maximum of 15.0Å with a km of 10.0 kcal·mol-1·Å-2. The third potential would be a position restraint acting on atom 44 in the x-dimension and come into play as soon as the absolute x-coordinate of atom 44 drops below the target minimum of -2.0Å. In the cylindrical system above, all three restraints above are probably safe to use (at ambient temperature), This is because the position restraint is not in a periodic dimension and the two distance restraints are stiff and bound to be within a maximum distance that should always be shorter than half of the shortest (here, only) periodic dimension.
Negative force constants are converted to positive ones (the negative potential is not allowed). Note that negative distances are disallowed for obvious reasons. Furthermore, harmonic restraints of this type can result in a highly frustrated energy landscape if the joint minimum restraint energy corresponds to a conformation with steric clashes in the presence of excluded volume terms. The large forces tend to degrade integrator stability, and this is therefore particularly harmful in gradient-based sampling approaches. Users should keep in mind that it is possible to specify multiple restraint terms for the same atom pair or multiple position restraints for the same atom, and that these restraints are never subjected to cutoffs (unlike tabulated potentials, which offer a more general functionality for custom potentials on interatomic distances). Finally, in a small molecule screen, it is possible to prescribe a list that references all putative atoms defined by the largest tolerated molecule. These lists are pruned and restored accordingly for every screened molecule. This can be useful to implement generic potentials directing a molecule toward specific target residues (most likely using half-harmonic functional forms), but it would generally be easier and more appropriate to rely on extracting such restraints automatically from the coordinates of a reference molecule instead. This function is performed by keywords FMCSC_MOL2DRESTMODE and FMCSC_MOL2DRESTBUF, and in this case this input file might be absent entirely.
Compartmentalization potential input file:
Keyword: FMCSC_OSMOFILERequired for: Assigning residues to individual system compartments defined by inner boundaries → FMCSC_SC_OSMO (see this paper for a reference application)
The first line has to contain only a single character indicating the mode (as usual, 'R' means that indices refer to residues, 'M' means that indices refer to molecules, and 'T' means that indices refer to molecule types; all in order of appearance in sequence input).
The actual input starts in the second line and is interpreted as follows:
Two mandatory integers, the first listing the species index (residue, molecule, or molecule type in the order defined by sequence input), the second listing the compartment index. The number of available compartments is defined by keyword FMCSC_OSMO_MODE. Currently, the compartments are systematically numbered as follows:
- Even-numbered compartments have z-coordinate values less than that of the geometric center of the simulation container, which coincides with the position of the inner compartment boundary. Odd-numbered compartments have z-coordinate values greater than the boundary position.
- If FMCSC_OSMO_MODE is at least 2, compartments 3, 4 (and possibly 7, 8) have y-coordinate values less than the xz-plane boundary position. All other compartments have y-coordinate values larger than this position.
- If FMCSC_OSMO_MODE is 3, compartments 1-4 and 5-8 have x-coordinate values greater than and less than the yz-plane boundary position, respectively.
Example:
---------------------------------------
T
1 1
2 2
---------------------------------------If FMCSC_OSMO_MODE is 1, this would restrain all ethanol molecules to the upper z-compartment and all methanol molecules to the lower z-compartment. If, in this example, the simulation container is a sphere, the shape of the two partitions would be two symmetric half-spheres. The water molecules would be free to move around the entire sphere (the default of "0" implies this). Importantly, the above input file would lead to a different result if FMCSC_OSMO_MODE is not 1 because this would change the size, number, and shape of partitions.
Density restraint input map:
Keyword: FMCSC_EMMAPFILERequired for: Biasing potential to introduce a global, spatial density restraint on a simulation density derived from an atomic property such as mass → FMCSC_SC_EMICRO
This input file has to be provided in a specific file format in binary form. The NetCDF library allows a reasonably efficient storage of array data in binary form while maintaining a flexible and adaptable interface. For instance, trajectory output can also be written in NetCDF format. The file itself is binary, but the library comes with tools to convert appropriately formatted files from human-readable versions. For further details, users are referred to the NetCDF documentation.
The format itself is similar to that used by UCSF Chimera when writing density information in NetCDF format. Specifically, it should have the following fields defined:
- Dimensions of spatial (the dimensionality itself) and X, Y, and Z (for the case of a three-dimensional input density, which is the only case currently supported), where the latter three dimensions give the numbers of grid cells in the respective dimensions
- A variable deltas with an attached attribute of units; this is an array of size spatial and type float, and the default unit is Å
- A variable densities with an attached attribute of units; this is a three-dimensional array of size Z,Y,X and type float, and the default unit is g/cm3
- Optional variables densities2 densities3, etc., that have an attached attribute of units; just like densities, these are three-dimensional arrays of size Z,Y,X and type float. They are only used if keyword FMCSC_EMPROPERTY is set to 3.
- Optional, global attributes xyz_origin and xyz_step, both (currently) requiring the specification of three floating point numbers in double precision (xyz_origin is completely optional and xyz_step can substitute for variable deltas (and vice versa))
- An optional variable origin with an attached attribute of units that is also an array of size spatial and type float with a default unit of Å
- The data section has to contain the appropriate number of entries for densities and also for deltas in case xyz_step is not defined
Once the input density map is read, it is processed, i.e., interpreted quantitatively, to yield the physical input density Ξijk for a given lattice cell with indices i, j, and k. This is done separately for each density variable. The details of this linear transformation are described elsewhere, and rely on several keywords associated with the density restraint potential, most notably EMTHRESHOLD and EMTOTMASS. The physical input density or densities (the interpreted map is written to output file DENSITY_INPUT_PHYS.nc) are compared to the simulation densities to yield the density restraint potential, EEMICRO as described elsewhere. The computation of the simulation densities is desribed in detail for keyword FMCSC_EMCALC.
Density restraint custom property:
Keyword: FMCSC_EMPROPERTYFILERequired for: Collecting spatial density distributions and/or using bias potential to introduce spatial density restraints on user-defined, atomic properties → FMCSC_EMCALC and FMCSC_SC_EMICRO
Using keyword FMCSC_EMPROPERTY, it is possible to choose simple atomic properties to collect density data for and/or restrain densities of. These simple properties are read from the parameter file (mass, proton mass, and partial charge, options 1-3). This lacks flexibility as these values cannot in general be modified without affecting other aspects of a simulation. This input file is meant to solve this issue by allowing the user to specify custom properties per atom, which are, per se, completely independent of any atomic properties used elsewhere by CAMPARI. When assuming finite background densities, there continues to be a reliance on atomic volume parameters, however. The input consists of a given number of lines (at most as many as there are atoms in the system), each specifying an integer (the index of the atom in CAMPARI's internal numbering), and one or more values for the atomic property or properties. Each column after the first one thus defines a specific property, and only atoms having nonzero values for at least one of these properties need to be listed.
Properties that take on only zero or positive values are interpreted in a slightly different manner from properties that take on both positive and negative values. For the former, the background level is a lower limit whereas for the latter it is usually closer to a median or mean. For the latter class, values from different atoms can offset each other and lead to an apparent property density of zero for a given grid cell even though atoms with finite values for this property populate the very same cell. It may thus sometimes be easier to split negative and positive values into two separate properties and thus densities to restrain to a target input map.
Example:
---------------------------------------
2 0.0 0.5 2.5
44 0.0 -1.7 3.5
121 1.5 1.2 1.1
---------------------------------------The above input would mean that only 3 atoms contribute to 3 separate spatial density restraints with the first and third (columns 2 and 4) being of the positive definite type and the second of the other type (column 3). This is clearly not a typical situation. The NetCDF binary input file would have to have 3 separate density fields called "densities", "densities2", and "densities3", and they would apply to columns 2, 3, and 4, respectively. CAMPARI will produce a report regarding the read-in values. The presence of more than one property also means that certain keywords should take multiple input values, notably FMCSC_EMTHRESHOLD, FMCSC_EMTOTMASS, FMCSC_EMTRUNCATE, FMCSC_EMBGDENSITY, FMCSC_EMBACKGROUND, and FMCSC_EMFLATTEN.
Section 5: Files relevant for global behavior of analysis functionality
(back to top)
Analysis group input file:
Keyword: FMCSC_ANGRPFILEThis input file allows the user to override the default parsing the software does for analysis purposes where data for molecules of identical type are often pooled (see documentation of output files). In addition, it allows tagging of analysis groups as solvent which has consequences for some of the analyses. The first letter in the file indicates the mode:
'T' or 't':
(molecule type-based selection):
Obviously, it is impossible to parse finer than by molecule type using this mode. The sole purpose here is solvent-tagging (by default, all single-residue molecules are tagged as solvent molecules):
----------------
XXX
----------------
XXX is a positive integer (solute) or negative integer (solvent) and the row number corresponds to the molecule type number by sequence of occurrence (see SEQFILE). Note that it is not possible to pool molecules of different type into the same analysis group, and that the actual group numbers provided in this file are irrelevant (beyond their sign).
'M' or 'm':
The format is identical but row numbers now indicate the molecule number instead of molecule type number (again, by sequence of occurrence). This format allows breaking of molecules of the same type into separate analysis groups which is useful (and required from a statistical mechanics point of view) anytime one or more of the molecules of a specific type are made unique by the application of restraint or other bias potentials, by an asymmetric system setup, etc ... Furthermore, this can be useful for error analyses even if the molecules are rigorously identical. The number provided by XXX will - as before - indicate solute/solvent-identity via its sign but each desired unique group now requires also a unique integer (at least as many as there are molecule types).
Input files to provide snapshot index sets for trajectory analysis mode:
Keyword: FMCSC_FRAMESFILEPrerequisite: Trajectory analysis mode → FMCSC_PDBANALYZE
This input file allows analysis runs to operate only on an arbitrary subset of an input trajectory. In addition, it allows the user to provide floating-point weights for each snapshot in the trajectory. This can be useful if a trajectory is analyzed, which was obtained using methodology that does not yield a well-defined statistical ensemble. In such a case, it is common to apply reweighting techniques of the weighted histogram variety that may yield per-snapshot weights. In order for the analysis features of CAMPARI to still be useful, it will then be necessary to deviate from the assumed unweighted (Boltzmann) averaging.
Therefore, this file accepts two alternative formats (per row). If a single integer is specified, the corresponding frame (numbered sequentially in the input trajectory and starting at 1) is added to the list of frames to analyze with a weight of unity. If an integer and a real number are provided, the frame specified by the integer is added to the list of frames to analyze with a weight given by the real number. If a file format is used that allows nonsequential access of trajectory snapshots, and if floating point weights are absent, the list thus obtained is processed "as is", which allows reordering of trajectories and the use of duplicate frames. In all other cases, the list is first ordered, and duplicate frames are removed (see elsewhere for details on this).
One exceptional use case of this file is given when the data are read directly from a PostgreSQL database (→ FMCSC_PDB_FORMAT is 6) and the option FMCSC_PSQL_R_TRACKBACK is enabled. Then, there must be exactly one integer in the file (no weight) that is the starting point for tracking back a trajectory by its actual history. This is akin to creating the list of input frames dynamically and relies on the the standard that is explained elsewhere (fields "previous_id", "previous_rank", and "previous_snap" in the snapshot tables). The expected number (or upper limit) of frames should be set correctly using FMCSC_NRSTEPS. If not, the run will terminate earlier, which can damage the integrity of some reported output properties. Importantly, this option is available on a per-replica basis in a parallel trajectory analysis run using the replica-exchange setup, see documentation for FMCSC_PSQL_R_TRACKBACK.
Section 6: Files relevant to specific analysis routines
(back to top)
Backbone segment distribution datafile:
Keyword: FMCSC_BBSEGFILERequired for: Analysis of φ/ψ-based backbone segment distributions → FMCSC_SEGCALC
A36x36 (i.e., 10 degree-spacing) Φ/Ψ-map indicating regions by integer codes that (can) form regular polypeptide secondary structure.
1 = β
2 = PII (Polyproline Type II Helix)
3 = Unusual Region ("Pass")
4 = αR (Right-Handed α-Helix)
5 = Inverse C7 Equatorial (γ'-Turn)
6 = Classic C7 Equatorial (γ-Turn)
7 = Unusual Region (Helix with 7 Residues per Turn)
8 = αL (Left-Handed α-Helix)
The file reproduces the visual picture a Ramachandran map gives for L-polypeptides and is read in accordingly. There are two default versions ("bbseg.dat" and "bbseg2.dat") in the data directory. The first one is a little tighter in its definitions (in particular PII and αR) while the second one is more lenient.
Input file with trace of parallel simulation run for analysis in post-processing:
Keyword: FMCSC_TRACEFILERequired for: (Un)scrambling of trajectories in parallel trajectory analysis runs on sets of trajectories generated for example by replica exchange simulations → FMCSC_REMC
Alternatively required for: Automatic adjustment of conformational space network in analyses relying on structural clustering to reflect reseeding history of a PIGS calculation
Any number of lines giving a total of Nnodes+1 integers, where Nnodes is the number of different replicas. The numbers in the first column have to be increasing monotonously, and the number in the remaining columns have to be such that integers are bound to the interval from 1 to Nnodes. In addition, if the run is meant to unscramble a replica exchange trajectory, every integer in the interval from 1 to Nnodes has to occur exactly once on every line. A failure to comply with this requirement will cause CAMPARI (MPI) to crash or hang.
In parallel trajectory analysis runs, this file is interpreted as a running map of some initial conformation with a set of conditions based on the assumption that swaps have occurred in the process (simulation) generating the input trajectories. Then, the desired outcome of using this functionality is to reorganize the set of trajectory data such that we obtain trajectories that are continuous in geometry. The most common application for this would be to "unscramble" replica exchange trajectories.
The provided step numbers (first column) are related to the frames in the input trajectories via keywords FMCSC_RE_TRAJSKIP and FMCSC_RE_TRAJOUT. The assumption is that for a given step, swaps have occurred first, then trajectory files were appended. In order for the procedure to work as intended, the current map for every snapshot in the input trajectories has to be present. It does not matter if there is more information, or whether the step numbers are exactly matched with FMCSC_RE_TRAJOUT.
Example:
30 4 3 2 1
60 1 3 2 4
90 1 2 3 4
120 1 4 2 3
Let us furthermore assume that FMCSC_RE_TRAJSKIP is 0, FMCSC_RE_TRAJOUT is 50, and there are 10 frames in the trajectories. The number of replicas obviously has to be 4. Then, the above file implies that for the first snapshot, CAMPARI assumes a mapping of "4 3 2 1" meaning that the first snapshot in the trajectory starting with "N_000_" (corresponding to the first replica), will end up being transferred to (and eventually analyzed and/or written by) the fourth replica. This is because the mapping current at the 30th step is, based on the file, the only one CAMPARI can assume current also for the 50th step, i.e., the first snapshot in the trajectories. By the same logic, the second snapshot would use the mapping "1 2 3 4" (from step 90), and all further snapshots would use the mapping "1 4 2 3" (from step 120). This implies that the second line in the input file is redundant (but does no harm). If a snapshot is computed to correspond to a step number smaller than any step number listed in this file, the default mapping is assumed (here,"1 2 3 4").
In serial trajectory analysis runs or parallel trajectory analysis runs in MPI averaging mode, this input file is relevant only if structural clustering is used. It is then used to correct the snapshot-to-snapshot connectivity map based on the knowledge of the history of reseeding protocol, e.g. from a PIGS calculation. Thus, the interpretation of the input is exactly as described for the PIGS output trace. This differs from the previous example in that the map in columns 2 to Nnodes+1 simply indicates a reseeding by reference to the index of the replica (the same integer can occur multiple times).
Example:
30 1 2 3 4
60 1 2 3 4
90 3 3 3 4
120 1 2 3 4
If we assume that FMCSC_RE_TRAJSKIP is 0, FMCSC_RE_TRAJOUT is 1, FMCSC_RE_TRAJTOTAL is 125, and FMCSC_NRSTEPS is 500 (4 concatenated trajectories of 125 snapshots each → FMCSC_TRACEFILE for details), then the only two reseeding events are seen at step 90 of the parallel simulation. Specifically, after completion of step 90 and writing to trajectories, the snapshots in the first and second replicas at replaced with that of the third replica at step 90 (see elsewhere for details). Unlike in the prior example, the line at step #120 does not imply any change in structure. Thus, the input file in this example will modify the extracted transition network of structural clusters as follows.
- The (directional) transition counts between clusters connected by snapshots 125 and 126, snapshots 250 and 251, and snapshots 375 and 376 will be decremented by 1 (concatenation points).
- The (directional) transition counts between clusters connected by snapshots 90 and 91 and snapshots 215 and 216 will be decremented by 1 (reseedings).
- The (directional) transition counts between clusters connected by snapshots 340 and 91 and snapshots 340 and 216 will be incremented by 1 (reseedings).
Suitable input files for this keyword (FMCSC_TRACEFILE) are created by CAMPARI itself during both replica exchange runs (keyword FMCSC_RETRACE enables the generation of output file N_000_REXTRACE.dat) and PIGS runs (the same keyword enables the generation of output file N_000_PIGSTRACE.dat). Note that it is quite tedious to map step numbers of an original parallel simulation run to the analysis of a single concatenated trajectory. For the read-in of the trace to function correctly, keywords FMCSC_RE_TRAJTOTAL (only for PIGS), FMCSC_RE_TRAJSKIP, and FMCSC_RE_TRAJOUT are all required. Note that this mapping to original step numbers is lost if a file with trajectory frames in random access mode is used (here, instead the step numbers become the line numbers of this file). This option (or the use of FMCSC_FRAMESFILE in general) is only supported in serial analysis run (i.e. not using MPI parallel analysis).
Requests for general PC analysis input file:
Keyword: FMCSC_PCCODEFILERequired for: Specific interatomic distance distribution (or pair correlation) functions or instantaneous output thereof → FMCSC_PCCALC and FMCSC_INSTGPC
Generalized pair correlations / distance distributions (or arbitrary application of tabulated potentials → TABCODEFILE) in systems with many atoms can be quite complicated and this input file tries to minimize the effort needed on the user-end to reward as diverse requests as possible. The file works in either of currently three modes listed below. Note that the input format also somewhat corresponds to the memory structure used to store the request.
Mode 1:
The first line specifies the mode and the number of listed specific atom-atom entries. These entries consist of the numbers of two atoms and with the integer code for the GPC component to which this atom-atom distance distribution should contribute. This specific list format is trivial to master but very limited when lots of analogous requests are wanted. It is probably best used in highly specific systems (such as a single polypeptide).
Example:
Suppose you study a peptide with three lysine residues and want to know the distribution of the three neutralizing counterions (atoms 301, 302, 303 in the system). Suppose the lysine sidechain nitrogens are atoms 27, 48, and 155, then the appropriate input file is as follows:
------------------------
1 9
27 301 1
27 302 1
27 303 1
48 301 1
48 302 1
48 303 1
155 301 1
155 302 1
155 303 1
------------------------
Mode 2:
Similar to mode 1, only that the list-format is interpreted to be a request to treat all molecules in the same analysis group analogously. Atom numbering must now refer to the first molecule in its group if the two molecules are different. If the atoms belong to the same molecule and the molecule is the first in its group, then the request is honored as an intramolecular distance distribution. To request intermolecular distance distributions for molecules in the same analysis group, the numbers should correspond to the first and second instance of this molecule type in the group. All other requests are ignored.
Example:
Take the same example, only that there is a high concentration of background salt. Using mode 1, the file with the same request would become fairly long. Also suppose that you want to know the chloride-chloride distribution function, and the sodium-chloride and sodium-sodium distribution functions. Let us assume the first two sodium atoms are 401 and 402 and that analysis groups correspond to molecule types (the default). Then, this is the appropriate input file (note it is actually shorter and now requests four unique distance distributions):
-----------------------
2 6
27 301 1
48 301 1
155 301 1
301 302 2
301 401 3
401 402 4
-----------------------
Mode 3:
This is a slightly different mode which honors requests in form of atom-atom matrices. The main disadvantages of this input format are that its sanity can easily be corrupted and that large molecules are difficult to handle. The idea is for the user to supply a list of analysis group-to-analysis group requests using header lines of "T i j" (intermolecular between types i and j) or "t i" (intramolecular within type i. The main advantage is that now explicit referencing to atom numbers is needed and that it can accomplish fairly complicated requests in a very compact format. This is best explained in an example.
Example:
Suppose you have a mix of SPC water (analysis group #1: 3 atoms) and urea (analysis group #2: 8 atoms) and want to know a variety of possible site-site distribution functions to characterize solution structure. Then an appropriate input file would look as follows:
--------------------------------------------------
3
T 1 2
1 0 2 2 3 3 3 3
0 4 0 0 0 0 0 0
0 4 0 0 0 0 0 0
T 1 1
5 6 6
0 7 7
0 0 7
T 2 2
0 0 0 0 0 0 0 0
0 0 0 0 8 8 8 8
0 0 0 0 9 9 9 9
0 0 0 0 9 9 9 9
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
--------------------------------------------------
Note that the first matrix is asymmetric (first type always determines number of rows). Here we request to measure the water O to urea C (#1), N (#2), and H (#3), and the water H to urea O (#4) distribution functions. The other two matrices are intermolecular for molecules in the same group and are by definition symmetric. Note that only the upper right half is read; the numbers must all be there, however. We request all possible water-water correlation functions (O-O, O-H, and H-H in #5, #6, and #7) and the O-H (#8) and N-H (#9) distribution functions for urea-urea. Of course it is still necessary to know the order of atoms within a molecule type.
Notes:
Inconsistent numbering in the components is removed and announced through log-output (i.e., components might be renumbered). Nothing stops a user from supplying illogical requests (i.e., for example to omit some of the 9's in the urea-urea matrix in the last example) since the code makes no assumptions about molecular symmetry. Furthermore, in particular for modes 1 and 2, nothing stops the user form supplying an atom pair more than once which can give rise to arbitrary weights when several pairs are pooled together. This is potentially very misleading and should be kept in mind when encountering surprising results. It is worthwhile to keep in mind that modes 2 and 3 can easily encompass a very large number of distances, which can make the analysis costly and instantaneous output very large. Also remember that analysis groups can split a molecule type into multiple groups but never combine multiple types into a single group, which is why the above input modes 2 and 3 continue to work in all circumstances. Lastly, CAMPARI will prohibit pooling data from inter- and intramolecular distances into the same component. This is because intermolecular distances are directly transformed to pair correlation functions whereas intramolecular ones are not. In any event, it is always recommended to double-check this input file.
The related functionality of reading in a map for tabulated potentials is almost identical to the one for generalized distance distribution analyses. The major differences lies in the fact that tabulated potentials do not respect analysis groups, i.e., they will always assume molecule type partitioning in modes 2 and 3.
Input file for tabulated Bessel functions:
Keyword: FMCSC_BESSELFILERequired for: Computation of predicted fiber diffraction patterns (preliminary implementation) → FMCSC_DIFFRCALC
This file should have a header line with two integers and a floating point number specifying the number of Bessel functions (order, "N"), the number of distance bins ("M"), and the distance resolution the actual file then has N columns and M rows of floating point numbers to supply the tabulated Bessel functions. Note that the first distance bin is assumed to be exactly zero. Also note that this file is very easily corrupted and that its sanity has to be ensured by the user. The file delivered with CAMPARI is called "BesselFunctions.dat" in the data/ directory and tabulates Bessel functions up to an order of 500 for 2001 distance bins. The latter corresponds to a maximum argument value of 500.0, as the distance resolution is set to 0.25. Finally, note that the whole file is read into memory. The file size will therefore be a good indicator of what the memory demand imposed by diffraction calculations in addition to the underlying calculation is.
Input file for PSQL simulation description:
Keyword: FMCSC_PSQL_SIMSUMOptional for: Depositing simulation data to the PostgreSQL database in trajectory analysis mode.
This is a plain-text file, and the only requirement is that individual lines are no longer than the default maximum string length defined by CAMPARI (200 characters by default), otherwise truncation will occur. It will be deposited as is into the field "sim_summary" of the table "simulations" of the chosen PostgreSQL database. The primary purpose is to allow summaries or log-files produced by other software or found in publications to remain associated with a set of simulation data.
Input files to provide atomic index sets:
Keywords: FMCSC_ALIGNFILE, FMCSC_PDB_ATOMMAP, FMCSC_CFILE (see FMCSC_CDISTANCE for details), FMCSC_SAVATOMFILE, FMCSC_TRAJIDXFILE, FMCSC_MOL2AUXINDEX, and FMCSC_MOL2AUXINPUTThis is the simplest input file format CAMPARI uses and consists of nothing but a number of lines specifying a single atomic index (assuming CAMPARI's intrinsic ordering → be careful with PDBNUCMODE) on each line. N lines will therefore constitute an index set of size N assuming all entries are valid (legal indices). In some cases, such an index file may be obtained by running an appropriate script over a CAMPARI-generated pdb file. Atom index sets are used in a variety of contexts.
Input files to provide atom pair index sets:
Keyword: FMCSC_CFILEThis input file type is very similar to atomic index input files. Here, instead of providing a single index entry on each line denoting an individual atom, instead a pair of indices needs to provided to specify a pair of atoms. This is currently only used in certain modes of structural clustering (see FMCSC_CDISTANCE and FMCSC_CFILE for details).
Example:
----------------
2 4
3 12
----------------
The above input file would define two pairs of atoms with indices of 2/4 and 3/12, respectively. This could for example be a set of (two) interatomic distances to be used for structural clustering. Note that double entries (entry order is ignored) are removed.
Input files to provide torsional index sets:
Keyword: FMCSC_CFILEThis input file type is technically identical to atomic index input files. The only difference is that here a list of system dihedral angle indices is provided. This is currently only used in structural clustering (see elsewhere for details). The intrinsic numbering of dihedral angles is most easily extracted as follows:
It is recommended to write a temporary input file contain all integers from 1 to Ndih, where Ndih is the total number of eligible dihedral angles in the system. If Ndih is not known, any large enough number will suffice. By supplying this temporary file as input to a CAMPARI run doing structural clustering with a distance criterion based on dihedral angles on the system of interest, CAMPARI will upon execution print the list of all selected dihedral angles (i.e., all dihedral angles) at the beginning of the run. This list can be saved and used to select the dihedral angles much more easily than by counting them off.
In general, the order of residues follows the sequence input file. For each residue, possible ω-angles come first and are followed by the φ-, ψ-, nucleic acid backbone and sidechain χ-angles. The first dihedral angle of any 5-membered ring pucker (the φ-angle in proline and similar polypeptide residues and the dihedral angle along the C3*-C4* bond in polynucleotides) is the only pucker degree of freedom to be included. In special cases, the sidechain χ-angles are followed by additional entries. If the system contains disulfide bridges, the Cα-Cβ-S-S angle is next. For the first (in terms of sequence) of the two residues of a disulfide bond, and additional entry is reserved for the Cβ-S-S-Cβ angle. Lastly, the user may request unsupported (nonnative) dihedral angles to be part of the set. This requires setting keyword FMCSC_TMD_UNKMODE to values larger than 0. The interpretation of which dihedral angles to make eligible is exactly as described for that keyword. The unsupported dihedral angles are always listed last and follow the order of Z matrix lines (atoms). The corresponding Z matrix lines are printed out in the initial summary and are also shown in output file FYC.dat when using the alternative output mode by supplying a negative value for keyword FMCSC_TOROUT.
Example:
----------------
Suppose the sequence input file specifies proline dipeptide (Ace-Pro-Nme) as the first and only molecule in the system. We assume FMCSC_TMD_UNKMODE is set to 3. Following the procedure outlined above for understanding the numbering requires all integers from 1 to 6 to be present in the input file, e.g.:
1
2
3
4
5
6
----------------
This will give the following output:
# 1 is UNS ( 4) angle of residue # 1 (ACE)
# 2 is OMEGA angle of residue # 2 (PRO)
# 3 is PHI angle of residue # 2 (PRO)
# 4 is PSI angle of residue # 2 (PRO)
# 5 is OMEGA angle of residue # 3 (NME)
# 6 is UNS ( 24) angle of residue # 3 (NME)
----------------
The two "UNS" dihedral angles are the (generally uninteresting) methyl spins in both capping groups. As an example, an input file targeting only the φ- and ψ-angles of the proline residue would require just the two entries "3" and "4" to be present. Note that the numbering will generally change on account of changes to the sequence, to FMCSC_TMD_UNKMODE, or to the presence of disulfide bonds. It is therefore recommended to always inspect the initial portion of what CAMPARI prints out in order to avoid errors caused by files that were not updated.
Input file to provide a coarse-grained (symbolic) rajectory:
Keyword: FMCSC_CLUFILERequired for: Using file-based clustering, i.e., setting keyword FMCSC_CMODE to either 6 or 7. This input file provides the symbolic trajectory itself as a list of integers.
Symbolic trajectories as written by CAMPARI itself are an appropriate input file format (see documentation on STRUCT_CLUSTERING.clu). The column to be used is selected by keyword FMCSC_CLUFILECOL. Note that clusters will be renumbered automatically if CAMPARI detects that the numbering in the input file is not continuous. All lines of the input file are read regardless of the settings for FMCSC_NEQUIL or FMCSC_CCOLLECT. In general, the input will be processed correctly if it is analyzed with the same settings as it was created. This is because the snapshot connectivity map (and all related input files) and the data read-in (if any) continue to work exactly as before in file-based clustering.
Input file to provide trajectory breaks:
Keyword: FMCSC_TRAJBREAKSFILERequired for: Avoiding that every microstate transition in the trajectory is counted toward output and analysis in the inferred graph (transition matrix) printed to output file STRUCT_CLUSTERING.graphml. By extension, this is relevant for all functionalities relying on this information (as summarized elsewhere) and their corresponding outputs. It also affects (independently) the results from the progress index method. Naturally, this is is irrelevant for any simulation not using clustering and related methods.
A trajectory usually is a series of continuous snapshots of a system. The implied continuity is used to infer the system's properties in terms of transitions, pathways, and dynamics. This is often done using network-based methods (most prominently in Markov state models). The connectivity of such a network or graph is defined by the edges and their probabilities, which are most often inferred from the trajectory itself. However, not all transitions may be qualitatively similar, e.g., the swaps in replica exchange calculations differ from the incremental evolution produced by the basic sampling engine. As another example, trajectories may be concatenated in certain parallel runs or in trajectory analysis mode. This input file can be used to eliminate such apparent transitions before the corresponding analyses are performed. Input file FMCSC_TRAJLINKSFILE provides the complementary facility of adding missing transitions.
Example:
----------------
10000
20000
30000
40000
50000
----------------
This file would tell CAMPARI that the transitions from the 10000th to the 10001st step, from the 20000th to the 20001st step, and so on, are all to be excluded from relevant analyses. Note that this input always refers to the basic and complete simulation length defined via FMCSC_NRSTEPS. This means that the numbers must not be adjusted to reflect settings for FMCSC_EQUIL, FMCSC_CCOLLECT, FMCSC_FRAMESFILE, or similar keywords. The use of a frames file should be handled with particular care. If the file accesses the trajectory in random access ("as is") mode, all step numbers are assumed to refer to the line number in the frames file rather than the index of the frame on that line. This is a general change of interpretation inherent to FMCSC_FRAMESFILE with certain input file formats. Conversely, if the file accesses the trajectory in strictly sequential mode, step numbers continue to refer to the original trajectory. In a parallel setting, this can apply separately and identically to the individual trajectories (RE mode) or globally to the concatenated trajectory (MPI averaging mode).
The actual coarse-grained trajectory reported in STRUCT_CLUSTERING.clu is naturally only printed for snapshots that ended up in the set to be analyzed, which means that any effective transition spanning one or more excluded step-to-step transitions is also excluded. Note that the removal of transitions can of course fracture the network, which may render some analyses impossible to perform or difficult to interpret. Lastly, users should be warned that a large value for NRSTEPS will lead to a large memory allocation when reading the file (irrespective of the file's actual size).
Input file to provide additional trajectory links:
Keyword: FMCSC_TRAJLINKSFILERequired for: Adding snapshot-based microstate transitions to an input trajectory to modify the inferred graph (transition matrix) printed to output file STRUCT_CLUSTERING.graphml. By extension, this is relevant for all functionalities relying on this information (as summarized elsewhere) and their corresponding outputs. It also affects (independently) the results from the progress index method. Naturally, this is is irrelevant for any simulation not using clustering and related methods.
A single trajectory is a series of continuous snapshots of a system. The implied continuity is used to infer the system's properties in terms of transitions, pathways, and dynamics. This is often done using network-based methods (most prominently in Markov state models). The connectivity of such a network or graph is defined by the edges and their probabilities. Networks are often constructed from sets of many trajectories whose start and end points may or may not be related. For analysis in CAMPARI, they are generally concatenated. It is impossible to infer the correct connectivity purely from the sequence of snapshots in the concatenated trajectory. As a complement to FMCSC_TRAJBREAKSFILE, this input file allows the user to manually specify missing transitions between snapshots. These occur, for example, in cases where several trajectories are branched off the same point and subsequently concatenated (as in the analysis of a PIGS simulation).
The first letter in the file indicates the mode:
'Y' or 'y': (attempted remapping of links requested)
'N' or 'n': (only exactly matched links are kept)
Example:
----------------
N
1000 2001
1000 3001
1000 4001
1000 5001
----------------
This file would tell CAMPARI that the 1000th snapshot in the (concatenated) input trajectory not only directly precedes the 1001st snapshot but also to the 2001st, 3001st, 4001st, and 5001st snapshots. Duplicates are not allowed. For every entry, a single integer count is added to the transition count matrix element corresponding to the transition from the cluster containing the snapshot in the first column to the cluster containing the snapshot in the second column (order therefore matters). Note that this input always refers to the basic and complete simulation length defined via FMCSC_NRSTEPS. This means that the numbers must not be adjusted to reflect settings for FMCSC_EQUIL, FMCSC_CCOLLECT, FMCSC_FRAMESFILE, or similar keywords. The use of a frames file should be handled with particular care. If the file accesses the trajectory in random access ("as is") mode, all step numbers are assumed to refer to the line number in the frames file rather than the index of the frame on that line. This is a general change of interpretation inherent to FMCSC_FRAMESFILE with certain input file formats. Conversely, if the file accesses the trajectory in strictly sequential mode, step numbers continue to refer to the original trajectory.
In the example, the mode of not remapping snapshots was chosen. This means that, if any of the snapshots in question is missing because of the aforementioned keywords, that particular link will not be used (a warning is provided). If the mode is 'Y' instead, CAMPARI will try to remap these snapshots to those that are available and nearest in time (sequence) in the original trajectory. Specifically, the first ones before and after are used for the first and second columns of the input file, respectively. In addition, there will be a check against added and destroyed links based on FMCSC_TRACEFILE and FMCSC_TRAJBREAKSFILE. If CAMPARI detects that there is no longer a "safe" connection between snapshots surrounding the unmapped entry, it will discard the link in question. Because the stacked use of all these modifications gets confusing quite easily, a snapshot-based link map will be created (including warnings) before the actual execution begins. Keyword FMCSC_CADDLINKMODE also provides somewhat overlapping functionality, but this is not at the snapshot level. Lastly, users should be warned that a large value for NRSTEPS will lead to a large memory allocation when reading the file (irrespective of the file's actual size).
Input file to provide the target set for committor probabilities:
Keyword: FMCSC_CLUFOLDFILERequired for: Defining a target set for committor probabilities (as reported in output files PFOLD_MINUS_xxx.dat, PFOLD_PLUS_xxx.dat, PFOLD_MINUS_yyy_CFEP_xxxxxxxx.dat, and PFOLD_PLUS_yyy_CFEP_xxxxxxxx.dat) comprised of one or more clusters based on multiple input snapshots. This is is irrelevant for any simulation not using clustering and related methods, not requesting the committor analysis, or for executables not linked against HSL.
This file uses a trivial single column input format where the user indicates the integers that correspond to the snapshots that are used to identify the clusters that compose the target (folded) set B in all committor analyses (see DOPFOLD and DOPFOLD_MINUS). The cluster or clusters have to be in the same strongly connected component as those for the alternative set (A) requested via FMCSC_CLUUNFOLDFILE.
Note that this input always refers to the basic and complete simulation length defined via FMCSC_NRSTEPS. This means that the numbers must not be adjusted to reflect settings for FMCSC_EQUIL, FMCSC_CCOLLECT, FMCSC_FRAMESFILE, or similar keywords. The use of a frames file should be handled with particular care. If the file accesses the trajectory in random access ("as is") mode, all step numbers are assumed to refer to the line number in the frames file rather than the index of the frame on that line. This is a general change of interpretation inherent to FMCSC_FRAMESFILE with certain input file formats. Conversely, if the file accesses the trajectory in strictly sequential mode, step numbers continue to refer to the original trajectory.
In case this file is not specified or not found, CAMPARI will use the value associated with keyword INISYNSNAP for this purpose, which defaults to using the cluster that has the largest number of snapshots.
Input file to provide the alternative set for committor probabilities:
Keyword: FMCSC_CLUUNFOLDFILERequired for: Defining an alternative set for committor probabilities (as reported in output files PFOLD_MINUS_xxx.dat, PFOLD_PLUS_xxx.dat, PFOLD_MINUS_yyy_CFEP_xxxxxxxx.dat, and PFOLD_PLUS_yyy_CFEP_xxxxxxxx.dat) comprised of one or more clusters based on multiple input snapshots. This is is irrelevant for any simulation not using clustering and related methods, not requesting the committor analysis, or for executables not linked against HSL.
This file is exactly identical to FMCSC_CLUFOLDFILE only that it lets the user specify the alternative (A) rather than the target (B) set.
Section 7: Files relevant to the NetCDF analysis mode:
(back to top)
Input file to provide binary data for the NetCDF analysis mode:
Keyword: FMCSC_NCDM_NCFILERequired for: Providing the data in NetCDF binary format to be mined.
Variables
The only variable that must be present in the NetCDF input data base is the one that contains the values of the degrees of freedom (features) per frame that the user wants to submit to structural clustering and downstream analyses. CAMPARI expects this variable to be named "featuresvals" in the data base, but this assumption can be changed with keyword NCDM_NMFV. This variable has to have two dimensions, the first one reflecting the number of frames (to which it can be given the UNLIMITED size) and the second one the number of features per frame. The name of these dimensions are inferred directly from the input NetCDF data base, but can not be changed when converting an ASCII file (defaul names are "nframes" for the first (possibly unlimited) dimension (number of frames) and "nfeats" for the second dimension (number of features)). One optional attribute to this variable can be specified and must be named "periodic_range" (see below). According to the NetCDF documentation names of attributes can be changed easily from the user side.
Two optional variables for feature weights are recognized by CAMPARI if present in the input NetCDF file. Static weights can be specified with variable "static_weights" and dynamic ones with variable "dynamic_weights". This naming is fixed, and in case the input data base does not comply with this standard, there are tools to also rename variables from the user side. The actual use of those weights depends on the requested analysis. For example, if dynamic weights are present in the data base but a distance criterion is selected such that dynamic weights are not supported, they will be completely discarded (the only exception to this rule being when converting an ASCII file, in that case weights are reflected in the converted file, but obviusly not used in analysis). As a final note on the use of feature weights, the NetCDF data mining mode supports inferring the weights directly from the data (see CDISTANCE and CMODWEIGHTS), regardless whether those have been specified in the input data base or not. This also means that input weights can be overwritten if this is desired (see CMODWEIGHTS).
Dimensions
As already mentioned above, there are two dimensions that characterize all the variables recognized by CAMPARI. The first dimension is the unlimited dimension (number of frames), which is the first dimension for the feature-containing variable NCDM_NMFV and for the dynamic weights (static weights obviously do not have this dimension as there is only one value per degree of freedom). The second one is the number of features per frame and is relevant to all the variables, viz., to the one that holds the values of the features NCDM_NMFV and to the static and dynamic weights, if specified.
Attributes
There is only one attribute that can influence the behavior of CAMPARI in NetCDF data mining mode. If the input data are periodic, the user can specify the periodicity range through the attribute "periodic_range" assigned to the feature-containing variable NCDM_NMFV. This attribute should contain two values, one for the left boundary and one for the right boundary of the periodic interval. In case a distance criterion that implies periodicity is specified but no periodicity attribute is assigned to the feature-containing variable, CAMPARI assumes it is analyzing angle data in degrees and uses the default periodic interval with boudaries at -180° and 180°. In general, periodic data are adjusted to be within the periodic interval in use (default or specified by the user). This means that, for example, the user does not have to adjust his/her periodic data to the reference periodic interval if the data acquisition has been done on different interavals, CAMPARI will adjust the values automatically, and this adjustment is NOT reflected in ANY possible output file (NCDM_WRTINPUT). However, the user has to be keep in mind that these acquisitions have to be consistent with the reference periodic interval. It is not possible to have, say, angles measured once in the interval [-180°180°] and once in [0°360°] as these intervals overlap. It is instead correct to mix measurements coming from, for instance, independent angle readouts once in the interval [-180°180°] and once in [180°540°]. It is worth noting also that, differently from what happens for ASCII conversion, when a NetCDF file is used as input, the keywork NCDM_PRDCRNG is completely ignored.
Input file to provide ASCII data to be converted in NetCDF standard and/or analyzed:
Keyword: FMCSC_NCDM_ASFILERequired for: Providing the data in ASCII format to be analysed with structural clustering and downstream routines and/or to be converted to the NetCDF standard required by CAMPARI for data mining.
CAMPARI expects to find an ASCII file with as many columns (features) as specified with keyword NCDM_NRFEATS. To subsample the features to be used in analysis, the only option is to provide an additional input file (see NCDM_CFILE). In case checking of the input ASCII file is explicitly requested, CAMPARI checks that every line of the input file contains the exact number of entries expected from NCDM_NRFEATS. This processing slows down considerably the reading-in of the data, and is disabled by default. The number of rows that correspond to the frames that are contained in the ASCII file should be specified with keyword NCDM_NRFRMS. This specification is important. Any other number than the exact number of frames contained in the ASCII file will produce unexpected and most likely wrong behavior. To subsample the frames of the input ASCII in analysis, the user can rely on the mutually exclusive usage of keywords CCOLLECT or NCDM_FRAMESFILE, which however are ignored in conversion (the entire ASCII file is always converted if conversion is requested, see NCDM_WRTINPUT). Keyword NCDM_NRFRMS must not account for the possible presence of features weights, which is checked upon by CAMPARI automatically. Indeed, in case the user wants to specify weights, those can be appended to the input ASCII file. If CAMPARI finds 2×NCDM_NRFRMS lines, it assumes that the input ASCII file contains dynamic weights, with the first line of weights (line NCDM_NRFRMS + 1) containing the weights corresponding to the degrees of freedom of the first snapshot and so on. If CAMPARI detects 2×NCDM_NRFRMS + 1, it assumes there are both dynamic and static weights, with the latter at the end of the file (last line). If the ASCII file is formed by NCDM_NRFRMS + 1 lines, the last one is interpreted as static weights. Any other combination will produce an error. The actual use of feature weights depends on the choice of CDISTANCE and keyword CMODWEIGHTS (and associated ones) may become relevant.
Input file to select specific degrees of freedom in NetCDF data mining mode:
Keyword: FMCSC_NCDM_CFILERequired for: Selecting specific features in analysis.
This simple single-column list of integers lets the user select (subsample) the features to be used for structural clustering and downstream analyses with respect to the set of degrees of freedom present in the input file (NCDM_NCFILE or NCDM_ASFILE).
Example:
----------------
1
3
4
----------------The above exemplified NCDM_CFILE file instructs CAMPARI to select only the first, third and fourth degrees of freedom of the original input file (NCDM_NCFILE or NCDM_ASFILE) for structural clustering and downstream routines. Duplicates, nulls and negatives are removed automatically. The features subsamplig is not reflected in the possible converted ASCII file (NCDM_WRTINPUT), but does reflect in the output summary database in case it is required when providing as input a NetCDF file (NCDM_NCFILE, mainly for debug, see NCDM_WRTINPUT).
Input file to select specific frames in NetCDF data mining mode:
Keyword: FMCSC_NCDM_FRAMESFILERequired for: Selecting specific frames in analysis.
With this single-column list of integers, the user can specify a set of frames from the input file (NCDM_NCFILE or NCDM_ASFILE) to be analyzed in structural clustering and downstream analyses.
Example:
----------------
1
3
4
3
107
----------------As the example above documents, duplicates are legal and the frames file does not need to be sorted, it will be processed exactly the way it is provided, consistently with the underlying NetCDF file format (see also FRAMESFILE). Nulls and negatives will be removed. However, a frame value that exceeds the number of frames present in the input file (NCDM_NCFILE or NCDM_ASFILE) will trigger an error. In case feature weights are present, they will be processed (and possibly reordered/duplicated) in order to match the frames specified with this input file. It is important to note that, differently from the standard execution mode of CAMPARI, the specification of a FRAMESFILE in the NetCDF data mining mode madates setting CCOLLECT to 1, viz., it is explicitly disabled the option to subsample a frames file with CCOLLECT. The action of the frames file is not reflected in any way in the NetCDF obtained from an input ASCII file, but it is reflected in the debug output when the input file is already in NetCDF format (see NCDM_WRTINPUT).