CAMPARI Tutorial 1

Tutorial 1: First CAMPARI Simulation - Torsional Space Monte Carlo Sampling of a Q_N-Peptide ("Polyglutamine") in Implicit Solvent

Preface:

A single, highly-flexible peptide chain like polyglutamine (Q_N) is an example of a system that can be simulated efficiently by Monte Carlo (MC) algorithms given that N is small enough. Researchers in the Pappu lab have studied this system extensively, and it is one of the most-often simulated system within CAMPARI. It is therefore the perfect example system for an introductory tutorial to CAMPARI.
A homopolymer such as Q_N is an easy system to setup and run using CAMPARI. From a polymeric point of view, polyglutamine adopts collapsed (globular) and disordered conformations in aqueous milieus at ambient temperature and pressure. Despite being an exclusively polar system, experimentally polyglutamine is extremely prone to aggregate and - in this capacity - associated with several neurodegenerative diseases such as Huntington's. Polymeric properties and association of monomeric and dimeric Q_N were analyzed, for example, in this reference. At the end of this tutorial, you should be able to set up simulations that attempt to reproduce those data at least for the monomeric case.

Before we can start, however, make sure you have a working version of the serial or the threads-parallel CAMPARI executable and parameter files (normally in the subdirectory params/ ) available. If you still need these, please refer to the Installation section of the documentation.

If you have CAMPARI installed and compiled, there are three main steps that need to be done to run a successful CAMPARI simulation:

Create a key-file. The key-file contains all the instructions and parameters needed to run the simulation you want. An entire section of the documentation is dedicated to clearly laying out the purpose of every keyword (keywords). This is understandably very verbose, and it is precisely the role of tutorials to introduce groups of keywords in a direct application context. Some (but very few) keywords are essential and cannot be left out of any key-file. Most keywords have default values set or are not applicable to a specific calculation, so in a lot of cases they can be omitted. Often times, however, it will help to write clearer key-files that contain more than the necessary keywords, especially for those that affect the majority of simulations. That way you can ensure that you are familiar with the most relevant options, which may save a lot of time and avoid running "bad" (as in, containing inadvertent errors) simulations. The template key-file used in this tutorial is generally a good starting point for Monte Carlo simulations within CAMPARI. Not surprisingly, we will (further below) start this tutorial by looking at this key-file and explaining its components.
Check and/or create auxiliary input files. The only auxiliary input files that are always needed are the sequence file and the parameter file (→PARAMETERS). These define the system to be simulated and allow CAMPARI to collect the necessary parameters to define the interactions amongst its components. Many additional keywords are references to file system locations where further auxiliary input files are placed. These often have very simple formats, and are only needed for specialized calculations (such as replica-exchange runs). Obviously, if any necessary input file does not exist, is not readable, or is in the incorrect format, CAMPARI will not run correctly and most likely exit with an indicative message. It is always recommended to scan the initial portion of the log-output produced by CAMPARI (for non-MPI runs, this, by default, is written to the terminal) to check for warnings. Many features are automatically disabled if required inputs are missing, and this may not always be desirable. In addition, segmentation faults sometimes occur with bad input files because CAMPARI uses Fortran formatted reads (i.e., variable type mismatches are poorly tolerated, empty lines are not tolerated everywhere, and very long strings can be problematic (e.g., file names: there is a macro in source file "macros.i" defining maximum string lengths). These language limitations are clearly annoying, and their only usefulness lies in pointing out inadvertent errors in input files.
Run CAMPARI. If all input files are present and the key-file complete, the command is as simple as:

<full path to CAMPARI>/bin/x86_64/campari -k foo.key
Note that the "x86_64" is an architecture or compiler string chosen during installation, so it may differ for your specific case. For the threads-parallel executable:

<full path to CAMPARI>/bin/x86_64/campari_threads -k foo.key
All output files will be created in the working directory. Remember that CAMPARI (by design) will overwrite almost all of these output files if the simulation is rerun in the same directory. Therefore, make sure to back up / move data from older runs that you wish to preserve.

Note that this tutorial is primarily meant to familiarize new users with the use of CAMPARI in general. The example chosen is more or less arbitrary. Therefore, it will not always be clear as to why certain choices for run parameters (keywords) are suggested. Please refer to the other tutorials for cases that are constructed to accomplish very specific tasks.

Step-by-Step:

Step 1 - Create (by modification) an appropriate key-file for Monte Carlo simulations of Q_N

Download or copy the key-file found in the subdirectory examples/tutorial1/ to your current working directory and rename the file as "tutorial1.key". This can be used as a starting point for future simulations. CAMPARI offers many options for customizing the simulation, but often very little needs to be modified from key-file to key-file if the simulations are analogous (for example, simulations on the same system, but with different analyses or sampling methods).

First, open the key-file and inspect its contents. Notice the universal prefix used by almost all keywords, viz., "FMCSC_". This tag, which is largely historical, makes it somewhat easier to automatically find keywords both in the source code and in key-files. Its single most important role is that it allows the key-file parser to differentiate unused keywords from other lines lacking a leading comment character ('#'). We will only examine some of the more important keywords in this section. For all others, you can and should use the comprehensive documentation. The descriptions there are usually relatively detailed and contain many links to related keywords or input/output files allowing you to navigate efficiently through topically connected subjects found in different parts of the documentation:
Two keywords of fundamental interest pertain to the only two essential input files required by a standard CAMPARI run:

PARAMETERS - This is one of two keywords (the other being FMCSC_SEQFILE) that are absolutely essential to run CAMPARI in any fashion (both are no longer needed when using the data mining executable). This keyword gives the path to the parameter file that will be used in the simulation to define the interaction model between the particles present, and to provide the most fundamental atomic properties needed to even build biomolecules. Several canonical, biomolecular force fields are supported, such as OPLS-AA/L, CHARMM22, AMBER99, or GROMOS53. This keyword is one of two exceptions to the use of "FMCSC_" as a standard prefix, and for our simulation should be written as:

PARAMETERS <full path to folder>/campari/params/abs3.2_opls.prm
You can read up on the corresponding documentation to find out what the specified file corresponds to. It should become apparent later why we are using it in this example.
FMCSC_SEQFILE - This is the path to the sequence file for the simulation. Being the second essential input file, it defines the composition of the system as a sequence of "residues". Residues are one of the fundamental organizational units in CAMPARI As mentioned, this file is always needed regardless of whether you are reading in an annotated coordinate file (via FMCSC_PDBFILE as a starting structure) or when generating a random starting structure from the sequence file itself. CAMPARI will never automatically infer the system's sequence from an input file other than the sequence file itself. There is, however, a tool available to construct an appropriate sequence file from a PDB file as found in the data bank itself (discussed and used, for example, in tutorials 6, 16, and 17). We will discuss constructing the sequence file from scratch in detail in step 2 of this tutorial. Set the value for the keyword to "q20.in".

Below you can find short descriptions of keywords in 5 separate groups. We here list only those of particular interest to the simulation example chosen for this tutorial.

Basic setup

There are some very fundamental and consequently very important keywords that control the type, extent, and initial conditions for the calculation to be undertaken.

FMCSC_PDBANALYZE - This is the major switch between running a "fresh" CAMPARI simulation that uses an actual sampling engine vs. running CAMPARI in trajectory analysis mode. The latter refers to the possibility of feeding in trajectory data that can then be analyzed by CAMPARI, for example by its various analysis routines. Keep this keyword at 0 (default) to run a fresh simulation. This keyword is an example of a "simple logical" as it is often referred to in the documentation of keywords. This means that specifying a value of 1 instructs CAMPARI to turn the corresponding functionality on ("true") whereas all other specified values are interpreted as "false", and disable the corresponding functionality.
FMCSC_DYNAMICS - Use the default setting of 1 corresponding to a pure Monte Carlo (MC) simulation. This keyword allows the use of gradient-based methods such as molecular dynamics.
FMCSC_CARTINT - This controls the basic choice of degrees of freedom that the system is supposed to evolve in. It can be either rigid-body/torsional (internal) coordinates or Cartesian coordinates. In MC, only the former is available (value of 1).
FMCSC_ENSEMBLE - The thermodynamic ensemble assumed to underlie the simulation is specified by this keyword. The default setting of 1 corresponds to the canonical (NVT) ensemble, which is the most convenient intuitive ensemble for a system in a implicit bath of solvent.
FMCSC_RANDOMIZE - Setting this keyword to option 1, which is also the default, will instruct CAMPARI to attempt to find a set of degrees of freedom that randomizes the starting structure in such a way that excluded volume is approximately obeyed. Randomization by this option is generally useful since it will supplement any structural input with randomized parts as needed. When running multiple simulations for the same system, randomized starting structures are in important condition for keeping the final results interpretable in statistical terms (they help understand the role of initial state bias). Randomization is controllable at different levels: options 0-3 for FMCSC_RANDOMIZE and keywords FMCSC_RANDOMTHRESH and FMCSC_RANDOMATTS.
FMCSC_NRSTEPS - This is the total number of elementary simulation (here MC) steps taken by the simulation. Here we will set this keyword to 21x10⁶. Based on prior experience, this simulation lengths ensures that we have sufficient data for the readouts we consider (if you are interested in simply following the technical steps involved in this tutorial, lower this and the setting for FMCSC_EQUIL below by two orders of magnitude to create a modified "tutorial" that finishes in a few minutes).
FMCSC_EQUIL - This is the number of total elementary simulation (here MC) steps discarded as equilibration steps. NRSTEPS minus EQUIL will be the total number of steps over which analysis will be run, with the frequency of analysis controlled by output settings discussed below. Note that no trajectory data will be produced during equilibration, either (→ FMCSC_XYZOUT).
FMCSC_TEMP - For constant temperature simulations, this defines the system (bath) temperature, and we should set it to the physiological temperature (310K) for this example.

Box settings

These settings define the simulation container, and they almost always matter.

FMCSC_SHAPE - Setting 2 corresponds to using a spherical droplet. Note that with a single molecule in the system (see below), we essentially create conditions of "infinite dilution" assuming that the radius of the droplet is large enough, that the molecule is sufficiently far away from the boundary, and that it cannot move away on account of the Monte Carlo move set.
FMCSC_SIZE - For our simulation of a 20-mer of polyglutamine (Q₂₀), we will use a radius of 400Å, which gives a droplet diameter one order of magnitude larger than the contour length of the fully extended molecule.
FMCSC_BOUNDARY - By using setting 4, we create a half-harmonic potential defining the droplet's boundary that acts on all atoms within the system. Because of the size (see above), for this particular system this choice is not of importance (but this is clearly the exception rather than the rule).
FMCSC_SOFTWALL - For a half-harmonic boundary potential, this keyword sets the force constant, While this value cannot be exactly zero, it can be made arbitrarily small to minimize the impact of the boundary in cases where this would be undesirable. As discussed below, we will create a move set that keeps the polymer in the same place, and thus it is advisable to set a tiny value (1.0e-9) here.

Note that a mismatch of the size of a periodic system relative to structural input, e.g., trajectory data being analyzed, can lead to errors that are difficult to understand at first.

Energy parameters

Look over the Hamiltonian section of the key-file. The specified values essentially correspond to the use of the ABSINTH implicit solvation model as published. You should not change them within the scope of this tutorial.
In general, Hamiltonian keywords consist of outside scaling factors that can assume any positive real number including zero (the latter disabling this term in the Hamiltonian), e.g., keyword FMCSC_SC_ATTLJ controls whether dispersive interactions assumed to be scaling with r^-6 act between nonbonded particles in the system.
The second class are dependent parameters that only matter for one or more specific terms in the Hamiltonian. These vary widely in their applicability, scope, and impact. For instance, FMCSC_SIGRULE can be used to set the combination rule for how to construct pairwise Lennard-Jones parameters σ_ij from terms σ_ii and σ_jj. Some of these keywords may be difficult to understand for someone not familiar with molecular force fields. In any case, we still strongly recommend reading the corresponding parts of the documentation.

Monte Carlo move set controls

Monte Carlo simulations in CAMPARI are performed in rigid-body / torsional space. For a single polypeptide chain, the native CAMPARI degrees of freedom are the backbone (φ, ψ, and ω) and sidechain (χ_1-3 for glutamine) dihedral angles. A list of native degrees of freedom is found in the description of sequence input. Exceptions include but are not limited to the sampling of ring pucker states in proline and sugars where bond angles of the ring will change slightly. Rigid-body motions of the chain can usually be turned off for the monomer, unless systems are meant to be studied under explicit confinement conditions.
Some of the occurring keywords are used to illustrate how to control the MC move set via the key-file. A comprehensive description is found elsewhere.

FMCSC_ALIGN - This is an important keyword that applies to almost all pivot-like moves, i.e., moves where a change in a single dihedral angle requires a decision as to which end to swivel around. The default option, 4, is not ideal here because stochastic swiveling means that there is no fixed base of motion. This would mean that the molecule moves around the droplet, which implies that it will contact the boundary at some point. Thus change it to option 3 and check that the initial structure is far away enough from the boundary (otherwise simply restart).
FMCSC_RIGIDFREQ - This is the frequency setting for any type of move sampling exclusively the rigid body coordinates of one or more molecules. In our example, sampling of the rigid-body motion of the chain is wasteful so you should turn off (set it to 0.0). Bear in mind, however, that sometimes unfortunate settings may place part of a molecule such that it overlaps with the boundary of the droplet. Such a conflict should be avoided as it may bias the result, for example by choosing a larger droplet. You can eliminate this problem altogether by switching to periodic boundaries or by setting FMCSC_SOFTWALL to an extremely small value.
In general, in the MC sampler within CAMPARI, the probability of individual types of moves is determined in a hierarchical decision tree. Turning FMCSC_RIGIDFREQ off would therefore also disable a set of related move types, such as cluster rigid-body moves (→ FMCSC_CLURBFREQ). This principle applies in general, and every move set frequency keyword should indicate in its documentation how to calculate the expected number of moves of that given type. This should also clarify the hierarchy. The disadvantage of the binary decision tree is that frequency settings are only relative to the respective branch (see below).
FMCSC_CHIFREQ - Frequency of sidechain moves. To give an example of the decision tree in action, consider the following: If FMCSC_PARTICLEFLUCFREQ is zero (which it must be here since we have a constant particle number ensemble) and assuming that FMCSC_RIGIDFREQ was set to zero above, we can infer that a setting of 0.2 with a total simulation length of 21x10⁶ steps will produce ca. 4.2x10⁶ side chain moves. This serves as an example for how to compute net expected attempt totals for moves of a given type (note that we disregard the extremely specialized moves controlled by FMCSC_PHFREQ here, i.e., we rely on the default being zero).
FMCSC_CRFREQ - In general, concerted rotation moves describe a class of moves for which stretches of a polymer are sampled while the ends are either restrained or constrained to their original positions. By choosing a nonzero frequency here, we would rely on many default choices for dependent keywords that would go beyond the scope of this tutorial to explain. Therefore, keep this entire class of moves disabled (setting of 0.0).
FMCSC_OMEGAFREQ - The amide (ω-)bond of the peptide chain is sampled separately from the φ/ψ-angles based on the frequency setting of this keyword. This is an example for an improved move set that can take into account the peculiarities of specific degrees of freedom. Here, the large barrier associated with cis/trans isomerization and the narrow basins mean that move parameters should be different from those for, e.g., φ/ψ-angles. These move parameters are controlled by their own keywords: FMCSC_OMEGARDFREQ and FMCSC_OMEGASTEPSZ. These parameters are named analogously for many move types and explained for the next item.
FMCSC_PIVOTRDFREQ - Pivot-type moves of the φ/ψ-angles are the "default" move in CAMPARI, and their expected number is obtained from the remaining frequency settings (that is, they are the last remaining option in the decision tree). The role of this keyword is to control the frequency for a subset of those, specifically the fraction of moves randomly resetting the corresponding degrees of freedom on the interval from -180° to 180° (maximal step size). This is an example of a keyword that occurs in analogous form for other move types as well (see for example FMCSC_RIGIDRDFREQ or FMCSC_CHIRDFREQ).
FMCSC_OTHERFREQ - The list of native CAMPARI degrees of freedom obviously does not include degrees of freedom in unsupported residues. However, it does not include certain symmetric dihedral angles in supported residues either, e.g., the primary amide C-N bond as found also in glutamine. To sample either of these types of degrees freedom it is necessary to use this keyword, which controls what are all single dihedral angle pivot moves (again with subordinate parameters). These moves can also work on native degrees of freedom (excepting φ/ψ-angles in polypeptides and bonds in flexible rings. Here, the settings in the key-file mean that we will use them on native degrees of freedom only.
FMCSC_PIVOTSTEPSZ - The remaining φ/ψ-pivot moves are stepwise perturbations from the current values for the particular torsions. This keyword controls the magnitudes of these small perturbations. Again, there are several analogous keywords for other move types. As alluded to, for example, the step size for ω-pivot moves should generally be smaller.

Note for all of the above that CAMPARI will parse the selections made for the move set, determine its applicability to the system at hand, and disable those components that cannot be used. In rare cases, it may also exit at the very beginning and ask for an explicit adjustment of the move set to avoid confusion. Also note that there are a few dihedral angles degrees of freedom not native to CAMPARI (i.e., they are frozen by default). These include the N-C bond in the sidechain primary amide groups and methyl groups at the terminal caps. They could be sampled with the help of keywords FMCSC_OTHERFREQ and associated keywords, but this is not explored here.

Output frequency and files for initial analysis

CAMPARI does a lot of analysis during the simulation depending on how you set keywords that control the frequency with which quantities of interest are evaluated, data are accumulated, and/or instantaneous output is written to corresponding output files. Probably the most ubiquitous and important output are trajectory data since they - as mentioned above - allow later reanalysis of quantities missed in the original simulation (→ FMCSC_PDBANALYZE). This is the traditional work flow in molecular simulations, viz., to separate completely trajectory generation and analysis.
Only 5 analysis keywords are discussed here in some detail:

FMCSC_BASENAME - This keyword will apply a particular prefix to a few of the output files. The most important output in this class is trajectory output.
FMCSC_XYZOUT - This keyword controls the frequency at which the Cartesian coordinates of the system are output to a trajectory file. Like every other analysis keyword ending in "OUT" or "CALC", setting this for example to 5000 means that the corresponding analysis/output is performed every 5000 steps. Here, a snapshot of the system's Cartesian coordinates will be output every 5000 steps.
FMCSC_XYZPDB - This keyword controls the type of trajectory output generated. By setting it to 3, we request the trajectory to be written in dcd-format, a common binary trajectory file format also used by programs such as CHARMM or NAMD.
FMCSC_ENOUT - This keyword controls the output interval for instantaneous energy data split by term (see ENERGY.dat). Set this to 5000, which will be sufficient to diagnose the mean boundary energy in particular.
FMCSC_POLCALC - This keyword controls the calculation and data accumulation interval (frequency) for polymeric descriptors of low computational cost (those that scale linearly with system size). For instance, this will accumulate data to produce in the end a histogram of the observed radii of gyration throughout the production phase of the simulation. Change this to be 500.

You should now have a complete key-file with appropriate settings. There are many other default keywords that were not discussed, so remember to refer to the full documentation of all keywords to understand what they can be used for and how they work.

Step 2 - Create a sequence file

The sequence file is a linear series of 3-letter codes (with possible suffixes) that define the molecular components of the simulation. CAMPARI natively supports standard amino acids, some post-translationally modified and nonstandard amino acids, capping groups, standard nucleotides, and many small molecules (including ions). Native support means that CAMPARI is, at the source code level, aware of the molecular topologies of these residues and how they can be connected (for polymers). Simulating residues that are not supported is also possible by means of a structural input file and, possibly, patches (see FMCSC_SEQFILE and FMCSC_PDB_TEMPLATE). There are specific capping groups like the acetyl unit (ACE) for polypeptide N-termini and the N-methylamide group (NME) for polypeptide C-termini. All terminal amino acids can also be made into (charged) N- or C-terminal residues simply by adding "_N" or "_C" after the 3-letter code, respectively. When constructing a chain, CAMPARI will connect residues into a single chain until it meets a valid C-terminal residue (either a capping group or a normal residue explicitly made C-terminal), at which point it begins to form a new molecule (that could be a small molecule or another polymer chain). A polyglutamine sequence with appropriate caps would therefore look like this:

ACE
GLN
GLN
....etc...
GLN
GLN
NME
END

Notice the "END" statement at the end of the sequence file. This line terminates the processing of sequence input exactly at that point, which means that any polymers started before must have been terminated. To learn more about sequence file construction and the available residues and small molecules, consult the corresponding documentation. For this tutorial, create a sequence file with the sequence for capped Q₂₀ in the same directory that your key-file resides in and name this file q20.in. As a final step, adjust other required paths in the key-file to match your local working environment.

Step 3 - Run CAMPARI

Simply run the CAMPARI executable while providing the key-file from that folder as the only argument (this is assuming that the CAMPARI executables are in your path; if not, you must include the full path below):

campari -k tutorial1.key >& log &
If you want to use the threads-parallel executable, be sure to set the number of threads (keyword FMCSC_NRTHREADS) to the number of cores you can spare for the job. This also works with just a single core (i.e., there is no need to compile and use the fully serial version unless you intend to use features not (yet) supported by the threads parallelization):

campari_threads -k tutorial1.key >& log &
This will run the job in the background, and all log-output is redirected to the file "log". If something is incorrect, the program will in all likelihood terminate right away and tell you in hopefully comprehensible fashion what the problem is. Common problems include missing or incorrect input files, bad paths, and, as mentioned above, format mismatches in the key-file (.e.g., providing a character variable to a keyword expecting an integer).
At the full number of steps, and depending on your processor speed, the simulation should run for several hours. If you just want to quickly see whether the simulation runs, and what it outputs, you can lower the number of steps by two orders of magnitude as suggested above. CAMPARI outputs a summary of the attempted calculation when starting a simulation, and provides a few global statistics at completion. Both of those are obtained as part of a standard output. This can be directed to a log-file, which is a good place to look for explanations if CAMPARI crashes or terminates unexpectedly. It is always useful to look at the initial summary right away, i.e., it is better to find setup errors before the bulk of the computing time has been invested.

Step 4 - Have a look at the output provided by CAMPARI

First check the last lines of said log file to see if CAMPARI completed correctly. It should have an end time and date of execution indicating that everything went well. Because we requested a number of reports, the initial output will contain what is essentially debugging information. It will be advisable to disable most standard reports for most calculations (although it is often helpful to keep reports for user-requested features, e.g., custom constraints and their corresponding report flag).

Now list the contents of the folder where CAMPARI was executed. Some files are always created if the simulation ran correctly, such as initial and final pdb-files (called POLYQ_START.pdb and POLYQ_END.pdb unless you changed the setting for FMCSC_BASENAME). You should also see the trajectory file that will be called POLYQ_traj.dcd (unless you changed the chosen file format). The trajectory with the full Cartesian (xyz-)coordinates will usually be obtained in binary form, for example in the highly compressed xtc-format (if permissible) or the more space-consuming dcd- or NetCDF formats. CAMPARI interfaces easily with the visualization software VMD that can open and manipulate most trajectory formats. These binary formats are often used by other simulation software as well (CHARMM for instance uses the dcd-format that we also use here setting FMCSC_XYZPDB to 3). For convenience, there is an automatic visualization script to be used with VMD that is going to be called POLYQ_VIS.vmd. It loads the CAMPARI-generated data and defines a few representations to view the structure. Note that trajectory output, unlike almost all other output in CAMPARI, is appended if you rerun the simulation several times in the same folder with the same settings.

CAMPARI can also take all the binary trajectory formats it writes as input for post-processing in trajectory analysis mode. This is often useful to refine or add certain analyses or change the settings, and can even be used to recompute energies for the structures present using a different Hamiltonian.

In the working directory, you should also see a file called POLYAVG.dat. This is the simplest output file provided on account of us choosing a value of 500 for FMCSC_POLCALC. By opening it, you should be able to read off the average radius of gyration (<R_g>) for the full simulation (minus equilibration) as the first data entry. The distribution of the radius of gyration is printed to RGHIST.dat, and further files being produced are RETEHIST.dat (histogram of end-to-end distance), RDHIST.dat (two-dimensional histogram of a reduced radius of gyration and asphericity), DENSPROF.dat (radial density profile), PERSISTENCE.dat (angular correlation function along backbone), and TURNS_RES.dat (average bending angles along backbone). If you ran the simulation to its full length, these data could reveal to you that capped Q₂₀ is collapsed under these conditions (roughly globular) with some preference for α-helices.

There are many other analyses that can be performed within CAMPARI. The section of the documentation on output files provides - at the top - a list of groups of analyses that are available within CAMPARI including links to the respective entries. By following these chains of links, you will usually be able to collect all the information (including needed keywords) required to perform, understand, and control these analyses. Whether these files are produced or not, will at the most fundamental level depend on their applicability to the system and on the choices for the various calculation interval flags (such as FMCSC_POLCALC discussed above).

This is the end of this tutorial. You should now have a good idea of what setting up a Monte Carlo simulation in CAMPARI entails (or at least where to find the necessary information to be able to do so). Other tutorials will expand on this knowledge and/or introduce you to new functionality within CAMPARI that was not touched upon at all this far.

Step 5 - Create other key-files using the automatic key-file builder and rerun CAMPARI

To finish off this tutorial, which is primarily an exercise in familiarizing yourself with the structure of CAMPARI key-files and some of the fundamental keywords, we explore a different route of creating a key-file with the help of an interactive tool to create key-file templates. This tool, while incomplete and under continuous development, aims to simplify the creation of CAMPARI key-files by guiding the user through a series of questions. This has the advantage that it accounts for some conditional linkages between different keywords, that it prints groups of associated keywords together, that it adds explanatory comments to the key-file itself, and that it sets reasonable default values for options that are unlikely to require specific customization.
To run it, simply execute in the terminal:

ruby <full path to CAMPARI>/tools/keyfile_builder.rb
Answer the first question with "tutorial1_mc.key", then go through the questions. Your goal in this first task is to create a key-file similar to the one used above. The tool proceeds through a series of (partially branched) questions, and you should guide it toward a single simulation task (no MPI parallelism) using a simple MC move set focused on sampling polypeptides. The Hamiltonian should be the full ABSINTH model using topology-assisted cutoffs. You want to write a trajectory in dcd-format and collect information on simple polymer properties on the fly. Once the main questions are finished, the tool will proceed backwards through the options and ask for missing specifications (meaning those for which no easy default can be set, mostly because they are overly system-specific or require user information, e.g., names of files). At this stage you can either refuse to enter this information as part of the tool and edit the key-file template afterwards or you could fill in the missing specs directly in the workflow of the tool. Once everything is defined, you should run:

campari -k tutorial1_mc.key >& log_mc &
You can abort this run after a few seconds (but watch that the needed output, see below, is actually produced and does not vanish due to I/O buffering). Now find a way to produce a side-by-side view of the differences in "log" and "log_mc". Focus on the initial part of this output where warnings and errors encountered during the setup stages are recorded and the summary of the attempted calculation is printed. This is more useful and simpler than comparing the key-files because it also reports on settings left at default values or defined indirectly. Are there substantial differences? You will certainly find differences in move set parameters. Monte Carlo move sets are notorious for having a large number of parameters to define that all lack both stringent guidelines and clearly optimization strategies, so these differences are normal. What is more important is whether the respective move sets enable the same sets of degrees of freedom to be sampled. If not, try to understand where the difference comes from. Secondly, you will most likely encounter a difference in how short-range interactions are truncated and in the system temperature. These are the most significant deviations, and they arise because the key-file builder assumes a safe default for temperature and because the template key-file part of this tutorial sets a more general option for FMCSC_MCCUTMODE.
Next, run the key-file generator again using "tutorial1_tmd.key" as answer for the first question. This time, try to produce a key-file that replaces the Monte Carlo sampler with an internal coordinate space molecular dynamics sampler, uses threads, reads a PDB input file (take for example the final structure of the completed run above), and enables custom constraints. Repeat the same procedure as before, so:

campari_threads -k tutorial1_tmd.key >& log_tmd &
It might happen that this simulation does not start because the threshold checker detects the presence of large forces, which could lead to the dynamics simulation becoming unstable immediately. This is a precautionary mechanism that depends on the threshold set by RANDOMTHRESH and can be overridden using keyword FMCSC_UNSAFE. Now produce a side-by-side view of the differences in "log" and "log_tmd". It should become apparent that, naturally, the whole report on move set choices is gone, that the information on the base sampler is completely different, and that there are new blocks of information pertaining to the newly requested input files. This tutorial ends with a reminder that the interactive tool to create key-file templates is not complete yet and might sometimes produce undesired choices for specific keywords. Thus the recommended workflow is to produce a key-file, run the simulation once, inspect the summary portion, and refine the key-file accordingly. This recommendation holds independently of how the key-file is produced initially.

Tutorial 1: First CAMPARI Simulation - Torsional Space Monte Carlo Sampling of a QN-Peptide ("Polyglutamine") in Implicit Solvent