CAMPARI Tutorial 15

Tutorial 15: What You Did Not Know You Could Do with CAMPARI - Part II: Molecular Tangram

Preface:

This tutorial covers some of the less common simulation and analysis features of CAMPARI, notably user-based constraints, hybrid Monte Carlo/molecular dynamics samplers, and spatial densities. It is a pure toy application with no real practical value or relevance. However, working with simpler species makes it easier to illustrate how to use these methods. We rely on unsupported residues (see tutorials 12 and 10), here molecules that are made to look like the 7 pieces of a tangram game. We will try to use simulation methodologies to both generate and solve tangram puzzles.

Step-by-Step:

Step 1 - Simulating tangram pieces

In the folder for this tutorial, you can find a PDB file containing artificial "molecules" corresponding to the 5 unique tangram pieces. They are not in any meaningful arrangement yet except that they lie in different xy-planes exactly. The molecules are composed of a single atom type emulating carbon (using simple numbering to distinguish different atoms in the same molecule), and they have the names "BIG", "NOR", SMA", "PAR", and "SQU". These names are arbitrary except that they should not clash with existing CAMPARI-supported residues, which can be found in sequence file documentation, or with automatically translated names, which are the ones used in the PDB or in other software packages (like DPR, which is D-proline in the PDB, or HSP, which is protonated histidine in CHARMM). The first three are the 3 types of triangles, "PAR" is the parallelogram, and "SQU" is the square. Open the file in your favorite molecular viewer. It should be clear from the beginning that, due to the fact that we represent geometric shapes as collections of atoms, we have granularity issues. This means that edges are not arbitrarily smooth, corners are not arbitrarily acute, etc. Of course, the representation alone is not that important, what will matter below is the interactions that these tangram pieces experience. We also have to use small offsets in one dimension for atoms since the Z-matrices as implemented in CAMPARI are not always able to deal with multiple strictly colinear/coplanar atoms in a row.
Now let us create a basic key file named "tutorial15.key":

PARAMETERS <full path to folder>/campari/params/oplsaal.prm FMCSC_SEQFILE tangram.in FMCSC_PDB_TEMPLATE <full path to folder>/campari/examples/tutorial15/tangram.pdb FMCSC_UNSAFE 1 FMCSC_SC_IPP 1.0 # this is the default FMCSC_SHAPE 1 FMCSC_BOUNDARY 4 # these two options set up an aperiodic cuboid FMCSC_SIZE 40. 40. 40. FMCSC_ORIGIN 0. 0. 0. FMCSC_RANDOMIZE 1 FMCSC_DYNAMICS 1 FMCSC_NRSTEPS 10000 FMCSC_EQUIL 1000 FMCSC_RIGIDFREQ 1.0 FMCSC_CUTOFFMODE 4 FMCSC_DISABLE_ANALYSIS 1
You should create the file tangram.in as a simple text file containing the pieces of the standard tangram game. Note that the order matters because we have all these unsupported entities in the same PDB template, which is processed sequentially:

SQU_N_C SMA_N_C SMA_N_C NOR_N_C BIG_N_C BIG_N_C PAR_N_C END
The other keywords set up the box, here a rectangular cuboid without periodic boundary conditions, define where CAMPARI is expected to find the basic information about the unsupported molecule types, instruct CAMPARI to create a random initial conformation obeying excluded volume, and define a short Monte Carlo run using only simple rigid-body moves. To verify that the basic setup is functional, execute the serial version of CAMPARI (version 4 required!) as follows (here and below, as in all other tutorials, you may have to add paths for your executables):

campari -k tutorial15.key > t15_test.log vmd yourcalc_END.pdb
With VMD (or an alternative viewer), you should see that the pieces are there and arranged randomly in 3D space. Thus, CAMPARI can simulate these molecules, at least in Monte Carlo.

Step 2 - Turning to two dimensions

Of course tangram is only a two-dimensional game. To generate an approximately two-dimensional starting structure, we could manually translate/rotate (here, only translate) the structural information in the template. This is doable but might be rather tedious in the general case. An alternative general solution would be to try to construct a random initial conformation that obeys position restraints. Absolute position restraints enabled and defined with the help of keywords FMCSC_SC_DREST and FMCSC_DRESTFILE are one of the few bias potentials observed during randomization. However, this is also somewhat tedious and will require a significant amount of refinement (try it out, though!). Due to the peculiar nature of our problem, we can resort to a simpler alternative: we will simply make the system enforce a 2D geometry. Change the following entry in the key-file:

FMCSC_SIZE 80. 80. 0.01
Now our system is a cuboid slice with a thickness of 0.01Å in the z-dimension. If the boundary terms are strong enough in relation to temperature, our system should eventually populate conformations where all pieces lie flat in that slice. This is because we use an atom-based softwall boundary condition meaning that all atoms experience a restoring force when moving out of the slice.
Run again:

campari -k tutorial15.key > t15_planarize.log mv yourcalc_END.pdb planar_fromMC.pdb vmd planar_fromMC.pdb
Once you inspect your newly generated file, you should find that all pieces are approximately in a plane now. If not, just run again until you have a satisfactory solution. You should notice a particularity in t15_planarize.log, however. In the final part, where it lists the acceptance statistics, you will find that the acceptance rate dropped to almost marginal numbers. This is because, in relation to our three-dimensional move set, the energy landscape is now highly frustrated. Pretty much any rigid-body move will cause an out-of-plane distortion. From looking at the structure, you will find that the planarity is approximate, as mentioned above. To rectify/improve on this, we can do a short minimization. Tutorial 8 gives an overview of minimization in internal coordinate spaces, which is what we are going to do here. One thing we might want to make sure is that CAMPARI has not detected any internal flexibility in our tangram pieces, they should remain rigid throughout. So edit/append the key-file as follows:

FMCSC_DYNAMICS 6 # edit existing keyword FMCSC_MINI_GRMS 0.0001 FMCSC_MINI_MODE 3 # BFGS quasi-Newton FMCSC_CHECKFREQ 100 # to get a bit more info on progression FMCSC_PDBFILE planar_fromMC.pdb FMCSC_TMDREPORT 1
The keyword FMCSC_TMDREPORT is a logical flag that allows you to request a list of degrees of freedom in internal coordinate space along with some statistics.

campari -k tutorial15.key > t15_minimize_planar.log mv yourcalc_END.pdb planar_fromMINI.pdb vmd planar_fromMINI.pdb
From "t15_minimize_planar.log", you should find that the final energy is about 0.5kcal/mol. You can also extract from the report printed before the summary that there are only rigid-body coordinates, a set of 6 currently unconstrained coordinates for each of the 7 molecules. Now, to keep this system in two dimensions, we should constrain our propagator(s) such that the z-axis is effectively removed. To do so, we can resort to user-defined constraints. In the atom-based input mode, we need to provide the indices of the atoms that are the respective first ones in their parent molecules. For each of these, we should put constraint requests for "Z" (remove z-axis from translation operations) and for "AB" (remove x- and y-axes as putative rotation axes). An automatic way to do so is the following (but you can just as well write this file manually):

awk 'BEGIN {print "A"}; {if ($3 == "C1") {printf("%d %s\n%d %s\n",$2,"Z",$2,"AB");}}' planar_fromMINI.pdb > tangram_2D.frz
From now, it might also matter more what atom type is inferred for the "atoms" in the pieces. This is because the atom type determines mass and Lennard-Jones size and interaction parameters. To avoid errors or confusion in this regard, we should ideally use a biotype patch to force them all to be identical. In our particular scenario this has the disadvantage, however, that it would change the PDB atom names. Thus, we stick to the slightly more tedious alternative of defining types and masses manually:

awk '{if ($1 == "HETATM") {printf("%d 3\n",$2);}}' planar_fromMINI.pdb > tutorial15.ljpi awk '{if ($1 == "HETATM") {printf("%d 12.0\n",$2);}}' planar_fromMINI.pdb > tutorial15.mpi
Atom type #3 is the sp² carbon atom type in the selected parameter file. We should test that the custom constraints work as expected, so modify/add keywords as follows:

FMCSC_DYNAMICS 1 # edit existing keyword FMCSC_PDBFILE planar_fromMINI.pdb # edit existing keyword FMCSC_SIZE 80. 80. 10.0 # edit existing keyword FMCSC_ORIGIN 0. 0. -5.0 # edit existing keyword FMCSC_MPATCHFILE tutorial15.mpi FMCSC_LJPATCHFILE tutorial15.ljpi FMCSC_SOFTWALL 0.1 FMCSC_FRZFILE tangram_2D.frz FMCSC_FRZREPORT 1 FMCSC_XYZOUT 200 # must occur below FMCSC_DISABLE_ANALYSIS in the key-file FMCSC_XYZPDB 3 # dcd
We change back to a Monte Carlo sampler, turn on the constraints, and increase the box in the z-dimension. This is not necessary but should have no effect if our constraints work as expected. We soften the walls of the container because the default is somewhat too stiff for gradient-based samplers (which we will use further below). We also turn on trajectory output so that we can inspect what the sampler is doing in slightly more detail.

campari -k tutorial15.key > t15_test_planar.log vmd -e yourcalc_VIS.vmd
In the trajectory animation, you should see first and foremost that the constraints are working as expected. It is also apparent that the xy-size of the box is sufficient for the Monte Carlo sampler to create large displacements. With excluded volume in place, any simple Monte Carlo move set will be sensitive to density: the lower the better for its efficacy. This concept is explained and or referred to in many places throughout the documentation. Of course, the 2D constraint in conjunction with the size of the pieces and the aperiodic boundary creates an additional problematic scenario: if the box is too small, it might actually become impossible for pieces to rearrange freely. If you take the rectangular packaging of a normal tangram game as reference, it is not possible to relocate pieces without taking them out (meaning in the language of this tutorial: without sampling the z-dimension).

Step 3 - Creating a puzzle

In order to create a pattern, we need to make the tangram pieces stick to each other. The stickier they are (in relation to simulation temperature), the more compact the final pattern should be, but the harder it will be to find such a compact pattern. In other words, the energy landscape will become more rugged and offer many different local minima that the sampler has trouble escaping from. The standard atom-atom interaction potentials are all radially symmetric, so the global energy minimum, given the constraints, should be a roughly circular, dense arrangement of the pieces. This is not necessarily possible given the shape of the pieces and their initial placement. In order to come up with a puzzle, we thus improve our sampler. Specifically, we are extending our MC move set by rigid-body moves acting on multiple molecules at a time, and we change to a hybrid sampler that alternates Monte Carlo and molecular dynamics segments. Generally speaking, we want to ensure that both the degrees of freedom being sampled and the Hamiltonian are exactly identical in both segments, which is true here with the caveat that a bath-coupled molecular dynamics trajectory always suffers from discretization errors (meaning that it never samples exactly the desired ensemble). Edit/append keywords as follows:

FMCSC_DYNAMICS 5 # edit existing keyword FMCSC_NRSTEPS 100000 # edit existing keyword FMCSC_CHECKFREQ 1000 # edit existing keyword FMCSC_CLURBFREQ 0.5 FMCSC_CLURBROTFREQ 0.6 # spinning about a global pivot FMCSC_CLURBSPINFREQ 0.6 # spinning separately around their axes FMCSC_CLURBMAX 3 # limits the number of molecules picked as a "cluster" FMCSC_SC_ATTLJ 2.5 FMCSC_HSSCALE 0.5 # globally reduce sigma parameters FMCSC_TMD_INTEGRATOR 1 FMCSC_TMD_INT2UP 4 FMCSC_TIMESTEP 0.02 FMCSC_TSTAT 2 FMCSC_CYCLE_MC_FIRST 1000 FMCSC_MCCUTMODE 2 # required for consistency between MC and MD Hamiltonians
The rigid body moves available via FMCSC_CLURBFREQ randomly pick more than one molecule and carry out concerted rotations/translations of different types on these molecules. As glue between molecules, we simply use the 6^th power dispersive attraction (FMCSC_SC_ATTLJ) but we decrease all the Lennard-Jones size parameters globally to 50% of their original values. This is so that the intermolecular contact distance can be approximately the same as the bond distance used within molecules. The internal coordinate-space integrator has some technical settings and requires both a time step (in ps) and a thermostat (to match the Monte Carlo ensemble), here the Andersen one. In a hybrid scheme as implemented in CAMPARI, dynamics and Monte Carlo segments alternate, and their minimum and maximum lengths are controllable via keywords FMCSC_CYCLE_MC_MIN and its cousins. The first segment is always a Monte Carlo one, and its length can be set separately. The goal of the calculation is to come up with an arrangement in which the pieces assemble to form a more or less homogeneous area covered by the pieces: puzzles in which one or more individual pieces are easily identifiable by their outline tend to be solved much more readily:

rm yourcalc_traj.dcd campari -k tutorial15.key > t15_create_puzzle.log vmd -e yourcalc_VIS.vmd
You might want to repeat this calculation a few times based on the final structure you obtain. Once you have something that looks like it would make a decent puzzle, you should copy that final PDB into a new file:

mv yourcalc_END.pdb tangram_puzzle.pdb
Now, depending on what exactly it looks like, we can minimize again (but it might be possible to skip this step):

FMCSC_DYNAMICS 6 # edit existing keyword FMCSC_NRSTEPS 10000 # edit existing keyword FMCSC_PDBFILE tangram_puzzle.pdb # edit existing keyword
Followed by another:

campari -k tutorial15.key > t15_minimize_puzzle.log mv yourcalc_END.pdb tangram_puzzle.pdb
In order to obtain the puzzle in a manner that abstracts the identity of the pieces, we turn to spatial density analysis. Our goal is to transform an "atomistic" structure into a three-dimensional map of density information. For this, we need to define a background (if any), choose a property for which to calculate the density, and define the parameters of the lattice on which to calculate this density. The resolution is important of course since it determines how "recognizable" the puzzle ends up being. Create a new key-file from scratch as follows:

PARAMETERS <full path to folder>/campari/params/oplsaal.prm FMCSC_SEQFILE tangram.in FMCSC_PDB_TEMPLATE <full path to folder>/campari/examples/tutorial15/tangram.pdb FMCSC_MPATCHFILE tutorial15.mpi FMCSC_LJPATCHFILE tutorial15.ljpi FMCSC_UNSAFE 1 FMCSC_SHAPE 1 FMCSC_BOUNDARY 4 # these two options set up an aperiodic cuboid FMCSC_SIZE 80. 80. 8.0 FMCSC_ORIGIN 0. 0. -4.0 FMCSC_PDBANALYZE 1 FMCSC_PDB_FORMAT 1 # expect a single pdb file (possibly with many models) FMCSC_PDBFILE tangram_puzzle.pdb FMCSC_NRSTEPS 1 FMCSC_DISABLE_ANALYSIS 1
These are just copied settings modified by turning on PDB analysis mode and telling CAMPARI to expect a single snapshot in PDB format to analyze. The only relevant exception is that we reduce again the system size in the z-dimension in order to be able to use a smaller lattice. The specific request to perform spatial density analysis is added like this:

FMCSC_EMCALC 1 # only a single snapshot to analyze anyway FMCSC_EMPROPERTY 1 # mass, the default FMCSC_EMDELTAS 1.5 1.5 1.5 # in x, y, z order and in Angstrom FMCSC_EMBUFFER 1.1 # extension shouldn't be necessary here but will be below FMCSC_EMBSPLINE 2 # linear interpolation FMCSC_EMBGDENSITY 0.0 # no assumed background
The spatial density analysis uses atom-centered B-splines to distribute the mass of an atom onto a regular lattice. The parameters of the lattice are set by the system size and by keyword FMCSC_EMDELTAS. The combination of the order of the B-splines and the lattice settings determines how distributed the density of an atom is. In our case, we want it to stay relatively close to its actual extent. In scientific applications this can be a way to mimic uncertainty and improve smoothness. The background is set to 0.0, which is both intuitive and necessary: without it, the density calculation is accounting for the fact that solutes displace solvent, which brings effective atomic volumes into play. This would be problematic because these volumes are corrected for interatomic overlap for atoms in the same molecule but not for atoms across molecules. The equation is given in the documentation for keyword FMCSC_EMCALC. Keyword FMCSC_EMBUFFER is important in aperiodic containers because atoms might stick out of the lattice. If you get an error in any of the subsequent CAMPARI calculations about this, you might have to increase this value further. Now put all these keywords into a file called tutorial15_analysis.key and execute:

campari -k tutorial15_analysis.key > t15_create_puzzle_map.log mv DENSITY.nc tutorial15_puzzle.nc chimera tutorial15_puzzle.nc # then use the "volume viewer" functionality
You should find a way to visualize this density, we recommend UCSF Chimera for this purpose. The density map is stored in NetCDF format. Inasmuch that is possible, the density should be homogeneous across the intermolecular interfaces. If it is not, we will make the puzzle to easy too solve in the subsequent step. In essence, it would be quite comparable to the appearance of faint outlines of individual pieces in a printed tangram puzzle.

Step 4 - Solving the puzzle

In order to solve the puzzle, we will take the abstract image, viz. the spatial density map and use it as a restraint potential in a simulation involving the 7 standard tangram pieces. For this, we can just edit tutorial15.key again. The logic of spatial density restraints is to transform the map to an interpretable physical density and to construct a potential on the difference between reference and estimated density per lattice cell. With a sufficiently high order of B-splines, the potential can be made to be smooth and have a continuous differential. Because this technique, which is explained in detail in the reference publication, predicts physical densities from instantaneous structures, it is not really necessary to also use excluded volume interactions. Thus, we make the following changes to existing keywords in tutorial15.key:

FMCSC_DYNAMICS 1 # edit existing keyword FMCSC_NRSTEPS 100000 # edit existing keyword FMCSC_CHECKFREQ 10000 # edit existing keyword FMCSC_PDBFILE planar_fromMINI.pdb # edit existing keyword FMCSC_SC_IPP 0.0 # edit existing keyword FMCSC_SC_ATTLJ 0.0 # edit existing keyword FMCSC_SIZE 80. 80. 8.0 # edit existing keyword FMCSC_ORIGIN 0. 0. -4.0 # edit existing keyword
Setting up the spatial density restraint potential can be a bit tedious, especially if the density map is from an external source. The basic information we need is how to interpret the density information in the map quantitatively, how to obtain the dimensions of both analysis and reference lattices, and how to compute the instantaneous densities from simulation structures. The last part is basically the same as in our analysis run in the last step, so you should add:

FMCSC_EMPROPERTY 1 # mass, the default FMCSC_EMBUFFER 1.1 # extension relevant here for x/y FMCSC_EMBSPLINE 2 # linear interpolation FMCSC_EMBGDENSITY 0.0 # no assumed background
The settings for the input map are more tricky and generally require some knowledge of the data in the map itself:

FMCSC_SC_EMICRO 0.005 # nonzero scale turns on potential FMCSC_EMMAPFILE tutorial15_puzzle.nc FMCSC_EMTHRESHOLD 0.1 # a value close to background FMCSC_EMBACKGROUND 0.0 # we know that this was the chosen background, so circumvent automatic detection FMCSC_EMHEURISTIC 3 # a technical setting only relevant for efficiency FMCSC_EMCALC 10 # collect spatial density (for free)
Without any further choices, CAMPARI will set up an analysis lattice that matches the reference lattice exactly. This is the preferred mode of operation but sometimes the resolution of the input map might be too fine, and this could be corrected/addressed using keyword FMCSC_EMREDUCE. Note that the scale factor for the spatial density restraint potential is fairly small, which is due to the extensive nature of the potential across grid cells.
The parameters that are most difficult to set are the threshold and the scale factor. The energy landscape created by spatial density restraints is astutely sensitive to these two values, and this is the reason why they are used as exchange parameters in a Hamiltonian replica exchange approach in the reference publication. Fortunately, the tangram game is simple enough that a simple serial calculation will generally be reasonable:

rm yourcalc_traj.dcd campari -k tutorial15.key > t15_solve_puzzle.log vmd -e yourcalc_VIS.vmd chimera DENSITY.nc tutorial15_puzzle.nc yourcalc_END.pdb tangram_puzzle.pdb
If the simulation is misbehaving, a likely reason is an insufficient buffer value, in particular if you play around with other parameters, such as the stiffness of the boundary. Note that it would be highly problematic to run this calculation without constraints that keep everything in the xy-plane because the box and thus the spatial density grids are much too small in the z-dimension relative to the sizes of the molecules. If everything runs smoothly, the trajectory visualization should allow a few important observations:

The target puzzle is most likely solved approximately but likely not perfectly. The errors might actually be rather similar to the errors in manually attempting to solve a (good) tangram puzzle, e.g., a small corner sticking out in the wrong place.
Pieces can overlap but generally the spatial density restraints alone are enough to ensure that the pattern is filled homogeneously.
The pieces can rearrange somewhat although, with the chosen settings, the big pieces have trouble rearranging substantially once they are in a reasonable position.

The comparison of densities should show you why it is difficult to solve the puzzle in the way it is meant in tangram, i.e., to have a unique solution for a geometrically exact match. The average simulation density will almost certainly look like the reference one for the puzzle but at a somewhat coarser resolution. Sampling and thermal fluctuations create ambiguity, and our restraints are not sufficiently discriminative between sampling around almost correct or exactly correct solutions. This is a caveat introduced by the reliance on sampling Boltzmann distributions at finite temperature because there is in fact a substantial energy gap between the reference solution in "tangram_puzzle.pdb" and a minimized version of an approximate solution. The visualization of the final simulation structure versus the puzzle might reveal that not a single piece is in its correct place. Repeat the above simulation a few times to understand how diverse the solutions can be. In fitting different three-dimensional protein structures (the pieces) into a map determined by low-resolution electron microscopy (the puzzle), the problem can be somewhat similar to the one here only that the ambiguity might be built into the map (due to limitations to resolution, particle heterogeneity, etc).
Finally, there are some things to try and questions to think about:

Change the masses of all the atoms in one of the big triangles to 2.0 (via "tutorial15.mpi"). What happens and why?
Experiment with stochastic annealing (available as option 4 for FMCSC_MINI_MODE, see the final part of Tutorial 8 for more details)/ For the calculation here, you only need to worry about keywords FMCSC_NRSTEPS (maximum run length), FMCSC_TEMP (standard, hot temperature), FMCSC_MINI_SC_TBATH (annealing, cold temperature), and FMCSC_MINI_SC_HEAT (the fraction of steps to spend searching before starting the annealing). Record the lowest energy you find from a number of such annealing runs and compare it to the energy you get as initial energy when using "tangram_puzzle.pdb" for FMCSC_PDBFILE.
Use keywords FMCSC_EMTHRESHOLD, FMCSC_EMFLATTEN, and FMCSC_EMTRUNCATE to change the interpretation of the input map such that it is nearly completely flat (meaning that there are basically two values: one background (empty) and one occupied). You can verify this by opening DENSITY_INPUT_PHYS.nc in UCSF Chimera or by looking at the contents of the file directly using the "ncdump" tool that is part of NetCDF. How does this affect the quality of solutions and why?
Are you able to get to the correct solution iteratively? To do so, identify a correctly placed piece in a given (short) run, rename the final PDB structure, and edit "tangram_2D.frz" to completely constrain this molecule. Then start the next run using the stored PDB structure as starting point. You might have to progressively decrease slightly the value of FMCSC_SC_EMICRO for the still movable pieces to be able to rearrange.
Is the puzzle (also) solvable using a different set of pieces that covers (approximately) the same area?

In summary, this tutorial has exposed you to working with unsupported residues, setting up advanced constraints, hybrid Monte Carlo/molecular dynamics samplers, and, most prominently, to spatial densities, both in analysis and as restraint potentials. Many observations in this tutorial, even though it is carried out on a completely artificial system, translate to problems faced in structural biology techniques.