Tutorial 9: Using Umbrella Sampling to Assess the Ability of Polyglutamine (Q30) to Access Conformations with High β-Content

Estimated simulation run time: 3 days

Requirements:

  1. 11 cpus for the MPI job
  2. User must run or adopt Matlab code to post-process simulation results

Preface:

Polyglutamine (polyQ) expansions have been implicated in several neurodegenerative diseases. These expansions are long, continuous repeats of glutamine (Q) residues present in several different disease related proteins. In healthy individuals, several proteins have polyQ expansions, but the expansion is not long enough to cause disease. For example, the wild-type protein "huntingtin" implicated in Huntington's disease (HD) contains a small polyQ repeat which does not cause disease. If this expansion becomes longer than a threshold of around 35 glutamine residues, then Huntington's disease (HD) develops within the human life span. The longer the polyQ expansion, the earlier affected individuals develop symptoms and the more severe the disease.

Individuals with polyQ expansion diseases contain polyQ-rich protein aggregates in the brain. These aggregates are thought to cause neuronal cell death. These Q-rich aggregates are rich in β-sheet. However, simulation and experimental evidence suggests individual polyQ peptides (monomers) are disordered, meaning that they do not adopt any repeating structure (implying that they do not contain any discernible amount of β-sheet). In earlier tutorials, you have simulated polyQ monomers. A good exploratory exercise would be to go back to these simulations and query the amount of β-sheet you observed (aside from trajectory data, potential output files helpful with this are DSSP.dat or ZSEC_HIST.dat). It should be very low.

If the β-sheet content of polyQ monomers is low, then why are large polyQ-rich aggregates rich in β-sheet? Simplistically, one can hypothesize that the high β-sheet content arises from two limiting cases, viz., a) monomers spontaneously forming β-sheet rich structures, then aggregating, or b) polyQ forming disordered aggregates and the intermolecular interactions and flanking protein sequences cause a conversion from disordered to β-sheet rich aggregates. The focus of this tutorial is to evaluate the free energy penalty for monomeric polyQ to form structures with high β-content. Note that this does not allow us to pick which hypothesis is correct, but is the first step in collecting evidence to build a case for which we appears more plausible.

To understand the context of these two hypotheses and the bigger picture of this research we direct you to work from the Wetzel group (evidence for hypothesis a) and to recent computational and experimental work that provides support for hypothesis b. There is additional evidence that flanking sequences play an important role in altering the mechanism of polyQ aggregation. The main body of work from which this tutorial is based is located in the following publication: Thermodynamics of β-Sheet Formation in Polyglutamine. Given that the focus of this tutorial is to learn how to use CAMPARI, most of the interesting and salient points of the research on which this tutorial is based are either glossed over or not covered at all.

Use of umbrella sampling and structural restraints:

First, some clarifications:

Up to this point, we have been referring to β-sheet content, which can be, for example, defined by the fraction of residues that make a specific pattern of intra-backbone hydrogen bonds. A precise definition is offered by Kabsch and Sander's DSSP algorithm and corresponds to the fraction of residues assigned to the "E", or "extended" structural classifier. β-Content, which is discussed below, is a superset (more generic) of β-sheet content and is the fraction of residues whose backbone φ/ψ-angles are contained in the β-basin of the Ramachandran map. The exact definition is given in the documentation on FMCSC_SC_ZSEC.

Existing experimental evidence suggests that polyQ forms disordered, collapsed structures with almost no secondary structure content. The goal of this tutorial is to ascertain the free energy penalty associated with a polyQ monomer adopting high β-content conformations. Thus, we need to know the relative probabilities of sampling high β-content conformations vis-à-vis those of visiting disordered conformations. These structural transitions are exceedingly rare and need to be well sampled to obtain accurate statistics. Using brute-force, equilibrium sampling approaches to study these rare events is highly inefficient and generally infeasible. To solve this problem, we use an artificial potential which biases sampling toward these rare conformational states, which enables us to obtain reliable statistics on these states and reconstruct the potential of mean force (PMF) along a reaction coordinate measuring β-content.

The harmonic structural restraint potential:
We use a harmonic structural restraint potential U to bias sampling of conformations with high β content. Our total potential Utot then becomes:

Utot = U + Wabs

where Wabs is the unbiased ABSINTH potential energy function. The documentation for keyword FMCSC_SC_ZSEC provides the definition for U.

fβ (the β-content) is a progress variable or reaction coordinate that corresponds to the fraction of residues that sample backbone dihedral angles from the β-basin in φ,ψ-space. fβ ranges from 0 to 1. Proteins which are almost entirely β-sheet have fβ values between 0.7 and 0.8. For a more detailed description of fβ, we refer you to the following work: Thermodynamics of β-Sheet Formation in Polyglutamine. There exists a more recent application using the fβ biasing potential to describe the monomeric Alzheimer's amyloid-β peptide. U is used to restrain a simulation to a particular target value of fβ, referred to as fβ0. Thus, given a large enough force constant, the fβ distribution of a restrained simulation should be approximately peaked at its value of fβ0.

To ensure adequate sampling in fβ space, we will run multiple simulations each at a different target fβ0 value. The set of fβ0 values we will use are: {0.0, 0.1, 0.25, 0.3, 0.4, 0.5, 0.6, 0.75, 0.8, 0.9, 1.0}. Replica exchange is used to further enhance sampling. If you have completed the tutorial on the FS-peptide, you already have some experience with using replica exchange. However, with this exercise, we exchange coordinates across simulations with different fβ0 values as opposed to different temperatures. For this tutorial, all of the data will be collected at a constant temperature of 298K.

We will post-process the simulation results using the WHAM method to obtain an unbiased probability distribution in fβ space. This allows us to compute the PMF as a function of fβ and answers the original question of what is the free energy penalty associated with accessing conformations with high β-content (high values of fβ).

Step-by-Step:

Step 1 - Build the appropriate key-file
Download the following key-file: fb.key and save it as fb.key in an empty folder for this tutorial. Open the key-file and let us look at the sections most relevant for this simulation:

FMCSC_SHAPE 2
FMCSC_SIZE 150
FMCSC_BOUNDARY 4

We will use a spherical droplet (FMCSC_SHAPE is 2) with a radius of 150Å and the setting for FMCSC_BOUNDARY of 4, which corresponds to the atom-based soft-wall boundary conditions (a good choice for most systems using continuum models).

FMCSC_NRSTEPS 41000000
FMCSC_EQUIL 1000000

This simulation will be run a total of 4.1x107 Monte Carlo (MC) steps. The first 106 of these steps will be considered equilibration, and will not be included in analysis output files and other system averages.

FMCSC_SC_ZSEC 1.0
FMCSC_ZS_FR_KB 150

FMCSC_SC_ZSEC is the scaling factor for the U harmonic restraint potential. Setting this to zero disables it. FMCSC_ZS_FR_KB is the kβ value (force constant = restraint strength) in the restraint potential. This potential is applied over all residues simultaneously, meaning that the average restraint strength per residue is FMCSC_ZS_FR_KB·N-1 where N is the number of residues. When comparing systems with different chain lengths, one should probably have an approximately constant value for FMCSC_ZS_FR_KB·N-1, thus FMCSC_ZS_FR_KB should be adjusted accordingly.

FMCSC_SEGCALC 100

The fβ histogram is updated, i.e., data are accumulated, every 100 MC steps. It is beneficial for this to happen at very high frequency since the calculation is not cost-intensive, and since we are interested in obtaining the best statistics possible. These distributions will serve as raw data in the WHAM post-processing step.

FMCSC_MPIAVG 0
FMCSC_REMC 1
FMCSC_REPLICAS 11
FMCSC_REDIM 1
FMCSC_RENBMODE 2
FMCSC_RESWAPS 10
FMCSC_REFREQ 10000

The FMCSC_REMC keyword enables replica exchange. By setting FMCSC_RENBMODE to 2, only neighbor swaps are attempted. This is common protocol for replica exchange simulations. Allowing non-neighbor swaps may appear to further boost sampling, but can also introduce bias which makes the results difficult to interpret even if the proper ensemble re-weighting techniques are used (compare comments in Tutorial 3 and see documentation on FMCSC_FRAMESFILE). The frequency of exchanges (swap cycles) is controlled by the FMCSC_REFREQ keyword. Lastly, FMCSC_RESWAPS refers to the number of swaps that will be attempted for each swap cycle and is chosen to be equal to the number of possible neighbors (FMCSC_REPLICAS - 1) here.

FMCSC_REFILE remc.in

This file defines the quantity that is to be swapped across replicas which is defined on the first line of this input file. Note that the first line is NOT the total number of replicas (set by FMCSC_REPLICAS). The remaining lines define the target fβ0 values for each replica. This automatically sets the value normally set through keyword FMCSC_ZS_FR_B for each replica. Download remc.in and place it in the same directory as the key-file.

FMCSC_SEQFILE q30.in

Download the sequence file, q30.in, and place it in the same directory as the key-file. This defines the system that CAMPARI will simulate, which here is a polypeptide with 30 glutamine residues. The ends are capped with acetyl (ACE) and N-methylamide (NME) groups, respectively, neither of which carries a net charge. The starting configuration is drawn randomly from an excluded volume-like ensemble trying to avoid steric overlaps. The completely random nature of the protocol used to generate these structures ensures that there is no conformational bias engendered by the starting configuration. The last thing to do is to adjust the explicit paths for PARAMETERS and FMCSC_BBSEGFILE in the key-file.
Step 2 - Launch the simulation
Use of replica exchange calculations requires a CAMPARI executable compiled with MPI. See the installation instructions to set this up correctly. The command you issue typically is:
mpirun -np 11 -prefix {PATH-TO-OPENMPI} campari_mpi -k fb.key 

The (-np) command indicates the number of replicas, and the (-prefix) command should be followed by the path to your installed version of MPI.
Step 3 - Sanity checking the simulation (before you waste too much time)
First of all, make sure that you have simulated what you expect. The fβ0 value is assigned one-to-one to the node number: node 0 (N_000_* output) corresponds to fβ0=0, while node 10 (N_010_* output) corresponds to fβ0=1.0. A quick sanity check is to look at the simulation trajectory and ensure that there is minimal β-content for low values of fβ0, and there is high β-content (polypeptide backbone forms linear segments or there are β-sheets present) for large fβ0. From the directory containing the simulation output, issue the following command (assuming presence of VMD):
vmd -e N_010___VIS.vmd
Note that there are in fact 3 underscores between "010" and "VIS". This is due to the selection of "_" as the simulation's basic name in the downloaded key-file. Note as well that you can do this before the simulation has finished running if you wish to monitor the progress of ongoing simulations.
From within the VMD console, type:
align_it
This does an RMSD alignment of the polypeptide across all frames to ease visualization. If you do this when the simulation is still running, there may be several drawing artifacts (long, nonsensical bonds) that occur because the pdb-file of the initial structure is used as a template. This issue resolves itself once the final pdb-file can be used, i.e., upon completion of the simulation.
Did you see what you expected? If not, there may be issues with your key-file. Open the N_0xx.log files to review what may have gone wrong. Repeat this for N_000___VIS.vmd and a few others for good measure.
Step 4 - Output and Analysis
First, download the Matlab code necessary to do the WHAM post analysis (Matlab is a proprietary software package used in science and engineering):
WHAM method: wham.m
Driver code: fbpmf.m

Place these files in the same directory as your simulation and key-file. Use Matlab (or an alternative interpreter) and run fbpmf.m (make sure you are in the same directory as where the simulation was run). You will obtain a plot of the PMF as a function of fβ. This will allow you to assess the free energy difference between conformations with high and low values of β-content. Be advised that the results you are analyzing come from only one independent replica exchange simulation and may not be sampled enough, i.e., the noise level may be high. Better statistics could of course be obtained by running multiple independent replica exchange simulations and averaging the results.

Discussion

By looking at the PMF along fβ as reaction coordinate, you will notice there is a large free energy barrier to accessing configurations with a high amount of β-content. This suggests that for monomeric Q30 (and probably for similar chain lengths as well), the adoption of structures high in β-content is rare. This provides some limited evidence that the high β-sheet content of polyQ-rich aggregates may be a consequence of intermolecular interactions and not from assembly of well-ordered monomers. For more details, see our publication from 2009 on this topic.
Design downloaded from free website templates.