Endonuclease PvuII (1PVI) DNA - GATTACAGATTACA
CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box Binding Protein (1C9B)
 

° 

Run molecular dynamics simulations

Running molecular dynamics simulations is one of YASARA's main applications. A lot of scientific insight can be gained from the analysis of MD trajectories, as long as one keeps in mind that MD simulations tend to look more realistic than they really are. They are also an essential tool for molecular modeling: during a simulation, you can freely interact with your protein and pull atoms around.

YASARA provides two different kinds of force fields: the NOVA force field which has been optimized for energy minimizations in vacuo, and the (Y)AMBER/YASARA force fields which are intended for realistic molecular dynamics simulations in aqueous solution.

° 

Preparing the topology

Before a simulation can be run, appropriate force field parameters have to be assigned to all the residues in the soup . While this is straightforward for well known ones like the standard 20 amino-acids or common ions, it is far from trivial for the infinitely many other molecules that nobody has explicitly parameterized yet.

Especially PDB files often contain ligands and cofactors at rather low resolution and without hydrogen atoms, and it is very challenging for a computer program to automatically analyze the molecule, assign the right bond orders and subsequently add the missing hydrogen atoms. Further complications arise from the fact that bond orders and protonation states depend on the pH you want to simulate.

A lot of efforts have been spent to make YASARA smart enough to do all this automatically, and in the majority of cases it is enough to follow these simple steps:

  • Click 'Options > Default pH' to set the pH you want to use during your simulation.
  • Click 'File > Load > PDB file' to load your protein and assign bond orders according to the chosen pH.
  • Click 'Edit > Clean > All' to add the missing hydrogen atoms and correct any problems.
  • Look at non-standard residues (e.g. press <F2>, then <F6>) while showing the bond orders to verify that YASARA has made the expected choices.

The problem of assigning bond orders and adding missing hydrogen atoms is an ambiguous one, in many cases multiple correct solutions exist. If YASARA does not make the intended assignments, there are several options:

  • Provide a hint for the favored solution by explicitly adding hydrogen atoms or setting bond orders to critical atoms.
  • Extend YASARA's chemical knowledge using the simple SMILES-based assignment of pH-dependent bond orders and protonation states in the GROUP_DATA section of the file yasara.def (present in YASARA Dynamics+). How that works is explained in detail there, and please forward your modifications to us, so that they can be included in the next update.
  • Add a complete topology entry for your residue to the TOPOLOGY_DATA section in the file yasara.def. This option is used for all the standard residues but should be avoided for the others, because it introduces a dependency on residue and atom names - and these are essentially random choices left to the crystallographers for all but the standard residues.

° 

Preparing the force field

You can only run a simulation if the force field 'knows' how to treat each residue in the soup. For AMBER-style force fields (including YAMBER), YASARA provides a fully automatic approach to parameter assignment called AutoSMILES, while the NOVA force field may require a bit of manual intervention.

The (Y)AMBER/YASARA force fields are aimed at realistic molecular dynamics simulations in aqueous solution, while NOVA has been developed for quick simulated annealing minimizations in vacuo.

To check if your structure requires special attention, click on Simulation > Force field and select the force field you want to use, then click Edit > Clean > All. YASARA now tries to prepare your structure for simulation, by adding missing atoms (hydrogens, side-chains, terminal oxygens) and deleting those that are not needed (atoms with alternate locations). If you get an error message, look at the indicated residue and try to fix the problem. Here are some hints:

  • If YASARA complains about an incomplete backbone and the indicated residue is a terminal one, just delete it.

  • If your protein contains bound metal ions, delete all bonds between the ions and the protein. Molecular dynamics force fields like (Y)Amber treat the interaction with metal ions purely electrostatically.

Click Simulation > Define simulation cell > Set automatically > OK to add a simulation cell. For molecular modeling purposes, you can also define a small cell that contains only a fraction of your protein, e.g. the active site. Then click Simulation > Simulator > Initialize to assign the force field parameters. At this point, you will get an error message if further attention is needed.

When changing a force field manually, keep in mind that installing a YASARA update may overwrite the force field files with newer versions, so make a backup of your changes. You can also create your own force field by saving the definition file under a different name and adding an entry for the new force field at the end of the file yasara/yasara.def.

° 

Deriving new (Y)AMBER force field parameters

Starting with version 6, YASARA can derive (Y)AMBER force field parameters for unknown molecules fully automatically, allowing to simulate 98% of the structures in the PDB at the touch of a button. Manual intervention is only needed in case of exotic metal ions. The approach behind the AutoSMILES algorithm can be summarized as follows:

  • Assignment of pH dependent fractional bond orders and protonation patterns, typing of ring systems by a graph-theoretic approach.
  • Identification of known molecules (from the force field definition file or the AMBER Parameter Database using SMILES strings. If no hit is found, proceed to step 3).
  • Calculation of semi-empirical AM1 Mulliken point charges[3]. This step involves a geometry optimization with the COSMO solvation model[4] and avoids fatal rearrangements sometimes found when optimizing highly charged molecules like ATP4- in vacuo.
  • Assignment of AM1BCC atom- and bond types.
  • Application of the 'AM1 Bond Charge Correction' to improve the AM1 charges and make them better represent the electrostatic potential around the molecule - just like RESP charges.
  • Further improvement of the AM1BCC charges using the known ideal RESP charges of similar molecule fragments, identified via SMILES strings.
  • Assignment of GAFF (General AMBER Force Field) atom types and remaining force field parameters.
  • In the end, the newly created parameters are cached for instantaneous availability next time.

Since all this is done automatically, only one step is required in practice: Press <F12> to run the simulation. YASARA also produces a detailed force field parameter assignment report in the console, which you can examine in detail to see what happened.

Figure: The steps of the AutoSMILES force field parameter assignment procedure.

If a residue contains more atoms than YASARA's QM module can handle , YASARA will try to split it up into smaller pieces that can easily be parameterized independently. While this works well for lipids, where each hydrophobic tail is usually parameterized separately, you may have to give YASARA a hint for other large molecules by following these steps:
  • Identify an aliphatic carbon that is as far away as possible from polar atoms, and - if removed - would split your residue in two parts in the soup (i.e. all atoms with lower numbers precede the carbon, all atoms with higher numbers follow it in the soup).
  • Mark the carbon, then right-click to activate the context menu and click Split > Atom.
  • In the bottom sequence selector, your residue should now have been split in exactly two parts. Now you can run the simulation.

If you nevertheless want to define force field parameters manually , this implies adding at least a residue topology in AMBER PREP format to one of the force field definition files (*.fof) in the yasara/fof subdirectory. The best location is probably gafftopo.fof, since this file is included in all force fields.

Try to follow these steps for AMBER-style force fields:

  • Look at the repository of AMBER force field parameters at http://pharmacy.man.ac.uk/amber/. Maybe your molecule has already been parameterized. Also try to Google it. Note that most of the AMBER Parameter Database is already included in YASARA by default.

  • Read the introduction to parameter fitting on page 287 of the AMBER 7 manual. (Downloadable from http://amber.scripps.edu).

  • Exit YASARA and open the force field definition file in a text editor. You can find the force fields at yasara/fof/ForceFieldName.fof, e.g. yasara/fof/yamber2.fof for the YamberII force field.

  • Go to the end of the file and add a topology entry for your residue. The format of these topology entries is also described on the AMBER website. YASARA does not use the bond length, angle and dihedral data, so these columns can be set to zero.

The information needed for every atom is the sequential number (column 1), the atom name in the PDB file (column 2), the force field atom type (column 3) and the point charge on the atom (last column). Just keep the header (including the three DUMMy atoms), and replace the name of the compound (line 1) and the three letter code of your residue (line 3, 'AGS' in the example below). The second line must stay empty.

If your ligand contains planar groups (around resonance or double bonds), you must also add improper dihedral entries (see IMPROPER statements below).

The LOOP statement used by AMBER is not needed and ignored.


N-Acetyl-D-glucosamine-6-sulfate ( 1' O and no 4' OH-group, Gaussian98, RESP)

AGS  INT    1
CORR OMIT DU   BEG
  0.000000
    1 DUMM   DU   M    0  -1  -2     0.0000    0.0000    0.0000  0.000
    2 DUMM   DU   M    1   0  -1     1.0000    0.0000    0.0000  0.000
    3 DUMM   DU   M    2   1   0     1.0000   90.0000    0.0000  0.000
    4 C1     AC   M    0   0   0     0.0000    0.0000    0.0000  0.124693
    5 C2     CT   M    0   0   0     0.0000    0.0000    0.0000  0.032432
    6 C3     CT   M    0   0   0     0.0000    0.0000    0.0000  0.107401
    7 C4     CT   M    0   0   0     0.0000    0.0000    0.0000  0.044439
    8 C5     CT   M    0   0   0     0.0000    0.0000    0.0000  0.098935
    9 C6     CT   M    0   0   0     0.0000    0.0000    0.0000  0.031706
   10 N      N    M    0   0   0     0.0000    0.0000    0.0000 -0.466264
   11 O1     OG   M    0   0   0     0.0000    0.0000    0.0000 -0.430845
   12 O3     OH   M    0   0   0     0.0000    0.0000    0.0000 -0.658535
   13 O5     OS   M    0   0   0     0.0000    0.0000    0.0000 -0.386715
   14 O6     OS   M    0   0   0     0.0000    0.0000    0.0000 -0.399527
   15 C2N    C    M    0   0   0     0.0000    0.0000    0.0000  0.749408
   16 O2N    O    M    0   0   0     0.0000    0.0000    0.0000 -0.632137
   17 CME    CT   M    0   0   0     0.0000    0.0000    0.0000 -0.448969
   18 HME    HC   M    0   0   0     0.0000    0.0000    0.0000  0.119617
   19 HME    HC   M    0   0   0     0.0000    0.0000    0.0000  0.119617
   20 HME    HC   M    0   0   0     0.0000    0.0000    0.0000  0.119617
   21 S      SO   M    0   0   0     0.0000    0.0000    0.0000  1.131685
   22 O1S    O2   M    0   0   0     0.0000    0.0000    0.0000 -0.603269
   23 O2S    O2   M    0   0   0     0.0000    0.0000    0.0000 -0.603269
   24 O3S    O2   M    0   0   0     0.0000    0.0000    0.0000 -0.603269
   25 H1     HC   M    0   0   0     0.0000    0.0000    0.0000  0.160956
   26 H2     HC   M    0   0   0     0.0000    0.0000    0.0000  0.127396
   27 H3     HC   M    0   0   0     0.0000    0.0000    0.0000  0.137259
   28 H4     HC   M    0   0   0     0.0000    0.0000    0.0000  0.139063
   29 H5     HC   M    0   0   0     0.0000    0.0000    0.0000  0.086498
   30 H6     HC   M    0   0   0     0.0000    0.0000    0.0000  0.085794
   31 H6     HC   M    0   0   0     0.0000    0.0000    0.0000  0.085794
   32 HN     H    M    0   0   0     0.0000    0.0000    0.0000  0.296182
   33 HO3    HO   M    0   0   0     0.0000    0.0000    0.0000  0.434306

LOOP
O5 C1

IMPROPER
 C2N  C2   N    HN
 CME  N    C2N  O2N

DONE

  • Update the force field by starting YASARA with the command line option -upd:
    
    yasara -upd
    
    

YASARA will then recompile all force fields and tell you if something went wrong. If you do not get an error message, restart YASARA and try to initialize the simulation again. If YASARA still complains, consider the following points:

  • Numbering of chemically equivalent hydrogens. The AMBER force fields do not follow the PDB convention that hydrogens bound to the same atom are numbered in the first column. YASARA corrects this problem for standard residues, but cannot guess the right answer for 'your' new residues. To avoid problems, do not additionally number hydrogens that are bound to the same atom. If you look at the example above, atoms 30 and 31 are bound to C6, and are both named H6, and neither H61/H62 nor 1H6/2H6.

  • If you created a topology for an unusual amino acid by copying from a standard residue, you also have to delete the numbers of chemically equivalent hydrogens as just described above.

References:

[1] Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation Jakalian A, Jack DB and Bayly CI (2002) J Comput Chem 23,1623-1641

[2] Development and Testing of a General Amber Force Field Wang J, Wolf RM, Caldwell JW, Kollman PA and Case DA (2004) submitted.

[3] MOPAC: A semiempirical molecular orbital program Stewart JJP (2000) J.Comp.Aided Mol.Des. 4,1-103

[4] Conductor-like screening model for real solvents: a new approach to the quantitative calculation of solvation phenomena Klamt A (1995) J.Phys.Chem. 99, 2224-2235

° 

Deriving new NOVA force field parameters

NOVA uses molecular trees to assign force field parameters. Most of the time, it is enough to follow this procedure:

  • Load a PDB file of the structure with the unknown ligand(s). If the PDB file does not contain CONECT records (no chemical bonds defined, the ligand appears as a point cloud in ball&stick display) let YASARA find the bonds (Edit > Find bonds in). In any case clean the structure (Edit > Clean), and save it in the yasara/fof directory.

  • Edit the file yasara/fof/nova.fof, search for the text 'OTHER MOLECULES' and insert the name of your PDB file below. If your PDB file also contains a protein, you should place a question mark '?' in front (this will tell YASARA to look only at new features in your structure and ignore the rest). This structure will now be used to learn equilibrium bond lengths and angles, it must therefore have an accurate covalent geometry.

Example:


;
;SOME OTHER MOLECULES
;====================
?MyStructureWithProtein.pdb
MyLigandAlone.pdb
smallmol.pdb
heparin.pdb

  • Update the force field by starting YASARA with the command line option -upd:
    
    yasara -upd
    
    
    YASARA will then recompile all force fields and tell you if something went wrong. If you do not get an error message, restart YASARA and try to initialize the simulation again.

  • If you still get the error message 'NOVA force field tree not found' and your molecule contains fused ring systems, simply use one of the other force fields which automatically detect and handle planarity.

  • You can always check that the bond types were assigned correctly by loading the PDB file and clicking on the atoms: the bond types are displayed in the lower left HUD.

  • If your residue contains unusual chemical groups that do not look like anything you find in proteins, DNA and sugars, the NOVA force field may not contain point charge definitions for the group. This will result in polar atoms without a charge. You can verify that by clicking on the polar atoms after initializing the simulation. The simulation HUD on the right lists all point charges on this atom. If there is more than one charge on an atom, you can cycle through them with <Home> and <End>. If you are really missing a charge and doing more than a quick&dirty modeling task, you must add an entry to the CHARGE_POSITION_DATA section in yasara/fof/nova.fof and recompile the force field.

  • The same principle holds for planar groups. If a group is not recognized as planar, a corresponding entry has to be added to the PLANE_CONFORMATION_PLANES section to avoid an out-of-plane distortion.

As NOVA has been optimized for proteins only, it is generally advised that you use the (Y)AMBER/YASARA force fields when working with small molecules.

° 

Running a simulation

As soon as the force field can cope with your structure, you are ready to run a simulation. Step-by-step instructions can be found in the help movie 'Accurate Simulations in Water'. Described below is the quickest way of running a simulation.

  • Create a project directory 'MyDir'.

  • Store either a PDB file (MyStructure.pdb) or a YASARA scene file (MyStructure.sce) of your structure there. Scene files must contain a simulation cell, and should therefore be used if you want to place the cell yourself.

  • Click Options > Macro & Movie > Set target and choose MyDir/MyStructure.pdb as the target.

  • Click Options > Macro & Movie > Play and choose your favorite molecular dynamics macro, e.g. md_run.mcr. If you do not like the default parameters, open yasara/mcr/md_run.mcr with a text editor, make the required changes and save it under a different name.

  • Wait and see how the simulation goes. The standard MD macros will take care of 'everything': Cleaning your structure, creating a simulation cell, filling it with water, placing counter ions, predicting pKa values and assigning protonation states, calibration, saving snapshots at regular intervals etc.

  • By default, YASARA updates the screen after each simulation step. While this approach maximizes interactivity, it also slows down the simulation. Click Simulation > Time step and increase the number at 'Update the display every...'. Note that this makes YASARA respond more slowly. Alternatively, you can run a simulation in console mode.

  • If you stopped a simulation and want to continue later, just play the md_run.mcr a second time.

To give you a reference point for the speed to expect: A simulation of Crambin (1CRN) using the default macro md_run.mcr with ~10000 atoms proceeds by 1 picosecond per minute or 1.44 nanoseconds per day on an AMD Opteron 1.8 GHz.

Figure: Crambin, ready for simulation in a water box

A note on simulating proteins with a net charge : the Particle Mesh Ewald algorithm used to calculate long range electrostatic interactions requires that the net charge of the simulation cell is zero, just like the charge of macroscopic objects in the real world. The macro md_run.mcr described here achieves this goal by adding counter ions to the system using the Neutralization Experiment, e.g. if the protein's net charge is negative due to excess Asp and Glu residues, YASARA will add Na+ ions to compensate.

° 

Analyzing a trajectory

It strongly depends on your question how you are going to analyze a simulation trajectory.

  • Click Options > Macro & Movie > Set target and choose MyDir/MyStructure (no extension!) as the target.

  • Click Options > Macro & Movie > Play and choose an analysis macro, e.g. md_analyze.mcr.

This macro will create a self-explanatory table MyDir/MyStructure_Analysis.tab which you can easily import in your favorite data visualization program (Excel, OpenOffice, XMGrace etc.). If you want more than the standard indicators (force field energies and RMSD from the starting structure), open yasara/mcr/md_analyze.mcr in a text editor and create your own version. Note that the energies in the table depend on the force field and cutoff that are chosen in the macro, so you may have to adapt it for your own simulations.

Here are a few hints for the analysis:

  • If you obtain an unexpected result, check how reproducible it is by running the simulation a second time with a different random number seed.

  • Care has been taken that molecular dynamics trajectories are entirely reproducible, i.e. if you run the same md_run.mcr a second time, using the same YASARA version on the same computer and assigning the same number of processors, you will obtain exactly the same trajectory again. Note however that the reproducibility may get lost when comparing different CPUs, operating systems or YASARA versions: AMD and Intel CPUs support different instruction sets for high performance calculations, which yield slightly different results. Also today's operating systems provide different mathematics libaries, which cause additional deviations. Finally, changes in the YASARA source code may also have a small influence on the results. In short: force field energies calculated for the same scene on different computers are likely to differ in the least significant digits, which are of no relevance anyway. The same is true for the calculated forces, which has a serious impact on simulations however: chaos theory requires that small initial differences grow exponentially with each simulation step, leading to very different trajectories and energies, just as if the simulation had been started with a different random number seed.

  • If you compare simulations run at different pH values and obtain quite different results, compare the covalent connectivity first: use the CompareAtom command to find those residues where YASARA assigned differing protonation states during the neutralization experiment, and investigate the region around these residues for an explanation of the structural deviations. Also check the significance of the result by running the simulation a second time with a different random number seed.

° 

Playing back a trajectory

  • Click Options > Macro & Movie > Set target and choose MyDir/MyStructure (no extension!) as the target.

  • Click Options > Macro & Movie > Play and choose a play-back macro, e.g. md_play.mcr.

Playing back a simulation requires that the force field can be initialized. So you may have create your own version of md_play.mcr, if you use a special force field or want to change some of the parameters (e.g. the play-back speed).