Endonuclease PvuII (1PVI) DNA - GATTACAGATTACA
CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box Binding Protein (1C9B)
 

° 

Solve an NMR structure

To solve an NMR structure with YASARA, you need:

Then follow these steps:

  • Create a directory for your project.
  • Save the sequence there, using the filename 'sequence.fasta' or 'sequence.pdb'.
  • Save the file with XPLOR restraints using the filename 'restraints.tbl'
  • Click Options > Macro & Movie > Set target and choose the project directory.
  • Click Options > Macro & Movie > Play macro and choose the macro 'nmr_solve'
  • Some time later, the file 'ensemble.pdb' contains your ensemble, and 'result.log' an analysis.

The entire refinement procedure has been implemented using the Yanaconda macro language, so that you can easily adapt everything to your own needs in a very flexible way.

The following sections describe the various stages needed to solve an NMR structure.

° 

Setting the parameters

In addition to the straightforward way of solving a structure , you can adapt the protocol at every step to your particular needs.

Before any work can be done, you need to set the default parameters, which is handled by the macro 'nmr_setdefaults'.

Obvious choices are:
  • The filenames for sequence, restraints, structure ensemble and analysis results.
  • The number of structures you want in the ensemble.

Not so obvious choices are:
  • The pH at which the NMR spectrum was recorded: this information will be used in the final explicit solvent refinement step to assign protonation states of amino acid side-chains.
  • The restraining function: YASARA supports the same functions as XPLOR, described in detail here, the usual default is the 'SoftSquare' function.
  • The restraining parameters: These globally affect all restraints and define the distance averaging as well as the overall scaling factors. Scaling factors are usually larger than the XPLOR equivalents due to internal force calculation differences. YASARA uses two different parameter sets: 'defaultpar' for the final refinement stage and analysis, and 'strongpar' for crude refinement with stronger forces.
  • The option to correct cis-peptide bonds before prolines: Contrary to the other 19 amino acids, prolines have a reasonable chance of allowing a preceding cis-peptide bond. If the 'correctcispro' flag is set to 'yes', these cis-peptide bonds will always be corrected, and any cis-peptide bonds that really occur in the structure will thus be missed. It is therefore a good idea to also generate an ensemble with this flag set to 'no', and check if the lowest energy members all share a certain cis-proline.
  • The list of cysteines that are bridged: If you already know which pairs of cysteines are bridged, store their numbers in 'cysbridgelist'. Alternatively, set 'cysbridgelist' to 'Auto' and let YASARA automatically link cysteines close in space.

° 

Folding the structure

The first step is to fold the protein from the stretched-out conformation . This is done by the macro 'nmr_fold' using the NMRFolding experiment , the process is described in more detail there.

At this stage, speed is more important than accuracy, and the structures generated this way are not realistic proteins yet. But they have helices at the right spot and the peptide chain running in the right direction to quickly arrive at the correct solution during the following molecular dynamics refinement.

The number of structures generated in this stage is specified by the 'structures' parameter in 'nmr_setdefaults' and is equal to the final number of ensemble members.

If you are running Linux, you can of course also skip this step and use other programs like Concoord to fold the structures.

° 

Refining the structure in vacuo

The second step is to convert the roughly folded decoys to realistic proteins with correct hydrogen bonding patterns. This is done by the macro 'nmr_refinevacuo', which runs molecular dynamics simulations in vacuo using the NOVA force field. It performs several refinement cycles, some of which are done without non-bonded interactions, so that atoms can pass through each other and kinetic traps like knots in the peptide chain can be resolved. For each ensemble member, the structure with the lowest restraint violation energy is kept and passed on to the next step.

° 

Refining the structure in explicit solvent

The third step is to improve the structural quality of the final ensemble members (like Ramachandran plot or 3D packing interactions ) by using YASARA's most accurate simulation techniques: explicit solvent , electrostatics without cutoff and force fields optimized for protein structure refinement . This is done by the macro 'nmr_refinewater' in several cycles. The structure with the lowest restraint violation energy then becomes the final ensemble member.

° 

Analyzing the ensemble

The fourth and final step is to calculate a total energy for each ensemble member, which is now not limited to restraint violations , but includes force field and solvation energies . The results are saved as 'result.log'. If you own the Twinset, they also include the three most important WHAT_CHECKs: RAMCHK, BBCCHK and QUACHK.

The ensemble members are then sorted with respect to this energy, so that the first structure in the ensemble is the best, then superposed on their secondary structure elements and saved together in one PDB file, by default 'ensemble.pdb'.

All the details can be found in the macro 'nmr_analyze'.

° 

Handling special cases like Cys-bridges, oligomers and metalloproteins

If you are trying to solve an unusual protein structure, here are some hints:

  • Treating cysteine bridges: The initial folding step is always done with protonated SG atoms and thus without cysteine bridges. If you know from some other experiment which cysteines are bridged, store the residue numbers in 'cysbridgelist' (see nmr_setdefaults). Alternatively, YASARA can automatically link cysteines that get close enough during the refinement in vacuo.

  • Treating metal binding sites with ions: Put the distance restraints involving ions into a second restraint file and run the initial folding step without any ions present. Then add the ions, place them at the center of the protein and then continue with step 2 and both restraint files.

  • Special residue numbering: If the first residue in the sequence does not have the number '1' in the restraint file, the easiest solution is to renumber the residues using the RenumberRes command, right after the linear peptide chain has been built in nmr_fold by the BuildMol command.

° 

Avoiding problems with hydrogen nomenclature

There are currently three common schemes for naming equivalent hydrogens bound to the same heavy atom: PDB, IUPAC/PDB3 and XPLOR. From an objective point of view, the PDB scheme is the smartest one: numbering hydrogens in the first column not only avoids a problematic misalignment of longer names, but also increases the information content: if hydrogen names differ only by the number in the first column, they are known to be bound to the same heavy atom. In July 2007, the PDB changed the naming scheme in all files to PDB V3, which is mostly like IUPAC and thus inherits its consistency problems. Trouble with hydrogen nomenclature are typically a major source for loss of time in NMR structure determination and analysis.

Since user friendlyness is a primary goal of YASARA, we searched hard for a solution. In the end, it turned out that many hydrogen related problems in computational chemistry can be magically solved by learning from nature: if quantum chemistry itself can hardly distinguish these hydrogens and MD force fields thus assign identical charges, why force different names upon them? Consequently YASARA simply removes the numbers from equivalent hydrogens.

While this approach is consistent within YASARA, other programs heavily depend on a certain hydrogen nomenclature. To ensure optimal interoperability, YASARA therefore lets you choose a certain atom naming scheme when saving a molecule in PDB or other formats .

Here are a few answers to common questions concerning this approach:

Q: My high quality NMR spectrum allows me to distinguish the chemical shifts of two methylene hydrogens. How do I assign the stereo-specific restraints?

A: Number the hydrogens any way you want in the XPLOR formatted restraint input file and use a floating assignment to let YASARA resolve the ambiguity. As a rule of thumb: if a side-chain rotamer depends on whether a floating assignment or an exact stereo-specific one is used, then the structure is underdetermined anyway. If you need to use stereo-specific assignments, check the nomenclature translation tables below and beware of 'quantum mechanical' tunneling during high temperature simulations, which can lead to deviations from naming conventions.

Q: I am analyzing Protein/DNA interactions and looking at double hydrogen bonds between Asn/Gln side-chains and DNA bases. How do I select the IUPAC HD22/HE22 hydrogen which is on the same side as the OD1/OE1 oxygen?

A: Use the 'with minimum distance' selection operator, here shown for residue 'i':

ListAtom HD2 Res (i) with minimum distance from OD1 Res (i)

Q: I need to visually debug the hydrogen naming mess of another program, how shall I do that in YASARA if the hydrogen numbers are removed? I really want the hydrogen numbers back.

A: Well, then continue reading.

  • When YASARA loads a PDB file, it sorts the equivalent hydrogens bound to the same heavy atom by their name in ascending order, then the hydrogen number is removed. The information about the original hydrogen numbering is thus implicitly retained by the rank order of the hydrogens in the soup.

  • One exception applies to the above procedure: if the hydrogens are part of a methylene or amide group and XPLOR atom names are used (HB1 instead of 1HB etc.), then the sort order is reversed. The reason is that XPLOR uses a reversed nomenclature for methylene and amide hydrogens when compared to the official PDB standard.

  • Unless the PDB file is loaded with corrections disabled ('Correct=No'), YASARA then swaps the hydrogens in methylene, amide and guanidine groups so that the official PDB conventions are met.

  • When you click on a hydrogen atom, its name is displayed in the HUD, together with its rank order in the group of equivalent hydrogens, e.g. HB (1 of 2). The hydrogen with the lowest atom number in the soup is ranked first.

  • This leads to the following translation tables for the various nomenclatures, in all of which YASARA adopts the official PDB numbering:

Methylene hydrogens:
YASARA PDB IUPAC/PDB3XPLOR
HX (1 of 2)1HXHX2 HX2
HX (2 of 2)2HXHX3 HX1

Amide hydrogens (Asn/Gln):
YASARA PDB IUPAC/PDB3 XPLOR
HD2 (1 of 2) 1HD2HD22 HD22
HD2 (2 of 2)2HD2HD21 HD21
HE2 (1 of 2)1HE2HE22 HE22
HE2 (2 of 2) 2HE2HE21 HE21

All other hydrogens:
YASARA PDB IUPAC/PDB3XPLOR
HX (1 of 2,3)1HXHX1 HX1
HX (2 of 2,3)2HXHX2 HX2
HX (3 of 3) 3HXHX3 HX3

In short: the rank order displayed by YASARA is normally the same as the hydrogen number in the original PDB file. In methylene groups however, YASARA's rank order is one lower than the IUPAC/PDB3 nomenclature and flipped with respect to the XPLOR nomenclature. In amide groups, YASARA's rank order is flipped with respect to IUPAC/PDB3 and XPLOR nomenclatures.

  • When saving a PDB file or NMR restraints for use with other programs, YASARA provides a Format parameter that allows to select a specific hydrogen nomenclature.

° 

Making floating assignments

To activate floating assignments for all hydrogen atoms, set floating= 'Element H' in nmr_setdefaults.mcr.

The technical details of floating assignments

When you assign two resonances (e.g. 1.5ppm and 1.6ppm) of which you know that they are HG1# and HG2# of a certain valine residue, but you do not know whether HG1# belongs to 1.5 or 1.6 ppm (and the same for HG2#), you can make a 'floating assignment' and leave the choice to YASARA. During the structure determination process, YASARA will then automatically pick the assignment that minimizes the restraint violations.

Since such an uncertainty in the assignment translates to an uncertainty in the atom positions, YASARA borrows the classical uncertainty indicator from X-ray crystallography - the B-factor - to handle floating assignments.

The procedure is as follows:

  • The default B-factor of atoms (e.g. a peptide chain built with BuildMol) is 0. If an assignment involves atoms with B-factor 0, it is assumed to be certain. Before you start a simulation with distance restraints or calculate violation energies, you tell YASARA which atoms to consider for floating assignments by setting their B-factors to 25. This is done automatically in the macro nmr_solve.mcr using the command 'BFactorAtom (floating),25'. If all atoms in an assignment have a B-factor > 0, this shows YASARA that some uncertainty is involved.

  • YASARA then analyzes all atoms with a B-factor > 0 (normally 25, see above) to find atoms or atom groups whose assignments could potentially be swapped to improve the fit to the restraints. This requires a) that the residue contains a second, chemically equivalent atom (group), b) that there is at least one restraint assigned to the atom (group), and c) that there are no other restraints assigned to a subset of the atom group (which would be a bug in the restraint file). Typical examples are the two hydrogens of methylene groups (CBeta of many amino acids etc.) and the two methyl groups of valine and leucine. The procedure does not rely on a priori knowledge about certain residues and will thus also work with unusual amino acids. YASARA sets the B-factors of all atoms identified as being part of floating assignments to 50.

  • During a simulation, or before calculating violation energies, YASARA analyzes the floating assignments to see if the fit can be improved by swapping the assignments (the percentage of assignments analyzed each simulation step can be influenced with the 'FloatGroups' parameter of the RestrainPar command). If the violation energy could be reduced by swapping the assignment, the B-factors of the involved atoms are set to 75.

Using the B-factor to encode the floating assignment status has a number of advantages:

B-factor Color Meaning
0 Blue Assignment is certain, this atom is not considered for floating assignments (the default)
25 Magenta This atom is allowed to be part of floating assignments (set by you)
50 Red This atom is permanently checked for floating assignments (set by YASARA)
75 Yellow This atom is part of a swapped assignment (set by YASARA)

  • The floating assignment status is also saved in PDB files (the B-factor is the number on the far right side).

  • YASARA's selection language can be used to restrict floating assignments to certain atom groups:


# Activate floating assignments...
# ...for all hydrogens
BFactorAtom Element H, 25
# ...for the methyl groups of Leu 18:
BFactorAtom HD? Res Leu 18, 25
# ...for all methylene groups:
BFactorAtom Element H with bond to Element C and with 1 bond angle to Element H, 25
# ...for all methyl groups:
BFactorAtom Element H with bond to Element C and with 2 bond angles to Element H, 25


# List atoms that are allowed to be part of floating assignments:
ListAtom BFactor>0
# List atoms that are permanently checked for floating assignments:
ListAtom BFactor>25
# List atoms that are part of swapped assignments:
ListAtom BFactor>50
# Save a list of atoms that are part of swapped assignments:
LogAs swapped.lst,append=No, ListAtom BFactor=75

Since the output of the above commands may be inconvenient to parse when YASARA is coupled to an automated assignment program like ARIA, the ListFloat command provides a compressed output:


# List all floating assignments in object 3gb1
ListFloatObj 3gb1
# List only the swapped floating assignments in object 3gb1
ListFloatObj 3gb1,Type=swapped
# Save the swapped assignments to disk
LogAs swapped.tbl,append=No, ListFloatObj 3gb1,Type=swapped

To pass information about many different floating assignments back to YASARA, simply collect them in a text file, e.g. 'floating.txt':


HG? Residue 65 Segment "   A"
HB? Residue 73 Segment "   B"

Read this file in the YASARA NMR macro, and activate floating assignments for the listed atom groups by setting their B-factors to 25:


for group in file floating.txt
  BFactorAtom (group),25

The similarity to XPLOR syntax can be maximized by replacing 'Residue' with 'resid' and 'Segment' with 'segid'.