Proteins are the molecular-scale machines which conduct and mediate the fundamental processes of life. The beautiful and diverse ways in which they perform their tasks are a perk of their incredible chemical and structural diversity.
While DNA and RNA only have four different building blocks, proteins employ at least 20 amino acid building blocks named for most of the letters in the alphabet ( etc.). Long linear chains of amino acids fold up into complex and compact structures.
Human Myoglobin. PDB: 3RGK Structure: QuteMol
Designing atomic-scale structures from the ground up is well beyond our current technology, but recent breakthroughs in computational biology have taken a huge leap forward in understanding how proteins fold. This quiz will investigate some of the foundational experiments that kick-started this field.
Early in the study of protein structures, most scientists believed that the 3D folded structure of proteins was "stapled" together in several locations by strong chemical bonds called disulfide bridges. Much like base-pairs which hold together double-stranded DNA and RNA, it was thought that a certain amino acid called cysteine or for short could come together like velcro to form strong bonds, the only chemical bonds that were known to occur in folded proteins.
Future Nobel prize winner Christian Anfinsen used an active protein called an enzyme to study folding. This enzyme contained eight amino acids which could form four bridges. Though he could not observe the exact atomic structure of the protein as it folded, he knew the protein was in the right arrangement when it performed its biological function: chopping up RNA.
Considering only these eight amino acids, how many possible arrangements of four bridges can appear in this enzyme?
Even though the amino acids could form 105 different bridge arrangements, Anfinsen observed one arrangement well over of the time. He called this the native fold. How the protein selected this correct arrangement was unknown.
Proteins and their bridges can be unfolded by heating and adding chemicals like alcohol which interfere with biological structures. When Anfinsen unfolded the enzyme, all of the bridges were broken and it became non-functional—it wouldn’t chop up any RNA.
What could Anfinsen conclude about the role of structure in protein function?
When he cooled the protein in alcohol, all four bridges reformed, but instead of selecting the native fold, they occupied each of the 105 bridge combinations roughly equally. This mixture of differently arranged proteins demonstrated only about of the RNA chopping activity expected from the pure protein.
The alcohol seemed to be interfering with the enzyme finding the correct configuration of bridges.
The randomly configured enzymes had almost no useful activity since only a very small fraction would be in the native fold randomly. Anfinsen then removed the alcohol from this random population of proteins and returned the environment back to a more familiar salty water that's found inside cells.
He observed that the enzyme's RNA chopping activity slowly increased back to , as bridges broke apart and tended to reform into the native fold all on their own.
What does this suggest about the role of bridges in determining protein structure?
Anfinsen's experiment indicated that the arrangement of bridges does not fully determine protein structure. The complete story involves some other force that pushes the protein towards its native fold. If a protein is forced towards its native fold, that means it must have lower energy. Just as the force of gravity pulls a ball to lower gravitational potential energy, the forces in play in protein folding pull a protein towards the lowest energy fold.
The exact identity of the forces biasing the protein structure towards its native fold was unknown at the time, but Anfinsen proposed,
"Interactions between the functional groups of the side chains may exert, by a concerted action, a powerful set of forces that allow a significant fraction of molecules to favor a configuration resembling that of the native fold, even in the absence of stabilizing [bridges]."
This is the basis of the thermodynamic hypothesis, which states that the native structure of a protein is determined by interactions between all of the protein's amino acids which act in concert with the environment to form its native fold.
Just like DNA and RNA, lower energy protein structures have more bonds. But bridges are only one of many types of bonds that form as proteins fold. Some bridge configurations are lower energy than others because they allow many other bonds to form. In the Folding chapter, we'll use information theory to tease out the other bonds involved, including hydrogen bonds, salt bridges, and - stacking between two or more amino acids.
Considering only four bridges, there are 105 possible bridge configurations, only one of which is the native fold. If we want to find the lowest energy configuration, we could count the total number of bonds in each of the 105 configurations. Then the lowest energy configuration is the one with the highest number of bonds including bridges and all other types
Why might this straightforward approach be doomed to failure?
The immensity of possible interactions and configurations of proteins is intimidating, but it turns out that proteins occupy only tiny islands in a vast sea of possible configurations. By finding simple patterns in protein configurations, we can focus only on these islands and make the protein folding problem much more tractable.
A protein is a linear sequence of amino acids, each of which is constrained to a plane. The bonds connecting each amino acid are free to rotate, so the configuration of two adjacent amino acids can be described using their dihedral angles. Naively, every pair of amino acids could have angles anywhere between and radians, but that’s not what we find.
The Python
environment below takes in the 3D structure of hemoglobin and measures the dihedral angles between every amino acid. The program outputs a scatter plot of all the different orientations in the protein.
The two largest islands in protein configurational space are called alpha helices and beta sheets for the characteristic shapes of their folds. Together with several other common motifs, these are known as protein secondary structures. Though an amino acid could theoretically occupy a configuration with an arbitrary torsion angle, finding patterns of different configurations can help us improve our "guesses" of possible protein structures.
Not only do different secondary structures appear as islands in plot of possible dihedral angles, the different amino acids tend to occupy one secondary structure over the others due to their unique chemical properties. Knowledge of these patterns can help us even more to find the native fold.
We will find these patterns by mining protein structures. The Python
environment below has been prepared to generate torsion distributions for three different amino acid patterns Glycine , Alanine , and any amino acid preceding a Proline You'll learn how to use Python
to perform analysis on patterns of protein folding in an upcoming chapter.
What conclusions can you make about the structural tendencies of these amino acids using the "map" of amino acid configurations above?
In the Folding chapter, we will use computational approaches to mine protein and RNA sequences. By combining simple patterns, some cues from information theory, and knowledge of physics and chemistry, we will attempt to connect biological sequence to biological structure.
Human Myoglobin. PDB: 3RGK Structure: QuteMol
The holy grail of computational biology is protein design; starting from the ground up to design a protein at the atomic scale to perform some function. We’ll take a look at the state of the art in this field, and explore the clever mathematical shortcuts that are currently enabling designer enzymes, drugs, and biomaterials.