August 2009

Unfolding protein

Proteins can be unpredictable, kinking into shapes that help to determine these biological workhorses’ functions – or dysfunctions. A University of Washington biologist is using high-performance computers to untangle proteins.

The latter half of the 20th century brought an avalanche of discoveries about the basic building blocks of life. Scientists unlocked the DNA genetic code, for example, and learned to manipulate genes at will.

But the rules that govern protein folding – the process that lets these cellular workhorses function and make life possible – have proved difficult to crack. Stably transforming a linear string of chemically diverse amino acids, the building blocks of all proteins, into a three-dimensional form involve difficult-to-decode physics. Yet understanding protein folding is key to making headway in some of this century’s most vexing problems, from controlling carbon dioxide in the atmosphere to stopping the H1N1 flu.

Computational biologist David Baker leads one such effort, combining computational prowess with an eclectic team intent on exploring this submicroscopic landscape. Despite the modernity of his challenges, Baker’s take on molecular modeling has more the feel of a grand Victorian expedition. It’s easy to imagine Baker, a Howard Hughes Medical Institute investigator at the University of Washington in Seattle, in khakis and a field jacket directing a group of intrepid explorers across an unexplored landscape. In his enthusiasm there’s a sense of the potential to discover a new species around every bend in the virtual terrain Baker travels.

And just as Victorian explorers often relied on interested amateur naturalists for routine fieldwork, Baker has enlisted legions of dedicated volunteers to assist him in determining protein structure by donating unused time on their home computers’ CPUs.

Even the name of his software, Rosetta, invokes the 19th century French explorers who recovered the key to deciphering Egyptian hieroglyphics. Similarly, Baker’s plans for Rosetta are no less ambitious. The software, which Baker and his many collaborators are continually refining, aims to decipher the essential keys that govern protein folding.

To describe how he solves protein structures, Baker borrows the naturalist’s descriptions of fieldwork. The analogy works because protein biochemists describe the protein-folding process as playing out in an energy landscape. In this landscape, the protein seeks its most stable shape: the conformation with the lowest energy the molecule can attain.

Rosetta tries to simulate that process. The program tests different folds until it arrives at the same shape the protein would form naturally. During the process, interim shapes represent the peaks and valleys of energy terrain.

Baker wants to learn nature’s protein-folding code to design working proteins that nature never invented.

“It’s like looking for the lowest elevation point in this massive landscape,” Baker says. “You could imagine sending out explorers, and they each go around. They report back to you the lowest elevation point they found until they cover all the possible elevation points. This takes a long time, and the larger the protein, the larger the energy landscape. The search problem gets harder and harder to find that lowest elevation point.”

Rosetta has been refined enough to take on larger challenges requiring orders of magnitude more computing power to decipher the energy function. Borrowing time from home computers no longer is sufficient to make the kind of progress Baker envisions.

A 2008 INCITE (Innovative and Novel Computational Impact on Theory and Experiment) award from the Department of Energy’s Office of Science allowed Baker’s team its first access to computing resources on a massive scale. The award provided 12 million processor hours on Intrepid, the IBM Blue Gene/P at Argonne National Laboratory’s Leadership Computing Facility.

Baker calls the big machines perfect for his task. Each processor acts as an explorer and “can carry out an independent search, but because they are networked together they can communicate so you know who has the lowest elevation point at any given time. So if someone finds a really low elevation then it can communicate that to other explorers.”

It’s as if a time-traveler could swoop in and hand Stanley and Livingstone GPS units and cell phones. Suddenly, when one party locates an interesting new landmark or gets lost down river, it could contact the other group to redirect its search. This kind of communication ability was simply impossible using a series of processors that were independently searching the energy landscape.

Intrepid’s communication abilities have greatly sped up the process of locating the most stable conformations, Baker says. He and his team were able to double the size of protein structures they were able to predict, from 80 up to 189 amino acids.

That’s a good start, but there’s a long way to go. For example, the egg-white protein albumin is considered simple but contains 615 amino acids. Still, Rosetta consistently calculated high-resolution structures comparable to structures determined by expensive and time-consuming nuclear magnetic resonance (NMR) spectroscopy, considered the gold-standard measurement technique in structural biology.

The initial INCITE award allowed Baker’s team to make significant progress in developing strategies that take advantage of Intrepid’s communication abilities. Based on promising results obtained in 2008, Baker’s team has been awarded an additional 60 million CPU hours for this year.

About one third of the time will be devoted to developing strategies for selecting and sharing structural features during folding calculations so Rosetta can more quickly converge on optimal folds. In addition, the group plans to develop tools to help protein biochemists solve the structures of biologically important proteins that are difficult or expensive to solve using conventional approaches such as X-ray crystallography and NMR.

“If we can reliably and accurately predict protein structures from sequence information alone it will be an incredible step forward and a huge cost savings,” Baker says. “We still have a long way to go, but the INCITE award will greatly speed our progress toward this important goal.”

Baker won’t be satisfied to simply learn nature’s protein-folding code. He wants to use that knowledge to design working proteins that nature never invented. In de novo design, the team starts with an idea then must find amino acids that will fold into the shape they envision.

In a March 2008 Science paper, Baker’s group described such a protein designed from scratch. Their invented protein breaks a carbon-carbon bond in a chemical reaction called a retro-aldol reaction. The aldol reaction is important for processing sugars and is enlisted by the pharmaceutical industry to synthesize various drugs.

To design a new protein – an enzyme that efficiently catalyzes an aldol reaction – the researchers started with 72 designs for retro-aldolases using four possible arrangements of catalytic centers. One of their designs, in which a bound water molecule participates in the reaction, achieved reaction rates approaching enzymatic catalysis.

Taking the approach one step further, they designed a protein to perform a reaction that no known naturally occurring enzyme can do. The reaction, called a Kemp elimination, occurs when a proton is removed from a carbon atom. These early forays into protein design were completed even before the INCITE award. The boost in computational power has encouraged more ambitious plans.

“We are trying to design new enzymes that catalyze new chemical reactions,” Baker says. “We are actually working on an enzyme that would take carbon dioxide out of the air and turn it into sugar or fuel. We have a design right now that has some activity, but it’s not ready for prime time. So that’s an example of where we can use additional time on these massive machines to optimize our design.”

Keeping one step ahead of the seasonal flu can stymie public health professionals, who must predict which strains will circulate each year and then include inactive elements of those individual viruses in each year’s vaccine. When the viral fragments are injected into the body, the immune system produces antibodies directed against further invasion by those flu strains.

Doctors want a flu vaccine that works against all strains. But each year flu strains develop new variations that evade the immune system.

Researchers have identified universal structural elements on the influenza virus that induce rare neutralizing antibodies targeting all forms of influenza, but haven’t been able to translate that into a universal vaccine.

The problem involves complex protein-protein interactions, Baker says. He aims to start with what we already know about immune-pathogen interactions and design novel vaccines for influenza – and even for HIV, the virus causing AIDS.

With INCITE support, Baker and colleagues have developed a computational method for incorporating structural elements from a complex protein and transferring them to another, simpler protein scaffold.

First they identify a critical binding site. Next they identify sections from a second protein that could incorporate the crucial new binding site while simultaneously ensuring the protein can maintain its ability to bind its original partner.

The idea is to extract crucial binding elements from natural neutralizing antibodies that can effectively combat the flu virus, then transfer them to an easier to manufacture protein-based vaccine. Such rare antibodies often show a remarkable ability to overcome the flu’s immune-evasion strategies, Baker says.

As a first step in this vaccine-design strategy, the researchers are transferring the binding site from a known pathogenic bacterial protein, Streptococcal protein G, to a designed vaccine candidate.

Because Streptococcal protein G and human immunoglobulin G antibody structures are highly interactive, Baker says they are “ideal for our vaccine study.”

Early results testing the strategy look promising to Baker. He estimates that it will take roughly 8 million CPU hours this year to design a protein that incorporates this particular binding site. But he already is looking forward to 2010, when the medically significant task of designing a working vaccine against H1N1or its next iteration will start.

It’s just the latest leg of Baker’s journey of discovery into the uncharted landscape of 21st century science.