May 2022

Bargain proteins

A Flatiron Institute biologist uses supercomputers and their quantum cousins to streamline the search for  promising drugs.

An X-ray crystal structure of a self-assembling peptide helical bundle. The structure was designed on a  quantum annealer and validated with a classical simulation on large-scale high-performance computing resources. Lab researchers later synthesized and characterized it experimentally. The two mirror-image subunits (one made from L-amino acids in cyan, the other made from D-amino acids in orange) were designed to pack together. Image courtesy of Vikram Mulligan, Flatiron Institute.

Devising a drug to treat a disease isn’t easy. It means screening hundreds of thousands of compounds, testing a small pool of promising candidates first in tissue culture and later in animals, and then, after many intermediate steps, finding perhaps one molecule worthy of human testing.

Using powerful computers to design novel proteins with the best properties can save much of this time, effort and expense. Vikram Mulligan, a research scientist at the Flatiron Institute in New York, aims to make the process even quicker and cheaper.

“A protein’s function is uniquely determined by the way it folds, which in turn is determined by its sequence of amino-acid building blocks,” Mulligan says. “Untangling the sequence-fold-function relationship is challenging due to the vastness of both the possible sequence space and the possible conformational space” – the huge number of configurations a protein could fold into.

Unfortunately, most researchers lack the computational resources to model protein folding. Mulligan hopes to address that limit. With allocations of supercomputer time from the Department of Energy (DOE), he and collaborators are developing machine learning methods and quantum computing technology, which relies on the strange physics that dominate at the tiniest scales, to improve protein-folding models and make them accessible to the average scientist.

Mulligan’s quest to understand the sequence-fold-function relationship began as a University of Toronto doctoral student, when he used lab experiments to investigate the kinetics of protein folding and misfolding in late-onset neurodegenerative diseases. As a postdoctoral researcher, he later sought additional computational skills and joined David Baker’s University of Washington lab, one of the preeminent hubs for computationally designed proteins. Starting in the early 2000s, Baker developed Rosetta, an open-source software suite that hundreds of labs worldwide now use to predict and design protein structures.

Until Mulligan joined Baker’s lab, Rosetta usually was used to model proteins made from the 20 naturally occurring amino acids. Mulligan, however, saw potential in designing peptides – amino acid chains – made from nonnatural amino acids that differ from the natural ones in various ways, such as an extra chlorine atom here or a fully reconfigured side chain there. “This flexibility allows us to make structures of mixed handedness, where we have helices that spiral in opposite directions packing against one another and things like that,” Mulligan says. “Ultimately, it allows us to access much more diverse structures, which in turn means we can access more diverse functions.”

Increasing structural diversity, however, also means increasing the challenge of computationally exploring the space of possible protein sequences. Mulligan embraced the challenge. “The ultimate test of whether we really understand how proteins fold is when we try to make something new out of building blocks that nature doesn’t use.”

‘This would make all our old antibiotics useful again.’

The first proof-of-principle compound derived from nonnatural amino acids that Mulligan computationally designed at Baker’s lab was a peptide that binds to and inhibits New Delhi metallo-beta-lactamase 1, an enzyme implicated in antibiotic-resistant bacteria.  “If we can make something that inhibits this antibiotic-resistance factor then we can treat antibiotic-resistant infection using a combination of the inhibitor and existing antibiotics,” Mulligan says. “This would make all our old antibiotics useful again.”

Since joining Flatiron in 2018, Mulligan has worked on methods that use ever fewer computational resources to design proteins with ever greater precision. Now he’s pursuing a pair of projects with a million node-hours on Theta, a Cray XC40 supercomputer at the Argonne Leadership Computing Facility, via a DOE INCITE (Innovative and Novel Computational Impact on Theory and Experiment) allocation.

“On Theta, we can do a calculation in a day that might take us a week or two on smaller clusters,” Mulligan says. “That allows us to iterate very fast.” Theta’s mix of GPU- and CPU-based nodes are well suited to the research, he adds. “There are a number of computational problems related to protein folding that don’t map all that well to GPUs, so it is valuable to also have access to a lot of CPUs.”

For his first INCITE project, Mulligan will develop machine-learning methods with low computational cost that can approximate the output of demanding validation simulations. “As we develop new methods, we need to validate the method against a reliable, established method that might be more computationally expensive,” Mulligan says. “So, we will do a one-time run of a ton of calculations on Theta to produce the data on which we will train the machine-learning method.” Once trained, researchers can use the technique to perform design and validation tasks on smaller computing systems. “It’s a one-time expensive use of computation to enable a lot of cheap computations down the road.”

The second INCITE project focuses on two areas. First, Mulligan and his colleagues will use simulations of quantum computers running on Theta to design proteins using an energy function from Rosetta that’s based on classical physics. Second, they’ll attempt to improve computations of energies performed on standard computers by incorporating quantum mechanical energy calculations to complement the Rosetta energy function.

Quantum computers could implicitly consider every possible amino acid sequence and let researchers efficiently sample from the best ones. With collaborators Hans Melo at drug-design company Menten AI and Brian Weitzner at protein-engineering firm Outpace Bio, Mulligan has successfully mapped the protein design problem to quantum annealers, special-purpose quantum computers that solve optimization problems, such as finding the most stable folding state for proteins with specific amino acid sequences

 With Menten AI’s Alexey Galda and Gavin Crooks at the Berkeley Institute for Theoretical Science, Mulligan also is beginning to map the problem to general-purpose gate-based quantum computers. The team has validated the first real proteins that were designed on a quantum computer, working with New York University’s Paramjit “Bobby” Arora and his graduate student, Haley Merritt, to synthesize molecules, and with UCLA researchers Michael Sawaya and Todd Yeates to solve structures.

Although its rewards seem far off, quantum computing will open the door to exploring, for the first time, the full palette of thousands of nonnatural amino acids and other chemical building blocks available to researchers, Mulligan expects. “We hope this will be the extra little boost we need to design proteins that get across the cell membrane and bind to a target and have all the other properties we’d like to see in a drug.”