Niall Mangan uses data to explore the underlying mechanisms of energy systems, particularly ones for energy capture, transduction and storage.

Mangan, an assistant professor of engineering sciences and applied mathematics at Northwestern University, focuses on bioengineering to produce biofuels, semiconductor physics for new solar-cell materials, and green chemistry via electrocatalysis for possible carbon dioxide capture and sequestration.

Her goal is to understand energy systems directly from data. “While we can learn a lot from a mathematical model,” she says, “building one based on the physics and chemistry can take a long time, which motivated me to speed up the development process via data-driven methods, in which data allows us to select terms within a large set of possibilities that is too large to search through comprehensively.”

Mangan is in her first year of a five-year DOE Early Career Research Program award to use data-driven discovery of dynamic models to characterize energy systems. Sparse optimization, the strategy she’s using for this work, lays out all possibilities of what’s happening in a system. It has the potential to speed innovations in energy systems.

Rather than throwing raw computing power at this work, she wants to use the machines more thoughtfully. “One of the things we’re trying to solve is to avoid brute-force simulating everything that’s possible,” Mangan explains. “Instead, we’re using data and optimization algorithms that are a more intelligent way to search through the space of possibility efficiently.”

As an undergrad, Mangan became concerned about climate change and the energy crisis and carefully considered the various ways she could help address it. Though she started out working in solar cells, her attention has turned to biological problems such as bioengineering of biofuels or sustainable bio-based production of chemicals traditionally produced by fossil fuels.

** **“I think about biological problems a lot now,” Mangan says, “but my background is building simple explanatory mathematical models that can help us explore what’s going on within relatively simple complex systems that experimentalists are trying to understand to improve production of biofuels or device physics within solar cells.”

After a while, Mangan noticed a pattern: She was building the same type of mathematical models over and over, and though the models changed, the process didn’t. “The challenge when building a model without data-driven methods is that you use scientific knowledge and intuition with an experimentalist to choose which interactions or functional forms in the model are most important for setting the behavior of the system,” she explains. “This process is the same, even though the simple explanatory models that come out are different each time.”

Her challenge now is to figure out which components are the most important to put into the models from a well-defined array of mechanisms. “I became interested in using data-driven methods coming out of improvements in computation, as well as optimization from machine learning. But I want to build models with all the characteristics of a more traditional mathematical model.”

Traditional models have knowledge of the physics, chemistry, and integration terms setting the behavior. In her case, “traditional” means constructed by theorists to include known physical terms with specific interpretable meaning — allowing interpretation and understanding of which underlying mechanisms and interactions are most important — as opposed to black box models, which only allow prediction.

Mathematical models usually are built after spending a lot of time thinking about and looking at experiments, and talking to experimentalists to fully understand the physics, chemistry and biology at play within their system of interest.

“There’s a long back-and-forth process of formalizing the interactions and mechanisms happening within the systems into a mathematical description that captures the really important parts of it,” she says, “but doesn’t necessarily try to recapitulate everything that’s happening.”

‘The faster we can build models, the more we can help experimentalists design and control their experiments.’

It’s crucial to determine what really matters in an energy system and what’s controlling its behavior — key parameters in biofuel or solar cell production, for example, can be tuned so that the system is extremely efficient.

And speed is critical because Mangan’s collaborators can often experimentally screen systems quickly — a hundred different systems a week — and it isn’t feasible for the mathematical modelers to keep up with them by building a different model for each system.

“Even if we could build these models faster, not every experiment is going to be informative or worth doing,” she points out. “But the faster we can build models, the more we can help experimentalists design and control their experiments to also be more efficient and waste fewer resources exploring things that aren’t going to work.”

This is where Mangan’s data-driven methods can accelerate the modeling process to keep pace with experimentation. “We use tons of data to train the models, marrying traditional physical and biological modeling with machine learning/statistical learning strategies,” she says. This method enables taking what they already know about systems and combining it “with high-throughput data for an even faster understanding of what’s important in the system so the experimentalists can make good choices. It’s an iterative cycle.”

Though thrilled about the new computational power available and machine learning, optimization and other burgeoning computational methods, Mangan thinks power resides in using what biologists and materials scientists have already learned. As researchers develop a mathematical model and begin to match experimental results, they discover which variables and interactions between variables are promising for experimentalists to study and which ones can be bypassed as less relevant.

She wants to use all of a system’s possibilities and the power of ML and other computational methods to select what is important for a specific system or experiment.

“This allows us to build models quickly so we can use the power of analyzing these models for understanding which terms/physical interactions are most important, prediction, and error control — all things we and the experimentalists would like to understand,” Mangan says. “Models allow us to interrogate states and behaviors of the system that may be more difficult to directly measure but are still constrained by these measurements.”

Her goal is to automate discovery in a way that reduces time spent on tasks that are a little bit redundant in the process of modeling and spend more time thinking about what’s going on within our system so we can solve more interesting problems. She wants “to understand what’s going on within these systems and these methods that can help us do that.”

One fun system Mangan’s group has already applied one of their models to involves hibernation. Cody Fitzgerald, a postdoc, found data that monitors the temperature of Arctic ground squirrels while they hibernate. He learned that their internal body temperatures drop super low (down to minus-4 degrees centigrade) during hibernation but periodically warm back up to their normal temperature for a short time before returning to hibernating.

“We took one of our data-driven models and, after messing around with some data to understand what we were working with, got a bunch of models,” Mangan says.

One of them yielded almost the same behavior as the data but also contained features that resembled as-yet-unmodeled biological hypotheses in the literature. The model suggests that something’s going on with squirrel metabolism, that some chemical decays over time at very low temperatures. When a squirrel gets cold enough, it gets a cue to wake up because it has run out of the molecule and needs to warm up to survive.

The model her group found from the data-driven approach captured exactly this type of behavior. “It’s a very nice, simple model,” Mangan says. “We don’t know if it’s right but are exploring possible models that came out of the algorithm. I’m excited the method was able to produce a model for a system that hadn’t previously been modeled in that way — but it does potentially represent a biological mechanism of what could be happening within that system.”