A Purdue University assistant professor of computer science leads a group effort to find new and better ways to generate high-performance computing codes that run efficiently on as many different kinds of supercomputer architectures as possible.
That’s the challenging goal Tiark Rompf has set for himself with his recent Department of Energy Early Career Research Program award – to develop what he calls “program generators” for exascale architectures and beyond.
“Programming supercomputers is hard,” Rompf says. Coders typically write software in so-called general-purpose languages. The languages are low-level, meaning “specialized to a given machine architecture. So when a machine is upgraded or replaced, one has to rewrite most of the software.”
As an alternative to this rewriting, which involves tediously translating low-level code from one supercomputer platform into another, programmers would prefer to use high-level languages “written in a way that feels natural” to them, Rompf says, and “closer to the way a programmer thinks about the computation.”
But high-level and low-level languages are far apart, with a steel wall of differences between the ways the two types of languages are written, interpreted and executed. In particular, high-level languages rarely perform as well as desired. Executing them requires special so-called smart compilers that must use highly specialized analysis to figure out what the program “really means and how to match it to machine instructions.”
Rompf and his group propose avoiding that with something called generative programming, which he has worked on since before he received his 2012 Ph.D. from Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland. The idea is to create special programs structured so they’re able to make additional programs where needed.
In a 2015 paper, Rompf and research colleagues at EPFL, Stanford University and ETH Zurich also called for a radical reassessment of high-level languages. “We really need to think about how to design programming languages and (software) libraries that embrace this generative programming idea,” he adds.
Program generators “are attractive because they can automate the process of producing very efficient code,” he says. But building them “has also been very hard, and therefore only a few exist today. We’re planning to build the necessary infrastructure to make it an order of magnitude easier.”
As he noted in his early-career award proposal, progress building program generators is extremely difficult for more reasons than just programmer-computer disharmony. Other obstacles include compiler limitations, differing capabilities of supercomputer processors, the changing ways data are stored and the ways software libraries are accessed. Rompf plans to use his five-year, $750,000 award to evaluate generative programming as a way around some of those roadblocks.
One idea, for instance, is to identify and create an extensible stack of intermediate languages that could serve as transitional steps when high-level codes must be translated into machine code. These also are described as “domain-specific languages” or DSLs, as they encode more knowledge about the application subject than general-purpose languages.
Eventually, programmers hope to entirely phase out legacy languages such as C and Fortran, substituting only high-level languages and DSLs. Rompf points out that legacy codes can be decades older than the processors they run on, and some have been heavily adapted to run on new generations of machines, an investment that can make legacy codes difficult to jettison.
Rompf started Project Lancet to integrate generative approaches into a virtual machine for high-level languages.
Generative programming was the basis for Rompf’s doctoral research. It was described as an approach called Lightweight Modular Staging, or LMS, in a 2010 paper he wrote with his EPFL Ph.D. advisor, Martin Odersky. That’s “a software platform that provides capabilities for other programmers to develop software in a generative style,” Rompf says.
LMS also underpins Delite, a software framework Rompf later developed in collaboration with a Stanford University group to build DSLs targeting parallel processing in supercomputer architectures – “very important for the work I’m planning to do,” he says.
While working at Oracle Labs between 2012 and 2014, Rompf started Project Lancet to integrate generative approaches into a virtual machine for high-level languages. Virtual machines are code that can induce real computers to run selected programs. In the case of Lancet, software executes high-level languages and then performs selective compilations in machine code.
Born and raised in Germany, Rompf joined Purdue in the fall of 2014. It’s “a great environment for doing this kind of research,” he says. “We have lots of good students in compilers, high-performance and databases. We’ve been hiring many new assistant professors. There are lots of young people who all want to accomplish things.”
He calls his DOE Early Career award a great honor. “I think there are many opportunities for future work in getting more of the DOE community in the interaction.” Although he is the project’s only principal investigator, he is collaborating with other groups at Purdue, ETH Zurich and Stanford and has received recent and related National Science Foundation research grants.
As a busy assistant professor, he has six graduate students on track to get their doctorates, plus a varying number of undergraduate assistants. Rompf also is a member of the Purdue Research on Programming Languages group (PurPL), with 10 faculty members and their students.
“It’s a very vibrant group, which like the Purdue computer science department has been growing a lot in recent years,” he says.
Now in its eighth year, the DOE Office of Science’s Early Career Research Program for researchers in universities and DOE national laboratories supports the development of individual research programs of outstanding scientists early in their careers and stimulates research careers in the disciplines supported by the Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.