Computer Science, New Faces
September 2016

Have stencil, will lift

A University of Washington DOE early-career-awardee, with techniques called verified lifting and stencil programming, aims to boost speeds in climate models and other HPC simulations.

Stencil photo courtesy of WikiMedia Commons.

When he became a University of Washington assistant professor in 2015, Alvin Cheung vowed in a summary of his CV to develop innovative programming languages and techniques to help programmers be more productive while working with “large-scale, data-intensive applications” that have a “real-world impact.”

The newly minted software systems engineer is getting Department of Energy support for those research goals. Cheung recently received a $750,000, five-year DOE Early Career Award to develop ideas he hopes will free researchers from the burdens of rewriting programs to boost high-performance computing speeds in areas like physics and climate simulation.

“I think we are addressing a critical need,” he says. “Scientists don’t want to spend the time doing that. Their focus should be on the actual science, not on writing code.”

Cheung proposes using a program called verified lifting as a bridge to upgrade code initially implemented in the time-tested Fortran general purpose programming language (GPL). With verified lifting, code written using a GPL such as Fortran would automatically be transformed into code written in a newer and more restrictive domain specific language (DSL). The resulting application would perform better than the original Fortran implementation.

In an early demonstration Cheung describes in a new paper, portions of stencil programs – coding patterns of loops iterating over multidimensional data arrays – were automatically rewritten into a DSL called Halide. The resulting application executed up to 24 times faster.

Alvin Cheung and the STNG compiler, which bridges gaps in programming languages. Photo courtesy of the University of Washington (UW); schematic of UW and Massachusetts Institute of Technology.

Alvin Cheung and the STNG compiler, which bridges gaps in programming languages. (Click to enlarge.) Photo courtesy of the University of Washington (UW); schematic of UW, Adobe USA and Massachusetts Institute of Technology.

Stencil programs themselves are especially suitable for such high-end research codes as physical simulations, image processing and machine learning, all of which contain loops over multidimensional arrays representing, for instance, images or time-series spatial data.
When implemented using Fortran, a popular high-performance scientific computing language, stencils often contain multiple loops embedded within each other. Software engineers like Cheung can exploit these embedded loops to translate into DSL equivalents.

The process is laid out in the paper, “Verified Lifting of Stencil Computations,” delivered by Shoaib Kamil of Adobe Research USA in June at the Association for Computing Machinery Programming Language Design and Implementation Conference in Santa Barbara, California. Cheung and Shachar Itzhaky and Armando Solar-Lezama, both of the Massachusetts Institute of Technology, are coauthors.

There may be hundreds of domain-specific languages of potential interest to DOE, Cheung estimates. Each is specialized to optimize a limited number of applications – hence the name. But DSLs lack the flexibility of GPLs like Fortran.

“Modern DSLs pride themselves on providing efficient performance in their implementations,” he says. “If written using such DSLs, your programs will run fast.” Ordinarily, however, “this only works if you are willing to rewrite your existing programs using DSLs. And doing so is often a big pain.” Moreover, “choosing the wrong DSL can have a huge negative impact on performance. And making the choice is, unfortunately, difficult. In many cases, you actually don’t know until you try.”

So researchers who decide on a DSL upgrade often are left with two options, he says. The first is to hire specialists to rewrite their original, complex GPL code into the DSL equivalent – a three-step process.

Translations may fail if the compiler encounters Fortran-DSL incompatibilities.

First, “someone has to do the hard work of understanding what is going on in the original program.” The next step is “trying to search out whether this DSL, a completely different language, can be used to express the computations that were in the original program, and (guess) whether the rewritten program” will perform better.”

The third step is rewriting the program from scratch. Manually rewriting can easily take years, Cheung estimates, an effort much like Microsoft’s merging of its legacy Windows 98 and Windows 2000 programs into Windows XP.

To cut that time and expense, researchers could instead opt for automation with an off-the-shelf compiler that can “take in the source language and spit out – hopefully – the same program (semantically speaking) in the target language,” he says. Such a compiler “tries to translate it sentence by sentence, so to speak.” But such translations may fail if the compiler encounters Fortran-DSL incompatibilities, he adds, and compilers may not always exist for any arbitrary source-target language pairs.

In the paper, Cheung and his colleagues use verified lifting to avoid either scenario. A prototype compiler-synthesizer called STNG looks back and forth to reconstruct key stencil passages into equivalent Halide versions, first by identifying compatible loops in the original Fortran stencils.

The STNG compiler uses a technique called program synthesis that guides the search for, or synthesizes, “summaries” of loops that can work in both languages. This dynamic search for what Cheung calls the best candidate rewrite uses techniques from program analysis, program synthesis and heuristics (rules of thumb).

“Finally, we use a theorem prover to verify that the candidate rewrite the synthesizer found is indeed correct,” he says. STNG has been released under joint development with Adobe researchers.

Cheung is seeking potential collaborators at DOE national laboratories and at research universities like MIT, the University of California, Berkeley, and Arizona State to test the use of verified lifting software on programs that run on supercomputers. “The applications that we want to test could even run at the exascale level,” about a thousand times faster than today’s best machines, he predicts.

Cheung grew up in an immigrant family in San Francisco and was the first to attend college, earning separate undergraduate degrees in electrical engineering, computer engineering and music at Stanford University. He graduated in the top 10 percent of his class for each subject.

“I knew I was going to be either an engineer or a concert pianist,” he recalls. His perpetual interest in dismantling things as a kid and the lure of technology from Silicon Valley won out, as he did a quick Stanford master’s in electrical engineering, specializing in software systems.

Uncertain about whether to pursue further higher education, he spent the next two years as a staff software engineer at IBM’s Almaden Research Center “just to experience how it’s like working in a research environment.”

He finally decided on a Ph.D. program at MIT, where he won the George M. Sprowls Award for the best doctoral thesis in computer science for 2015.

His doctoral thesis described converting a legacy Java program into another database-querying language called SQL. “The origin of verified lifting stems from my thesis,” he says, “though at that point we didn’t call it that and didn’t realize its potential.”