A typical global climate model does about a thousand trillion calculations to simulate a single year’s worth of interactions among sun, air, earth and sea. Huge numbers of computations also are required to model a variety of dynamic systems, from the sluggish movements in Earth’s interior to the expansion of the universe.

These simulations need the power of high-performance computing (HPC) systems, but it can be tricky to use a large machine effectively. The work must be divided into chunks so different processor cores or nodes (multiple cores) can do separate calculations simultaneously. That’s the essence of parallel computing.

Dividing labor among nodes is no simple matter. Tasks must be allocated to make each group of cores as independent as possible. The challenge is to minimize the number of times in which any stream of calculations depends on data from another stream. Each dependency could make one sequence stop and wait for data from another – so the less data dependency between parallel sequences, the better.

At the same time, tasks must be split into groups of roughly equal size to ensure that each sequence of parallel calculations takes about the same amount of time. Otherwise, processors that complete their tasks early must wait for others to finish, wasting time and resources.

Researchers in the 1990s worked hard to improve how computers parceled out these tasks. Then a team from Sandia National Laboratories in New Mexico unveiled a new approach at the Supercomputing 95 industry conference that year in San Diego. The paper by Bruce Hendrickson and Robert Leland, “A Multilevel Algorithm for Partitioning Graphs,” showed how to break computational problems into roughly equal pieces, with a minimal number of data dependencies between chunks. The algorithm worked better and faster than any other approach and ultimately helped expand the use of graph algorithms in scientific computing from small, niche applications to the mainstream.

‘The goal of every scientist is to make a lasting contribution.’

Hendrickson and Leland implemented the method in their Chaco partitioning software and made it available to the computing community. Nearly 20 years later, the code still is downloaded about 250 times per year. Although many researchers have made algorithmic tweaks and implemented competing versions, the basic ideas Hendrickson and Leland pioneered still are at the heart of most partitioning tools for parallel computing.

In honor of this contribution, Hendrickson and Leland will receive the second Test of Time award at SC14, this year’s supercomputing conference, November 16-21 in New Orleans. The award was established to recognize an outstanding paper from a past conference “that has deeply influenced the HPC discipline,” say the meeting’s sponsors, the Computer Society of the Institute of Electrical and Electronics Engineers (IEEE) and the Association for Computing Machinery.

“The goal of every scientist is to make a lasting contribution to your field. It’s humbling to be told by your community that your work has stood the test of time,” says Hendrickson, senior manager for extreme scale computing at Sandia and affiliated professor of computer science at the University of New Mexico.

“What’s really nice about the award is it creates a natural opportunity to express gratitude to our colleagues,” says Leland, Sandia’s director of computing research. He singles out Hendrickson but “more generally to the others we worked with in that era and also to our laboratory and our sponsors for nurturing an environment in which this work could flourish.” Hendrickson and Leland particularly appreciate the support of DOE’s Applied Mathematics Research Program.

The pair credits computational scientist Horst Simon, now deputy director of Lawrence Berkeley National Laboratory, for opening the path to their algorithm in 1991. Simon’s important insight was that a computer problem could be represented as a graph, a map-like mathematical tool in which key entities such as cities or individuals are represented by vertices and connections between the vertices, such as highways or familial relationships, are depicted as lines or edges.

“Horst published a paper in the early ’90s arguing that (graph theory) was also a useful way of thinking about parallel computing,” Hendrickson says. “For many scientific computing problems, you can view little bits of the computation as vertices in a graph and data dependencies between the computations as edges.”

Once researchers represented a program in this form, a number of established mathematical tools could be used to break it into pieces for parallel computing. “Horst realized that this (problem) can be viewed as cutting a graph into pieces in such a way that each piece has about the same number of vertices, yet the number of edges between the pieces was minimized,” Hendrickson says.

Researchers tried various methods to slice up graphs. One approach cut the graph perpendicularly at the halfway point of its longest axis, handily splitting the vertices into two roughly equal parts. If the cut were made through a narrow region of the graph, a relatively small number of edges would cross between the parallel pieces.

A more effective approach depended less on geometry and more on sophisticated mathematical manipulations. This method, developed by Simon and collaborator Alex Pothen, called for the numerical heavy lifting of determining eigenvectors – mathematical tools that can help identify where to break graphs apart. Simon worked with another collaborator, Stephen Barnard, on multilevel numerical methods to accelerate computing these eigenvectors.

At Sandia, Hendrickson and Leland saw a way to put a new twist on this work. They looked at using these multilevel concepts directly on the partitioning problem, rather than for computing numerical values. Hendrickson says they developed “a simple and elegant approach that we implemented in our graph partitioning toolkit called Chaco.”

A key idea in the algorithm is to merge two adjacent vertices. Doing this many times results in a smaller graph that retains much of the original’s large-scale structure. As the algorithm repeats this process at multiple levels, the overall graph shrinks and the major sections become more self-evident and ready for partitioning.

“The approach was significantly faster than eigenvector-based partitioning, and generally produced better solutions,” Hendrickson says. “It has continued to be the approach of choice ever since, tweaked and improved by many others.”

Leland notes that graph partitioning now is used to solve a number of different problems, many of them unrelated to computational science. “It’s a very general framework for solving a wide variety of problems in which the relationships between entities are important and there is a need to identify regions or communities that interact internally more strongly than externally,” Leland says. “The multilevel approach gives a natural way to analyze this connectivity with more or less resolution.”

In fact, graph partitioning is such a general framework that two other research teams independently arrived at the same basic strategy at about the same time. The teams, however, weren’t working in parallel computing. One wanted to solve linear systems of equations. The other was interested in the placement of complex circuits.

“It seems the time was right for this discovery,” Hendrickson says.