Profiles in Computing
April 2007

Message passing passage

The big computers of today might not have been possible if some dedicated experts hadn’t gathered at an ‘unappetizing’ hotel in the 1990s.

The high-performance computers that enable today’s cutting-edge science would be radically different – and less efficient – if some dedicated experts hadn’t sequestered themselves in an isolated Dallas hotel in the early 1990s.

The computer researchers, manufacturers and users created the Message Passing Interface (MPI) standard.  Today, MPI is used on virtually every parallel processing computer in the world, including BlueGene/L, the world’s fastest.  And it’s still spreading: About 1,000 times a month, users download MPICH, an MPI implementation created at Argonne National Laboratory, a Department of Energy facility near Chicago.  And MPICH is just one of several MPI implementations users can choose from.

MPI is dominant because it works with many computer languages and designs.  With MPI, “If you have an idea for a machine, you can build it and know what you’re going to run on it,” says Bill Gropp, an Argonne senior computer scientist who helped develop MPI and MPICH.  MPI is vital to today’s high-performance computers, Gropp says.  Without it, “We would have had large, scalable machines, but … I’m not sure we would have gotten as far.  I’m not sure we would have something like BlueGene.”

MPI is key to parallel processing, the technology that makes today’s high-performance computers possible.  Parallel processing breaks up a task so multiple processors can work on it simultaneously.  MPI exchanges data between those hundreds and thousands of processors and processes.

It became clear in the early days of parallel processing – the 1980s and early 1990s – that a message-passing standard was needed.  In those nascent parallel computing days, every company had its own message-passing software, and a scientist had to design applications to specifically work with it.  That meant the application would run only on that kind of computer, “so if someone came out with a better machine you were trapped,” Gropp says.

‘There was really nothing to do (but) work on what we were supposed to be doing.’

Programmers could use a portability layer that “translated” code for the vendor’s system.  But matching the layer’s semantics with the vendor’s software was tricky and time-consuming.  “You either had to have fuzzy semantics, which would cause obscure problems in the code, or precise semantics,” which made the program slow down, Gropp says.  He helped develop Chameleon, an Argonne’s portability layer; the CH in MPICH stands for Chameleon.

Genealogy 1

A graph of the number of downloads of MPICH vs. time for the period from March 1994 through December 2006.

“You either had to have fuzzy semantics, which would cause obscure problems in the code, or precise semantics,” which made the program slow down, Gropp says.  He helped develop Chameleon, an Argonne’s portability layer; the CH in MPICH stands for Chameleon.

Eventually, even computer makers saw the need for a common standard, and in April 1992 the late Ken Kennedy, a prominent Rice University computer researcher, organized a meeting to tackle it.  “He was the one who sort of kicked the anthill,” Gropp says.

Later that year an early MPI was issued by Jack Dongarra and David Walker from DOE’s Oak Ridge National Laboratory; Tony Hey, then at Great Britain’s University of Southampton; and Rolf Hempel, then of GMD, a German math and computing society.  That early MPI was lacking, Gropp says, but it got others interested.  In November 1992, the scientists who used and researched parallel computers joined with hardware firms to found the MPI Forum, dedicated to developing a universal message-passing standard.

For most of 1993 forum members gathered every six weeks “at the same unappetizing hotel” in northern Dallas, Gropp says. The location was so remote, “There was really nothing to do (but) work on what we were supposed to be doing.” Between those meetings and e-mail exchanges, the group created a draft and presented it at the November Supercomputing 93 conference.  A final version came out the next May.

Every segment of the computer community – researchers, users and vendors – had a stake in MPI’s success.  Nonetheless, “Even when we were done we didn’t know if it would take off,” Gropp says.  After all, other standardization projects had failed to gain wide acceptance.

Gropp and his fellow Argonne researchers had a big role in ensuring MPI avoided that fate.  Early in the process Gropp committed his research group to produce a “rolling implementation” of the standard – applying MPI to programming for testing – “just to try to make sure … we didn’t make some sort of boneheaded mistake,” he says.

Gropp and his fellow Argonne researchers later realized the test could be an enduring contribution.  They released it as MPICH, a full MPI implementation, soon after MPI came out.  Having a full implementation meant programmers could use MPI almost immediately, so the standard didn’t just sit on a shelf.

Gropp and his fellow Argonne researchers developed MPICH as side project to their research on parallel processing tools, which was supported by the U.S. Department of Energy’s Office of Advanced Scientific Computing Research (ASCR).  Although the ASCR project didn’t specifically mention MPI, “We had the freedom to do the right thing” and work on it, Gropp says.  If ASCR had required a formal proposal for it, “MPI would have died,” he adds.  “To have some flexibility there to pursue this opportunity in the (computing) community was … a tremendous benefit.”

Users embraced MPICH because it has good performance but demands few of the computer’s resources.  The MPI standard also is attractive because:

  • It’s portable – it works with many different computer languages.
  • It’s fast – each implementation is optimized to run efficiently.
  • It supports libraries of subroutines programmers can use “off the shelf.”
  • There are multiple open source implementations – users can try it at no cost.
  • There is a user community to teach it.

MPI also got a boost soon after its release when the Air Force specified it in a call for weather-modeling software.  That was “a big, welcome surprise,” Gropp says.

Research to improve MPI continues, much of it with support from ASCR.  The office backs work on multiple implementations, including MPICH and OpenMPI.

Having multiple implementations is good, Gropp says: “It pushes us to do something better.  The users all win from that.” Research has led to new releases, MPI-2 and MPICH-2, that added new features, and “There continues to be more to do to make sure MPI is effective on the next generation of hardware,” Gropp says.

MPI is likely to continue as the dominant message-passing library for parallel programming.  Gropp says he’s been told MPI even runs on satellites.

“I doubt that MPI is used in your bank ATM – although it could be – but in technical computing, it’s everywhere,” he adds.