Computer Science
April 2007

Sniffing out bad code

A program designed to find and fix bad computer codes now is finding malicious programming and maintaining software on some of the world’s biggest computers.

When Bart Miller became interested in code debugging, he had no idea it would lead him into the world of cyber sleuthing, where programmers duel in a world reminiscent of MAD magazine’s “Spy vs. Spy.”

In the early 1990s, Miller and graduate student Jeff Hollingsworth set out to build tools that could analyze the operation of programs running on high-performance computing systems.  The key to their program, called dynamic instrumentation or Dyninst, was developing a way to grab onto a running code without knowing anything about its source code, analyze it, and patch it without disturbing it, all while it’s still running.

Miller says that when he first presented the idea, other computer scientists met it with severe skepticism.

“People thought these ideas were crazy when we proposed them,” says Miller, a professor of computer science at the University of Wisconsin-Madison.  “The idea that you could grab onto a binary program without knowing anything about it and safely trace its behavior to control it, to monitor it, to do whatever you wanted to do, that just seemed to be a futile exercise.”

‘The virus writers don’t want you to know what their code is doing.’

Miller says without strong support from the Department of Energy’s Office of Advanced Scientific Computing Research, he would not have been able to pull it together.  But by 1995 the group had a working program they named Paradyn.  The demand for Paradyn became so strong that the researchers pulled out the portion of the code that performed the performance audits and assembled a publicly accessible library of Dyninst code anyone could use to build program analysis and instrumentation tools.  Miller says users worldwide have built tools for analyzing performance, debugging, security audits, and software engineering.

After those initial successes, Miller and Hollingsworth, now a professor of computer science at the University of Maryland and co-director of the Paradyn project, were sought after for their expertise in tracing code that had “gone bad.” It could be defective at the programming level, but it also could be infected with malicious commands.  Combating malicious code, or malware as it’s called, has become a full-time profession for many programmers. Miller consults with such groups working in security studies and cyber forensics.

“Cyber forensics, like any form of forensics, is saying, ‘What happened and how did it happen?’ and ‘Who done it?’” Miller says. “If you get infected with a piece of malicious code and you catch it, your big question is ‘What did it do to me?’ You need to be able to go in and do a fairly detailed analysis. You may need to let it run again, but in a carefully controlled way and watching what it does, blocking it if it tries to do something dangerous and studying it under these controlled conditions.”

Dyninst tools make it possible to burrow down into the binary code of a program and report back on what it’s doing. Such tools are invaluable to cyber-sleuths trying to stay on top of malicious code builders.

“The virus writers don’t want you to know what their code is doing,” Miller says .“They try to build code that is tamper-proof, so the code tries to detect when it’s being monitored and it shuts itself down.  It really is kind of like a ‘Spy vs. Spy’ game.  They build a defense, and we try to take it apart.”

One such area of active research is analyzing code that has been stripped of its auxiliary information.  These data structures, called symbol tables, contain the kinds of information that helps program analysts understand how programs are put together.

“A lot of proprietary codes, and certainly a lot of malicious codes, have been stripped of their symbol tables,” Miller says. “Analyzing a program that has been stripped of its symbol tables is quite complex.  But in the last year, we’ve built up Dyninst’s ability to analyze these stripped binary codes.  It has given us the ability to work with malicious code much more effectively.”

At the same time, Dyninst has become a key component of high-performance computing systems.  In a collaboration with DOE’s Los Alamos National Laboratories in New Mexico, Miller helped scale up the Dyninst library to help maintain large-scale physics codes.

“There were just no debugging programs that would run on these very large-scale codes,” Miller says.  “When we first tried to run Dyninst on them it just died in all sorts of horrible and glorious ways.  But through several months of effort we were able to extend Dyninst’s ability to scale up.  And now there are several tools built by Los Alamos scientists to help maintain their physics codes.”

It’s easy to underestimate how difficult it is to build these kinds of programs, but Miller says the combination of his crack research team and collaborations with DOE laboratories has allowed the group to make progress quickly.

“Part of it is clever design, part of it is experience, but mostly I have a really good staff,” he says.