Profiles in Computing
August 2009

Conquering unfriendly skies

Through difficulty, Berkeley Lab’s Cecilia Aragon soars to the stars.

Cecilia Aragon is giving wing to a new way of doing science. It’s not simply an improved tool or technique. Rather, Aragon aims to launch our cognitive abilities into unexplored territory by combining the number-crunching power of computing with the uniquely human talent for pattern recognition.

Cecelia Aragon

Cecelia Aragon

That Aragon is a brilliant, creative pioneer in computer science is now widely recognized, perhaps almost routine. As a staff scientist at Lawrence Berkeley National Laboratory (LBNL), her most recent honor was a 2009 Presidential Early Career Award for Scientists and Engineers. It is not the first time – nor likely will it be the last – that Aragon has been celebrated as a leader in her field.

Separate from her research exploits, she also is considered one of the nation’s top aerobatic pilots – an expert in torque rolls, tailslides, high negative “G” loops and knife-edge flight.

But what makes Aragon’s scientific and personal achievements even more remarkable is when one considers the obstacles she overcame to get where she is today.

“As a child, I was almost cripplingly shy and afraid of everything,” Aragon says. Growing up the child of Hispanic immigrants in a mostly white Indiana community, she regularly experienced the kind of discrimination and social isolation that tends to push, and keep, people down.

Aragon remembers local merchants refusing her family service and seeing a home taken off the market when her parents made an offer to buy. Her school labeled her “slow,” and most of the instruction she got from adults around her was focused on telling her what she couldn’t do or even aspire to.

“I was afraid to do anything that would draw any attention to me,” Aragon says. The one thing she could do, without much notice, was dream about dancing and flying. She imagined levitating and soaring into the sky, moving in graceful, glorious arcs and loops amid the clouds, free from the constraints of mediocrity imposed upon her on the ground.

Combining the power of computing to augment human visual perception and intuition fascinates her.

These dreams eventually manifested themselves as a love for mathematics and computation, another way in which mere mortals could – through another kind of abstract expression – escape the stultifying limitations of corporeal existence.

“I just loved the beauty and patterns of mathematics,” says Aragon, adding that at the same time she began to view computers as a means for extending her thoughts and imagination. “For me, using a computer is just thrilling. It’s an art form, like dancing, that allows us to greatly augment our brain’s abilities.”

This is what Aragon wants to do now for the scientific community. Just as she was once held back by the bias and ignorance of those around her, as well as her own fears, many researchers today are often disadvantaged by data overload.

“It used to be much more time-consuming to collect rather than to analyze scientific data,” Aragon says. That’s not the problem today, she says. Due to the exponential growth in computing power over the past few decades, scientists in many fields are awash in a “data tsunami” that has overwhelmed their ability to fully make use of it.

As a member of LBNL’s Computational Research Division, Aragon is working on enhancing the “scientist-computer interaction” – a subfield of the broader arena known as human-computer interaction. Her main goal is maximizing the discovery potential for large-scale research collaborations.

 This architecture diagram depicts Sunfall’s four main components: Search, which processes starfield images and reduces false-positive supernova candidates; Workflow Status Monitor, a Web-based program to facilitate collaboration and improve researchers’ awareness of data flow; Data Forklift, which coordinates and automates transfers of astronomical data; and Supernova Warehouse, a data management, workflow visualization and collaborative scientific analysis tool that centralizes data from multiple sources.

This architecture diagram depicts Sunfall’s four main components: Search, which processes starfield images and reduces false-positive supernova candidates; Workflow Status Monitor, a Web-based program to facilitate collaboration and improve researchers’ awareness of data flow; Data Forklift, which coordinates and automates transfers of astronomical data; and Supernova Warehouse, a data management, workflow visualization and collaborative scientific analysis tool that centralizes data from multiple sources.

One example is Sunfall, an information visualization system Aragon’s team created to assist astronomers in identifying a rare and fleeting type of supernova that some believe could hold the key to understanding one of the central mysteries of the universe: dark energy.

Teams of scientists created an international astrophysics project known as the Nearby Supernova Factory was a decade ago to identify these unique stellar explosions, known as Type 1a supernovae. They are of special interest because of their extraordinarily uniform brightness, which allows scientists to use them as “standard candles” for measuring the expansion of the universe – and for indirectly observing the effect of dark energy on the expansion rate.

Unfortunately, they occur only rarely – a few times every millennium in each galaxy – and are detectable for only a few weeks to a couple of months. Standard astronomical observation generated only a handful of sightings each year. The SNfactory was designed to capture potential Type 1a supernovae by performing a wide-field analysis of the entire sky, collecting 50 to 80 gigabytes of images every night. The idea was to compare consecutive images daily to identify differences that could be supernovae.

“At the time, it was the largest data volume supernova search in existence,” Aragon says.

But these massive daily collections of night sky images taken from the ground were found to include a lot of visual “noise” from airplanes, cosmic rays and other kinds of interference. As a result, the SNfactory produced something like 500,000 “potential” stellar explosions for every true supernova discovery.

This was better than the previous era of scarce detection, but it required a tremendous amount of tedious analysis as scientists sifted through the most promising images – whittled down to something like 1,000 per day – to separate the stellar wheat from the chaff.

“It was time-consuming, crude work but it was necessary because humans are still much better at pattern recognition than computers,” Aragon says.

In 2005, her team was brought in to help. The researchers began by applying machine-learning algorithms and refined statistical analyses. Their most critical idea, however, combined the computational tools with a novel, visually interactive system that boosted the power of human pattern recognition.

This software framework architecture, dubbed Sunfall (for SUperNova Factory Assembly Line), was created in collaboration with a large group of astronomers, computer scientists and physicists. Now in operation for more than three years, it’s the first visual analytics system in production use for a major astrophysics project. Sunfall is building a stunning database of new Type 1a supernovae observations and making it available to researchers worldwide.

“The idea is to allow the scientists to spend more of their time doing creative work rather than having to dig through the data,” Aragon says.

Sunfall does give scientists more control over their data, but one of Aragon’s former students said the real genius in the approach begins with her mentor’s emphasis on the personal side of the scientist-computer interaction.

“When she first came to the lab, the focus was just on solving technical problems,” says Sarah Poon, a software designer and systems engineer at LBNL. “Cecilia has been one of the lead evangelizers on the need to focus on usability, the social aspects of computing.”

Poon, who met Aragon in 2005 while a student at the University of California-Berkeley, said factoring in the social science side of the equation is even more important to success as scientific experiments become increasingly big and collaborative.

Aragon agrees. “To make the interface most effective, we need to better understand human visual perception, not to mention psychology and sociology,” she says. These aren’t routinely taught as part of computer science, she notes.

Dr. Marti Hearst, an information and computer scientist at UC Berkeley, became Aragon’s adviser when Aragon returned to the university in 2003 after nine years as a computer scientist at the NASA Ames Research Center. Hearst says her student’s focus then was on pilot training and visualization, a natural project since it combined her love for flying in reality and in cyberspace.

“My work is in trying to take abstract, complex information and present it visually,” Hearst says. “Cecilia wanted to take air currents we can’t see but can detect and make them visual to pilots to improve safety when landing.”

Aragon sped through her doctoral thesis in 18 months, an endeavor that led to a prototype system that translated large amounts of real-time airflow and turbulence data into a simple visual display that could be transparently presented to pilots as they viewed the landing area. Hearst and Aragon worked with 17 helicopter pilots to develop the system, in simulation, because helicopter landings involve some of the most hazardous airflow situations.

But these are just Aragon’s most recent contributions to science. After receiving her B.S. in mathematics in 1982 from the California Institute of Technology, she and Raimund Seidel published a 1989 report describing a new form of data structure that vastly improved the ability to store and access information.

It is called a “treap,” a method for managing a complex, ever-changing database by randomly assigning numeric priorities so that all data – no matter how far out on any branch of a binary search tree – is within easy reach. The method is now widely used in applications such as wireless networking and fast parallel computing.

Although Aragon is proud of the treap and is what she’s best known for in computer science circles, she’s more interested today in data-searching techniques.

“Of all our five senses, the visual sense, has by far the highest bandwidth,” she says. Combining the power of computing to augment human visual perception and intuition fascinates her.

The biggest breakthroughs in science, she says, are going to come from large collaborations of researchers using information technologies to augment human creativity and imagination.

Aragon wants to play a key role in helping scientists learn to fly.