Concurrency Bug Research by Prof. Shan Lu Receives ASPLOS Influential Paper Award
A 2008 paper by UChicago CS Professor Shan Lu cataloging over 100 concurrency bugs in software written for multi-core processors received the Influential Paper Award from ASPLOS, the premier conference for computer architecture and systems. The research, conducted while Lu was a PhD student at the University of Illinois at Urbana-Champaign, established a taxonomy for the common software bugs faced by programmers working with multi-core architectures, which are now standard in most computing devices.
The award was announced at the 2022 edition of ASPLOS, The International Conference on Architectural Support for Programming Languages and Operating Systems, held in early March. Lu’s paper, “Learning from mistakes: a comprehensive study on real world concurrency bug characteristics,” was one of three to receive the recognition, which goes to papers at least ten years old that have made a major impact on the field.
At the time when Lu wrote her paper with co-authors Soyeon Park, Eunsoo Seo, and her Ph.D. advisor Yuanyuan Zhou, multi-core processors were still a new feature in personal computing. In response to approaching the physical limits of ever-smaller transistors for improved speed, manufacturers started designing chips with multiple, connected processing units. With the right programming, software could utilize the advantages of parallel processing on these architectures, increasing their performance.
However, writing software for these more complicated systems also added the potential for new types of bugs that traditional detection methods, designed for sequential software running on single-core processors, failed to detect. One major group is concurrency bugs, where the multiple “threads” split between cores interfere with each other, corrupting data or crashing the program. While these bugs may be rare, they can be catastrophic, and they are very difficult to detect.
“It was widely believed that concurrency bugs existed in every multi-threaded software, because software developers had not been educated to reason about timing and synchronization among threads. Unfortunately, there were not many well-documented concurrency bugs, because for these bugs to happen, it depends on the timing,” Lu said. “It’s random, you may run the program 1000 times, and it fails once. So a lot of the time these bugs escaped into released software without being noticed by developers.”
While it was common to see papers describing one or a small number of these bugs in a particular type of software, Lu and her colleagues were much more ambitious. By conducting an empirical study of four open-source software projects — MySQL, Apache, Mozilla, and OpenOffice — they were able to survey a large number of real-world concurrency bugs and study the common patterns that emerged.
The work created an important taxonomy of common multi-core concurrency bugs that the field has built upon over the last 14 years, developing bug detectors and fixes that addressed the most pervasive issues. As a result, the paper has been cited over 1100 times, according to Google Scholar.
“The challenge in the bug-finding community is that there are no benchmarks,” Lu said. “So people, when they work on a particular type of problem in this area, they will cite our paper and say that the problem they’re looking at is a major category, according to this empirical study.”
But the study’s citations were just one aspect of why ASPLOS recognized its impact. The format of the study — cataloging a multitude of real-world bugs instead of studying only one or artificially “injecting” a bug — was unusual for the time, and has since become a common method in the field, both for research and within industry. Even finding the right software projects to study was a challenge, in a pre-Github era when it wasn’t easy to find large, well-supported, open-source software projects with enough bug documentation to support the research.
In addition to influencing the field, the paper also set the future course for Lu’s own research. Her group has repeatedly replicated the model of conducting an empirical study of real-world bugs, both for further exploration of concurrency bugs and for bugs plaguing web applications, machine learning APIs, and other modern software approaches.
“I continued to work on this same concurrency bug problem for several years,” Lu said. “But then when I started to branch out to look at other problems, I always started with looking at what type of bugs are out there, then decided what to work on. So that paper is how I have been doing my research all these years.”