As rapid advances in computing hardware have led to dramatic improvement in
computer performance, the issues of reliability, availability, maintainability,
and cost of ownership are becoming increasingly important. Unfortunately,
software bugs continue to be frequent, accounting for as much as 40% of
computer system failures. Programmers on average inject 100 defects per thousand
lines of code. Software bugs can crash
the system, making the service unavailable. Moreover, ``silent'' bugs that go
undetected can corrupt information, generating wrong outputs or control
commands, and destroying valuable
information. According to the National Institute of Standards and Technology,
software bugs cost the U.S. economy an estimated $59.5 billion annually, or
approximately 0.6% of the gross domestic product! Given the magnitude of
the problem, it is crucial that we find effective solutions soon.
Unfortunately, identifying and fixing software bugs is a task that requires
enormous human labor. Entire teams are dedicated to test the software and look
for anomalies. These anomalies are reported to developers who attempt to find
the bug (or bugs) that cause them. Despite this enormous effort, software
released to end-users still contains numerous bugs. These bugs continue to
consume human time in the form of bug reporting at the user site, user-vendor
communication, and subsequent ``bug-fix'' software releases. We need, above all,
techniques that automate the process of debugging as much as possible.
A major difficulty with debugging is that many bugs only appear for a particular
combination of user inputs and/or hardware configurations. Moreover, some
particularly hard bugs such as data races occur only with a particular sequence
of interactions between threads in multi-threaded programs. As a result, many
bugs that occur in production runs will not be easily reproducible when the
program is recompiled with heavy instrumentation and executed in a debugging
run. Consequently, it is necessary to provide low-overhead debugging support
that can be triggered on production runs.
Overall, we envision a truly effective debugging system as one that is able to
detect, characterize (i.e. find the root cause), recover, and correct software
bugs automatically, on-the-fly, and on production runs. The goal of our research
is to build such a system with a revolutionary combination of innovations in
compilers, data mining
algorithms, computer hardware, and operating system support.