Operating System Research Group

Operating System Research on Energy, Reliability and Autonomy

 Home
People
Publication
Projects
Conferences
Member Only

Reading List

Introduction

As rapid advances in computing hardware have led to dramatic improvement in computer performance, the issues of reliability, availability, maintainability, and cost of ownership are becoming increasingly important. Unfortunately, software bugs continue to be frequent, accounting for as much as 40% of computer system failures. Programmers on average inject 100 defects per thousand lines of code. Software bugs can crash the system, making the service unavailable. Moreover, ``silent'' bugs that go undetected can corrupt information, generating wrong outputs or control commands, and destroying valuable information. According to the National Institute of Standards and Technology, software bugs cost the U.S. economy an estimated $59.5 billion annually, or approximately 0.6% of the gross domestic product! Given the magnitude of the problem, it is crucial that we find effective solutions soon.

Unfortunately, identifying and fixing software bugs is a task that requires enormous human labor. Entire teams are dedicated to test the software and look for anomalies. These anomalies are reported to developers who attempt to find the bug (or bugs) that cause them. Despite this enormous effort, software released to end-users still contains numerous bugs. These bugs continue to consume human time in the form of bug reporting at the user site, user-vendor communication, and subsequent ``bug-fix'' software releases. We need, above all, techniques that automate the process of debugging as much as possible.

A major difficulty with debugging is that many bugs only appear for a particular combination of user inputs and/or hardware configurations. Moreover, some particularly hard bugs such as data races occur only with a particular sequence of interactions between threads in multi-threaded programs. As a result, many bugs that occur in production runs will not be easily reproducible when the program is recompiled with heavy instrumentation and executed in a debugging run. Consequently, it is necessary to provide low-overhead debugging support that can be triggered on production runs.

Overall, we envision a truly effective debugging system as one that is able to detect, characterize (i.e. find the root cause), recover, and correct software bugs automatically, on-the-fly, and on production runs. The goal of our research is to build such a system with a revolutionary combination of innovations in compilers, data mining
algorithms, computer hardware, and operating system support.

Events

bullet

Bi-weekly Meeting: Friday 11:30-12:30pm, 4102

bullet Next meeting: Feb 20, 2004.  Sudarshan Srinivasan will present FlashBack, a light-weight OS extension for rollback and deterministic replay for software debugging

Funding

bullet  NSF Medium-ITR, 2003

 

 



 

Last updated: 05/26/2006.