CoLUA: Automatically Predicting Configuration Bug Reports and Extracting Configuration Options

Wei Wen, Tingting Yu, and Jane Huffman Hayes


Abstract

Configuration bugs are among the dominant causes of software failures. Software organizations often use bug-tracking systems to manage bug reports collected from developers and users. In order for software developers to understand and reproduce configuration bugs, it is vital for them to know whether a bug in the bug report is related to configuration issues; this is not often easily discerned due to a lack of easy to spot terminology in the bug reports. In addition, to locate and fix a configuration bug, a developer needs to know which configuration options are associated with the bug. To address these two problems, we introduce CoLUA, a two-step automated approach that combines natural language processing, information retrieval, and machine learning. In the first step, CoLUA selects features from the textual information in the bug reports, and uses various machine learning techniques to build classification models; developers can use these models to label a bug report as either a configuration bug report or a non-configuration bug report. In the second step, CoLUA identifies which configuration options are involved in the labeled configuration bug reports. We evaluate CoLUA on 900 bug reports from three large open source software systems. The results show that CoLUA predicts configuration bug reports with high accuracy and that it effectively identifies the root causes of configuration options.

Supplemental Data

The data and programs can be downloaded at this link