search menu icon-carat-right cmu-wordmark

Automatically Detecting Technical Debt Discussions

White Paper
This study introduces (1) a dataset of expert labels of technical debt in developer comments and (2) a classifier trained on those labels.
Publisher

Software Engineering Institute

Abstract

Technical debt (TD) refers to suboptimal choices during software development that achieve short-term goals at the expense of long-term quality. Although developers often informally discuss TD, the concept has not yet crystalized into a consistently applied issue type when describing issues in repositories. Application of machine learning to locate technical debt can improve our understanding of TD and help develop practices to manage it. In this study, we manually labeled references to TD for 1,934 tickets in the Chromium issue tracker. We used these labels to train a classifier to estimate labels for an additional 475,000 tickets. Our classifier significantly outperforms key phrase search, and we conclude that discussion of TD appears in about 16% of the tracked Chromium issues. The prevalence of TD demonstrated by our results suggests the need to designate of a new technical debt issue type in issue trackers.