Software Engineering Institute | Carnegie Mellon University
Software Engineering Institute | Carnegie Mellon University

Digital Library

Conference Paper

Blacklist Ecosystem Analysis

  • Abstract

    Motivation: We compare the contents of 86 Internet blacklists to provide a view of the whole ecosystem of blocking network touch points and blacklists. We aim to formalize and evaluate practitioner tacit knowledge of the fatigue of playing "whack-a-mole" against resilient adversary resources. Method: Lists are compared to lists of the same data type (domain name or IP address). Different phases of the study use different comparisons. Comparisons include how many lists an indicator is unique to; list sizes; expanded list characterization and intersection; pairwise intersections of all lists; and following, a statistical test we define to determine if one list adds elements shortly after another. Results: Based on a synthesis of multiple methods, domain-name-based indicators are unique to one list 96.16% to 97.37% of the time. IP-address-based indicators are unique to one list 82.46% to 95.24% of the time. Discussion: There is little overlap between blacklists. Though there are exceptions, the intersection between lists remains low even after expanding each list to a larger neighborhood of related indicators. Few lists consistently provide content before other lists if there is intersection. These results suggest that each blacklist describes a distinct sort of malicious activity and that even merging all lists there is no global ground truth to acquire. Practical insights include (1) network defenders are advised to obtain and evaluate as many lists as practical, (2) "whack-a-mole" is inevitable due to list dynamics, barring a strategic change, an (3) academics comparing their results to one or a few blacklists to test accuracy are advised to reconsider this validation technique.


    See the conference paper at