Everything You Wanted to Know About Blacklists But Were Afraid to Ask

September 30, 2013 • White Paper

By

Leigh B. Metcalf and Jonathan Spring

This document compares the contents of 25 different common public-internet blacklists in order to discover any patterns in the shared entries.

Publisher

Software Engineering Institute

Subjects

Situational Awareness

Abstract

This document compares the contents of 25 different common public-internet blacklists in order to discover any patterns in the shared entries. Some lists contain IP addresses, and other lists contain domain names; these types of lists form the two cohorts that are compared. The contents of the lists are compared directly.

The contents are also expanded to closely related identifiers using a passive DNS data source, and these expanded contents are also expanded. The list contents are also compared temporally to determine which, if either, list consistently provided any shared indicators before another list.

The results demonstrate that most of the time, list contents are unique. There is surprisingly little overlap between any two blacklists. Though there are exceptions to this pattern, the intersection between the lists in general remains low even after expanding each list to a larger neighborhood of related indicators. The results also show that some lists do consistently provide content before certain other lists, but more often there is no intersection in the first place. When there is intersection, there is often no pattern to which list came first. These results suggest that each blacklist is describing a distinct sort of malicious activity. The lists do not appear to converge on one version of all the malicious indicators for the internet-at-large. Network defenders would be advised, therefore, to obtain and evaluate as many lists as practical, since it does not appear that any new list can be rejected out-of-hand as redundant.

When we began this analysis, the allegory of the blind men feeling the elephant came to mind. In the allegory, several blind men are feeling an elephant, trying to determine what it is. Because each man is touching only one part of the elephant, each has his own idea about what it is–one thinks it is a wall, another thinks it is a whip, and another thinks it is a tree trunk. Seeing the entire elephant at once would reveal its true identity. The situation with blacklists on the public internet takes this allegory a step further: like the blind men in the allegory, we cannot see the entire situation at once, but we are also trying to describe many kinds of "animals." We must first figure out what we are describing before we can accurately tie our reports together.

Software Engineering Institute