search menu icon-carat-right cmu-wordmark

Fishing for Phishes: Applying Capture-Recapture Methods to Estimate Phishing Populations

October 2007 White Paper
Rhiannon Weaver, M. P. Collins (Redjack)

In this paper, the authors describe addressing phishing problems by estimating population in terms of netblocks and by clustering phishing attempts into scams.

Publisher:

Software Engineering Institute

Abstract

We estimate of the extent of phishing activity on the Internet via capture-recapture analysis of two major phishing site reports. Capture-recapture analysis is a population estimation technique originally developed for wildlife conservation, but is applicable in any environment wherein multiple independent parties collect reports of an activity. 

Generating a meaningful population estimate for phishing activity requires addressing complex relationships between phishers and phishing reports. Phishers clandestinely occupy machines and adding evasive measures into phishing URLs to evade firewalls and other fraud-detection measures. Phishing reports, in the meantime, may be demonstrate a preference towards certain classes of phish.

We address these problems by estimating population in terms of netblocks and by clustering phishing attempts together into scams, which are phishes that demonstrate similar behavior on multiple axes. We generate population estimates using data from two different phishing reports over an 80-day period, and show that these reports capture approximately 40% of scams and 80% of CIDR /24 (256 contiguous address) netblocks involved in phishing.