The Long "Taile" of Typosquatting Domain Names
August 2014 • Article
In this USENIX 2014 paper, the authors describe a methodology to improve existing solutions in identifying typosquatting domains and their monetization strategies.
Typosquatting is a speculative behavior that leverages Internet naming and governance practices to extract profit from users’ misspellings and typing errors. Simple and inexpensive domain registration motivates speculators to register domain names in bulk to profit from display advertisements, to redirect traffic to third party pages, to deploy phishing sites, or to serve malware. While previous research has focused on typosquatting domains which target popular websites, speculators also appear to be typosquatting on the “long tail” of the popularity distribution: millions of registered domain names appear to be potential typos of other site names, and only 6.8% target the 10,000 most popular .com domains.
Investigating the entire distribution can give a more complete understanding of the typosquatting phenomenon. In this paper, we perform a comprehensive study of typosquatting domain registrations within the .com TLD. Our methodology helps us to significantly improve upon existing solutions in identifying typosquatting domains and their monetization strategies, especially for less popular targets. We find that about half of the possible typo domains identified by lexical analysis are truly typo domains. From our zone file analysis, we estimate that 20% of the total number of .com domain registrations are true typo domains and their number is increasing with the expansion of the .com domain space. This large number of typo registrations motivates us to review intervention attempts and implement efficient user-side mitigation tools to diminish the financial benefit of typosquatting to miscreants.
This paper was originally presented at USENIX in August 2014. Links to the PDF file, slides, and BibTeX version of the paper are available on the USENIX site.