search menu icon-carat-right cmu-wordmark

When Threat Hunting Fails: Identifying Malvertising Domains Using Lexical Clustering

January 2018 Presentation
Matt Foley (Cisco Systems, Inc.), David Rodriguez (Cisco Systems, Inc.), Dhia Mahjoub (OpenDNS)

In this presentation, the authors discuss the current malvertising threat landscape: ad networks, exchanges, exploits, and popular infection points.




In this presentation, the authors introduce a real-time streaming pipeline built in Kafka to stem the initial attack that is observable in DNS logs by using a scalable clustering technique known as locality sensitive hashing (LSH) over the hostnames to identify the permutations of words and characters from “software”, “update”, “tech”, “support”, and more. We then discuss a novel belief propagation algorithm through a client-hostname bipartite graph that propagates up the related file hosts that lay behind malicious advertisements. Finally, we will disclose the anatomy of a malicious advertising campaign and uncover how the file hosts are often reused in malvertising campaigns.