search menu icon-carat-right cmu-wordmark

The Security Wolf of Wall Street: Fighting Crime with High-Frequency Classification and Natural Language Processing

January 2016 Presentation
Jeremiah O'Connor (OpenDNS), Thibault Reuille (OpenDNS)

This presentation focuses on how to build a scalable machine learning infrastructure in real-time.

Publisher:

CERT Division

Abstract

In a world where threat actors move fast and the Internet evolves in a non­deterministic fashion, turning threat intelligence into automated protection has proven to be a challenge for the information security industry. While traditional threat research methods will never go away, there is an increasing need for powerful decision models that can process data in a real-time fashion and scale to incorporate increasingly rich sources of threat intel. In this presentation, given at FloCon 2016, the authors focus on one way to build a scalable machine learning infrastructure in real-time on a massive amount of DNS data (approximately 80B queries per day). The authors offer a sneak peek into how OpenDNS does scalable data science and touch on two core components, Big Data engineering and Big Data science, and specifically discuss how they are used to implement real­time threat detection systems for large­scale network traffic.