search menu icon-carat-right cmu-wordmark

Comcast Security Analytics Platform

August 2020 Presentation
Gary Gabriel (Comcast), Mason Cheng (Comcast)

This presentation showed practical ways to process large-scale security-related data and analyze it using cloud based infrastructure.





The regular Security Information and Event Management (SIEM) system, while great at quickly searching through the data and making basic correlations, is not built for adding customizable, machine learning-enabled analytics. The Comcast cybersecurity threat analytics team is developing solutions that make use of various security tools and the SIEM, supplementing and extending them with a data lake.

Comcast processes terabytes of security-related logs every day, from many different tools and in many different formats. In addition, it uses lookup data sources such as Active Directory and asset databases. To use all of this data for large-scale security analysis and modeling, we process these logs and lookup data with ETL processes using Apache Spark jobs.

In this presentation, given at FloCon 2020, the presenters explored the design and architecture of our threat analytics system. They described how to use large-scale data platforms in Apache Spark/S3 and Airflow to manage complex ETL pipelines and orchestrate various workflows. They presented how to develop analytical and ML pipelines and modules to detect cyber threats. Also discussed, was how Comcast enables the review of model output using notebooks and dashboards. Notebooks allow for initial model output evaluation. Dashboards are used for the later rounds where we improve on visualization, enhance the data with additional details, and expand the number of people reviewing the results.