search menu icon-carat-right cmu-wordmark

Efficiently Standing Up a Cloud-Based Cybersecurity Data Lake with Minimal Resourcing

Presentation
This presentation highlights a quick and efficient approach to build a cybersecurity data lake, incorporating data that are unique to an organization, and providing coverage that is entirely flexible.
Publisher

Software Engineering Institute

Subjects

Abstract

This presentation was given at FloCon 2023, an annual conference that focuses on applying any and all collected data to defend enterprise networks.

Cybersecurity is a field uniquely characterized by a constantly advancing knowledge base that attempts to match the ever-increasing sophistication of threat actors with a widely diverse range of goals and techniques. Vendor tools that can successfully address complex threat behaviors play a key role, but because of the pace of threat activity and its unpredictable nature, providing 100% coverage is not feasible. Moreover, deployment of the full range of functionality for many vendor tools requires years, particularly in complex environments, which results in cybersecurity gaps.

Vendor tools are meant to work for a wide variety of customers, yet they are seldom able to accurately incorporate or fully leverage the unique sources of data from any given organization. Gaps invariably remain, and organizations that have a level of independent command over their data are better poised to perform the deep analytics needed to fully elucidate a cybersecurity conundrum. A challenge is that the effort to collect and analyze data, at petabyte scale, has traditionally been prohibitive. Building a data lake to serve as a stop-gap solution, or to augment other tools, has not been seen as viable.

This presentation highlights a quick and efficient approach to build a cybersecurity data lake, incorporating data that are unique to an organization, and providing coverage that is entirely flexible. This approach to a lake can be used as a “rolling” stop-gap for a wide variety of purposes that can evolve as dictated both by changes in the threat landscape -- and ability of vendor tools to provide coverage. We will outline the process undertaken for this purpose at a large organization, including the strategy used, required skills, timeline, and how this tall order was made feasible with a limited budget and minimal staff of only 5 people. We will share examples of challenges faced and techniques employed to overcome them, such as maximizing diverse skill-sets, and we will highlight key lessons learned.

Participants will be exposed to a methodology for creating and operating a hybrid cloud-based data lake within a period of months rather than years. They will learn to apply the concept of a small, fusion-based team comprised of individuals with varying backgrounds, who work together closely on a specific capability, and share accountability for both architectural planning as well as deployment and outcomes. Participants will be introduced to tools that enable key gaps and questions to be identified, clearly documented, and expeditiously evaluated, to enable logical and efficient decision-making, even in complex scenarios.

Part of a Collection

FloCon 2023 Assets

This content was created for a conference series or symposium and does not necessarily reflect the positions and views of the Software Engineering Institute.