DeCypher: Cyber Knowledge Graph Queries Expressed through Natural Language
February 2023 • Presentation
Steven Noel (MITRE)
This session focuses on DeCypher, which represents the first known approach to natural language processing for constructing graph database queries for cyber situational understanding.
Software Engineering Institute
This presentation was given at FloCon 2023, an annual conference that focuses on applying any and all collected data to defend enterprise networks.
This talk examines DeCypher, a system for transforming natural language questions to graph knowledge base queries for cyber situational understanding. DeCypher allows analysts to state in plain English what is needed from a cyber knowledge base, eliminating the need for specialized query language skills. It helps alleviate the cognitive load of users and bridges usability gaps, enabling more effective human-machine interaction and rapid understanding of cyberspace activities.
MITRE’s CyGraph tool has demonstrated advanced capabilities for cyber situational understanding, employing a graph model for reasoning about complex interrelationships. This supports the kind of deep (multi-step) correlation needed for effective cyber analytics such as tracing root causes, discovering adversarial scope, and assessing mission impacts. CyGraph ingests data from various sources and builds a unified graph knowledge base relevant to cyberattacks and mission impacts, bringing together isolated data and events into an ongoing overall picture for situational understanding and decision support. This includes data about network infrastructure, security posture, threats, and mission dependencies, all mapped to entities and relationships in the CyGraph knowledge base.
For ad hoc exploratory analysis and visualization, CyGraph allows users to formulate graph analytic queries to answer specific operational questions. DeCypher allows an analyst to express these queries in natural language. DeCypher then applies machine learning to generate corresponding formal query language for CyGraph. An analyst can then explore query results through interactive graph visualizations within CyGraph.
A key sub-task for DeCypher is intent classification, which predicts the overall category of a user question. DeCypher also performs named entity recognition, which identifies key information (entities) in a question. Semantic similarity checking then matches recognized entities to elements of an inferred data model for a populated CyGraph knowledge base. In comparison to legacy methods for query formulation, analysts complete nearly twice as many time-limited tasks with DeCypher, completing them 21% faster. User satisfaction and confidence are improved by 62% and perceived usability is improved by 49%. In terms of sub-task performance, DeCypher intent classification has an F1 score of 84% and entity recognition has an F1 score of 79%.
DeCypher represents the first known approach to natural language processing for constructing graph database queries for cyber situational understanding. It decomposes the problem in a novel way, applying intent classification, named entity recognition, and model linking. It is domain-agnostic so it can be readily applied to new environments. Our experiments demonstrate that with DeCypher, analysts can formulate ad hoc queries much faster, with significant improvements in user satisfaction, confidence, and perceived usability.