We are developing scalable graph algorithms and a network analysis framework known as StreamWorks, whereby an analyst may monitor and analyze streaming computer network traffic data to identify emerging computer network intrusions and threats. Different types of graphical query patterns may be defined for specific types of cyberattacks, including various network scans, reflector attacks, flood attacks, viruses, and worms. StreamWorks will support subgraph matching on computer network attributes such as hostnames, IP addresses, protocols, ports, packet sizes, machine types, and message types. The speed of subgraph pattern matching will be accelerated by collecting and utilizing node and edge frequency information to optimize search paths through a massive data graph. Computer network intrusion analysis will involve live computer network data streamed in at high data rates and the analysis of data graphs consisting of millions to billions of edges. For known patterns, specific graphical query patterns are collected in a library and continuously and efficiently matched against the dynamic graph as it is updated. Each graph query is captured as a subgraph join tree that decomposes the query graph into smaller search subpatterns. These smaller subpatterns signify precursor events that emerge early before the full query pattern is complete. As precursor events are detected in data streams, they are matched to the nodes of different subgraph join trees. Matching that occurs higher in a join tree indicates a higher probability that a specific type of attack is occurring. A similarity or confidence score may be computed for partial matches through training on collected computer network traffic data to measure the frequencies of occurrence of partial subpatterns that precede the full graph query pattern. For unknown patterns or zero-day exploits, the same analysis framework may be applied to track the emergence of small subpatterns as they appear in the data stream. The system may be seeded with hints to look for small graph patterns that involve rare events (based on collected statistics), events involving critical resources such as an authentication server, domain name server, database, etc., or particular host machines of specific suspicions or interests to analysts. When seeded subpatterns are found in the data stream, they are tracked and monitored within subgraph join trees. Here, subpatterns are joined based on specific criteria such as when the subpatterns grow beyond a certain threshold size, additional critical resources are introduced into a subpattern, or important types of interactions or communications are detected. Thus, full attack patterns may dynamically emerge from the small seeded patterns or hints. The initial seeded patterns may have confidence scores generated from collected statistics or assigned by analysts and are then propagated up through the subgraph join tree. Additionally, StreamWorks will provide mechanisms for analysts to vet tracked subpatterns so as to improve analysis and performance by eliminating benign patterns from being monitored and assessed.
In this presentation, the authors describe the emerging graph pattern approach and the system design of StreamWorks and demonstrate its emerging threat detection capabilities.