Augmented Flow Data has been used to provide greatly enhanced capabilities with respect to anomaly detection, lightweight protocol analysis, and network situational awareness. In particular, as presented at FloCon 2014, new types of semantic augmentation also show promise for dynamic impact assessment, response selection, and alert prioritization. However, there are some unique challenges involved in gathering adequate semantic data. Processing semantic information, whether in the form of graph-based clustering or lexical analysis, is computationally expensive, and flow data often consists of thousands or millions of records per day in even modest networks. Scalability is therefore a critical issue to ensure that augmented flows can be used for practical analysis.
In this talk, we discuss strategies for optimizing the addition of semantic information to flow data to enable it to be used in real time. This includes hierarchical data labeling, parallelization, and statistical approximation approaches. Visualization of large datasets such as flow data is also an ongoing challenge. Identifying effective presentation of semantic relevancy data is an extension of this problem. To assist with data exploration in this new dimension, we present a graph-based GUI tool designed specifically to expose those augmented flow records that are both organizationally relevant and heavily interacted with. We show how this visualization strategy can highlight those records that may be unusual, such as strong relationships which are not mission relevant, and what they can add to overall network situational awareness. Flow data has proven useful as a lightweight informative data source across a broad spectrum of applications, from identifying anomalous behavior, to clustering users and systems based on network communication patterns. As the mining of flow data continues, however, the community has identified applications in which network traffic summaries must be augmented with additional information to meet a specific challenge. In this talk, we consider the problem of automating the identification of mission-centric organizational relationships. We also present an open source distribution of our modeling engine and visualization tool.