Software Engineering Institute | Carnegie Mellon University
Software Engineering Institute | Carnegie Mellon University

Digital Library


Graph Based Role Mining Techniques for Cyber Security

  • Watch

  • Abstract

    Mathematical methods of role-mining with respect to graphs have found applications across several types of communication networks (e.g. social media), where the communications are modeled as a graph and features of the graph’s structure, such as node degree, are used to group the nodes into roles defined by these feature-based characteristics. In this talk, Kiri proposes tailoring existing role-mining techniques to enterprise networks where the network graph is derived from NetFlow data captured by the enterprise. More specifically, nodes on the graph represent IPs, while an edge between two nodes represents the existence of a flow record where one node is the source IP and the other is the destination IP. This approach allows for the possibility of a directed graph. Additionally, weights can be added to the edges representing, for example, the number of bytes transmitted or the duration of the flow. When role-mining a NetFlow graph, we can go beyond graphical properties when compiling a feature set for each node. We can also incorporate behavioral information from the NetFlow records not otherwise included on the graph. For example, the median duration of all flows in which the IP participates, the total number of packets transmitted to and from the IP, or the number of different ports the IP talks to. We theorize that tracking the distribution of users into roles over time will allow the detection of service outages and cyber attacks, as well as allowing enterprises to monitor the resiliency of their network. To do this type of tracking, we aim to define meaningful roles by which each node on the network can be classified.

    here exists multiple types of meaning for potential roles to take on. First, we consider definitions allowing identification of nodes by the type of hardware (e.g. server, workstation, router). Variations in the distribution of nodes to this type of role could provide resiliency indicators (e.g., membership in a certain role goes down and does not quickly bounce back could mean that when that type of node goes down, there is not an efficient backup plan to replace it or bring it back online). (Note that this example applies more to servers than, say, workstations.) Alternatively, a set of role definitions that classifies nodes based on criticality (e.g., high, medium, low) could direct cyber security operators to nodes requiring the most protection. This talk covers the technical details of our custom role-mining approach and presents initial experimental results from testing on multiple data sets as well as an analysis of their accuracy and applicability to cyber security concerns.

  • Download

Part of a Collection

FloCon 2015 Collection