search menu icon-carat-right cmu-wordmark

Alchemy: Stochastic Data Augmentation for Malicious Network Traffic Detection

August 2020 Presentation
Bo Hu (NTT Group)

This presentation introduces a stochastic method called Alchemy that regenerates a set of feature vectors by randomly resampling the raw traffic data of each bag into several subsets.


NTT Group



Malware and botnets are abused for various types of cyber-crime such as data exfiltration, distributed denial of service (DDoS), and recently data ransom. Existing signature-based network security techniques are designed to detect pre-defined and rule-based traffic patterns.  However, due to the continuous evolution of malware and botnets, these techniques have trouble defending against the increasing types and volumes of these threats. Machine learning has become a promising alternative approach to network security. Many previous studies have aggregated traffic data into groups by hosts or flows for generating features and training detection models.

However, two problems degrade detection performance. One is the scarcity of training sets due to the rarity of new types of malicious traffic. The other is variations in feature values generated from incomplete data due to the limited amount of observed traffic. Existing solutions aim to increase data to enhance the robustness of detection models against these problems. Unfortunately, the regenerated feature vectors may not represent the nature of traffic well enough, since most of these solutions regenerate synthetic feature vectors only on the basis of existing feature vectors without considering the real distribution of raw traffic.