Optimizing Seed Selection for Fuzzing

August 20, 2014 • Conference Paper

By

Alexandre Rebert (Carnegie Mellon University and ForAllSecure, Inc.), Sang Kil Cha (Carnegie Mellon University), Thanassis Avgerinos (Carnegie Mellon University), Jonathan Foote, David Warren, Gustavo Grieco (CIFASIS-CONICET), and David Brumley (Carnegie Mellon University)

In this paper, we focus on how to mathematically formulate and reason about one critical aspect in fuzzing: how best to pick seed files to maximize the total number of bugs found during a fuzz campaign.

Publisher

Software Engineering Institute

Abstract

This conference paper appears in the Proceedings of the 23rd USENIX conference on Security Symposium (SEC'14), pp. 861-875.

Randomly mutating well-formed program inputs or simply fuzzing, is a highly effective and widely used strategy to find bugs in software. Other than showing fuzzers find bugs, there has been little systematic effort in understanding the science of how to fuzz properly. In this paper, we focus on how to mathematically formulate and reason about one critical aspect in fuzzing: how best to pick seed files to maximize the total number of bugs found during a fuzz campaign. We design and evaluate six different algorithms using over 650 CPU days on Amazon Elastic Compute Cloud (EC2) to provide ground truth data. Overall, we find 240 bugs in 8 applications and show that the choice of algorithm can greatly increase the number of bugs found. We also show that current seed selection strategies as found in Peach may fare no better than picking seeds at random. We make our data set and code publicly available.

Software Engineering Institute