search menu icon-carat-right cmu-wordmark

Assessing Disclosure Risk in Anonymized Datasets

White Paper
In this paper, the authors propose a framework for estimating disclosure risk using conditional entropy between the original and the anonymized datasets.
Publisher

Software Engineering Institute

Abstract

Sharing of log data is a valuable step towards the improvement of network security. However, logs often contain sensitive information and organizations are hesitant to share them. Anonymization methods are used for increasing protection, lowering the disclosure risk to a level considered safe. Accordingly, a metric for anonymity is necessary to quantitatively assess the risk before releasing log data. In this paper, we propose a general framework for estimating disclosure risk using conditional entropy between the original and the anonymized datasets. We demonstrate our approach using network log files.

Part of a Collection

FloCon 2008 Collection

This content was created for a conference series or symposium and does not necessarily reflect the positions and views of the Software Engineering Institute.