search menu icon-carat-right cmu-wordmark

The Generation and Use of TLS Fingerprints

January 2019 Presentation
Blake Anderson (Cisco Systems, Inc.), David McGrew (Cisco Systems, Inc.), Keith Schomburg (Cisco Systems, Inc.)

In this presentation, the authors describe a TLS fingerprinting system and discusses the common pitfalls when using this type of information and analyzes techniques that make effective use of our newly open-sourced TLS fingerprint database.

Publisher:

Cisco Systems, Inc.

Abstract

There are many TLS implementations in use by different applications and operating systems, each of which evolves as that protocol does. TLS fingerprints offer a way to identify client implementations from passive observations of sessions, and thus to make valuable inferences about the applications, libraries, and operating systems in use. However, to do so reliably requires a complete and regularly updated database of TLS fingerprints, accurate models of the prevalence of and relationships between libraries and processes, and a fingerprint definition that accommodates GREASE and admits a similarity measure. In this presentation, we describe a TLS fingerprinting system that meets these requirements. By fusing detailed network flow data and managed endpoint telemetry from an enterprise network, we have developed the first large-scale system to generate a TLS fingerprint database automatically and continuously. Additionally, our fingerprints naturally capture the intricacies of the information a TLS fingerprint conveys, i.e., each fingerprint is associated with a list of application names, hashes, and version numbers observed utilizing the specified ClientHello parameters, sorted by their empirical prevalence. The fingerprint database is open-source and regularly updated. After the first month of data collection, we had generated nearly 1,000 unique TLS fingerprints that provide coverage for nearly 5,000 unique processes.

Additionally, we will present an analysis of TLS fingerprints in the wild, which our fingerprint database makes possible. First, we analyze the stability of the fingerprint database, i.e., the rate that the environment introduces new TLS fingerprints and the database’s attribution efficacy over time. When our system observes a TLS session in the wild and the database lacks attribution information for that session, we return a set of the closest known fingerprints by using a similarity metric over the space of TLS fingerprints. We leverage our longitudinal data to quantify the effectiveness of this approach. Next, we analyze cases where the TLS fingerprint provides application attribution versus library attribution, and relatedly, the set of fingerprints that uniquely identifies a single application versus a set of applications. Finally, we will use graph analysis based on a graph derived from the fingerprint database that highlights the evolutionary relationships between TLS fingerprints and different application versions