Software Engineering Institute | Carnegie Mellon University
Software Engineering Institute | Carnegie Mellon University

Digital Library

Javascript is currently disabled for your browser. For an optimal search experience, please enable javascript.

Advanced Search

Basic Search

Content Type


Publication Date



  • January 2017
  • Nabu is a tool based on the work of NetSimile used for parsing, constructing, and comparing the structural graphs of a large collection of PDF documents.
  • Malware Analysis
  • Publisher: GitHub
  • Abstract

    This tool grew from PDFrankenstein, and now includes javascript in the pdf database. The workflow with Nabu typically consists of three steps:

    1. Building the Database: A graph database is built from a collection of PDFs by parsing the specified PDFs. (PDFs are provided with full paths in a line-separated file.)

    2. Scoring the Database: A list of files is provided to score the graphs for similarity. If the files are not present in the graph database, they are added. Nabu outputs the list in CSV format: subject, family, candidate, score.

    3. Drawing Clusters: Running from the graph database, draw dendogram clusters. Nabu uses scipy and matplotlib to draw the dendogram of the set of PDFs based on the similarity score. It currently uses the Canberra distance metric.

Software Information

Published by GitHub

Get the Software