Topic: A Novel Approach for Generating Synthetic Datasets for Digital Forensics
by Thomas Göbel
D19/2.03a, October 17, 2019 (Thursday), 12.00 noon
Keywords — Synthetic dataset generation, network traffic, operating system data
Increases in the quantity and complexity of digital evidence necessitate the development and application of advanced, accurate and efficient digital
forensic tools. Digital forensic tool testing helps assure the veracity of digital evidence, but it requires appropriate validation datasets.
The datasets are crucial to evaluating reproducibility and improving the state of the art. Datasets can be real-world or synthetic. While
real-world datasets have the advantage of relevance, the interpretation of results can be difficult because reliable ground truth may not exist.
In contrast, ground truth is easily established for synthetic datasets. This chapter presents the hystck framework for generating synthetic
datasets with ground truth. The framework supports the automated generation of synthetic network traffic and operating system and application
artifacts by simulating human-computer interactions. The generated data can be indistinguishable from data generated by normal
human-computer interactions. The modular structure of the framework enhances the ability to incorporate extensions that simulate new applications
and generate new types of network traffic.