Vikas Gupta defended his Master's thesis on 'File Detection in Network Traffic Using Approximate Matching'

Virtually every day data breach incidents are reported in the news. Scammers, fraudsters, hackers and malicious insiders are raking in millions with sensitive business and personal information. Not all incidents involve cunning and astute hackers. The involvement of insiders is ever increasing. Data information leakage is a critical issue for many companies, especially nowadays where every employee has an access to high speed internet.
In the past, email was the only gateway to send out information but with the advent of technologies like SaaS (e.g. Dropbox) and other similar services, possible routes have become numerous and complicated to guard for an organisation.

Data is valuable, for legitimate purposes or criminal purposes alike. An intuitive approach to check data leakage is to scan the network traffic for presence of any confidential information transmitted. The existing systems use slew of techniques like keyword matching, regular expression pattern matching, cryptographic algorithms or rolling hashes to prevent data leakage. These techniques are either trivial to evade or suffer with high false alarm rate.

In this thesis, 'known file content' detection in network traffic using approximate matching is presented. It performs content analysis on-the-fly. The approach is protocol agnostic and filetype independent. Compared to existing techniques, proposed approach is straight forward and does not need comprehensive configuration. It is easy to deploy and maintain, as only file fingerprint is required, instead of verbose rules.