Topic: Duplicate Detection in Face Image Datasets
by Torsten Schlett
online Big Blue Button Room: D19/2.03a, July 20, 2023 (Thursday), 12.00 noon
Keywords — Face Recognition, Face Image Quality Assessment, Image Duplicate Detection, Dataset Cleaning, Error vs Discard Characteristic
Various face image datasets used for face recognition research were assembled by scraping images from the web. While this enables the datasets to include a large variety of real images, near-identical or even exactly identical images can be collected by accident.
The detection of exact duplicates can be efficiently accomplished through the use of exact hash functions. Detecting images with nearly equivalent content, but e.g. different resolutions, is less straightforward. In this talk a simple near-identical duplicate detection approach using two image-hash functions is applied to a selection of face image datasets, and the effect of the removal of those duplicates on face recognition and face image quality assessment is examined.