Facial recognition systems have become pivotal in various security and identification applications due to their reliability and non-intrusiveness. However, the performance of these systems heavily relies on the quality of the captured biometric sample. A critical concern is the potential for bias when dealing with different demographic groups such as ethnicity, gender, and age.
This thesis aims to evaluate the performance of several Quality Components of the Open Source Face Image Quality (OFIQ) framework across demographic groups to identify any potential biases. OFIQ is a reference implementation for the international standard ISO/IEC DIS 29794-5, developed by the Federal Office for Information Security (BSI). This standard aims to provide a consistent method for assessing facial image quality, which is crucial for various applications. OFIQ evaluates facial images using 34 different quality assessment components. The output of OFIQ’s quality measures is a quality score in the range of 0-100, with a higher score indicating better quality.
The thesis examined the consistency and fairness of these quality components across demographic variables. For this purpose, the two datasets VGG-Face2 and Balanced Faces in the Wild (BFW) were selected for their comprehensive demographic representations. This work is limited to a subset of OFIQ quality components, namely: luminance mean, luminance variance, under-exposure prevention, over-exposure prevention, natural color, and unified quality score.
Key findings revealed significant performance differences of aforementioned quality components across demographic groups. For ethnicity, the African American group exhibited very high discard rates in luminance mean and luminance variance, with discard gaps of up to 54 % and 17 %, respectively, indicating clear biases. The unified quality score, which is based on a model from MagFace, showed the worst performance for the East Asian group, with a mean discard gap of 10.03 % in the critical 0-50 scalar range. This suggests that potential biases in image quality assessment are not exclusively limited to OFIQ but also reflect biases inherent in the underlying MagFace model.
In terms of gender, males were found to be assigned higher quality scores than females, especially considering luminance mean, with discard percentage gaps reaching up to 25 %. Looking at the unified quality score, males were scored marginally better than females with a max discard percentage difference of 11.6 %, indicating a moderate bias against females in these assessments.
For age, the senior group (age above 61) performed the best with regards to the unified quality score, followed by the middle-aged group (aged between 26 and 60), and then the young group (aged 25 and under). A maximum difference in discard percentage from the young to the senior group was found to be 13.68 %, indicating a significant bias against the young group.