On June 30, 2025, Hans Heje successfully defended his MSc thesis entitled „Vision Language Models for Presentation Attack Detection„.
Abstract:
Presentation Attack Detection (PAD) remains a critical challenge in biometric security, particularly for face recognition systems vulnerable to presentation attacks. This thesis builds upon the current state of the art, which relies solely on the vision transformer while discarding the text transformer, by investigating the integration of soft biometrics—specifically facial attributes—into vision language models to improve PAD performance. A variety of models and experiments are conducted, including zero-shot PAD, zero-shot PAD using attributes, feature extraction, fine-tuning of vision transformers, fine-tuning of vision and text transformers with attributes, guided fine-tuning using attributes, and model fusion. Facial attributes are incorporated into the text encoder, while the foundation models‘ weights are based on FaRL—which is pretrained on significantly less data than the foundation models used in the state of the art—are employed to boost performance. LoRA layers are added during training to facilitate efficient fine-tuning. The results demonstrate that combining facial attributes with FaRL can potentially enhance the robustness and adaptability of PAD models, with the greatest improvements observed in zero-shot scenarios, having an increase in AUC of 3.3 percentage points and a decrease in HTER of 4.6 percentage points. These findings highlight the potential of integrating soft biometrics with domain-specific foundation models to strengthen biometric security.