Qingwen Zeng successfully defended his Master’s thesis titled „Presentation Attack Detection on ID Documents Using Multimodal Model.“ This is a joint effort of h-da (Germany) and DTU (Denmark).
Abstract:
Presentation Attack Detection (PAD) for ID documents is hindered by domain shifts and limited real training data. This work evaluates a compact multimodal framework based on SmolVLM2 model, which jointly processes visual and textual information. The model is tested under zero-shot inference and two fine-tuning strategies—generative and discriminative—both using a fixed task prompt. Deep learning and unimodal models serve as baselines. Experiments are conducted on genuine ID datasets (Chile, Mexico) and synthetic passport datasets (Poland, Portugal, Spain), following ISO/IEC 30107-3 standards. Results show that zero-shot multimodal inference fails, and supervised adaptation is essential for reliable PAD. The generative fine-tuning strategy achieves the most stable performance on genuine IDs and improves cross-country robustness. However, performance on synthetic data remains inconsistent, indicating that current synthetic benchmarks do not fully capture real-world PAD challenges.
