da/sec scientific talk on Biometrics

Topic: Speaker Verification using i-Vectors

by Andreas Nautsch
FBI D14/3.03, November 13, 2014 (Thursday), 12.00 noon

Keywords — speaker recognition, i-vectors, statistic framework, duration, score normalization, fusion

Short Abstract (German)

Im Rahmen der Masterarbeit wurde das noch sehr junge Identitsätsvektor (i-vector) Verfahren zur biometrischen Authentifikation mittels Stimme, der Sprecher-Verifikation, in Bezug auf Sprachdauer-Effekte untersucht. Dabei wurden ein Industrie-nahes Szenario (3-5s Sprache, deutsche Ziffernketten 0-9) und ein internationaler Foschungswettbewerb des US National Instituts of Standards and Technology (NIST, text-unabhängig, variable Sprachdauern, multi-lingual) betrachtet. Im Rahmen der Thesis konnte die Performanz bestehender Industrie-Verfahren um über 40% mittels einer System-Fusion auf Score-Ebene gesteigert werden und zusätzlich wurde ein bestehendes Score-Normierungsverfahren derart erweitert, dass Dauer-basierte Schwankungen abgeschwächt wurden: zum NIST Baseline-Verfahren konnte die Performanz um 19% gesteigert werden, was der Community im Juni 2014 auf der ISCA Odyssey 2014: The Speaker and Language Recognition Workshop in Joensuu, Finnland, vorgestellt wurde.

Abstract

Speaker verification becomes more important as a biometric key security solution in industry, forensic, and governmental terms. Telephone-based authentication concepts ensuring purposes of data privacy get more popular e.g., data encryption on mobile devices, or user validations on contact-centers. Further, forensic speech analysis is relevant to, i.e. lawsuits where the origin of recorded yells for help is decision-making to distinguish between self-defence or homicide.
Current researches emphasise on text-independent scenarios which e.g., verify on randomised pass-phrases in short duration effort, and on analysing duration-variant speech samples which comprise durations of one second up to many minutes. Thereby, speaker characteristics are modelled by statistical patterns where state-of-the-art research systems prefer template-probe to model-based comparisons, since model-based approaches were shown to be less accurate and having too high computational efforts in duration-variant scenarios. In contrast, template-based systems are known to have disadvantages in short-term scenarios. State-of-the-art researches comprise identity vectors (i-vectors) which describe the speaker-characteristic offset to an universal background model.
The applicability of i-vectors will be evaluated in this thesis by comparing i-vector system to well-established model-based approaches on an industry short duration scenario. Thereby, the i-vector approach will be shown not only to operate robust and fast, but also augment existing technologies, such that equal error rates below 0.5% can be achieved. Further, a new duration-mismatch compensation technique will be presented that increases the robustness and performance of i-vector systems in duration-variant scenarios. This new method was evaluated within a current international evaluation of the National Institute of Standards and Technology (NIST) which examines state-of-the-art i-vector systems: the NIST baseline system could be significantly outperformed by a 19% relative-gain in terms of minimum etection costs. Furthermore, this thesis provides a speaker verification framework design which is based on the ISO/IEC 19795-1:2006 Biometric Performance Testing and Reporting — Part 1: Principles and Framework standard.

Rehearsal of CAST award presentation

Andreas Nautsch: Speaker Verification using i-Vectors. M.Sc. thesis, Hochschule Darmstadt, April 1, 2014.
Program of CAST-Workshop: CAST-Förderpreis IT-Sicherheit 2014