Scholars@Duke publication: Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors

Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors

Publication , Conference

Li, M; Narayanan, S

Published in: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

August 18, 2011

It has been previously demonstrated that systems based on block wise local features and Gaussian mixture models (GMM) are suitable for video based talking face verification due to the best trade-off in terms of complexity, robustness and performance. In this paper, we propose two methods to enhance the robustness and performance of the GMM-ZTnorm baseline system. First, joint factor analysis is performed to compensate the session variabilities due to different recording devices, lighting conditions, facial expressions, etc. Second, the difference between the universal background model (UBM) and the maximum a posteriori (MAP) adapted model is mapped into the GMM mean shifted supervector whose over-complete dictionary becomes more incoherent. Then, for verification purpose, the sparse representation computed by l¹-minimization with quadratic constraints is employed to model these GMM mean shifted supervectors. Experimental results show that the proposed system achieved 8.4% (group 1) and 10.5% (group 2) equal error rate on the Banca talking face video database following the P protocol and outperformed the GMM-ZTnorm baseline by yielding more than 20% relative error reduction. © 2011 IEEE.

Duke Scholars

Author Ming Li DKU Faculty

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP.2011.5946773

ISSN

1520-6149

Publication Date

August 18, 2011

Start / End Page

1481 / 1484

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, M., & Narayanan, S. (2011). Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors. In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (pp. 1481–1484). https://doi.org/10.1109/ICASSP.2011.5946773

Li, M., and S. Narayanan. “Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors.” In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 1481–84, 2011. https://doi.org/10.1109/ICASSP.2011.5946773.

Li M, Narayanan S. Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors. In: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2011. p. 1481–4.

Li, M., and S. Narayanan. “Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2011, pp. 1481–84. Scopus, doi:10.1109/ICASSP.2011.5946773.

Li M, Narayanan S. Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2011. p. 1481–1484.

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP.2011.5946773

ISSN

1520-6149

Publication Date

August 18, 2011

Start / End Page

1481 / 1484