Speed and Accuracy Improvement in Open Set Speaker Identification

Speed and Accuracy Improvement in Open Set Speaker Identification

Mohammad Mehdi Homayounpour, Hadieh Razazan

Abstract

Gaussian Mixture Models (GMM) and Support Vector Machines (SVM) exhibit uncorrelated error regions, so they can be combined to construct a classifier with higher performance. In this paper, an open-set text-independent speaker identification system is presented. In this system GMM capability of speaker modeling and discriminative power of SVM, are exploited in order to increase speaker identification accuracy. In training phase, using a validation database and GMM models, a confusion matrix is obtained. This matrix specifies groups of similar speakers. SVM models are trained to distinguish between speakers in each group. In identification phase, speakers are firstly identified by a first level GMM classifier. If the identified speaker falls in a group with similar speakers (confused speakers), second level classifiers i.e. SVMs are used to distinguish between speakers of this group. Identification error rate was reduced from 4.15%, when only GMMs were used, to 1.7% when identification was down by the proposed serial hybrid of GMMs and SVMs. Grouping of speakers was exploited to improve identification speed. In identification phase, first, the group of speaker is determined and then the speaker is identified in this group. This approach improved the identification time from 1.47s (in the case of base system using GMMs and no grouping of speakers) to 0.75s in the best case and 1.15s in the worst case. World model and maximum score normalization methods were also applied and evaluated for open set speaker identification. Both normalization techniques showed a considerable improvement in identification performance.

Keywords

speaker identification, open set, Gaussian mixture model, support vetor machine

References