FarsBayan: A Farsi Speech Synthesizer based on Unit Selection Method

FarsBayan: A Farsi Speech Synthesizer based on Unit Selection Method

Mohamad Mehdi Homayounpour, Majid Namnabat

Abstract

In recent years, the unit selection-based concatenative speech synthesis method using a large corpus has attracted great attention as it produces more natural output speech compared to other known approaches. Also this method has a great potential for improvement, even its base idea seems to be simple. The main components of this technique include a corpus containing variant instances, two criteria, namely target cost and concatenation cost, for evaluation of the instances, and finally a search algorithm for identification and selection of the best instances. In this paper, we present the structure of proposed unit selection synthesis system for Farsi languagethat is entitled FarsBayan. In this research, the constitutive sub-costs of cost measures, the different methods for determining sub-cost weights and pruning algorithms to reduce search space are described. The output speech was found to be remarkably fluent and natural. The quality of the output speech has been evaluated using MOS subjective test, and we have obtained a MOS test score of 3.8 for overall quality.

Keywords

Text-To-Speech Conversion, Concatenative Speech Synthesis, Unit Selection Speech Synthesis, Farsi Language, Target Cost, Concatenation Cost, Viterbi Search Algorithm, MOS Test

References