Subspace Gaussian mixture models for automatic speech recognition

ridm@nrct.go.th ระบบคลังข้อมูลงานวิจัยไทย รายการโปรดที่คุณเลือกไว้

Subspace Gaussian mixture models for automatic speech recognition

หน่วยงาน Edinburgh Research Archive, United Kingdom

รายละเอียด

ชื่อเรื่อง	:	Subspace Gaussian mixture models for automatic speech recognition
นักวิจัย	:	Lu, Liang
คำค้น	:	subspace model , speech recognition , noise , multilingual
หน่วยงาน	:	Edinburgh Research Archive, United Kingdom
ผู้ร่วมงาน	:	Renals, Stephen , Ghoshal, Arnab
ปีพิมพ์	:	2556
อ้างอิง	:	http://hdl.handle.net/1842/8065
ที่มา	:	-
ความเชี่ยวชาญ	:	-
ความสัมพันธ์	:	Lu, L., Chin, K., Ghoshal, A., and Renals, S. (2012). Noise compensation for subspace Gaussian mixture models. In Proc. INTERSPEECH. , Lu, L., Chin, K., Ghoshal, A., and Renals, S. (2013). Joint uncertainty decoding for noise robust subspace Gaussian mixture models. IEEE Transactions on Audio, Speech, and Language Processing. , Lu, L., Ghoshal, A., and Renals, S. (2011). Regularized subspace Gaussian mixture models for cross-lingual speech recognition. In Proc. IEEE ASRU. , Lu, L., Ghoshal, A., and Renals, S. (2011). Regularized subspace Gaussian mixture models for speech recognition. IEEE Signal Processing Letters, 18(7):419–422. , Lu, L., Ghoshal, A., and Renals, S. (2012). Joint uncertainty decoding with unscented transforms for noise robust subspace Gaussian mixture models. In Proc. SAPASCALE Workshop. , Lu, L., Ghoshal, A., and Renals, S. (2012). Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition. In Proc. ICASSP. , Lu, L., Ghoshal, A., and Renals, S. (2013). Noise adaptive training for subspace Gaussian mixture models. In Proc. INTERSPEECH.
ขอบเขตของเนื้อหา	:	-
บทคัดย่อ/คำอธิบาย	:	In most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs) are used to model the density of the emitting states in the hidden Markov models (HMMs). In a conventional system, the model parameters of each GMM are estimated directly and independently given the alignment. This results a large number of model parameters to be estimated, and consequently, a large amount of training data is required to fit the model. In addition, different sources of acoustic variability that impact the accuracy of a recogniser such as pronunciation variation, accent, speaker factor and environmental noise are only weakly modelled and factorized by adaptation techniques such as maximum likelihood linear regression (MLLR), maximum a posteriori adaptation (MAP) and vocal tract length normalisation (VTLN). In this thesis, we will discuss an alternative acoustic modelling approach — the subspace Gaussian mixture model (SGMM), which is expected to deal with these two issues better. In an SGMM, the model parameters are derived from low-dimensional model and speaker subspaces that can capture phonetic and speaker correlations. Given these subspaces, only a small number of state-dependent parameters are required to derive the corresponding GMMs. Hence, the total number of model parameters can be reduced, which allows acoustic modelling with a limited amount of training data. In addition, the SGMM-based acoustic model factorizes the phonetic and speaker factors and within this framework, other source of acoustic variability may also be explored. In this thesis, we propose a regularised model estimation for SGMMs, which avoids overtraining in case that the training data is sparse. We will also take advantage of the structure of SGMMs to explore cross-lingual acoustic modelling for low-resource speech recognition. Here, the model subspace is estimated from out-domain data and ported to the target language system. In this case, only the state-dependent parameters need to be estimated which relaxes the requirement of the amount of training data. To improve the robustness of SGMMs against environmental noise, we propose to apply the joint uncertainty decoding (JUD) technique that is shown to be efficient and effective. We will report experimental results on the Wall Street Journal (WSJ) database and GlobalPhone corpora to evaluate the regularisation and cross-lingual modelling of SGMMs. Noise compensation using JUD for SGMM acoustic models is evaluated on the Aurora 4 database.
บรรณานุกรม	:	APA Chicago MLA Vancouver Lu, Liang . (2556). Subspace Gaussian mixture models for automatic speech recognition. กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom . Lu, Liang . 2556. "Subspace Gaussian mixture models for automatic speech recognition". กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom . Lu, Liang . "Subspace Gaussian mixture models for automatic speech recognition." กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom , 2556. Print. Lu, Liang . Subspace Gaussian mixture models for automatic speech recognition. กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom ; 2556.