تبديل حرف به صدا در زبان فارسی به کمک شبکه‌های عصبی پرسپترون چندلايه‌ای

محورهای موضوعی : مهندسی برق و کامپیوتر

1 - دانشگاه صنعتی اميرکبير
2 - دانشگاه صنعتی اميرکبير

تاریخ دریافت : 1384/09/28 تاریخ پذیرش : 1385/05/28 تاریخ انتشار : 1386/06/30

کلید واژه: تبديل حرف به صداشبکه عصبی پرسپترون چندلايهالگوريتم ترازبندی حرف- صدازبان فارسی,

چکیده مقاله :

ساخت سيستم‌های اتوماتيک تبديل حرف به صدا برای استفاده در سيستم‌های تبديل متن به گفتار در زبان فارسی، به دليل عدم استفاده از اعراب در نوشتار و در نتيجه مستوربودن بعضی از واکه‌ها مشکل می‌باشد و عموماً اين سيستم‌ها برای زبان فارسی کارآيي پايينی دارند. در اين مقاله ساختار يک سيستم تبديل حرف به صدا با معماری سه‌لايه بررسی شده است. لايه اول اين سيستم قانون- گرا می‌باشد و لايه دوم از پنج شبکه عصبی پرسپترون چندلايه‌ای و يک بخش کنترلر برای تعيين دنباله واجی متناظر با حروف تشکيل شده است. برای تعيين دنباله واجی متناظر با حروف، از شبکه‌های عصبی استفاده می‌شود. بخش کنترلر نيز، خروجی شبکه‌ها را کنترل می‌کند تا دنباله واجی نهايي متناظر با کلمات با ساختار هجابندی فارسی مطابقت داشته باشد. در لايه سوم نيز يک شبکه عصبی برای تعيين حروف مشدد، با استفاده از نتايج مراحل قبل وجود دارد. اجزاء مختلف اين سيستم به گونه‌ای طراحی شده‌اند که در نهايت برای هر کلمه، يک دنباله واجی منطقی توليد گردد. منظور از دنباله واجی منطقی، دنباله واجی می‌باشد که در آن اصول بديهی واج‌نگاری و ساختار هجابندی زبان فارسی رعايت شده باشد. ميزان درستی به دست آمده برای حروف 88% و برای کلمات 61% می‌باشد که برای تبديل حرف به صدای زبان فارسی کارآيي بسيار خوبی می‌باشد.

چکیده انگلیسی:

Construction of letter to sound (LTS) conversion systems in Persian is a difficult task. Because of the omission of some vowels in Farsi orthography, these systems in general have low efficiencies. In this paper, the structure of a letter to sound system, having three-layer architecture, was presented. The first layer is rule-based, and the second layer consists of five multi layer perceptron (MLP) neural networks and a controller section for pronunciations determination. The third layer has a MLP network for detection of geminated letters by using results obtained from the previous steps. The proposed system is designed to produce rational pronunciations for every word, where the rational pronunciation means a phonetic transcription, which follows the correct Farsi syllabification structure and the obvious rules of phonetics. The authors have achieved 88% and 61% correct letters and words performance respectively, which is quite satisfactory for a Farsi language LTS system. The correct letter criterion is the percentage of letters for which the pronunciations have been determined correctly and the correct word criterion is the percentage of words for which the pronunciations of the constituting letters have been determined correctly.

منابع و مأخذ:

[1] R. I. Damper, Y. Marchand, J. -D. S. Marsters, and A. Bazin, "Aligning letters and phonemes for speech synthesis," in Proc. 5th ISCA Speech Synthesis Workshop, pp. 209-213, Jun. 2004.
[2] J. Suontausta and J. Hakkinenen, "Decision tree based text-to-phoneme mapping for speech recognition," in Proc. ICSLP, vol. 2, pp. 831-834, Beijing, China, Oct. 2000.
[3] N. McCulloch, M. Bedworth, and J. Bridle, "NETspeak a re-implementation of NETtalk," Computer Speech and Language, vol. 2, no. 3/4, pp. 289-301, Jun. 1987.
[4] R. I. Damper and J. F. G. Esatmond, "Pronuncing text by analogy," in 16th International Conf. of Computational Linguistics, vol. 2, pp. 268-273, Madrid, Spain, Jul. 1996.
[5] M. Norris, "Time, memory, change and structure in the NETtalk text-to-speech network," in Proc. ACNN’96 Cognitive Models, Workshop Case Study, vol. 2, no. 7, 1996.
[6] T. J. Sejnowski and C. R. Rosenberg, "Parallel networks that learn to pronounce English text," Complex Systems, vol. 1, no. 1, pp. 145-168, Feb. 1987.
[7] O. Andersen, "Comparison of two tree-structured approaches for grapheme-to-phoneme conversion," in Proc. ICSLP’96, vol. 3, pp. 1700-1703, Oct. 1996.
[8] A. K. Kienappel and R. Kneser, "Designing very compact decision trees for grapheme-to-phoneme transcription," in Proc. Eurospeech, pp. 1911-1914, Aalborg, Denmark, Sep. 2001.
[9] R. I. Damper and J. F. G. Eastmond, "Pronunciation by analogy: impact of implementational choices on performance," Language and Speech, vol. 40, no. 1, pp. 1-23, 1997.
[10] Y. Marchand and R. I. Damper, "A multi-strategy approach to improving pronunciation by analogy," Computational Linguistics, vol. 26, no. 2, pp. 195-219, 2000.
[11] C. Bagshaw, "Phonemic transcription by analogy in text-to-speech synthesis: novel word pronunciation and lexicon compression," Computer Speech and Language, vol. 12, no. 2, pp. 119-142, 1998.
[12] M. J. Dedina and H. C. Nusbaum, "PRONOUNCE: A program for pronunciation by analogy," Computer Speech and Language, vol. 5, no. 1, pp. 55-64, 1991
[13] R. W. P. Luk and R. I. Damper, "Inference of letter-phoneme correspondences by delimiting and dynamic time warping techniques," in Proc. ICASSP, vol. 2, pp. 61-64, Mar. 1992.
[14] H. Demuth and M. Beale, Neural Network Toolbox for Use with Matlab, Users Guide Version 3.0, 1998.

مقالات مرتبط

اینورتر منبع امپدانسی فعال جدید با تنش ولتاژ کاهش یافته در دو سرکلیدها
تاریخ چاپ : 1404/09/22
طراحی ساختار مناسب ترانسفورماتور الکترونیک قدرت بر مبنای استفاده از مبدل های چند پورته با قابلیت نصب ذخیره ساز
تاریخ چاپ : 1404/09/22
تولید الگوی آزمون خودکار پیشرفته با استفاده از الگوریتم PSO-FAN
تاریخ چاپ : 1404/09/22
مدل سازی رفتار گذرای وابسته به زمان مدار حلقه قفل فاز دیجیتالی به کمک شبکه¬ی عصبی واحد بازگشتی گیتی
تاریخ چاپ : 1404/09/22
تحلیل عملکرد و ارزیابی کارایی روش¬های مختلف کنترل در درایو موتورهای سنکرون مغناطیس دائم
تاریخ چاپ : 1404/09/22
مروری جامع بر سنتز آرایه‌های آنتن خطی و صفحه‌ای
تاریخ چاپ : 1404/09/22

اشتراک گذاری

آدرس مقاله

تبديل حرف به صدا در زبان فارسی به کمک شبکه‌های عصبی پرسپترون چندلايه‌ای