IMPROVING UZBEK MACHINE TRANSLATION THROUGH PARALLEL CORPORA: CHALLENGES AND SOLUTIONS

Botirova Nilufar Salimjon kizi

doi:10.5281/zenodo.15590463

IMPROVING UZBEK MACHINE TRANSLATION THROUGH PARALLEL CORPORA: CHALLENGES AND SOLUTIONS

Mualliflar

Botirova Nilufar Salimjon kizi

https://doi.org/10.5281/zenodo.15590463

Kalit so‘zlar

Corpus, corpus linguistics, parallel corpus, translation corpus, comparable corpus, segmentation, machine translation

Annotasiya

The thesis explores the significance of parallel corpora in modern translation studies, focusing on their crucial role in improving machine translation systems, specifically in the context of the Uzbek language. Parallel corpora, which consist of texts in multiple languages aligned at the sentence or paragraph level, are essential for training neural network-based translation systems. The paper outlines the main challenges in creating high-quality parallel corpora, particularly for underrepresented languages like Uzbek. These challenges include limited available resources, contextual mismatching, errors in segmentation and alignment, and copyright issues. The thesis discusses several solutions to these problems, such as building open-access databases, leveraging machine translation systems, using modern alignment tools, and engaging in crowdsourcing efforts. Additionally, it emphasizes the future potential of parallel corpora in advancing translation quality, supporting linguistic research, and promoting the global recognition of the Uzbek language. Ultimately, the paper argues that parallel corpora are not just a scientific resource but a technological tool, bridging the gap between human translators and machine translation systems.

Muallif haqida

Botirova Nilufar Salimjon kizi

PhD student Uzbekistan State World Languages University

Foydalanilgan adabiyotlar ro‘yhati

Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. MT Summit X.2005.

Tiedemann, J. Parallel Data, Tools and Interfaces in OPUS. In LREC.2012.

Bojar, O., et al. Findings of the 2014 Workshop on Statistical Machine Translation. ACL.2014.

Och, F. J., & Ney, H. A systematic comparison of various statistical alignment models. Computational Linguistics.2004.

Resnik, P., & Smith, N. A. The web as a parallel corpus. Computational Linguistics.2003.

Artetxe, M., & Schwenk, H. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Transactions of the ACL.2019.

Sharoff, S. Constructing Comparable Corpora for Low-Resource Languages. Language Resources and Evaluation.2020.

Translators Without Borders – https://translatorswithoutborders.org

OPUS corpus – http://opus.nlpl.eu

LaBSE (Google Research) – https://github.com/google-research/bert

IMPROVING UZBEK MACHINE TRANSLATION THROUGH PARALLEL CORPORA: CHALLENGES AND SOLUTIONS

IMPROVING UZBEK MACHINE TRANSLATION THROUGH PARALLEL CORPORA: CHALLENGES AND SOLUTIONS

Mualliflar

Kalit so‘zlar

Annotasiya

Muallif haqida

Botirova Nilufar Salimjon kizi

Foydalanilgan adabiyotlar ro‘yhati

Downloads

Nashr qilingan

Qanday qilib iqtibos keltirish kerak

Nashr

Sho'ba

Downloads

Shunga o'xshash maqolalar