Assessing interpretability in machine translation models for low-resource  languages

Assessing interpretability in machine translation models for low-resource languages

Files

Gomba_Assessing_2024.pdf (5.09 MB)

Date

2024-12

Publisher

University of Pretoria

Abstract

In recent years, we have seen an increase in the adoption of Large Language Models (LLM) usage across many different applications. A practical example is OpenAI’s ChatGPT, a tool based on InstructGPT that uses pre-training combined with questioning answering and guidance with reinforcement learning with human feedback. A gap that still exists, the need for better coverage of low resource languages, has led to a substantial amount of research focused on multilingual LLMs in the Natural Language Processing (NLP) domain bringing about models such as NLLB-200, Glot500-m, and BLOOM. However, most of these black box multilingual LLMs fail at representing low resource languages, especially when applied to translation tasks, as their internal logic remain hidden from the user. This leaves one unable to account for or explain reasons for failures in real-life translations tasks. This research investigates the performance and interpretability of two models, a LLM and a small-scale model, trained on low-resource language pairs Xhosa Zulu and Tswana-Zulu. Both models make use of the transformer architecture. The research aims to evaluate the differences in translation quality and interpretability between the models, examining the role of attention mechanisms in capturing context and ensuring correct translations. The research aims to evaluate the (1) differences in translation quality and interpretability between models of different scales, (2) the impact of training dataset sizes on translation quality, and (3) the effectiveness of post-model eXplainable AI (XAI) methods in evaluating generated translations and model efficiency in low-resource language settings. The post-model methods used are attention pattern analysis, BLEU scores, MMD scores and human evaluation methods. We conclude that larger models handle linguistic complexities better, training on larger datasets generally improves translation quality, and diverse post-hoc evaluation methods are essential for a comprehensive assessment. This analysis contributes to a better understanding of the strengths and weaknesses of different model scales in machine translation, guiding future developments in XAI for machine translation of languages such as Swati, Tshiluba, Yoruba and other low-resource languages.

Description

Dissertation (MSc (Computer Science))--University of Pretoria, 2024.

Keywords

UCTD, Sustainable Development Goals (SDGs), Interpretability, Machine translation, Transformers, Low-resource languages

Sustainable Development Goals

SDG-04: Quality education

Citation

*

URI

http://hdl.handle.net/2263/100236
DOI: https://doi.org/10.25403/UPresearchdata.28248956.v1

Collections

Theses and Dissertations (University of Pretoria)
Theses and Dissertations (Computer Science)

Full item page