Fine-tuning retrieval-augmented generation with an auto-regressive language model for sentiment analysis in financial reviews

dc.contributor.authorMathebula, Miehleketo
dc.contributor.authorModupe, Abiodun
dc.contributor.authorMarivate, Vukosi
dc.contributor.emailmiehleketo.mathebula@tuks.co.zaen_US
dc.date.accessioned2025-01-27T06:27:49Z
dc.date.available2025-01-27T06:27:49Z
dc.date.issued2024-12
dc.descriptionDATA AVAILABITY STATEMENT: The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.en_US
dc.descriptionThis article forms part of a special collection titled 'Applications of Data Science and Artificial Intelligence'.en_US
dc.description.abstractSentiment analysis is a well-known task that has been used to analyse customer feedback reviews and media headlines to detect the sentimental personality or polarisation of a given text. With the growth of social media and other online platforms, like Twitter (now branded as X), Facebook, blogs, and others, it has been used in the investment community to monitor customer feedback, reviews, and news headlines about financial institutions’ products and services to ensure business success and prioritise aspects of customer relationship management. Supervised learning algorithms have been popularly employed for this task, but the performance of these models has been compromised due to the brevity of the content and the presence of idiomatic expressions, sound imitations, and abbreviations. Additionally, the pre-training of a larger language model (PTLM) struggles to capture bidirectional contextual knowledge learnt through word dependency because the sentence-level representation fails to take broad features into account. We develop a novel structure called language feature extraction and adaptation for reviews (LFEAR), an advanced natural language model that amalgamates retrieval-augmented generation (RAG) with a conversation format for an auto-regressive fine-tuning model (ARFT). This helps to overcome the limitations of lexicon-based tools and the reliance on pre-defined sentiment lexicons, which may not fully capture the range of sentiments in natural language and address questions on various topics and tasks. LFEAR is fine-tuned on Hellopeter reviews that incorporate industry-specific contextual information retrieval to show resilience and flexibility for various tasks, including analysing sentiments in reviews of restaurants, movies, politics, and financial products. The proposed model achieved an average precision score of 98.45%, answer correctness of 93.85%, and context precision of 97.69% based on Retrieval-Augmented Generation Assessment (RAGAS) metrics. The LFEAR model is effective in conducting sentiment analysis across various domains due to its adaptability and scalable inference mechanism. It considers unique language characteristics and patterns in specific domains to ensure accurate sentiment annotation. This is particularly beneficial for individuals in the financial sector, such as investors and institutions, including those listed on the Johannesburg Stock Exchange (JSE), which is the primary stock exchange in South Africa and plays a significant role in the country’s financial market. Future initiatives will focus on incorporating a wider range of data sources and improving the system’s ability to express nuanced sentiments effectively, enhancing its usefulness in diverse real-world scenarios.en_US
dc.description.departmentComputer Scienceen_US
dc.description.sdgSDG-09: Industry, innovation and infrastructureen_US
dc.description.sdgSDG-12:Responsible consumption and productionen_US
dc.description.sponsorshipThe funded Data Science for Social Impact (DSFI) Group at the University of Pretoria from Google.en_US
dc.description.urihttps://www.mdpi.com/journal/applscien_US
dc.identifier.citationMathebula, M.; Modupe, A.; Marivate, V. Fine-Tuning RetrievalAugmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews. Applied Sciences (Switzerland) 2024, 14, 10782. https://doi.org/10.3390/app142310782.en_US
dc.identifier.issn2076-3417 (online)
dc.identifier.other10.3390/app142310782
dc.identifier.urihttp://hdl.handle.net/2263/100302
dc.language.isoenen_US
dc.publisherMDPIen_US
dc.rights© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an Open Access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).en_US
dc.subjectSentiment analysisen_US
dc.subjectPrompt engineeringen_US
dc.subjectConversational fine-tuningen_US
dc.subjectRetrieval augmented generation assessmenten_US
dc.subjectSDG-09: Industry, innovation and infrastructureen_US
dc.subjectSDG-12: Responsible consumption and productionen_US
dc.subjectLanguage feature extraction and adaptation for reviews (LFEAR)en_US
dc.subjectPre-training of a larger language model (PTLM)en_US
dc.subjectRetrieval-augmented generation (RAG)en_US
dc.subjectAuto-regressive fine-tuning model (ARFT)en_US
dc.subjectLarge language model (LLM)en_US
dc.titleFine-tuning retrieval-augmented generation with an auto-regressive language model for sentiment analysis in financial reviewsen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mathebula_FineTuning_2024.pdf
Size:
4.46 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: