Combating hate : how multilingual transformers can help detect topical hate speech

dc.contributor.authorSrikissoon, Trishanta
dc.contributor.authorMarivate, Vukosi
dc.contributor.emailvukosi.marivate@cs.up.ac.zaen_US
dc.date.accessioned2024-05-30T11:03:48Z
dc.date.available2024-05-30T11:03:48Z
dc.date.issued2023
dc.description.abstractAutomated hate speech detection is important to protecting people’s dignity, online experiences, and physical safety in Society 5.0. Transformers are sophisticated pre-trained language models that can be fine-tuned for multilingual hate speech detection. Many studies consider this application as a binary classification problem. Additionally, research on topical hate speech detection use target-specific datasets containing assertions about a particular group. In this paper we investigate multi-class hate speech detection using target-generic datasets. We assess the performance of mBERT and XLM-RoBERTA on high and low resource languages, with limited sample sizes and class imbalance. We find that our fine-tuned mBERT models are performant in detecting gender-targeted hate speech. Our Urdu classifier produces a 31% lift on the baseline model. We also present a pipeline for processing multilingual datasets for multi-class hate speech detection. Our approach could be used in future works on topically focused hate speech detection for other low resource languages, particularly African languages which remain under-explored in this domain.en_US
dc.description.departmentComputer Scienceen_US
dc.description.librarianam2024en_US
dc.description.sdgSDG-09: Industry, innovation and infrastructureen_US
dc.description.sponsorshipThe ABSA Chair of Data Science, the TensorFlow Award for Machine Learning Grant.en_US
dc.description.urihttps://easychair.org/publications/EPiC/Computingen_US
dc.identifier.citationSrikissoon, T. & Marivate, V. 2023, 'Combating hate : how multilingual transformers can help detect topical hate speech', EPiC SeriesinComputing, vol. 93, pp. 203-215. DOI:10.29007/1cm6.en_US
dc.identifier.issn2398-7340 (online)
dc.identifier.other10.29007/1cm6
dc.identifier.urihttp://hdl.handle.net/2263/96304
dc.language.isoenen_US
dc.publisherEasychairen_US
dc.rights© 2023 EasyChair.en_US
dc.subjectHate speechen_US
dc.subjectMachine learningen_US
dc.subjectNatural language processingen_US
dc.subjectSDG-08: Decent work and economic growthen_US
dc.titleCombating hate : how multilingual transformers can help detect topical hate speechen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Srikissoon_Combating_2023.pdf
Size:
433.41 KB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: