Using NER and Doc2Vec to cluster South African criminal cases

dc.contributor.advisorMarivate, Vukosi
dc.contributor.emailu13140443@tuks.co.zaen_US
dc.contributor.postgraduateNchachi, Carel Kagiso
dc.date.accessioned2024-09-12T09:42:59Z
dc.date.available2024-09-12T09:42:59Z
dc.date.created2024
dc.date.issued2021
dc.descriptionMini Dissertation (MSc (Computer Science))--University of Pretoria, 2021.en_US
dc.description.abstractThe judicial system is the central pillar of law and order across the world. It is re- sponsible for maintaining order amongst citizens and also solving litigations that arise. Although this system has worked quite well, there still exists several challenges, such as racial biases in cases, shortage of legal professionals and inconsistencies with regards to rulings in cases. These challenges need to be addressed in order to maintain law and order in society and to help strengthen the criminal justice system. Researchers have incorporated Natural Language Processing (NLP) techniques to help address some of these challenges. Focusing primarily on three legal applications, which are Legal Judgment Prediction (LJP), Similar Case Matching (SCM) and Legal Question Answering (LQA)[28]. SCM focuses on identifying the relationships among cases using the available informa- tion. In other words, SCM is focused on segmenting or grouping legal cases. This is especially useful for Common Law judicial systems, where judicial decisions are based on similar and representative cases that have happened in the past. South Africa uses this type of judicial system. Although good progress has been made in SCM applications, there currently exists sev- eral challenges found in the these models. These challenges include using entities found in a legal document to improve the matching of similar cases and the interpretability of these models. In this research we will focus on applying the SCM application on South African criminal cases, by creating a model that will be able to match similar crime cases together. This model will also solve the two challenges currently faced in SCM applications. We found that using a Named Entity Recognizer (NER) with a Paragraph Vector- Distributed memory (PV-DM) model produced better results than using conventional PV-DM or TFIDF model. This model also overcomes the current SCM challenges as it uses the entities found in cases as the main variables for the model (using the NER model). Since the entities help explain how the model mapped similar case, this makes the model also interpretable. Based on the accuracy (similarity score) of the model, we can use this model as tool to segment criminal cases in real life.en_US
dc.description.availabilityUnrestricteden_US
dc.description.degreeMSc (Computer Science)en_US
dc.description.departmentComputer Scienceen_US
dc.description.facultyFaculty of Engineering, Built Environment and Information Technologyen_US
dc.identifier.citation*en_US
dc.identifier.otherA2024en_US
dc.identifier.urihttp://hdl.handle.net/2263/98152
dc.language.isoenen_US
dc.publisherUniversity of Pretoria
dc.rights© 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subjectUCTDen_US
dc.subjectSimilar Case Matchingen_US
dc.subjectJudicial systemen_US
dc.subjectNatural Language Processing (NLP)en_US
dc.subjectNamed Entity Recognizer (NER)en_US
dc.titleUsing NER and Doc2Vec to cluster South African criminal casesen_US
dc.typeMini Dissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nchachi_Using_2021.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format
Description:
Mini Dissertation

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: