Improving probabilistic record linkage with a single-layer neural network

Loading...
Thumbnail Image

Date

Authors

Hamersma, Kris A.

Journal Title

Journal ISSN

Volume Title

Publisher

University of Pretoria. Faculty of Engineering, Built Environment and Information Technology. Dept. of Industrial and Systems Engineering

Abstract

Data analysis requires data to be of a high quality. Unfortunately this is not always the case, especially when data is extracted from di erent data sources. In the case where there is no unique identi er to match data records from multiple data sources alternative methods need to be developed to match the records. Record linkage attempts to do this primarily with deterministic and probabilistic approaches. Deterministic models depend on certain corresponding elds from each record pair to be identical matches to match the record pair together. Probabilistic methods use a set of equations called the Fellegi- Sunter formulae to calculate decision-making weights, which is used to score a record pair on how well they match. If the matching score is above a certain threshold, the record pair is considered to be a match. This project investigates whether the development of a learning algorithm that re nes the weights will improve the probabilistic model's matching accuracy. The dataset that was used to train and test the record linkage models was a set of 92650 record pairs, some of which were matches and some of which were non-matches. It was found that a learning algorithm did improve the matching accuracy of the probabilistic model, although it is likely that the increase in the number of input features will improve the matching performance even more.

Description

Mini Dissertation (B Eng. (Industrial and Systems Engineering))--University of Pretoria, 2017.

Keywords

Mini-dissertations (Industrial and Systems Engineering)

Sustainable Development Goals

Citation