Enhancing digital text collections with detailed metadata to improve retrieval
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Pretoria
Abstract
Digital text collections are increasingly important, as they enable researchers to explore new ways of interacting with texts through the use of technology. Various tools have been developed to facilitate exploring and searching in text collections at a fairly low level of granularity. Ideally, it should be possible to filter the results at a greater level of granularity to retrieve only specific instances in which the researcher is interested.
The aim of this study was to investigate to what extent detailed metadata could be used to enhance texts in order to improve retrieval. To do this, the researcher had to identify metadata that could be useful to filter according to and find ways in which these metadata can be applied to or encoded in texts. The researcher also had to evaluate existing tools to determine to what extent current tools support retrieval on a fine-grained level. After identifying useful metadata and reviewing existing tools, the researcher could suggest a metadata framework that could be used to encode texts on a detailed level. Metadata in five different categories were used, namely morphological, syntactic, semantic, functional and bibliographic. A further contribution in this metadata framework was the addition of in-text bibliographic metadata, to use where sections in a text have different properties than those in the main text.
The suggested framework had to be tested to determine if retrieval was indeed improved. In order to do so, a selection of texts was encoded with the suggested framework and a prototype was developed to test the retrieval. The prototype receives the encoded texts and stores the information in a database. A graphical user interface was developed to enable searching in the database in an easy and intuitive manner.
The prototype demonstrates that it is possible to search for words or phrases with specific properties when detailed metadata are applied to texts. The fine-grained metadata from five different categories enable retrieval on a greater level of granularity and specificity. It is therefore recommended that detailed metadata are used to encode texts in order to improve retrieval in digital text collections.
Keywords: metadata, digital humanities, digital text collections, retrieval, encoding
Description
Thesis (DPhil (Information Science))--University of Pretoria, 2020.
Keywords
UCTD, Information science, Metadata, Digital humanities, Retrieval, Encoding, Digital text collections
Sustainable Development Goals
Citation
Ball, LH 2020, Enhancing digital text collections with detailed metadata to improve retrieval, DPhil (Information Science) Thesis, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/79015>