Automatic self-similarity based form labelling of classical-period piano sonata movements from audio recordings

doi:https://doi.org/10.25403/UPresearchdata.28376270

Automatic self-similarity based form labelling of classical-period piano sonata movements from audio recordings

dc.contributor.advisor	Jacobs, J. Pieter
dc.contributor.email	u15005357@tuks.co.za	en_US
dc.contributor.postgraduate	Burger, Paul Alwyn Desmond
dc.date.accessioned	2025-02-10T07:44:39Z
dc.date.available	2025-02-10T07:44:39Z
dc.date.created	2025-05
dc.date.issued	2025-01
dc.description	Dissertation (MEng (Computer Engineering))--University of Pretoria, 2025.	en_US
dc.description.abstract	Musical form is defined as the overall structure of a music piece. It is the arrangement of musical units of harmony, melody, and rhythm in ways that show repetition or variation. It is a musicological property that can be used to group pieces that display the same structure in their composition. The labelling of musical form types (for the purpose of, e.g., querying online music databases) by utilising raw audio alone is a relatively unexplored area in the field of music information retrieval research. This study investigates the potential of a methodology through which eight form types found in Classical piano sonatas can be labelled successfully by utilising self-similarity matrices which are based on features directly derived from raw audio as input for a convolutional neural network. This work is novel in the sense that it represents the first time that the passing of an entire self-similarity matrix to a convolutional neural network for the purpose of overall musical form identification is proposed and investigated. The eight form types that the study seeks to label correctly are found in piano sonatas composed by Mozart, Beethoven, Haydn, Clementi and Czerny. The raw audio of piano sonatas composed by these composers was obtained from YouTube. The study was designed to specifically focus on sonatas composed for solo piano because of the availability of state-of-the art piano transcription software which can be used to generate features known as piano rolls. This method circumvents the potential difficulties related to inferring form labels in a bottom-up manner based on audio segment boundary detection and segment matching, by directly generating form labels from the audio. For this study, a custom dataset was developed on which to perform the experiments. The custom dataset was created from a representative collection which was obtained from several musicological sources. It was found that there were instances where different form labels were assigned to the same movement from the same piece by different musicologists. For this reason, this study was defined as a multi-label classification problem as opposed to a multi-class classification problem approach followed by other researchers. Experiments were performed to determine the best musical features to use for deriving self-similarity matrices. The results in this dissertation suggest that self-similarity matrices based on piano rolls proved to be superior to the others that were evaluated. This conclusion is supported by a hypothesis test that compares self-similarity matrices based on velocity piano rolls and self-similarity matrices based on mel-spectrograms. Self-similarity matrices based on velocity piano rolls achieved a macro average area under the receiver operating characteristic curve (ROC-AUC) score of 0.823 and a coverage score of 2.045 on the custom dataset when evaluated using a 20 fold cross-validation testing protocol. The methodology that was developed in this study was shown to outperform an alternative approach reported in the literature when compared in terms of several of the typically applied performance metrics. The study also considered more nuanced aspects of form recognition. Performers will sometimes opt not to play repeats notated in the score. By analysing the outputs of the model for different performances of the same movements, the model was shown to be robust to performers omitting notated repeats.	en_US
dc.description.availability	Unrestricted	en_US
dc.description.degree	MEng (Computer Engineering)	en_US
dc.description.department	Electrical, Electronic and Computer Engineering	en_US
dc.description.faculty	Faculty of Engineering, Built Environment and Information Technology	en_US
dc.description.sdg	None	en_US
dc.identifier.citation	*	en_US
dc.identifier.doi	https://doi.org/10.25403/UPresearchdata.28376270	en_US
dc.identifier.other	A2025	en_US
dc.identifier.uri	http://hdl.handle.net/2263/100630
dc.language.iso	en	en_US
dc.publisher	University of Pretoria
dc.rights	© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject	UCTD	en_US
dc.subject	Music structure analysis	en_US
dc.subject	Form recognition	en_US
dc.subject	Music information retrieval	en_US
dc.title	Automatic self-similarity based form labelling of classical-period piano sonata movements from audio recordings	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Burger_Automatic_2025.pdf
Size:: 10.55 MB
Format:: Adobe Portable Document Format
Description:: Dissertation

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations (University of Pretoria)
Theses and Dissertations (Electrical, Electronic and Computer Engineering)

Simple item page