Automatic self-similarity based form labelling of classical-period piano sonata movements from audio recordings

dc.contributor.advisorJacobs, J. Pieter
dc.contributor.emailu15005357@tuks.co.zaen_US
dc.contributor.postgraduateBurger, Paul Alwyn Desmond
dc.date.accessioned2025-02-10T07:44:39Z
dc.date.available2025-02-10T07:44:39Z
dc.date.created2025-05
dc.date.issued2025-01
dc.descriptionDissertation (MEng (Computer Engineering))--University of Pretoria, 2025.en_US
dc.description.abstractMusical form is defined as the overall structure of a music piece. It is the arrangement of musical units of harmony, melody, and rhythm in ways that show repetition or variation. It is a musicological property that can be used to group pieces that display the same structure in their composition. The labelling of musical form types (for the purpose of, e.g., querying online music databases) by utilising raw audio alone is a relatively unexplored area in the field of music information retrieval research. This study investigates the potential of a methodology through which eight form types found in Classical piano sonatas can be labelled successfully by utilising self-similarity matrices which are based on features directly derived from raw audio as input for a convolutional neural network. This work is novel in the sense that it represents the first time that the passing of an entire self-similarity matrix to a convolutional neural network for the purpose of overall musical form identification is proposed and investigated. The eight form types that the study seeks to label correctly are found in piano sonatas composed by Mozart, Beethoven, Haydn, Clementi and Czerny. The raw audio of piano sonatas composed by these composers was obtained from YouTube. The study was designed to specifically focus on sonatas composed for solo piano because of the availability of state-of-the art piano transcription software which can be used to generate features known as piano rolls. This method circumvents the potential difficulties related to inferring form labels in a bottom-up manner based on audio segment boundary detection and segment matching, by directly generating form labels from the audio. For this study, a custom dataset was developed on which to perform the experiments. The custom dataset was created from a representative collection which was obtained from several musicological sources. It was found that there were instances where different form labels were assigned to the same movement from the same piece by different musicologists. For this reason, this study was defined as a multi-label classification problem as opposed to a multi-class classification problem approach followed by other researchers. Experiments were performed to determine the best musical features to use for deriving self-similarity matrices. The results in this dissertation suggest that self-similarity matrices based on piano rolls proved to be superior to the others that were evaluated. This conclusion is supported by a hypothesis test that compares self-similarity matrices based on velocity piano rolls and self-similarity matrices based on mel-spectrograms. Self-similarity matrices based on velocity piano rolls achieved a macro average area under the receiver operating characteristic curve (ROC-AUC) score of 0.823 and a coverage score of 2.045 on the custom dataset when evaluated using a 20 fold cross-validation testing protocol. The methodology that was developed in this study was shown to outperform an alternative approach reported in the literature when compared in terms of several of the typically applied performance metrics. The study also considered more nuanced aspects of form recognition. Performers will sometimes opt not to play repeats notated in the score. By analysing the outputs of the model for different performances of the same movements, the model was shown to be robust to performers omitting notated repeats.en_US
dc.description.availabilityUnrestricteden_US
dc.description.degreeMEng (Computer Engineering)en_US
dc.description.departmentElectrical, Electronic and Computer Engineeringen_US
dc.description.facultyFaculty of Engineering, Built Environment and Information Technologyen_US
dc.description.sdgNoneen_US
dc.identifier.citation*en_US
dc.identifier.doihttps://doi.org/10.25403/UPresearchdata.28376270en_US
dc.identifier.otherA2025en_US
dc.identifier.urihttp://hdl.handle.net/2263/100630
dc.language.isoenen_US
dc.publisherUniversity of Pretoria
dc.rights© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subjectUCTDen_US
dc.subjectMusic structure analysisen_US
dc.subjectForm recognitionen_US
dc.subjectMusic information retrievalen_US
dc.titleAutomatic self-similarity based form labelling of classical-period piano sonata movements from audio recordingsen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Burger_Automatic_2025.pdf
Size:
10.55 MB
Format:
Adobe Portable Document Format
Description:
Dissertation

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: