Automatic self-similarity based form labelling of classical-period piano sonata movements from audio recordings
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Pretoria
Abstract
Musical form is defined as the overall structure of a music piece. It is the arrangement of musical units of harmony, melody, and rhythm in ways that show repetition or variation. It is a musicological property that can be used to group pieces that display the same structure in their composition. The labelling of musical form types (for the purpose of, e.g., querying online music databases) by utilising raw audio alone is a relatively unexplored area in the field of music information retrieval research.
This study investigates the potential of a methodology through which eight form types found in Classical piano sonatas can be labelled successfully by utilising self-similarity matrices which are based on features directly derived from raw audio as input for a convolutional neural network. This work is novel in the sense that it represents the first time that the passing of an entire self-similarity matrix to a convolutional neural network for the purpose of overall musical form identification is proposed and investigated. The eight form types that the study seeks to label correctly are found in piano sonatas composed by Mozart, Beethoven, Haydn, Clementi and Czerny. The raw audio of piano sonatas composed by these composers was obtained from YouTube. The study was designed to specifically focus on sonatas composed for solo piano because of the availability of state-of-the art piano transcription software which can be used to generate features known as piano rolls. This method circumvents the potential difficulties related to inferring form labels in a bottom-up manner based on audio segment boundary detection and segment matching, by directly generating form labels from the audio. For this study, a custom dataset was developed on which to perform the experiments. The custom dataset was created from a representative collection which was obtained from several musicological sources. It was found that there were instances where different form labels were assigned to the same movement from the same piece by different musicologists. For this reason, this study was defined as a multi-label classification problem as opposed to a multi-class classification problem approach followed by other researchers.
Experiments were performed to determine the best musical features to use for deriving self-similarity matrices. The results in this dissertation suggest that self-similarity matrices based on piano rolls proved to be superior to the others that were evaluated. This conclusion is supported by a hypothesis test that compares self-similarity matrices based on velocity piano rolls and self-similarity matrices based on mel-spectrograms. Self-similarity matrices based on velocity piano rolls achieved a macro average area under the receiver operating characteristic curve (ROC-AUC) score of 0.823 and a coverage score of 2.045 on the custom dataset when evaluated using a 20 fold cross-validation testing protocol. The methodology that was developed in this study was shown to outperform an alternative approach reported in the literature when compared in terms of several of the typically applied performance metrics. The study also considered more nuanced aspects of form recognition. Performers will sometimes opt not to play repeats notated in the score. By analysing the outputs of the model for different performances of the same movements, the model was shown to be robust to performers omitting notated repeats.
Description
Dissertation (MEng (Computer Engineering))--University of Pretoria, 2025.
Keywords
UCTD, Music structure analysis, Form recognition, Music information retrieval
Sustainable Development Goals
None
Citation
*