Dynamic Topic Models for the Classification of Music Files

Start: 05/01/2017 - 4:15pm
End  : 05/01/2017 - 5:15pm

Applied Math Seminar

Rebecca Garnett (NAWCWD China Lake)


With the advent of large-scale digital music repositories and personalized streaming radio software, there is a growing need for effective, autonomous methods of music categorization. The majority of published research in this area employ the physics of sound propagation and attempt to draw algorithmic parallels to the human auditory system for classification of music into different genres. However, deep neural network architectures are currently the state of the art for many classification problems. These deep networks typically require large amounts of data, long time scales, and extensive computational resources for training, putting constraints on their ability to be effectively implemented. Motivated by Mallat’s Invariant Scattering Convolution Networks (Bruna, Joan, and Stéphane Mallat. "Invariant scattering convolution networks." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1872-1886.), this work presents some preliminary studies to overcome these limitations. Mallat’s work demonstrated that respecting natural symmetries and adding robustness to deformations using non-linear functions can substantially improve classification. This study exploits these ideas to classify musical audio signals based on learned representations of their spectrograms’ dynamics. First, Nonnegative Matrix Factorization (NMF) was used to obtain a representation of spectrograms. Then the nonlinear max-pooling operator was used to add stability and reduce computational complexity. Finally, Hidden Markov Models (HMMs) were built to characterize the signal dynamics for each genre of music, and samples were classified according to how well they fit each HMM. Employing these HMMs induced a time-independent model, while the non-linear pooling step added robustness to deformations. Testing was executed against the well-studied GTZAN genre dataset and classification was performed using a multi-class Support Vector Machine (SVM). An 86% correct classification rate was achieved.

Emmy Noether Rm Millikan 1021 Pomona College