Classifying music genres. From Scratch: Part III

Published in

Karma and Eggs

4 min readFeb 16, 2020

Music information retrieval using librosa and python

In the previous article, we made a sample pipeline to extract musical information from songs we downloaded. We can save this as a .pkl or a .csv file. Now, my next step is to check if we can spot some differentiating features.. could possibly use to detect genre of a song.

We mainly focus on attributes that define the musical feel of song. It’s rhythm, timbre and dynamics. We will use tempo for rhythm, spectral rolloff, Spectral centroid, Zero crossing rate and spectral bandwidth for timbre, and oa_spec as spectral energy. Let’s see how the data looks like.

I could drop one of spec_cent or rolloff of i had a limitation on resources but since at higher values of both, there is higher variability, i’d like to keep both. Let’s see how these values differ by genres.

Note that this is the mean of features we are plotting, over a sample of ~ 180K snippets of various genres and bands, and specially the variation of Timbre features: spec_cent, spec_bw and rolloff show higher variation compared to spectral energy and Tempo. Let’s check if we can see some clusters in PCA or T-SNE plots. the labels here are signatures of all the combination of genres found in the snippets, Now, a particular song can not be sticking to a particular genre all the time. There are pieces of songs with tunes similar to some other close genre (like RnB and rock ballad!).

In PCA Plot, we have reduced the features to only 2, which represents the data along the highest variance axis. There are some clusters of Grey and pink closer to the middle, while Jazz, RnB and pop are closer genres and are spread along periphery. If we reduce the data using T-SNE and can spot a pattern, a parametric model will surelybe able to learn that well and it looks like the same-we have a spread out rock and jazzy music on the periphery.

Now we’ll set up a Keras neural net to do a multi-class classification of genres.

After training this model for 40 epochs the train accuracy maxed out at 64% and test accuracy at 65% which is a good sign. The reason for low accuracy is the one we discussed above. Let’s say a song is labelled as Hip-Hop, but meanwhile it contains some trance-like beats, in a short 3–4 second snippet. In such a case these snippets are mis-classified. Let’s see what features are most responsible for genre differentiation-

shapley values- Class = genre, features = input features

Feature 2 , 3 and 1 are tempo, spec_cent and spec_bw respectively, followed by rolloff and zero crossing in top-down sequence, and as we discussed above, Timbre features seem to be more prominent in differentiating between genres, and tempo, of course is a major factor.

To test this model, we’ll break a sample song in small snippets as we did in testing case, and try to detect genre of each snippet. we’ll then check the frequency of each genre across all samples- and may the mode win.

All the related code is placed on this github repo as a collection of quick jupyter workbooks. Here’s a snapshot of the code. I’ve tried to input a 3 minute EDM music song. Seems like the mode approach works well.

Making this model was fun, we learned the basic concept of musical structure, the information hidden in the musical frequencies, and how to extract all this to understand the magic called music. I’ll further focus upon more EDA and try to gather other features as well (chroma features, tonnetz, etc) and dig into how bands, genres and emotions associated with music differ from each other.

Moreover, ill focus on sub-genres of rock like classic, hard, punk, ranging to metal, progressive and psychedelic. Please keep yourself posted ;)

Classifying music genres. From Scratch: Part III

Music information retrieval using librosa and python

Written by Anupam vashist