Classifying music genres. From Scratch: Part II

Music information retrieval using librosa and python

Anupam vashist
Karma and Eggs

--

Hello awesome people. This is in continuation to previous article where we discussed the intuition of Genres in music and wrote a short code snippet to download songs with band and genre label, and save them in a .wav format for further processing. In this article, we’ll try to extract features from these wav files and create a master database for future analysis.

here’s mandatory introduction to Librosa- an awesome music information retrieval tools library which will help us extract relevant features from our wav files. For a particular song, ill be interested in it’s artist/band, genre as external features. the features integral to a song are it’s rhythm, tempo, how the tempo is distributed over the song, auto-correlation to find how vibrant the song is. what tunes are used, how much energy the song contains and how much alive (or dull) the music is.

First up, we’ll load a .way file using librosa.load:

“y” contains amplitude samples of audio data, sampled at rate “sr”, usually 22050 or 44100 samples per second.

depending on how often a particular amplitude value occurs in our sample, we’d get a frequencies comprising our sample “y”. To get these frequencies we’d need to apply a fourier transform to our sample, or straight away use mel frequency cepstral transform (mfcc) which mimic the process that human ears use to decode a number of frequencies. plotting these frequencies on a time axis, we get a spectrogram of frequency vs time domain(we’ll be using spectrograms very often). Alternatively we can perform a continuous Q transform (CQT) to check which pitch classes are active in our sample “y”. Here’s how CQT result plots like.

Spectral centroid: higher value of SC corresponds to more energy of the signal being concentrated within higher frequencies. Basically, it measures the spectral shape and position

Brightness of sound: Used to identify Timbre of music- i.e, a collective measure of what instruments are being used by the quality of sounds they’d be comprised of.

Here’s a link to an awesome reference paper:

https://www.asee.org/documents/zones/zone1/2008/student/ASEE12008_0044_paper.pdf

  • Roll-off frequency is defined for each frame as the center frequency for a spectrogram bin such that at least roll_percent (0.85 by default) of the energy of the spectrum in this frame is contained in this bin and the bins below. This can be used to, e.g., approximate the maximum (or minimum) frequency by setting roll_percent to a value close to 1 (or 0).
  • Zero Crossing Rate (ZCR) is the rate of sign-changes along a signal, zero crossing rates are low for voiced part and high for unvoiced part where as the energy is high for voiced part and low for unvoiced part, also being a key feature to classify percussive sounds
  • Spectral bandwidth It is a measure of the spread of spectrum
  • Tempo: equivalent to BPM of song
  • Tonnetz (German: tone-network) is a conceptual lattice diagram representing tonal space first described by Leonhard Euler in 1739. Various visual representations of the Tonnetz can be used to show traditional harmonic relationships in European classical music.

Here’s how to extract the above features using librosa.

We’ll make a pipeline to extract all relevant features from all of our wav files. for this, we’ll need to first break the song into “intro”, “Chorus”, “verse” and “Outro”. We’ll be doing a quick approximation for this-

Usually, as a song is centered around a lyrical theme, it’s music is also structured into repeating, memorable and sequences of central emphasis, called chorus, and the deeper and detailed sequences called verses. By definition, chorus would be the sequence that’s highly correlated with the song compared to other sequences. Using this logic we’ll break the song into snippets to introduce structure to our data. First, we’ll break the song into auto-correlated snippets, then each snippet is broken into chunks of 1/5th of a second and from these small chunks we’ll extract the data.

Here a link to the github repo containing code for breaking the wav into snippets as well as extracting the following features from our wav file:

Band, Genre, tempo, spectral centroid, spectral bandwidth, rolloff, zero crossing rate, chroma CQT, Tonnetz, MFCC, segment mean “y” and snippet count, beat count etc.

Following this, we’ll have sufficient information about the songs that we have downloaded. In my experiment i downloaded 100 songs and extracted ~180,000 data points ready to be analyzed. Next, we’ll get ito quick EDA and make a basic (but powerful) song genre detector.

Cheers!

--

--