How to Create a Music Visualizer with Three.JS

 In an attempt to learn THREE.js — the 3D rendering WebGL framework and WebAudio API, I made something that visualises the music in a very simple way. This article documents the whole process.

Final thing first:

(Just use a .mp3 / .mp4 / .wav file to see it work. If you are out, you can use this)

A Primer on WebAudio API

The HTML5’s <audio> tag when combined with the WebAudio API, becomes quite powerful. It’s a dynamic tool that lets you process and adds audio effects dynamically to any kind of audio.

The Web Audio API involves handling audio operations inside an audio context and has been designed to allow modular routing. Basic audio operations are performed with audio nodes, which are linked together to form an audio routing graph. Several sources — with different types of channel layouts — are supported even within a single context. This modular design provides the flexibility to create complex audio functions with dynamic effects.

The audio pipeline starts by creating an audio context. It should have at least a single audio source — which can be thought of as an entry point for external files, mic input, oscillators, etc. Once we have a source in place, the signal is processed and moved ahead in the pipeline using audio nodes. After processing, the signal(s) are routed to the audio destination, which can only be a single one in the whole context.

Modular Routing

The simplest illustration has a single source and a destination, without any effects or processing, inside the context. Why would anyone use this? Maybe they just want to play the sound without any changes.

On the left is an example of a much more complex setup, which can also be made using this API.

Let’s look at the relevant code from the visualiser:

// get the audio file form the possible array of files, the user 
// uploaded
audio.src = URL.createObjectURL(files[0]);
// load the file, and then play it - all using HTML5 audio element's // API
Enter WebAudio API
var context = new AudioContext(); // create context
var src =
context.createMediaElementSource(audio); //create src inside ctx
var analyser = context.createAnalyser(); //create analyser in ctx
src.connect(analyser); //connect analyser node to the src
analyser.connect(context.destination); // connect the destination
// node to the analyser

analyser.fftSize = 512;
var bufferLength = analyser.frequencyBinCount;
var dataArray = new Uint8Array(bufferLength);

Let's skip a couple of other things going on in between
function render() { // this function runs at every update
// slice the array into two halves
var lowerHalfArray = dataArray.slice(0, (dataArray.length/2) - 1);
var upperHalfArray =
dataArray.slice((dataArray.length/2) - 1, dataArray.length - 1);
// do some basic reductions/normalisations
var lowerMax = max(lowerHalfArray);
var lowerAvg = avg(lowerHalfArray);
var upperAvg = avg(upperHalfArray);
var lowerMaxFr = lowerMax / lowerHalfArray.length;
var lowerAvgFr = lowerAvg / lowerHalfArray.length;
var upperAvgFr = upperAvg / upperHalfArray.length;
/* use the reduced values to modulate the 3d objects */
// these are the planar meshes above and below the sphere
makeRoughGround(plane, modulate(upperAvgFr, 0, 1, 0.5, 4));
makeRoughGround(plane2, modulate(lowerMaxFr, 0, 1, 0.5, 4));

// this modulates the sphere's shape.
modulate(Math.pow(lowerMaxFr, 0.5), 0, 1, 0, 8),
modulate(upperAvgFr, 0, 1, 0, 4)
Skipping a few other things here unrelated to the WebAudio API

With respect to the WebAudio API, our only purpose of using it in this project is to extract the attributes of the audio signal and use those to update the visuals. If you look at the code above, the `analyser node` helps us to analyse these musical attributes in real time. Since it does not interferes with the signal and does not changes it in any way, it’s the perfect interface for our use.

This interface (Analyser Node Interface) represents a node which is able to provide real-time frequency and time-domain analysis information. The audio stream will be passed un-processed from input to output.

The default value of FFT Size is 2048, but we chose a lower resolution of 512 as it’s far easier to compute — considering the very primitive beat detection method that I utilised do not need very high resolution and in addition there will be additional computations for the real time 3D updates, so we can safely go ahead with this value for starters.

The buffer length is equal to the bin count, which is half of the FFT size. For an FFT size of 512 the buffer length is of 256 data points. Which means that for every update, we have 256 data points corresponding to the whole audio frequency spectrum that we can utilise for the visualisation, which is dataArray in the code.

The way I have used this data is that I just chopped the array from middle into two halves, the upper half contains all the higher frequencies — roughly the treble and the the lower half contains all the lower frequencies. I wanted something quick and used the aggregate of the lower frequencies to approximate the beat. While the upper half’s consolidated values were used to visualise the texture. Both of these were reduced down to a set of more natural values through some simple transformations.

Making the scene in THREE.js

The basic building blocks:

I started with only the sphere in the scene and transform it. As this turned out to be a simpler task, I added the two planar meshes later on to give the scene more depth. The good part about THREE.js is that after you are done with the basic setup, it’s fairly easy to focus on adding or removing things.

For the scene we need a few basic things:

  • A Scene, which contains everything
  • A Light, or two
  • A Camera
  • A renderer, that renders everything
  • All the objects — which we the sphere & the planes
  • Groups (optional) — they help you group objects together and help you maintain some order
  • a render loop, that will update the scene when needed

You can look at the rest of the code to see everything in details. If you’d like get a proper introduction to THREE.js.

The core idea of the visualiser was about modulating a sphere’s size based on the beat signature, and deform its surface based on the vocal. To make it interesting I used some procedural noise that adds some physical texture to the ball, using audio data as an input.

There are a few a few battle tested methods to add noise and I settled on using Simplex Noise ( more details here ). You can see the code below to understand how the deformation is done in reality.

When we put everything together, we have a simple audio visualiser. Let me know what you think about it. Any ideas, improvements, etc.

Download Now