Tech Blog

Can we teach a neural network to make music?

We have been creating music since prehistoric times, and this art evolved into many genres we enjoy today. Our approach to music ranges from studying music theory to musical improvisation. Can we go a step further and create a model capable of composing music?
It is possible to create a model which can compose music, but we still need to ask ourselves the following questions:

  • Can the music reach the level of human creation?
  • Can we teach it more genres of music?

Neural network VS Human

Here are six clips. Now try to guess which ones were the networks creation, and which are human.

1.

2.

3.

4.

5.

6.

The network created 3, 4, and 6 melody. For now it’s still simple to distinguish the networks ‘masterpieces’ from human creation.

Difference between composition and improvisation

The act of composing typically includes the creation of music notation, such as sheet music or “score” which is then performed by the composer or by other instrumental musicians or singers. Improvisation is the creation of spontaneous music — the act of instantaneous composition by performers, where compositional techniques are employed with or without preparation.
Now that we are aware of the difference between these two terms, which one of them we will associate with our future model? It is probably more accurate to say that the network improvises while learning, because it is comes up with new melodies on top of a set form.
However, our final goal is for the model to create new melodies and new forms by itself, thus, to compose.

How to approach this issue?

To create a model, we first have to realize what the input is for our model and what is the expected output. Let’s go with the obvious answer — the output is a new piece of music, that is pleasant to listen to, and difficult to distinguish if a person or a machine learning model is behind it. But what does that actually mean?

What is music?

For humans, music is an art form and cultural activity whose medium is sound organized in time. This includes common elements such as pitch, rhythm, tempo, dynamics, and the ‘color’ of the sound. Different styles of music may emphasize, de-emphasize or simply omit some of these elements.

At its core, it is a chord sequence with an associated melody.

Where is the pattern?

Music can be divided into genres and subgenres. Certain genres have a pattern in their melody and style which differentiates it from the other genres. Long-term dependencies are the heart of a certain style of music. We can use these dependencies to our advantage and design a model that can compose music of a specific genre. Because of distinct differences between genres, for now let’s focus on handling one genre.

Let’s create a model which takes chords and composes a melody to go with.

What kind of model to use?

Feed-forward networks are not up to the task of composing music for a simple reason — they lack the ability to store information about the past, thus they cannot keep up with the song. We could use Recurrent neural networks (RNN) to keep track of the current music in the hidden layer as memory. Regardless, it does not work as expected. Music composed this way was poorly structured thematically and rhythmically. The culprit is probably the problem of vanishing gradients in RNNs, which makes it unable to work with musics’ long-term dependencies.

To avoid these issues, Long short-term memory (LSTM) architecture is used. LSTM is an RNN architecture which has feedback connections that enable it to compute anything that a Turing machine can. It is capable to process both single data points (like images) and entire sequences of data (like speech or video).

Training

For the training process, our model will observe pairs consisting of a chord and its corresponding melody. These are created as MIDI formated files, in which we have a piano roll matrix where a note can be “on” or “off” – one or zero. We will use snippets of the Beatles.

Testing

When testing, we will provide the network with chords, and the output should be a pleasant sounding melody that goes with it.

Compare and contrast

II

Chords

II

Networks Melody

II

Actual Melody

Conclusion

We can teach a neural network to make music. It is still far from perfect, but it is important to note that this is the first step in creating a network that skillfully composes music.

About Author

Naida Agić is software developer in BPUE and she is working under AIML team. She is passionate about machine learning, responsible and dedicated, goal oriented and fearless in problem solving. One of her biggest passion is mathematics as she is one of the talents at the Faculty of Natural Sciences and Mathematics in Sarajevo.

BPU_Carlos, CTO
22 Oct: Appointment of New CEO in BPU Holdings Inc.

On of October 18, 2019, the Board of Directors have voted with Carlos Arturo Nevarez as the new President and Chief Executive Officer. Having prior experience in similar positions, Carlos is the company Board choice to carry out this position with full responsibility and integrity.

Zimgo