Further Thoughts on Musical Evo-Devo

Further Thoughts on Musical Evo-Devo

I keep thinking off and on about writing a program to apply evo-devo to musical composition (part 2 ). There are tons of problems to solve, of course, but I may have made some progress.

Cells → MIDI

The foremost of these, in my mind, is one of implementation: once you’ve wound up with a 2-D array of “cells” (corresponding to measures, or to individual notes, or something), each with its individual soup of activated genes and whatnot, how exactly do you convert that to a series of notes, in a form that can be converted into a MIDI or MP3 file?

One possible approach is to have a receptor in the “cell wall” that can be activated by “enzymes” and assigned a value between 0 and 127 (corresponding to the 128 MIDI note values). When the overall program decides that it’s time to stop growing and start transcribing, it can simply look at each cell in turn and see which notes are activated; or, if the receptor isn’t activated, that corresponds to a rest. The length of the note is determined by the width of the cell.

One problem with this approach is that it doesn’t allow chords. But this need not be a problem, at least at first. We can apply the “Depeche Mode rule”: in the early days, the band Depeche Mode had a rule that they wouldn’t have chords in their music, because on synthesizers of the day, chords sounded particularly fake. So at least for a first cut, we could restrict each instrument to only playing one note at a time, and see what comes out.

One way to fix the no-chords problem, in turn, would be to lift the restriction that there must be exactly one line per instrument. In the past, I’ve assumed that if a cell splits horizontally, it must make two measures out of one; and if it splits vertically, it creates two instruments out of one.

But say we allow a cell to split vertically, with both daughter cells being played on the same instrument. One of them could say “I am the root note” and play C; the other could see that its sibling is the root note, and play E. Thus, the two together would play a C-E chord.

This also solves another problem, that of allowing different fingers to play notes of different lengths. You could, for instance, have the thumb and middle finger play a whole-note C-E chord, while the ring finger and pinky play quarter notes. By allowing instruments to split vertically, and then horizontally, we allow this sort of thing to occur.

Note lengths

I mentioned above that the length of a note should be the length of the cell. By this I mean that every cell should have a certain width. If there is a standard width that corresponds to a quarter note, then cells two units wide would be half notes; one-half unit wide, eighth notes; three units, dotted-quarter notes; and so forth.

Of course, each cell must be allowed to change size: otherwise, every note in the final piece would have the same length, which would be boring. At the same time, we don’t want to encourage cells to take on arbitrary lengths without regard to what their neighbors (the other instruments) are doing: music has a rhythm, which means that for the most part, the drummer and bass player will be playing notes at the same time.

Imagine, then, that the width of a cell is represented by a spring: left to its own devices, it is one unit long. But it can be compressed to become shorter, or extended to become longer, though this takes some effort. Now imagine that each cell is glued to is neighbors above and below, each of which has its own spring that determines its length.

Let’s say we have a stack of four such cells, one above the other. Three of them want to be one unit long, and the fourth wants to be two units long. Since they’re glued together, they are forced to begin and end at the same time. But the fourth cell, which is compressed into one half of its desired length, will push to make the entire stack wider, while the first three, which are happy the way they are, will resist this widening.

The entire stack could then resize itself to a compromise width, like 1.25 units. Or we could institute tyranny of the majority: three cells want to be one unit long, and the one that disagrees has to shut up and take its lumps. We could also institute “peer pressure”, in which, in addition to all of the above, each stack also wants to be the same width as its neighbors. This can all be adusted, depending on what gives better results.

Coevolving populations

I said above that each cell would play a single note of a given length. But perhaps this entire approach is misguided. Perhaps it would be better to have each cell correspond to a measure, rather than an individual note. The music for that measure could be drawn from a separate pool of “organisms”. After all, specific sequences of four quarter notes can be used in many songs in different genres; there’s no need for each “song organism” to reinvent them separately.

In fact, it could be a good idea to have more than one such pool: one for notes, and another for rhythms, since rhythms are an important part of genre: blues uses a lot of [eighth, dotted-quarter, eighth, dotted-quarter] (da-dah, da-dah), while techno and EBM use a lot of [eighth, eighth, eighth, eighth-rest, eighth, eighth, eighth, eighth-rest] (da-da-dah, da-da-dah). Reggae, Ska, Polka, and others have their own distinctive rhythms.

A one-measure cell could then bind itself to the note sequence E-C-F-B, and also to the rhythm [eighth, dotted-quarter, eighth, dotted-quarter] to produce [eighth E, dotted-quarter C, eighth F, dotted-quarter B].

We now have four coevolving symbiotic populations: instruments (where each organism is a set of software synthesizer settings), rhythms (each organism is a series of note lengths), tunes (each organism is a series of notes without length), and songs (each organism chooses from the other populations and organizes them).

Another reason for having a separate pool of tunes is that they can also be used for things like key progressions.

As has been mentioned before, one of the big problems with this project is that of selection: a program can’t tell the difference between a good song and a bad one; ultimately, this has to be determined by a human. But if we split out rhythms and tunes, we can seed the initial population from a collection of MIDI files, which contain musical elements that humans like.

Nor is there any reason to confine ourselves to evolutionary algorithms for tunes and rhythms: we can use Markov chains or neural networks to learn what works and what doesn’t. When a human listens to the final composition and decides that he doesn’t like it, the system will only kill the composition organism; but it can look up which tunes and rhythms it used, and adjust the weights on its corresponding Markov chain or neural network to teach it that that was a mistake.

As with instruments, there should probably be some way for a composition not just to say “give me a rhythm that works”, but more specifically “give me a rhythm that works for blues”. (Obviously, this shouldn’t be the ASCII string “blues”, but an arbitrary machine-readable marker that the system can use to pick an appropriate rhythm or tune.)