Rhythmic Research > Eigenrhythms > 4. Results

4. RESULTS

For our tests, we collected a set of 100 MIDI tracks, ar-ranged as 10 examples for each of 10 genres. Our inten-tion was to provide a reasonable selection of the kind of music typically heard on contemporary popular music radio. The genres were also defined with an eye to distinctions that could possibly be discerned in the rhythm patterns (according to our intuitions) and also the availability of enough suitable MIDI files. We verified that each of the files we selected was a well-produced replica and a satis-factory representative of its class, but did not use any more specific criteria in selecting them. The ten categories can be seen on the axes of the confusion matrix in figure 2. To evaluate the raw tempo and downbeat extraction, we auditioned each tracking result by resynthesizing the original drum pattern along with added tone pips indicat-ing the system’s chosen downbeats and cycle length. In two of the cases the arbitrary initial note extraction returned irregular drum patterns for which no period could be decided. In nine of the remaining 98 cases (9.2%), the period chosen by the system was wrong, almost always half the length (i.e. tempo twice as fast) as the perceived period. Where the tempo was correct, about half the tracks had patterns of 4 or 8 beats (rather than the basic 2-beat bass/snare pattern), and of those, approximately half (25 out of 53) had the downbeat in the right point within that sequence. In the others, the downbeat was shifted by an even number of beats. This is a secondary error, since the extracted pattern was basically appropriate, but it would make for a better interpolation space if the automatic downbeat placement could come closer to subjective impression. We return to this in the discussion.

4.1. Classification task

The 100 extracted patterns, each represented by a 1200 point envelope, were then fed to PCA to extract the eigenrhythms; the mean pattern and top five eigenrhythm bases are illustrated in figure 3; 25 dimensions are required to explain 90% of the variance. Restricting ourselves to the top N eigenrhythms gives an N-dimensional projection of the set of rhythms that minimizes squared-error distortion, and which can also be seen as a kind of generalization, smoothing away ‘insignificant’ differences between patterns. This reduced space can be used for classification, for instance by treating each of the patterns in turn as unknown and classifying it on the basis of its k nearest neighbors (k-NN classification). Although this is not as good a predictor of classifier success as using test data separate from the data used in deriving the eigenrhythm space, we note that the genre labels were not involved in that stage i.e. the PCA ‘model’ does not encode prior knowledge of the true class of the test examples.


Figure 2. Confusion matrix for genre classification based on eigenrhythms. Classification was based on the single nearest neighbors according to the four top eigenrhythms. Overall classification accuracy was 21% in this case, compared to 10% from random guessing.

We performed this classification and searched over the number of PCA dimensions (from 2 to 40) and the number of neighbors to use in the k-NN classification. Our results were generally weak; one of the best performing combinations was the simple case of using 4 dimensions and classifying according to the single nearest neighbor. The 10 × 10 confusion matrix for this case is shown in figure 2. Although the overall classification accuracy is 21%, this is significantly better than random guessing (which would give 10%), and the confusion matrix reveals some cliques of greater discrimination, including { country, blues } and { rock, hiphop, punk } . Rhythm and Blues (randb) is recognized correctly 4 out of 10 times, which is also how often disco is recognized as house. All of these details seem to make sense in view of the musical character of the different classes.

4.2. Eigenrhythms

For a greater insight into the classification performance, and to see what the "eigenrhythm" concept has actually captured, it is interesting to look at the top eigenrhythms individually, as in figure 3. The top left panel shows the overall mean pattern, which is subtracted from every pattern prior to the eigen analysis; this pane is nonnegative, with white showing regions of no energy and darker shades of gray indicating progressively more intense beats. The remaining patterns have, in general, both positive and negative portions, and can be associated with positive or negative weights to add to or subtract from different beats in the mean pattern - i.e. increasing or reducing the contrast between their positive and negative extrema. These patterns are shown with positive excursions in green, negative values in red, fading to white as the value tends to zero (with apologies to those viewing this paper in monochrome!). Eigenrhythm 1 is mainly positive, following the mean pat-tern, but includes emphasis of the 16th notes (at samples 25, 75, 125 etc.) in the snare and hi-hat, the snare beats on the eighth notes at samples 50, 150 and 350, and the bass drum simultaneous with the main snare beats at samples 100 and 300 (quarter note beats 2 and 4).

Eigenrhythm 2 provides contrast between the eighth-note hi-hats, and the snare and bass hits on beats 2 and 4 along with a snare eighth-note ‘echo’ at samples 150 and 350. The third basis contrasts hi-hat beats on the 16th note ‘off’ beats with a simple quater-note rhythm, so a negative coefficient here will introduce a double speed hi hat, and a positive weight gives half-speed. Basis 4 contrasts offbeat bass drums at samples 50, 150, 250 and 350 with hi-hat beats alongside the main snares at beats 2 and 4, as well as including some fast snare beats between samples 200 and 300 and some evidence of 6/8 patterns with hi-hat features around sample 67, 167, 267 etc. (i.e. two-thirds of the way through each quarter-note beat). The final eigenrhythm in the figure contrasts snare and bass on beats 2 and 4, and also can be seen to provide for more complex structures in hi-hat and bass drum.

Overall, we begin to see that the individual eigenrhythms are encoding particular features of the different rhythmic patterns, more about the character of the piece than its genre, but at the level of groups of notes rather than individual events. Although genre is only weakly predicted by nearest neighbors in the eigenspace, we can ask if there are other perceived properties being preserved. Figure 4 is relevant here, which shows all the training points projected onto the first two eigenrhythms. Although the names are hard to read when the text overlaps, what stands out are the clusters, including those around (2, 1) and (4, -1). (The polarity of eigenrhythm 1 has been flipped in this image for clarity). Listening to the individual tracks involved, the first cluster consists of straight-ahead 4/4 patterns with eighth note hi-hat patterns, snare on beats 2 and 4, and some variations in the bass drum within the basic eighth-note grid. Patterns in the second cluster, by contrast, have the same basic bass drum on beats 1 and 3 with snare on beats 2 and 4, but have a 6/8 ‘syncopated’ rhythm in the hi-hat, or a simple quarter-note hi-hat beat. Thus, once the tempo variation is removed, we find that the eigenrhythm space does indeed cluster drum patterns with clearly discernible perceptual similarities.

Conclusion >>

 

 
Featured Project

Eigenrhythms

Current/Future Projects

Eigensynth: Derivative Beat Box
[ more info ]

Past Projects

Phase Vocoder
[ more info ]

 
 

Eigenrhythms | index        Download | Long Version, ISMIR Version (pdf format)