- Michael Jackson, Original Solo Recordings, 1972-1997 (view medium size, view large size)

Last week I was at Music Hackday, messing around with music data at The Guardian offices in Kings Cross, London. The outcome is this Michael Jackson doughnut, which spans the whole of his solo recording career, showing which of his tracks have been the most popular & loved by Last.fm listeners, as well as which releases have been the most influential and loved. The graphic was programmatically produced and could be produced for other artists quite easily.

Some notes on how to read this information graphic: The size of each slice is proportional to the number of plays of that particular track or release compared to the total number of plays of Michael Jackson's material on Last.fm. The darkness of the slice indicates how loved that track or album is by the listener base. Tracks are organised by album, by date, starting with his first solo record, Ben, from 1972.

Some notes on the metrics: The graphic takes into account 6 months worth of Last.fm listening data, that's 3,606,823 individual plays of his tracks by 1,432,458 listeners worldwide. The loving data spans a shorter time period, a little over two months; this period contains data which falls both before and after his death. The love data for releases is normalised by track count, so the darkness of the blue hue expresses the average number of loves per track on said release.

In the case of (countless) albums featuring re-releases of original works, such as Number Ones, I have aggregated play counts and merged them into a single play count for the original recording (e.g. Billie Jean). This is why you don't see releases like HiStory (Book 1) or The Essential Michael Jackson in the graphic. The playcounts on original tracks which are featured on those records, however, do contribute to the tracks shown in the graphic, as I'm trying to get a feel for which of his original recordings are the most influential overall.

Some notes on the tech: I spent much of the 4-8 hours it took to produce the software coming up with ways of cleaning the data. The gap in the doughnut represents the proportion of plays which relate to live versions, endless remixes (official & unofficial) and collaborations (duets etc). I chose to keep these in the dataset in order to represent proportionally how much attention goes towards that kind of material. I also stripped a lot of seemingly badly tagged, or dubiously attributed tracks using a bunch of filters aided by my observation of the dataset. I used simple Levenshtein distances to merge tracknames together and a few simple transforms to help the merging process. The merging operation was crucial because some tracks have been officially re-released over 30 times and the metadata can vary a little each time. The graphic was produced using PHP & Actionscript with JSON as the transport mechanism between the two.

Popularity and love data is from Last.fm, release dates are from Musicbrainz and Discogs was used for authoritative discographic data.

Lastly, just a note on the motivation - MJ was primarily a childhood experience for me (I was 6 when Bad was released), and the hours I've spent with the dataset have given me the time to reflect on what a grip his older material has had on my generation's collective identity. As the notion of a mainstream, backed as it is by broadcast media, gives way to thousands of niches in disparate networks, it's hard not to see MJ as a relic of an age of mass cultural experience that is slowly receding into the distance. R.I.P. MJ.

Right-hand side is inflation adjusted. By Barry Ritholtz.

Ebenezer Howard

- Ebenezer Howard, The Garden Cities of To-Morrow (1902)

last.fm chart arc

One of our talented data miners, Martin, put these chart arcs up on our data playground last week. They show chart positions and movements of artists in our profile, along with popularity information. Turns out i probably have the least 'mainstream' taste of any of the last.fm staff. For further explanation and all the staff chart arcs, visit the last.fm chart arcs page on our playground.

From the ibm archives photo gallery.

My father worked with the IBM 1620, the Control Data Corporation's CDC 6600 & the Cray 1 supercomputer, which looks like a monolith.

These last two were designed by Seymour Cray.

The 1620 took punchcards and required the entire operating system to be fed in every time it was switched on, since it lacked anything but 60,000 digits of volatile storage. It got the nickname CADET (Can't Add Doesn't Even Try) through its use of lookup tables instead of adders to perform arithmetic.

The cognitive syle characteristics of the standard default PP presentation: foreshortening of the evidence and thought, low spatial resolution, a deeply hierarchical single-path structure as the model for organising every type of content, breaking up of the narrative and data into slides and minimal fragments, rapid temporal sequencing of thin information rather than focused spatial analysis, conspicuous decoration and Phluff, a preoccupation with format not content, and an attitude of commercialism that turns everything into a sales pitch.

- Edward Tufte