From CS 294-10 Visualization Fa13
Billboard charts through the ages
I'm very interested in music, so I went hunting for some interesting datasets in that realm. I stumbled upon the Whitburn Project, which collects information about songs and their positions on the Billboard charts. The dataset also includes information about genres.
I'm interested in all sorts of things in this dataset, but one thing I'm wondering is when the tides turned in favor of hip-hop/rap in the Hot 100.
I plotted five major genres (R&B, Rap, Rock, Pop, and Vocal) and the number of songs appearing on the charts for each genre per year since 1995. At the start of this chart, you can see an obvious trend of an increasing number of rap songs on the charts, but I immediately ran into a problem with the dataset. The genres, which are manually entered by the caretakers of the dataset, are missing or sparse for the most recent several years. I'm going to try to focus my attention on questions that do not involve genre (or focus on genre only for older music).
The real thing: Chart longevity.
Even disregarding genre, I have tons of questions about this data set. I'm curious to know how many weeks songs tend to stay on the charts. Moreover, I'd like to know more about what kind of songs become mega-smashes. Do they tend to come from artists that have had mega-smashes before? What progression do these songs take on the charts? Here is a histogram of this information for every song on the charts since 1950. The bins are by ten week periods of time (roughly 2.5 months each).
Roughly half of the songs stay on the chart for less than 10 weeks, and the vast majority of songs don't make it past the 20 week mark. I had to exclude a "null" category on the x-axis, which means some of the songs in the data did not have the field that indicates the number of weeks spent on the chart.
I'm now going to filter my data by the songs that lasted over 1 year on the charts. Here are those songs:
A bit surprisingly, I recognize all of these songs. They are all from my musical lifetime, meaning that I remember when (nearly) all of them were released. None of the songs between 1950 and, roughly, 1995 charted for over a year. Here's a chart illustrating this:
It also shows the peak position that each song attained on the chart. Many of the songs peaked at number 1 on the chart, which is not surprising give their longevity. I labeled the songs that peaked at 8 or below (below being worse... confusing terminology in this case). I also labeled "Radioactive," which is still on the chart.
Again notice that all of these songs are from the last 20 years. I now want to investigate these artists.
At this point I ran into some trouble with Tableau because I wanted to aggregate over an aggregation (for example, find the median number of tracks per artist in two groups-- 52 week+er's, and all other artists). I found some tutorials online that answered a lot of my questions, but in the process I realized that I was less interested in the question of how these artists differed from "average" artists, considering that (1) there are only 30 artists that have had year+ hits, and (2) most artists are one-hit wonders:
Here's one snapshot from my exploration of these artists with 52+ week songs. This graph shows the Billboard Hot 100 history for each of these artists, comparing the highest spot of each song with the number of weeks it appeared on the chart:
Many of these artists had relatively few songs on the charts (e.g., Next, Jason Mraz) while others have had several songs, most of which follow a linear trend between high and weeks up to a certain point when the outliers begin (e.g., Santana, Lonestar, Faith Hill).
Moving toward my final visualization, I made this one that shows the trajectory of each of the 52+ week songs. This is pretty much impossible to read, but we can see something interesting:
Notably, all of these songs fall off the chart somewhere around position 50. I figured that Billboard must have some rule that dictated this, and sure enough, Billboard's Hot 100 has a concept called "recurrent tracks"- if a song has been on the Hot 100 for over 20 weeks and drops below position 50, the song is removed from the charts. The goal of this policy is to keep the charts "as current as possible and to give proper representation to new and developing artists and tracks." [Billboard Hot 100 on Wikipedia]. I was excited to find this for two reasons: first, I was expecting this graph to look horrendous when I made it (and it does), but it still showed me something interesting. Second, the criteria for having a song on the Hot 100 for over 52 week is more difficult than I realized: a song actually had to stay within the top 50 after its initial rise.
In order to get the data into the above form, I wrote some python scripts to reshape the data. I'm sure Tableau could have accomplished these transformations automatically, but it was much faster for me this way. Originally, the data had one row per track, but I changed the data to have one row per track per week.
33 songs have spent over 1 year on the Billboard Hot 100 chart since its inception. The goal of this chart is to show how these songs progress on the chart. The blue line shows the typical (median) progression of one of these songs over its first 52 weeks; the songs tend to ascend slowly, only reaching their peak halfway through the year, at week 26. Surprisingly, many of these megahits did not reach #1 on the charts. The chart also illustrates variability in the chart progression of these hits, with two examples: The Black Eyed Peas - "I Gotta Feeling" (2009, orange) and Imagine Dragons - "Radioactive" (2012, green). Among these 33 songs, "I Gotta Feeling" was the fastest to reach its peak, #1 in week 3. On the other hand, "Radioactive" had a very slow rise on the charts, reaching its peak at #3 only after 47 weeks (note: "Radioactive" is still on the Hot 100 chart). These two examples show the extremes of how these songs can progress, and they're virtually opposites in terms of their positive/negative slopes. Finally, this visualization calls to attention the Billboard Hot 100's concept of a "recurrent track." When a song has been on the chart for over 20 weeks and dips below position 50, it is removed to make way for fresher songs.
This chart answers my question about how megahits progress on the charts by showing an average progression as well as the two extreme progressions to emphasize variability. The labeling of the recurrent zone also helps to understand the end of the megahits' progression, as they cannot dip below 50 after their initial rise.
Finally, for your listening pleasure/misery, he's an Rdio playlist of all 33 of these megahits: http://rd.io/x/QI2xLyDEpg/