A2-ChristopherVolz
From CS294-10 Visualization Fa07
Contents |
[edit] Assignment 2
[edit] Online Data Set
- 100 most popular names by state from 1960 - 2004
- The data is available at the following URL: http://www.ssa.gov/cgi-bin/namesbystate.cgi. I wrote a small python script which scraped data from the site and output it into a CSV file which I then imported into Tableau.
[edit] Questions to be answered
- What is the variation of the top 10 most popular names by state?
- Do popular names flow from populous states to less populous states?
[edit] Methods
The most popular names were determined by summing up all occurrences of names across all states and years that data had been recorded for (All 50 states + DC, between the years 1960 and 2004). This created a relatively easy to read visualization of the top five names and their occurrences over time. (And it turns out I have a very popular name)
Top 5 Names across all states over Time
In order to determine whether name popularity flowed from populous states to less populous states I selected the top and bottom five states, as measured by the number of names reported.
I created similar charts for the 5 most populous states as well as the 5 least populous states, and while differences existed they graphs weren't terribly revealing.
Top 5 Names across 5 Most Populous States over Time
Top 5 Names across 5 Least Populous States over Time
Multiple View of Total Number of Names across all states, Most Populous states and Least Populous states over Time
[edit] Discussion
What became obvious from the initial graph charting popular names against time was that each of the names had a period of extreme popularity which accounted for the majority of the incidences of the name. It also revealed the fact that each of the most popular names over the whole time frame looked at were in a state of decline. I was interested to know if this was due to a greater variety of names being chosen or if a small subset of names were instead gaining in popularity. Unfortunately, I was unable to figure out how to chart this using Tableau (though I'm fairly confident it's possible, I just didn't have the time to figure out the ins and outs of the program enough to figure out how best to do it)
The Multiple View of the data shows that different states had different growth periods; assuming that the incidence of names is related to the total number of births in a given state. This seems like a reasonable assumption, but should be cross-checked with census data to make sure such a correlation exists. What's most interesting about this view, and what, I think, would bear further investigation, is why the least populous states showed a population surge 8-10 years prior to the most populous states.
Regarding whether or not popular names flowed from populous states to less populous states, there is no clear correlation. Again, this might be a symptom of my inexperience using Tableau rather than a lack of data. The biggest problems I ran into using Tableau were trying to figure out how to overlap different data sets and also how to manipulate the data I had in a meaningful way. For some tasks I had to ultimately resort to reformatting or extracting data from the CSV file by hand rather than having Tableau to do the calculations for me. This was, obviously, not ideal.




