From CS294-10 Visualization Sp11
I initially decided I want to explore the techniques of brushing and linking. I felt they could be useful to find non-obvious patterns in the data and could also be very helpful in a collaborative environment. Protovis seems to have good support for these techniques, so I decided to use this tool. In terms of data, I played a lot with different dataset. I wanted multi-dimensional data that could benefit from brushing and linking. After a lot of trial and error, I found the Internet and American Project. Sadly, I didn't have any application that could handle SPSS files, so I gave up and continued looking for other sources. However, after learning about DataWrangler in class I realized that this application could help me clean up the data. So, I downloaded the SPSS data from the Pew Center and uploaded it to DataWrangler, naively thinking that it would be capable of handling it. It turns out that DataWrangler can only work with simple text formats and cannot handle SPSS. The problem was that I was now really interested in the data from the Pew Center and I was sure I wanted to do something with it. So, after unsuccessfully trying to find SPSS translators or transformers online, I found out that SPSS itself had an option to export data as CSV. Therefore, I decided to install a trial version from the IBM website to finally be able to play with the data.
The SPSS software worked pretty well. I was able to interact with the survey data and export it to CSV. I later uploaded the CSV file to Tableau to understand the data and find patterns that might be worth visualizing. When trying to make sense of the data, I also looked at the questionnaire to understand how questions had been asked and foresee some conclusions. I realized that the questions from the survey wouldn't be enough to drive the interesting conclusions that I was expected, so I went back to the Pew Center website to find other interesting datasets. That was when I finally found the Work/Generation gap/Woodstock survey. This study contains questions that might drive interesting relations. The survey included questions about religion, race, happiness, and work that I thought would be worth exploring. So I went ahead, downloaded the data, converted it to CSV in SPSS, uploaded it to Tableau and starting playing around with it. I found this exercise to be really useful--by creating fast visualizations from different categories I finally realized that probably the best way to make sense of the data wouldn't be a brushing and linking visualization, but a parallel coordinates dynamic visualization that could represent the multi-dimensionality of the data easily and could allow a researcher to find interesting patterns. Thus, I selected a set of questions from the dataset and decided to create a parallel coordinate visualization with them.
The most important objective of the visualization will be the ability to find interesting patterns in the survey data. For example, what kind of work makes people happier? Are there any differences in happiness for different races, sexes, household situations, or age? Who earns more money? Republicans, Democrats, or Independents? What's the relationship between education, work, and satisfaction? Does the type of music people like influence their level of education or their income?
In order to achieve that goal, a parallel coordinates visualization with dynamic querying may provide a solution. Inspired by the examples provided in the Protovis webpage, I decided to create a similar visualization for the data in this dataset. The visualization should look like the following:
The data was converted from CSV to JSON using a converter by Jason Parker
The zip version of the code is attached. My files are located in the /pew folder. The rest of the folders only contain required libraries. To see the visualization, open /pew/pew.html in a browser (tip: Protovis is way faster in Google Chrome with such a huge dataset).
The final visualization does not contain the exact number of respondents who are included in each category. I wanted to include a label that would indicate how many responses fell into the slider selection. This would give people the ability to accurately see how important the effects of certain characteristics like religion or music are. However, I wasn't able to understand the way Protovis handles individual pieces of data as part of the array.
Similarly, I wanted to allow the user to select the specific pieces of data he or she would like to compare. The survey data is very rich and allows for multiple analysis. For example, some users might be more interested in the effect of music, while others might be more interested in the relationship between education and satisfaction. My idea was to have users select in checkboxes the dimensions of the data that would be presented. I experimented with jQuery and found an interesting example that used dynamic selection outside of the canvas, but could not get it working. I have commented out in my code all the ways in which I tried to solve this problem. Finally, I would have liked to have labels for the categorical data (e.g., college education, graduate education, high school education) as opposed to their numerical proxies. I will definitely keep working on improving this visualization and learning Protovis -- I am very interested in dynamic visualizations in web pages and this one seems to be a terrific tool for that purpose.
I spent approximately 15-20 hours developing this visualization. Finding a good dataset took me a long time, since I wanted to study a case that I am passionate about. The Pew Center proved to be a great resource, but massaging and changing formats was also challenging. Protovis in itself wasn't difficult to create simple visualizations, but trying to add new elements or to combine the results with other elements of the webpage was hard.