From CS294-10 Visualization Sp11
Data Domain and Interaction Technique
For this assignment, I wanted to implement an interactive data visualization component on top of my current research project, Opinion Space. In Opinion Space, users are mapped onto a 2-D plane based off of their opinions. More specifically, all users of Opinion Space answer a set of 5 propositions on a continuous scale, which gives us a 5 dimensional vector for each participant indicating their opinion. We project this point onto a 2-D plane to obtain the Opinion Space. If two users are further away in opinion, they will be further apart in the space. Effectively, Opinion Space is a starfield display where each point is a participant in the system and spatial distance relationships between points are proportional to differences in opinion.
Opinion Space also collects metadata from its users either directly as a part of registration, or through an external source (e.g. if the system is deployed for an organization, the organization provides us with information on its users). I planned to build dynamic query filters with the fields of these filters populated by the metadata. A user could then filter the starfield visualization based off of some metadata. For example, a user can filter out all the males and then filter out those who are between the ages of 30-40 to show only the females that are 30-40 in the Opinion Space. Also, after a user has filtered out down the points to some desired query set (e.g. all females that are between 30 and 40), a user can then choose to color the remaining points based off a field. For example, after filtering out the points in Opinion Space to obtain only the females between 30 and 40, a user can color the remaining points by region, where all users from the northwest are colored red, all uses from the south are colored green, and all users from the east are colored yellow.
Dynamic query filters are a good choice for starfield displays as "points of light are convenient because they are small yet highly visible, could be color coded, are selectable objects, and can be displayed rapidly" (Shneiderman's Visual Information Seeking paper). As a part of Opinion Space, these points also offer additional information on the distribution of opinions. Accordingly as a user filteres down points, they may begin to discover clusters or trends of opinion in the data. Also, the metadata used to construct these filters tends to be ordinal and/or nominal. Therefore, segmentation by color is an effective tool as color can be used well to distinguish ordinal and nominal data.
For this project, I used data from an Opinion Space deployed for an online focus group. The topic of discussion was the US automotive industry. This instance of Opinion Space had 2105 participants and the sponsoring organization provided metadata on all the participants. The data included income, age, gender, make of car, region, segment of car, and education level. As I imported the data, I grouped age and income into buckets, making them ordinal fields. The rest of the fields are nominal.
Below is a screenshot of the Opinion Space Automotive Industry interface:
Storyboard of Features
Dynamic Query Filters
Using the filters, a user can choose to filter out the points on the display according to two functions: 1) Filter out and 2) Show only. As I was creating the storyboard mockup, I realized that only the filter out function was needed in order to allow all possible queries to be executed. However, I included the show only function as a convenience. Also, to make querying easier, I wanted there to be forward and backward buttons that allow the user to undo and redo filters. Lastly, there's a refresh button to reset all the points.
After filtering the points down to a desired query set, a user can color segment the points based off of a field. This allows a user to quickly compare a set of points based off of a certain field rather that needing to continually apply a certain filter and step backwards. When choosing to segment by color, the system will create a palette and legend and display them on the bottom left of the space.
This visualization was implemented on the existing Opinion Space system. All of the coding was done in Flash/Flex.
Above is a screenshot of the finished product. The application supports two features: 1) dynamic query filtering and 2) color segmentation. A user can filter down the points on the right by selecting a category and value to filter by. After narrowing the points down to a specific query set, the user can also segment the points by color on any field.
As this data from the organization is not public, I can illustrate the functionality of the system to anyone who would like to see it in person.
The final product was different from the initial storyboard in a few key ways stated below.
Changes between storyboard and the final implementation
- After playing around with the queries a bit, I realized there were two key flaws. The first was that I had originally designed the filters as drop down menus for the category and the possible values. This required me to select items from the possible values drop down menu many times if I were trying to filter out multiple fields for a category (e.g. removing all but two age fields). Accordingly, I changed the possible values drop down into a list where you could select multiple fields.
- The second flaw was that it was difficult to remember which filters I had used. I added a summary list that displayed which filters had been applied, how many points were removed, and how many points remained.
- For color segmentation, I used a palette from Cynthia Brewer's color brewer site. Also, I refined the palette generation to avoid any fields that had already been queried on. For instance, if I had filtered out all points with age = 20-30, then I would not include that in the dynamic palette generation.
- Total Time Spent This application took around 20 hours total to finish. Around 5 hours was taken to parse, import, and check the data to ensure that it was all formed properly. Development took up the rest of the time with the software seeing 3 iterations until the final product.
- Time Consuming Sections A significant portion of time was spent processing the data and ensuring that there were no errors in importing the data. As I was transferring data from excel, to a python script, and then to a mysql database, I had to make sure that data was being saved with the correct data type. Also with strings, I found that sometimes the same value was encoded with both an uppercase and lowercase first letter (e.g. "ford" and "Ford").
- Undo/Redo This was simple to implement at first, but as I coded more features into the filters (adding the multiple selection and color segmentation) , this grew in complexity. I had thought this was going to be quite simple to implement, but now realize that there are a lot of things to consider in making this efficient and making the code elegant.
- Feedback from others After the first two iterations, I sought feedback from others, which helped improve the usability of the filters. The make field of the filters used to be both make and model. I changed this field to only show the make, which allowed the data to be clustered in groups better. This was a case where the data was too granular to be an effective query in the filter.
- Future Work While implementing this application, new ideas for features and functions came up. One idea that will be implemented in the next version of this application is text analysis.