From CS294-10 Visualization Sp11
Data Domain and Storyboard
As evident from my AS2, which was on global attitudes, I have an interest in geographic information. I decided to make a geographic visualization consisting of an interactive map. User interaction with the map would include brushing and the ability to manually select various regions for which data or statistics can be viewed. These controls support the exploratory aspects of the application. Color overlays would help to visualize variables over different regions of the map. My intention is for the overall visualization style to be similar to that of choropleth maps.
After searching around for suitable data, I found several pieces that I thought could be put together for the visualization that I want.
- From the US 2010 Census, a large data set containing extensive demographic data for all ~3200 US counties.
- An SVG defining the geographic boundaries of all the counties in the US.
- A list of the names of all US counties and their corresponding FIPS codes.
Both the census data and SVG index county data using the counties' FIPS codes. With some tinkering, the demographic data and SVG could be read into the application and correlated by joining on FIPS code. Likewise, the plain-English county names can be joined as well, resulting in a very nice data set with both demographic and geospatial information. With this in mind, I created a storyboard for my application.
The main screen of the visualization should show map of the US counties, colored according to some demographic variable that the user chooses from a menu bar.
Because the counties are likely to appear small on the screen, it is not possible to directly label every county with its name. To support interaction, the user can hover the cursor over a county and obtain information about the county in a tooltip.
Viewing information about a single county is boring. The user can click and select multiple counties. The application will calculate statistics over the selection and display it. These calculations are not likely to be a simple averaging of values, as data should be weighted by county population and so forth.
Clicking and dragging allows the user to easily select entire regions spanning multiple counties at a time.
Basic map controls include panning. Since the left mouse button is used for selection/dragging, the right mouse button is used for translating the map.
As can be expected of any map application, the mouse wheel is responsible for zooming in and out of the map.
Please go to the link above to launch the visualization. There is a description of the applet there (not reproduced here to avoid clutter).
Changes from storyboard to implementation
- After implementing tool tips, I decided to change the hover behavior. Instead of showing tooltips near the cursor, I moved the information popup to the bottom right corner of the application. The tool tips I originally implemented were distracting and obscured the map when moving the mouse around. This distraction was due perhaps in part to the high density of counties in the map.
- I initially intended on using a heat-map-style color map (reds, oranges, yellows, etc.) to encode the quantitative demographic variables. I soon found this color mapping to be clumsy, as interpolating and mapping values to colors was complicated and non-intuitive. I also remembered that temperature-map-style visualizations are generally ineffective, so I decided to revert to a simple gradient between two shades (e.g., blue->black, green->black).
- After building the application, it became apparent that it was impossible to effectively map the range of several of the variables to the 255 shades provided by color encoding. For example, county population ranges from over 9 million in Los Angeles to less than 50 people in the most sparse counties. The many orders of magnitude in difference made it difficult to set an effective color scale without sacrificing too much granularity. I considered using logarithmic scales, but that seemed to compromise the intuitive qualities afforded by color visualization. I ultimately decided to introduce user adjustable legends to address this issue.
I estimate that I spent roughly 25-30 hours on the assignment. Most of it was coding. Another important chunk of my time was spent on searching for and manipulating the data sets.
- Performance. The applet runs slowly on less powerful machines. This has less to do with data crunching tasks such as selecting counties or calculating statistics and more to do with graphics. The application uses Java 2D for rendering, with limited acceleration. The application draws up to 3000+ shapes per frame, with each with shape consisting of potentially hundreds of points. This becomes very taxing in a real-time environment. I tried switching to other, accelerated renderers such as OpenGL, but those do not render unusually shaped SVGs as nicely, so I stuck with Java 2D.
- Aggregate Statistics. Calculating aggregate statistics over multiple counties was non-trivial. It is important to weigh each county measure by its respective population to avoid skewed statistics. At one point (due partially to bad programming on my part), I ran into overflowing integers when aggregating over large variables such as total population. Aggregate statistics was very gratifying to get right. After it was all done, I selected the entirety of the country and inspected the aggregate statistics as calculated by the program. The reported statistics for the total population, ethnic proportions, etc. for the US matched official statistics. It was to be expected, but it still felt good to be assured that things were working.
- Need to pay attention to libraries. I coded my own camera transformation and button functions in Processing. I only found out afterward, when someone told me, that there is a button library for Processing. This would have saved me a lot of time, as button coding is quite tedious. Someone also mentioned existing camera methods in the API, though I found these to work strangely for Processing running in 2D mode.