A3-DavidPoll
From CS294-10 Visualization Fa08
Contents |
Planning
Data Set
I intend to use 2008 polling data (collected by Real Clear Politics) for a variety of races as my data set here. My goal is to be able to visualize trends and similarities in polling data throughout this race. Data will be collected through screen scrapes (made much simpler by the standard format used for the data pages on RealClearPolitics). Polls include date ranges, polling institution, candidates, vote percentages, and sample sizes, although I don't intend to take sample size into account. Users will be able to select candidate + race combinations to plot together on a graph over time. They will be able to focus on particular pieces of data by filtering by polling institutions, and date ranges.
Visualization
The core of the visualization will be the graph of the data. It will use 2-d position to encode time and voting percentage. It will use hue to encode party, and if multiple candidates of the same party appear, it will use value to distinguish between them. Moving averages will be plotted using lines, while individual polling data for each day will be plotted using circles.
Sliders will be used to change the date range being visualized (in conjunction with Calendar controls and textboxes to make entering specific dates simpler). Checkboxes will be used to filter by candidate and race, as well as by polling institution. Furthermore, users will be able to filter by polls of likely voters, registered voters, or by choosing the sample size range. Here is a rough sketch of the UI:
Dynamic Query
Adjusting the sliders, checking/unchecking checkboxes, or typing data into the various fields will immediately cause the graph to change to reflect those choices. For example, if "ABC" is unchecked from the choice of polling institutions, the graph would immediately stop showing ABC polling data, and the averages would be adjusted to exclude that information.
Furthermore, the list of polling institutions is filtered based upon the races selected. New institutions are automatically selected when a race is added, but can be deselected by the user. In addition, I will have a "filter" textbox for the races (since the quantity of races is great), which will allow users to begin typing either a candidate's name or a race name, and the list will be filtered to races that match those criterion with each key press. Note that this filter textbox is not shown in the diagram above (sorry!)
Details
Placing the mouse over a dot or line will provide a summary of the actual data used to plot the point/line. For a point, if multiple polls plot the same percentage, both will be shown. Clicking on the point will populate a "details" portion of the UI with data about the point and links to the actual poll data from the various institutions. These details will include the entire row of data from the source, showing the candidate's vote percentage as well as those of his opponent, whether or not that candidate's data has been plotted.
In addition, mousing over a point will highlight any other points from the same poll. If the poll is multi-day, points for each day may be highlighted. If more than one candidate for the poll is graphed, points for both candidates will be highlighted. Furthermore, if only a single poll is represented by a particular dot, a line will be plotted for the averages of polls from that institution when the mouse is over a point.
Implementation
For the "RealClearPolitics Polling Visualizer", I used .NET 3.5 SP1 and its corresponding UI technology, WPF. As such, it requires these to be installed on whichever machines wish to run it.
You can find .NET 3.5 SP1 here: .NET Framework 3.5 Service Pack 1
Results
Description
The RealClearPolitics.com Polling Visualizer allows you to juxtapose data from the vast amounts of polling data accumulated on the RealClearPolitics.com website, allowing you to compare trends over time for candidates in varied races. Within the application, one can change his view of the data, choosing specific candidates in particular races, selecting which polling institutions to consider, and filtering by date or polling sample size and type. The application presents a graph showing polling data for each candidate in each race over time. Furthermore, it provides a graphical representation (a line plot) of the moving average of the polls for each candidate. Filtering the data adjusts the averages, allowing users to ignore "outlier" polling institutions.
Selecting among the multitude of races can be tricky, so some dynamic filtering of the races is included. Typing in the filter box instantly filters the races to match your query (where the race must have words or candidates' names that begin with each of the words in the filter).
The graph supports hovering over graphed items to see more details about the relevant data, and clicking on polling data points will expose details about the relevant poll and includes a link back to the original polling data.
Changes
Aside from a few minor layout changes, the visualization is mostly identical to the original storyboard. There are two main exclusions (made due to the limited amount of time I could dedicate to this assignment, but which would not be too difficult to add with a few more hours of work):
- Hover-based trend lines
- This feature would have shown a line for all of the polls from the institution whose poll was being focused on. This makes it easy to see whether the institution generally overpolls or underpolls a given candidate compared to the average. The current architecture supports this, but the feature is not yet implemented.
- Hover-based highlighting of related data points
- This feature would have highlighted all of the data points (for any candidate) generated by the poll related to the hovered-over point. This was mostly a convencience feature that would make it easier to see how two candidates compared in a particular poll. It would require some minor architectural changes, although nothing too severe. In the meantime, hovering over data points highlights the point that has been focused on.
That being said, a few minor additions were made.
- Resizable data points
- Depending on the user's focus, the data points themselves may be the primary focus or the details. As such, the user (using the "View" menu) can change the size of the plotted data points, bringing average lines more clearly into focus.
- Stock and Live data
- Using the File->Refresh menu, the user can choose to look at either stock or live data. I included this feature so that the visualization can be seen even if the user is not connected to the internet.
- Dynamic resizing
- The user has the ability to resize the filter panels in order to devote more screen real estate to his visualization.
- Y-axis zooming
- In hindsight, this probably should have been in my original storyboard, but the feature is extremely useful for looking, for example, at low-polling candidates such as independents, and for seeing changes more clearly.
- Subtle animations
- This was mostly added because of some performance issues. The reality is that, for General Election national polling, there are too many data points for them all to be rendered at high speeds. As a result, I allow them to load progressively, and data points fade in in order to facilitate the impression of progressive loading (as the next group of data points fades in, the current one is becoming even more opaque).
- Candidate Coloring
- In the list of races, candidates' names are colored according to their party, making it easy for users to identify the candidates' names and the parties to which they belong.
- Browsing the RealClearPolitics.com website
- The application includes a tab that loads up with the RealClearPolitics.com website for initial browsing. I added this primarily because (1) it was easy to do and (2) I'm doing lots of screenscraping off the RealClearPolitics.com website, and I wanted to make sure I was directing some actual traffic there :)
Download
Source
Source (Visual Studio 2008 solution): Image:RCPPollVis.zip
Note: Code written by me appears in the RCPGrapher project. The other projects are external libraries with tiny edits to support my needs.
Binaries
Requires .NET Framework 3.5 Service Pack 1: Image:RCPPVBin.zip
Instructions
- To run the application, double-click RCPGrapher.exe in the binaries zip. Note: all files in the zip must be extracted to a single folder.
- The application will start up with a tab showing the RealClearPolitics.com website.
- Switch to the "Stats" tab.
- Click File->Refresh.
- Choose either Live or Stock data. The list of races on the right side of the window will be populated. This can take some time -- please be patient. Nonetheless, you should be able to begin interacting with the polls while this loads asynchronously.
- Check a few of the polls and see the graph updated. Note the sliders below the graph -- when a new race is added, only the bounds of the slider will change, so if you were looking at a race with very little polling data you may need to adjust the slider to get a more useful picture.
- Play around, and enjoy! Don't forget to try clicking on the data points and adjusting the scant few options in the menus!
Commentary
Building this application in WPF was an enlightening experience. I have spent a lot of time in the past (thanks to my employment with Microsoft) working with XAML in Silverlight and (to a lesser extent) in WPF. The declarative approach to building this visualization is extremely powerful, and really makes generalizations of visualizations extremely powerful. As part of this project, I built a "graph" control that is fully templatable. Those dots on the graph could just as easily have been buttons, 3d images, videos, or even other graphs, and the scaling on the graph is completely customizable. Changing the graph to a bar chart would be as easy as templating those controls to be rectangles and setting their width and height appropriately. I might continue to pursue this path for the final project, as I've built a significant amount of infrastructure to make this as easy as possible for myself.
All of my work also complied fairly strictly to a ViewModel-based approach with the exception of the use of behaviors, which I didn't find myself needing for the most part(ViewModel Pattern). There is almost no UI code except in my custom graphing control, and all of that code focuses on layout (with zero model interactions).
Unfortunately, one of the drawbacks of all this flexibility is that it impacts performance. Everything works just fine when there are only a few hundred data points on the screen. As that number goes up to a thousand (or a few thousand), things start slowing down. This can be seen by trying to change the bounds of the graph while the general election data is all visible. The problem isn't so much the continual redrawing, but rather the continuous application and reapplication of the appropriate templates.
In addition, I made significant use of LINQ (Language Integrated Query) for the dynamic queries. Unfortunately, it's quite clear that LINQ's existing infrastructure is not really build with dynamic and live query in mind. To the contrary, with every query I made, in order to keep the UI coherent, I had to manually recompose and re-run the queries, and then update an observable collection (that fires notifications when changed, allowing controls to bind to them), removing and adding items as appropriate. It seems quite possible to build an implementation of LINQ that is specifically geared toward Observable collections, and a preliminary search shows that some 3rd parties are in fact working on this problem. The language feature itself is amazingly elegant, and making it accessible for UI development (allowing it to follow the observer patterns) would go a long way toward making extremely rich and efficient UI/Visualization development simple.
I spent a lot of time working on performance and shuffling around data appropriately in order to get all of the filtering and querying right. I wish this were much simpler, and as I said, having that adjustment to LINQ would likely have reduced my model code by half.
Interestingly, I expected to need to put much more effort into the screen-scraping and data collection portion of the work, but it's much easier and more straightforward than dealing with visualizations and UI to a large extent because of the synchronous programming model that can be used for the data acquisition itself. With the visualization itself, I had to consider that any interaction could happen in any order and at any time, even overlapping previous actions. I'm sure I've still got bugs related to this, but I tried to hit the main scenarios.
Time Spent on this Assignment: ~30 hours (4 hours for data collection, 10 hours for templatable/flexible graphing control, 8 hours building the ViewModel/performance tweaking, 6 hours designing UI and animations, 2 hours other tweaking and general UI spiffiness)

