A2-OmarKhan
From CS294-10 Visualization Fa07
[edit] Visualization Software Assignment
i used tableau in this assignment.
data
i got my data from http://lib.stat.cmu.edu/datasets/. i chose the 2000 presidential election in florida dataset. see http://lib.stat.cmu.edu/datasets/fl2000.txt.
questions
my initial question was whether unusual voting results could be gleaned from the data, and if so, what were they and could they have an effect on the outcome?
of course, no one can forget the hanging chads and all the complaints about voting machines in florida after the election. to drill down on my question, i broke it into two parts:
1) the data should make it clear that even the tiniest change in votes could make a difference (see counts). otherwise, this all seems moot
2) if (1) follows, then the data can likely illustrate two ways that voting mechanisms could've caused voters to vote in a way other than they would have, thereby skewing the results
(a) voting for the wrong person (b) voiding a ballot by marking it incorrectly
data format and manipulations
the initial data gave one row for each florida county, with the following columns: "county" "technology" "columns" "under" "over" "Bush" "Gore" "Browne" "Nader" "Harris" "Hagelin" "Buchanan" "McReynolds" "Phillips" "Moorehead" "Chote" "McCarthy"
technology refers to the type of voting machine, columns refers to the number of columns on the ballot, under is the number of undervotes, over is the number of overvotes, and the remaining columns give the final certified counts reported by the florida division of elections for each candidate.
first, since my question focused on 2(a) and i didn't plan to go deeper into the actual technology or ballot marking failures for this assignment, i removed the "technology", "columns", "over" and "under" columns. then i transformed the data to have 3 columns: county, candidate, # of votes using python. i inserted this data into an excel spreadsheet.
data exploration
first, i answered 1. gore had 2,911,417 certified votes, bush 2,911,215, a difference of only 212 votes. so any widespread glitch could have caused significant problems.
next, i needed some way to identify unusual voting for 3rd party candidates. first, there were too many third party candidates to visualize all of them, so i only kept candidates with at least 10,000 votes total across the entire state. this left bush, gore, nader (far left consumer advocate), buchanan (far right conservative), and browne (libertarian party candidate). based on the extreme leanings of these 3rd party candidates, i conjectured that voting patterns for each of these candidates would differ from county to county depending on the voting outcome for the major party candidates in each county.
to determine a county by county voting pattern for the 3rd party candidates, i computed the percentage of votes received by each candidate per county. i used tableau to generate this data, and then placed it in my spreadsheet. finally, i defined a county to be a 'gore' county if gore beat bush by more than 10 points, a 'bush' county is bush won by more than 10 points, and a 'close' county otherwise. i used python to do this calculation.
the result is the visualization below. the percentage of votes for each county for each of the 3rd party candidates is plotted on a 1d scatter plot. each 3rd party candidate is given three scatter plots: one for counties that bush won, another for gore, and a close scatter. the points in each scatter are sized based on the number of votes received. finally, the median for each scatter is marked with a black line, and a dotted black line marks one standard deviation above the mean (i considered indicating the mean as well, but decided that the median was more visually telling and things were getting cluttered). i used color redundantly to help convey the winner of the county (red for republican, blue for democrat and grey for too close). finally, i put the county name on the data points (tableau decided which data points to label).
interesting observations immediately jump out:
1) the voting behavior for 3rd party candidates in the close counties looks somewhat different compared to bush and gore counties, though buchanan 'close' compared to buchanan 'gore' is similar, as is browne 'bush' and browne 'close'.
2) buchanan got an unusually large percentage of the vote in palm beach county compared to his showing in other blue counties. this is even stranger because we know buchanan is a far right candidate. this is especially important because based on the size of the blue circle, we know this is a significant number of votes.
3) nader, a left leaning candidate, had an unusually good showing in a number of bush counties, namely desoto and lee.
4) alachua county is an extreme outlier for both nader and browne. it's likely worth exploring it in more detail.
if i were to continue this study, i would then determine how i could integrate the type of ballot, under and over votes and perhaps demographic into a new visualization that drilled down on the results in the outlier counties, especially palm beach, desoto and lee.
