A2-ScottMurray

From CS294-10 Visualization Fa08

Jump to: navigation, search

This page documents my process for Assignment 2.

Contents

1. Domain Selection

I've been wanting to work with political campaign finance data for some time. In 1998, I discovered Open Secrets, a website run by The Center for Responsive Politics which provides a friendly interface to public data submitted to the Federal Election Commission. This includes all individual contributions of $200 or more, which campaigns are required to report.

At the time, the FEC didn't offer this data on their own website. (At that point, they may not have even had a website yet.) So opensecrets.org was performing a real public service by collecting the data, in-person, from the FEC, and sharing it with the public on their website. At the time, for Joe Citizen to acquire that data, he would have had to physically travel to the FEC's headquarters in Washington, DC. As a result, few people ever saw the data, and, although it was technically "publicly available," it essentially remained secret, except from journalists and other muckrakers.

I remember punching in friends' names and being shocked to discover that one of them had given $5,000—the maximum allowed amount—to a particularly unsavory candidate. Upon confronting my friend, she explained that her father had made several such donations, each in the name of a different family member, in order to circumvent the $5,000 individual limit. And just like that, the website had exposed something that would have otherwise remained secret forever: that my friend's dad had broken the law in order to funnel more money toward his candidate of choice.

I'm curious to see if visualization tools can help reveal anything else of interest hiding in this data set.

2. Framing the Question

There are many big-picture questions one could ask of the campaign contribution data, but I'm more interested in its personal side. What about my friends and family members? What about people who live in my city, my neighborhood, on my block? What kinds of political contributions are they making, and what do those say about the politics of my local community? Let's begin with this question:

  • How do the political contributions of my neighbors compare against those of the nation as a whole?

This question should give us some flexibility for exploring the data later on, so we can compare donations by amount, party, or specific candidates or campaigns.

3. Finding the Data

Opensecrets.org was my first stop, but its interface has been modified since my last visit. Detailed name-and-address level data is no longer available—mostly just high-level summary information is presented. Open Secrets doesn't make any of the raw data available directly from their website, but they will provide a custom subset of the data for a fee. They do, however, provide some information through APIs, but not on the level of individual contributions.

The site pointed me to some other APIs that looked promising (including those offered by the open-source site GovTrack.us), but, in the end, it seemed best to go straight to the source, and download the complete data file of contributions by individuals during the 2007-2008 election cycle (64 MB).

Note that there are some legal restrictions on the use of this data. I believe my project falls would be considered one of the permissible uses:

4. Examining the Data

The 320-megabyte FEC data file was provided in a plain-text, fixed-width COBOL format. Tableau doesn't support COBOL, so I started by importing the data into Access. After 10 minutes of manually specifying field widths, the data was imported successfully. I then exported it as a CSV file, and imported the CSV into Tableau. That process took about 20 minutes, due to the size of the data set.

At that point, I realized that the file contained information only on the transactions, and not on who the money was given to, such as the party, candidate, or campaign's name. So I sought out the FEC file with the committee data (each candidate's campaign is represented by an FEC-registered "committee"), seeing that the committees (and, therefore, their parties) could be linked to individual contributions by a shared ID number.

I downloaded the committee data, imported the COBOL into Access, and then exported as a CSV. After importing into Tableau, I found it challenging to join the two tables, so I went back to Access and joined the tables there instead. Then I wrote a query in Access to show only records relevant to presidential campaigns, and I excluded unneeded data fields in order to speed up Tableau's performance later. Working with the full 320 MB file was far too slow, as Tableau would take several minutes to recalculate each change to the visualization, and I hoped that by excluding all of the unneeded data, it would run a little faster.

5. Creating the Visualization

After importing the merged data set, in Tableau I defined calculated fields to count the number of contributions to different parties. This shows the counts of individual contributors by state:

I then tried to break out contributors to Democratic and Republican campaigns separately:

Note, however, how the bars in the left and right columns are exactly the same lengths. I wasn't using the right data, but managed to correct that, and improve the presentation somewhat:

Now confident that the numbers were accurate, I added the three other parties that are captured in the data (Independent, Libertarian, and Green) and color-coded them:

This was interesting, but I was losing sight of my goal: to compare the national data against those of my local community. So next I removed the states to see just the national breakdown:

At this point, I realized that I'd been working with the numbers of individual contributors, but the dollar value of the contributions is much more interesting to me. So I made some more calculated fields, to derive the sums of contributions to campaigns of each party, and displayed it as:

Then, while maintaining the same scale, I filtered to exclude all non-California based contributions:

...and did the same to show only those contributions made from San Francisco-based individuals:

I would have liked to perform an even finer filter, showing contributions only from people in my neighborhood or street, for example, but street address information was not available in the data set.

I then struggled for quite some time to integrate the three previous images. At this point, I thought I could best answer my original question by displaying the national, state-level, and local contribution amounts all in the same visualization, to make it easy to compare the politics of the country as a whole against those of California and, more locally, San Francisco. Essentially, I wanted to overlay the three charts on top of each other. Tableau may have the ability to do this, but I wasn't able to find it. The "pages" feature lets you create multiple views of the same data and then "play" through them (like a flipbook), but I wanted to display them all at the same time, overlapping. Stuck, I turned to Photoshop to create the final composite image I imagined.

6. Final Visualization

Presidential campaign contributions made by individuals during the 2007-2008 election cycle, by political party and contributor location

Image:US-campaign-contributions.png

Using FEC data of individual contributors to presidential election campaigns, this bar chart illuminates that both California- and San Francisco-based individuals give significant portions of the total amounts received by campaigns of each party. It is also clear that Democratic campaigns have received over 1.5 times as much as Republicans in this election cycle, at the national, state, and city levels. Contributions to Independent and Libertarian campaigns barely even register in comparison, and Green ones not at all.



[add comment]
Personal tools