From CS 294-10 Visualization Sp10

Jump to: navigation, search


Step 1: Pick a Domain

2010.02.19: Worldwide Oil Consumption
For many years now, experts have been warning us about finite natural resources. In particular, they warn about the coming crisis of peak oil, when oil extraction begins to decline and oil demand exceeds supply.

2010.02.20: US Oil Consumption and Prices
From 2003 to 2008, the price of oil rose to record highs, and then plunged as the 2008 global economic recession took hold. This made me wonder whether the price fluctuations were purely due to changes in supply and demand, or whether there was some other factor at play such as speculation.

Step 2: Pick a Question

2010.02.19: What is the current oil supply vs demand situation, and how far are we from peak oil? (in other words, when has/will oil production decline?)

2010.02.20: How much of an impact have changes in US petroleum supply and demand had on US crude oil prices?

Step 3: Data Sources

I decided to use official energy statistics provided by the US federal government's Energy Information Administration. In particular, I used the following tables from the Forecasts and Analysis of Energy Data page for my initial visualization.

Initial Visualization

2010.02.19: I initially decided to plot time vs worldwide oil demand and supply on a line chart. However, I was disappointed to find that so far, worldwide production and consumption were quite evenly matched (no peak oil yet), and both were climbing at a fairly steady rate, which resulted in a quite boring visualization.


2010.02.19: Consequently, I decided to narrow my focus to US oil production and consumption. The difference between the two lines would correspond to net imports - exports, resembling something similar to Playfair's iconic Chart of Imports and Exports to and from England in VDQI.


2010.02.20: I soon realized that my visualization could tell a more interesting story by describing the relationship between US oil demand vs supply, and US oil prices, so I proceeded to add the price of crude oil (in $ per barrel) to my plot.


2010.02.21: I was dissatisfied that dissimilar units ($ per barrel and millions of barrels) were being plotted along the same axis, so I found out how to create a dual-axis chart, with one axis dedicated to the quantity of consumption and production (in millions of barrels of oil), while the opposite axis would describe the price of oil (in $ per barrel).


2010.02.21: I noticed that it was unclear which axis should be used for calculating the value of each line, therefore I decided to color-code the quantity axis and data points in orange, and the price axis and data points in green. An interesting fluke is that although demand began dropping in 2008, prices apparently spiked in that year, possibly due to excessive speculation.


2010.02.22: 2008 was an especially interesting year in this graph, so I decided to refine the granularity of my datasource so that it would display data by month, rather than by year. I spent a lot of time attempting to get rid of the separation between years. I think that the problem is that my datasource utilizes nested fields (multiple months within a year). According to the documentation, this is fine when importing data from multidimensional data sources such as an enterprise database, but for relational data sources like Excel, I need to merge the two fields together into a single date field. http://www.tableausoftware.com/onlinehelp/v5.1/online/Output/wwhelp/wwhimpl/js/html/wwhelp.htm



I think that the final visualization coherently answers my question of how changes in US petroleum supply and demand have impacted US crude oil prices. It is clear that from 1991 to 2007, US oil consumption has continually increased, while production has declined. As demand outstripped supply, oil was presumably imported from abroad, and drove up prices to record highs. However, the 2008 economic recession greatly reduced the gulf between oil demand and supply, and consequently caused prices to plunge in 2009.

2010.02.23: Everything above this line is part of my original submission for Monday; this postscript was written on Tuesday.

The gaps between the years really bothered me, so I decided to concatenate the separate month/year date fields into a single unified date field, as described in the help documentation. Unfortunately due to inconsistencies between Tableau's data parser and my spreadsheet program's internal date representation, the date range was constrained from 12/30/1899 to 01/01/1900. I tried several fixes such as: http://www.tableausoftware.com/forum/strange-filter-parameters-oracle-date-%2526amp%3B-time-field and after multiple failed attempts, I eventually got Tableau to recognize the correct dates by converting to the ISO_8601 format.


Below is the final result. From an aesthetic standpoint, I would have liked to denote the oil price as a series of dots (a candlestick chart would be even better) instead of lines to distinguish that it utilizes the ($ per barrel) axis rather than the (million barrels) axis, but since Tableau does not allow you to differ the data series style within a single chart, the color-based differentiation will have to suffice. Overall I am quite pleased with the result, but I think that I could have achieved similar results in a fraction of the time using my existing spreadsheet program.


In hindsight, I could have avoided much frustration by doing more preprocessing of the data before feeding it to Tableau, or choosing to use another dataset. My initial approach was to download the dataset as XLS, and import it directly into Tableau. Unfortunately, it seems that the layout of that dataset, and its use of merged cells caused Tableau to mutilate the data series labels (see the earlier revisions of the first 4 charts for examples).

I experienced much better results when I downloaded the data in HTML format, transposed the table in my spreadsheet program, and exported as CSV. In this case, Tableau made much fewer parsing/interpretation errors, although date/time format misinterpretation proved to be a recurring annoyance (like the 1899 issue mentioned above).

I think Tableau is a very nifty tool for creating visualizations; the interface is very easy to use as long as you do not stray from the most commonly used set of tasks. Unfortunately I encountered a wide gulf of execution whenever I tried to customize various elements of my visualization (eg: how do I eliminate the gap between the years?), as well as a couple of moderately frustrating / time-consuming bugs. I guess that with more practice, I can become more proficient at using Tableau and other visualization tools, and better understand their individual quirks and limitations.

[add comment]
Personal tools