A2-MaxwellPretzlav

From CS294-10 Visualization Fa08

Jump to: navigation, search

Contents

NorCal/SoCal Weather

OS X's weather widget 9:30 AM Tuesday, September 23, 2008
OS X's weather widget 9:30 AM Tuesday, September 23, 2008

One thing I've noticed while living in Berkeley is that the general daytime temperature tends to be roughly 5-10 degrees cooler than it is in my hometown of Venice, CA whenever I check. I'm curious to see how true this really is, or if it just seems that way.

Notebook

Finding the Data

The NCDC download interface
The NCDC download interface

I began by googling california weather history data, which led me to the california data exchange selector. It turns out this website has temperature data for North Oakland but not Los Angeles, so I continued searching until I found the National Climatic Data Center. After blundering about this god-awful website for some time I finally came across Quality Controlled Local Climatological Data which allowed me to download daily weather data by the month. I downloaded weather data for Santa Monica Municipal Airport (about ten blocks from my parents' house) and Oakland Metro Airport (as close as I could get to Berkeley) for all of 2007 (a month at a time ... grr!).

Data Massaging

The data I received from NCDC came in individual CSV files by month. Unfortunately nobody seems to have told the NCDC how to actually format data, as each "PLAIN ASCII" file as they claimed came looking like this:

<html><head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"></head><body>Daily Summary
<pre>Month/Year: 12/2007
Station Location: SANTA MONICA MUNI AIRPORT (93197)
Lat: 34.016 
Lon: -118.451
Elev: 142 ft. above sea level
Data,Headers,Here
Tables,of,data,here
...,...,...,...
...,...,...,...
HDD monthly total:,328, 
HDD monthly departure:,M
HDD Season to date total:,M
HDD Season to date departure:,M
CDD monthly total:,0, 
CDD monthly departure:,M
CDD Season to date total:,, 
CDD Season to date departure:,, 
greatest 24-hr precipitation:,0.20, 
greatest 24-hr precipitation date:,18-19, 
greatest 24-hr snowfall:,M
greatest 24-hr snowfall date:,M
greatest snow depth:,M
greatest snow depth date:,M
sea level pressure max:,30.44, 
sea level pressure max date:,02, 
sea level pressure max time:,2215, 
sea level pressure min:,29.57, 
sea level pressure min date:,01, 
sea level pressure min time:,0019, 
number of days with max temp >= 90:,0, 
number of days with max temp <= 32:,0,s
number of days with thunderstorms:,0, 
number of days with min temp <= 32:,0, 
number of days with min temp <= 0:,0, 
number of days with heavy fog:,1, 
number of days with precipitation >= .01 inch:,0, 
number of days with precipition >= .10 inch:,0, 
number of days snowfall >= 1.0 inch:,M
</pre> 
</body></html>


So I wound up running each file through a little Unix command to strip off all the extraneous non-data lines.

tail -n +8 ${NUM}.txt | tail -r | tail -n +34 | tail -r > ${NUM}.csv 

Was I expected to simply copy and paste the data from the resulting web-page? The page they showed resulted from clicking a button that claimed to give ASCII output...

I also had to run each file through the following regular expression:

 sed -E 's/([[:digit:]]{4})([[:digit:]]{2})([[:digit:]]{2})/\1\/\2\/\3/' ${NUM}.csv

as it appears neither Spotfire nor Tableau can handle dates in the YEARMONTHDAY format that the NCDC uses.

Additionally, the tables were organized like so:

date max temperature min temperature avg temperature other stuff ...
20070102 69 49 59 blah ...

It turns out that neither Spotfire nor Tableau likes the data this way ... I spent over an hour digging through help files and guides trying to get Tableau to easily generate a line chart with average, max, and min graphed on the same axis. I ended up creating a new table in Excel and copying and pasting the data into it.

The table (click to download xls) now has this format:

Location Date Temp Type
SM 2007/01/02 69 max
SM 2007/01/02 49 min
SM 2007/01/02 59 avg
OAK 2007/01/02 50 avg
... ... ... ...

Unless I missed some very basic functionality it appears that both Tablaeu and Spotfire only accept data in this very specific (and redundant) way, and really don't like to having to draw connections between data sorted in other ways.

Visualizing

The first thing I saw when trying to visualize the weather data in Tableau
The first thing I saw when trying to visualize the weather data in Tableau

Now, finally, I can visualize my data. Tableau lets me quickly drop the correct items on the correct axes (now that my data is in the form Tableau wants).

When I switch to day view, I can even exclude the few instances of missing data from the data set:

Image:pretzlavTableauExclude.png

A little more massaging gives me this view:

Image:pretzlavFirstWeatherImage.jpg

Here already I can see some of the patterns I've been looking for in the data. Santa Monica consistently averages above Oakland. Interesting things to note: It appears Santa Monica had a number of days in January and February where temperatures reached the mid-80s, something not seen in Oakland until May. Additionally Santa Monica's Spring through Fall lows never reach nearly as low as Oakland's. Both places tend to rise and fall at the same times, with the annual peak being in late August/early September (this difference from much of the East Coast, where the annual peak tends to be in July).

Refining

My first attempt at visualizing the difference in the temperature on a day-to-day basis, however it is fairly hard to read
My first attempt at visualizing the difference in the temperature on a day-to-day basis, however it is fairly hard to read
Trying to view difference on a monthly basis by using monthly averages.
Trying to view difference on a monthly basis by using monthly averages.

While this visualization is very interesting and data rich, it still doesn't directly answer my original question. To answer that, I'd like to graph the difference in the data over time. My first attempt at that was using Tableau's "Table Calculation" feature, however it only seemed to be able to show the difference between the sum of both temperatures and a certain temperature, or the average temperature (which seemed to have the same value as the temperature itself?!). There seemed to be no way to create a "Calculated Field" that was the difference between two different measurements of another field (like I would need to manually create a "Difference in Temperature" field. Finally I managed to create a Table Calculation that displayed the difference between the two temperatures as one of the shown lines.

The difference in temperatures including Average "Trend Lines"
The difference in temperatures including Average "Trend Lines"

Tableau still insisted on displaying both lines, despite the fact that the Oakland temperature minus the Oakland temperature is of course zero. If I filtered out the flat Oakland line, it stopped being able to calculate the difference. I did discover that by adding "Trend Lines" I can see an average line, and it does appear that Santa Monica is consistently about 5° warmer than Oakland, with slightly more separated lows and closer highs.

To verify Tableau was doing the correct calculation (since the interface to create the Table Calculations was confusing and I wasn't sure if it was showing me what I thought I was seeing) I decided to re-create the difference data directly in Excel and graph it directly. This turned out to be identical to the difference line generated in Tableau, so I was doing the correct thing. Now I can easily see the daily difference in average temperature between Santa Monica and Oakland.

Daily average difference between Santa Monica and Oakland for the year 2007
Daily average difference between Santa Monica and Oakland for the year 2007

However, the graph is so erratic it is difficult to see much more than that the general trend is definitely positive. By averaging over the months, one can much more easily see the breakdown in difference.

Difference in Average Temperature Between Santa Monica and Oakland by Month.  I could not get Tableau to show the scale on the left axis for some mysterious reason.
Difference in Average Temperature Between Santa Monica and Oakland by Month. I could not get Tableau to show the scale on the left axis for some mysterious reason.

Here, now, the difference in average temperature between the two places is clear, and one can even see how the difference changed from month to month — Santa Monica was considerably warmer on average than Oakland was in January, while in April they were very close.

All this staring at difference data started making me wonder how the temperature difference might change during time of day. I never really pay close attention to the differences in temperature at, say 2:00 AM. Most of the time when I check the weather it's when I get up in the morning, around 8:00 or 9:00 AM. I decided to check some hourly data to see what this looked like. Since the hourly observations for a single month generate a good deal of data, I decided to compare only three months -- January, April and September, since they're all times when I'm in Berkeley and they're each quite different weather-wise. I was also curious to compare January and April, since January seems to have the most difference between the two places and April the least.

I downloaded the hourly data for the three months of 2007 and repeated my massaging steps as performed above. This data turned out to be somewhat more complex than the simplified organized daily data—I spent some time learning the difference between Dry-Bulb Temperature and Wet-Bulb Temperature. I decided to go with Dry-Bulb (plain air) temperature for my purposes.

Final Visualization

Image:PretzlavHourlyAverageTemp.png

This visualization shows the difference in average hourly temperature for the months of January, April, and September 2007 between Santa Monica, CA and Oakland, CA. The important feature to note is that all three graphs show the highest differentiation from around 8:00 - 10:00AM—this happens to be the time when I most frequently check the weather and notice the difference between the two places. Also of note is that in January Santa Monica's rise over Oakland came several hours later on average than in September, with April lying somewhere in between (but considerably less pronounced). Across the three months, the average difference in temperature for 9:00AM is 6.8°F, for 10:00AM, 5.8°F.

Exploring these two data sets showed me exactly why I've noticed the temperature differences that I have. Not only does Santa Monica average around 5°F warmer than Oakland, the times when I happen to check the weather are also the times when the difference is most pronounced.



[add comment]