From CS294-10 Visualization Sp11

Jump to: navigation, search



Data domain database of flight delays leaving SFO, integrated with the latitude and longitude of destination airports.

The visualization technique Focus + Context timeline with time selection, and a radial plot of the bearing to the destination airport.






reason for using this technique On one hand, I could have plotted each airport on a flat map, but I was more interested in where weather patterns occured - also, since these map projections aren't straight lines. Thus, I calculated the bearing (along a great circle) from SFO to the destination airport, and plotted it radially. (Comment: I'm not a map expert nor an airline routing expert or weather expert; but this is a trivial reason why this kind of graph will show more information than a simple map)

What I hope to visualize is clusters of delayed flights in one direction, which would indicate storms in the month I chose (January 2010). The overlapping dates in the focus area would show the density of flight delays. The interaction technique I hoped to enable is sizing the window and then sliding it over the data to detect patterns.

I could have used the Arc technique, but this requires an ordering of nodes - the example shown used a community detection algorithm, and I wasn't ready to apply this to my data.

Final Product


This is all flights departing SFO for the month of January 2010. (The reason for just one month was to make it easy on the browser to load.)

The airport icon is the center of the radial plot: around it are wedges: the shade of red represents how often there were delays in that direction in the selected time period. The length of red represents how many delayed flights there was in one day. I chose an opacity of 0.05 since it could represent up to twenty stacked days of delays. The length of the wedge is linear in the number of delays for one day: this could be problematic, since the area increases for a larger radius. Thus, the angle of the wedge is small. The angle is still large enough that one can clearly distinguish major directions of flights, such as those towards Seattle, Boise and JFK.

The airport names are scattered in their direction: the distance from the center is not the actual distance, but is there to separate overlapping distance. There's a redundant encoding of delay density in the shade of the wedge.

By adjusting the size and position of the context window, the user can zero in on parts of the month with lots of delays. from about January 16 to 22, there was an east coast storm:


Here's the overall data, showing that the directions with the absolute most one-day delays was actually towards the LAX/SAN direction. Chico and Boise are rarely delayed.


Conclusions I drew from playing with the data were: delays were rarely isolated: it seems that departures in other directions are often affected as well. In retrospect, this should have been obvious, since departing flights often arrived from delayed areas.

Changes between storyboard and final

I didn't yet implement arriving flights. If I were to do this, I would add some kind of 'arrow' mark to indicate the direction. I added a focus chart of the total delay time. This has more 'spiky' behavior than the context chart. For example, in the storm from the 16th to the 22nd, the actual # minutes delayed plot shows a huge spike around the 18th.

The context chart shows the total volume of flights, and below that, the total volume of delayed flights. Thus it's easy to eyeball what percentage of flights were delayed. The labels of airports didn't cooperate well, and I had to essentially randomize their radius while sticking them inside a Wedge mark. The labels overlap and don't reflect actual distance.

Development Process

All my materials are here: File:Flight bundle.zip

The first part was getting the data: The sources I used were http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time and http://www.partow.net/miscellaneous/airportdatabase/

The python script get_the_data.py reads in the file and filters out trips with a source of SFO. It also reads in the list of airport codes and LATLONGs, and maps airport codes to bearings from SFO. The script finally joins these trips with the bearings and then outputs two JSON files for the trips and directions.

How much time did I spend developing the application?

Doing the data integration: ~2 hours

Figuring out protovis: unspeakable amounts of time. I struggled a lot with getting angles to work, and ended up encoding them as wedges. Moving between angles and radians was also tricky. Finally, I had to do multiple iterations of my python script to get the list of destinations and delays in the right format. I learned a lot about how the .data method in Protovis works, though, and feel like I could do a similar application much, much faster with this knowledge.

Actually playing with parameters of application and writing features: ~4 hours

[add comment]
Personal tools