From CS 294-10 Visualization Sp10

Jump to: navigation, search

Wiki Book

Dataset 1: I initially thought closer to me and being a GSI for 2 consecutive semester for a large population undergraduate course, I thought of using that data to find out (hidden) links between the ultimate performance of the students (grades) to their performances through out the semester (each of the 12HWs, two midterms etc). I claimed that this would help detect the trends and accordingly scaffold the education of the needy in order to ensure that everyone does well.

However, I was denied permission to use/analyze/post the data since student scores are to be kept private (some regulations). (10 Feb 2010)

Dataset 2: I decided to work with an entirely different domain. In my masters thesis, I had worked on deformable objects modeling. In one of my (simulation) experiments, I had a patch of a deformable object, represented as points in 3D space and I applied 2 point forces in arbitrary directions with known magnitudes in R-cube space. I used a FEA simulation software to determine the deformation of the specimen surface at all points. For the visualization assignment in this course, I wished to determine the relation between forces and displacement, especially, the ones which are perpendicular to eachother. This was so because only such results would reflect on the material properties of the specimen such as elasticity, isotropic nature etc.

I tried importing a representative surface file, load (forces) data, displacement data and other details. However, I realized that visual feedback of the surface was the best method possible to visualize a number of 3D points and displacements. Tableu was not particularly helpful and using C++ with openGL to visualize the points proved better. But that is generally done in many software programs, solid modelers etc. and so I decided not to re-invent the wheel. (15 Feb 2010)

Dataset 3: This is a little closer to me as I travel (for leisure) quite a bit and most of it is by air. US is quite evolved in terms of domestic air travel and thus I decided to work on a specific area of my interest of air travel data. Department of transportation (DOT) website offers huge amount of data, updated regularly and I decided to use it.

While doing this assignment, I decided to find an answer to this question: What is the relation between actual distances between two points in the country and the relation to the airfare that one pays to travel in between those two points. Put simply, I did not wish to get into the deep problem of how airfare is calculated by the airline, but just wanted to figure out how much is the simple assumption - of longer distance means more money - applicable.

Huge data sets are available from the DOT here[1], and I decided to use only the latest data (3rd quarter, 2009). I used table 6[2].

Pre-processing: Since the data is already in Tableu-readable CSV format, I did not have to do much of pre-processing.

My first visualization looked something like this: (I curtailed it and here is only a part of the screen-shot) File:a3-1.jpg


Owing to the volume of data however, I decided to focus my attention to only a few markets/airports from where flights originated. This was done using Tableu filter on the origin city. In no way this reduced the importance of data as to answer my original question, even a few representative markets/distances would be sufficient. So I decided to close in on the three SFbay area airports and flights operated through them. Also this table provided by DOT has only those city pairs listed where average number of passengers per day is at least 10. (So there is no mention of a flight going to some corner of Alabama from small San Jose airport!).

Additional data and additions to the question: While working I realized that the data had a few more aspects which could help get more out of it. e.g. it has the airline information. So I decided to use it and have its mention in the final visualization so that the effect of distance on the airfare and its variation with respect to airlines can be seen.

My visualization as seen below shows: 1. Anomaly of airfare calculations where clearly distance is NOT the deciding factor (For lesser distance SFO-SantaBarbara has more fare compared to SFO-Santa Ana) 2. The embedded Hue clearly indicates that more the passengers, less is the fare. (Same example of SFO-SantaBarbara, Santa Ana) 3. It also indicates that OAKLAND is major southwest hub in bay area as most of the sector leading carrier is Southwest (WN)


Self-critique: Still there is some "decoding" needed so that one can find out what answer to look for. Also, if I could reduce number of records.

[add comment]
Personal tools