A2-CalvinArdi
From CS294-10 Visualization Fa08
Contents |
Pre-Visualization
Domain
Given our financial situation, it'd be interesting to use a dataset with something related to finances or the economy. I browsed through data360 on a whim and found a few comprehensive datasets on 30-year mortgage rates as well as the personal savings rate. A lot of the data on the site tends to revolve around the US economy, e.g., exchange rates, and so on.
Another domain that might be interesting might be somewhere within the realm of social networks. Either the data gathered from user information submitted to a social networking site, or some sort of survey on the demographics or usage of social networking sites.
Initial Question
Is there some sort of correlation between the savings rate and the mortgage rate? If the mortgage rate goes down, do people tend to save more? Or perhaps the opposite, where if the mortgage rate goes up, people will also save more and tighten the purse strings in case they go up further?
Visualization Software
I've never used either of the visualization tools, but I wouldn't mind giving both a try. At the moment, it seems like Tableau is a popular choice, and was rather easy to register and sign up (though it's a bit disappointing that the software is Windows-only).
Worklog
Software Orientation
The data import wasn't that bad, considering the data was tab-delimited; I only had to replace them with commas, a trivial task with the help of awk. More specifically, the command used on the personal savings data set was as follows:
cat PSAVERT.txt | awk {'print $1,",",$2'} > newTest
(Side note: Did I mention either having dual-monitors or a large monitor helps out with this sort of thing?)
Importing the data is easy if everything goes the right way. Things I noticed so far:
- Have the correct extension (.csv) or things won't work
- Life is easier if you label your columns first
Some neat features I noticed is it immediately recognized the date format, and I could section it off by year, quarter, month, day, and so on. For the first image, I decided to do the year on the x-axis.
On the y-axis, there are several ways to manipulate the data. Going by year, it originally took the sum of all the rates; a simple click changed that to the average and we have the first visualization (I did a quick export of the image; the last two essential digits of the year is missing for some reason):
Seeing that there's a lot of "ink" being used to display the bar, I used a different style graph.
Hence, we can see an interesting trend in the savings rate so far. Playing around with the data, the savings rate actually becomes negative, if we were going by months. Are there any explanations in the trend of savings? Why is it that we're starting to save fewer and fewer as the years go by? More importantly, is there any trend with the way mortgage rates rise and fall?
Two Data Sets
Attempting to import two data sets from .csv files was a bit more difficult. Initial imports opened up new workbooks; I looked through the help manual to see what could be done, but it seems like such "joins" can only be done with an actual database (e.g., Access, *SQL, etc). It would have been neat to see it join two files with the date as the key (since the two datasets had the same format and their dates coincided at the first of every month).
However, even though I was working with data sets, it impressed me that it had (or seems to have) a lot of support for different accesses to databases and such. It certainly would be interesting to interact and visualize an extremely large dataset stored in a database.
I could have opted for an Excel notebook, but to keep things simple I opened the .csv file up in OpenOffice's Excel-equivalent and added a new column, copying and pasting the mortgage rates into the same .csv file as the savings rate. Unfortunately, the mortgage rate start date is about 20 years later than the start date for the interest rate, so we'll have to make do without those years.
Update #1
Still attempting to get things working; it's proving to be a bit tougher than I thought (or I may just be mindless clicking around). I'd like to somehow get the "range" on there besides the average; at some point the savings rate is actually negative (which I infer to be people spending more than what they actually have) and would like to represent that somehow.
Software Orientation Pt. 2
Struggled with the software a bit, and then realized that some of the "dragging and dropping" is available as a feature in places I didn't really think about. The following links helped tremendously:
- online help - search for "measure names" and click on the first link.
- Tableau Forum - in particular, a topic about charting multiple variables.
The following are some screenshots of my experimentation:
We can start to see through the bar graph that the savings rate was the highest when the mortgage rate was the highest. A quick rough guess from the graphs above and below put the mortgage rate at around 6-16% +/- 0.5%, and in recent years has been hovering around the 6% range without too drastic of a change. Perhaps this might have something to do with the housing market itself, and not just the mortgage... The savings rate, however, continues to steadily decrease with a slight increase in 2007/08.
I truncated the data set so that values from 1971 are displayed:
I was about to settle on this image until I found a few more interesting ways of displaying data. As Scott discovered, Tableau does have it's limitations; for example if you wanted to overlay different chart types together, that's not currently possible with the current version of the software. In particular, I wanted the bars to be different colors to make it easier to differentiate the nominal labels of the average savings rate and the average mortgage interest rate. Although you can control the color of the entire graph, it's not possible to manipulate the colors of each individual "bar" or category of values.
The solution ends up being a bit simple but not exactly obvious to the new user. Drag the "Measure Values" onto the "Rows" tab, and a new shelf of "Measure Names/Values" will appear. You can manipulate the data as you wish (taking counts or averages of a value), and then drag the same "Measure Values" label to the "Color" box in the "Marks" shelf. Doing this will allow you to manipulate the colors of the variables, as such:
Another small gripe: it's pretty neat to be able to drag some sort of calculation or a quick calculation onto the "Size" in the "Marks" shelf; I would have liked to encode the percentage change of the percentages in the size to get a quick idea of how much of a percentage change there was during that year, but because of the way the Measure Values are used, I'm unable to do so for each individual graph.
We can finally compare the two trends together, and although they aren't following each other exactly, when the interest rates fall, so do the savings percentage. Can we try to extract more detail? Using some of Tableau's built-in features, I was able to calculate the exact percentage change from year to year and plotted it out using the same method above. There wasn't a way that I found to be able to have two "Measure Values", so I created two sheets and superimposed them on each other using an image editor. (The JPEG exports from Tableau, however, aren't that great of quality).
Final Visualization
The end product are two graphs imposed upon one another. The trends of the personal savings and interest rates are graphed with respect to year, both of which seem to hit their peak around 1981-1982. As the years go by, we notice a steady, almost linear decrease in both interest rate and personal savings rate. The savings rate, however, is not directly tied to mortgage rates, as we see a remarkable decrease from 2003-2005.
The second graph visualizes the percentage changes in each of the rates with respect to year. In 1983, we notice a very sharp percentage change, possibly due to the stock market crash that year, with rates bouncing back up in 1984. The rest of the years don't seem to show any correlation of one rate affecting the other; these changes may be by-products of other economic indicators, like the stock market, GDP, or housing prices.
