A3-MarkHowison
From CS294-10 Visualization Fa07
Contents |
[edit] Data Domain
I am going to use data generated from comparing the price performance of a stock portfolio against a hypothetical portfolio in which each stock is replaced by a S&P 500 ETF. This effectively tracks the portfolio's performance against the S&P 500 by simulating what would have happened if all the money in the portfolio had instead been invested directly in the S&P 500. To obtain this data, I will start with a table of tickers, buy dates, and cost bases for the portfolio. Next, I will crawl Yahoo Finance, or another similar service, to obtain prices for the stocks. Finally, I will compute a summary table showing the total value of the portfolios and the difference in total value for each day.
[edit] Visualization Technique
There are two time-series graphs of interest for this data set: (1) a plot of total value of the portfolios over time, and (2) a plot of the difference in value between the portfolios over time. Both graphs might contain interesting features at different levels of detail, for instance the overall trend of the portfolios, localized increases or declines in value, or cyclical patterns. To best show these different levels, I will generate two or more copies of each graph with aspect ratios calculated by Jeff and Maneesh's algorithm for banking to 45 degrees. Since stock data is likely to produce fairly stochastic results, it may be that the algorithm is unable to find strong enough peaks in the power spectrum. I will test a number of different portfolios to investigate this potential pitfall. Jeff has suggested that one solution might be to switch from a Fourier to a wavelet analysis of the spectrum.
[edit] Storyboard for Interface
The first tab allows the user to input a portfolio of stocks. I'm not going to implement sell dates yet, since it complicates things. Hitting "Update" will download all the relevant price data.
The second tab will display time-series plots banked to 45 degrees. There will also be a simple dynamic query that allows the user to filter out stocks from the portfolio.
[edit] Implementation
Download: SnPbench.zip
The Java source code is located in
SnPbench/src/edu/berkeley/howison/snpbench/
The PHP data crawler is located at
SnPbench/data/DataCrawler.php
[edit] To Run
On Windows: double-click "run.bat"
On Mac/Linux: Open a terminal, navigate to the SnPbench directory, and type
java -cp dist/SnPbench.jar edu.berkeley.howison.snpbench.SnPbench
If you don't start it in this way, it may become confused about where the data directory is located.
[edit] Overview
The data source is a portfolio of stocks corresponding to the Motley Fool's 2005 Hidden Gem newsletter stock picks. That is, it simulates a portfolio where the holdings are Hidden Gems stock picks purchased with equal cost basis during the month each was recommended. The purchases, therefore, are from Jan to Dec 2005, and the portfolio is held without change until the present time. The S&P Benchmarker creates a second, hypothetical portfolio of S&P 500 ETF holdings purchased in the same amount and at the same time as each holding in the actual Hidden Gems portfolio. It then calculates the percent gain for each portfolio and plots the difference between the Fool portfolio and the S&P portfolio as a time series, providing a benchmark of the performance of the Fool's stock advice for that period.
Using the multibanking to 45 algorithm from Heer & Agrawala (2006), S&P Benchmarker automatically generates trend curves using Fourier analysis and suggests aspect ratios by banking each trend curve to 45 degrees via average-absolute-slope banking. The controls allow the user to:
- Change the granularity of the data by pruning by number of days (the default setting is to show every 2 days worth of data).
- Change the threshold parameter alpha in the algorithm, where a higher value more aggressively prunes out trend curves that lead to similar aspect ratios.
- Apply one of the aspect ratios to the time series plot. The slider always displays 1.62 (i.e., the Golden Ratio) plus the calculated aspect ratios for each trend curve. (WARNING: There is a bug in the program, and it does not always repaint after selecting a new aspect ratio. When this happens, move the aspect ratio slider around to force a repaint.)
- Dynamically filter specific tickers out of the portfolio. The filtered portfolio represents the scenario in which the filtered stock was never purchased, despite being recommended by Hidden Gems.
[edit] Final Writeup
I decided to jettison the tab for entering the portfolio data, and instead wrote a PHP script that downloaded a static set of data to use in the visualization (the Hidden Gems 2005 stock picks). I also added additional controls to the application, for controlling the behavior of the multibanking algorithm and time series plot.
I spent roughly 4 hours corralling data, which included: (1) writing a PHP script that downloaded and processed data from Yahoo Finance and (2) loading that data into HSQLDB, a Java-based lightweight SQL server. Since I am more familiar with MySQL, I hit several snags trying to use HSQLDB for the first time.
I spent another 4 hours implementing and debugging the multibanking algorithm, including time spent looking for a FFT library.
Finally, it took me somewhere on the order of 12 hours to remember how to program in Java, to create a GUI, and to locate and figure out the API for a Java plotting library. I ended up choosing JFreeChart, which has the major drawback of poor support for resizing data plots. Initially, I wanted to use prefuse but I was unsure how I would implement a static time series plot. In the end, I wish I had written my own simple plotting component using Java2D, since it might have taken less time than figuring out how to use JFreeChart.


