A2-NivayAnandarajah

From CS294-10 Visualization Fa08

Jump to: navigation, search

Contents

Step 1: Selecting a Domain of Interest

I found this to be a rather daunting task due to the variety of information available on the web. There was information on nearly every interest I tried, so I opted for one that has a statistical air about it.

One thing I have heard way too much of in my life is sports statistics. In particular: basketball. There is nothing people love more that spouting out arbitrary sports statistics. Arbitrary is the key word in this sentence. The numbers are never compiled to reveal anything about the nature of the game or how it should be played. This I knew would be an interesting topic to tackle.

Step 2: Posing an Initial Question

Just like any sport, basketball is centered around the timeless debate of offense versus defense. Every coach likes to think defense wins games, while players play like to play like offense wins games.

A simple visualization could clearly identify what wins basketball games: good offense or good defense?

Step 3: Finding a Database to Answer Question

I chose to get data from NCAA collegiate records. Where NBA analysis is complicated due to the powerhouse performance of star players, college games still showcase consistent offensive and defensive performances.

I first tried my luck searching the various database sites (swivel, LexisNexis, etc.) but couldn't locate exactly what I wanted. I fell back on my good friend google and found a solid data set: http://www.kenpom.com/stats.php?y=2008

The data set contains statistics for the 340 Division 1 NCAA teams for the 2008 season. The author combines statistics for offense and defense to create a raw offensive and defensive efficiency score. He then adjusted this based on the difficulty of the contest. He then ranked all data for the entire contest for each of these categories. Additionally he provided win percentage and a tempo score. These statistics alone should help me visualize a solution to my question.

Step 4: Examining the Data

The data simply required column labels and then was ready to be imported into tableau. Tableau accepted everything with minimal difficulty.

There was a large amount of data with much that was irrelevant or redundant when trying to answer my question. I chose to focus on using the offensive efficiency adjusted rank, defensive efficiency adjusted rank and win percentage.

Step 5: Initial Visualization

My initial instinct was to simply plot the offensive efficiency adjusted rank and defensive efficiency adjusted rank versus win percentage. I had to force the data to show all values before receiving any values. This was the result:

Image:Bball1_first.jpg

As if it wasn't already abundantly clear, this is a terrible visualization that does not bring any resolution to the question at hand. The data does show the correlation that the better the offense, the better the team does, and the better the defense the better the team does. However, comparing values relative to one another is not clear. Forcing the user to translate positioning by eye is very shaky.

Additionally, this graph shows the large amount of variation in the data. The noise needs to have a clearer method of being visualized.

Step 6: Evolution of Visualization

At this point, I started looking at additional questions this data may be able to answer.

The first was do winning teams need a balanced offense and defense?

By plotting offense and defense rank on the same plot, we can clearly see that winning teams have both good offense and defense. By scaling the size if the circles, I could minimize overlap. There are very few outliers that have good offense and bad defense or vice versa. We can additionally see a correlation that for a given defensive level, the team gets dramatically increases in offensive efficiency to get to the next level of winning percentage. This stands as a testament to the exponentially competitive nature of the league.

For the sake of curiosity, I wanted to posit the idea that team with higher tempo have better win percentages. Tableau allowed me to quickly see the error of hypothesis:

Image:Bball3_tempo.jpg

The visualization clearly illustrates a clear lack of correlation between tempo and win percentage through an evenly randomly scattered plot. I found this to be a surprising data set.

In order to identify a which mode of play was more effective, I had to ask what was the difference in offense and defense for a given team. To plot the difference of these values, serious massaging of the data had to be taken. The efficiency data given in separate columns had to be placed into one large efficiency column. A separate column for labeling which corresponded to defense and which corresponded to offense had to be created. Then values of rank and team name had to be redundantly applied to these new entries. The result allows comparing values on the same graph:

Image:bball4_combo.jpg

This graph at least shows that often times defense was more important and often times offense was. Towards the higher ranking teams, defense tends to stay more consistent. This still does does give a clear answer though.

I decided to take a subset of data and asked how offense and defense played a role in the top 20 teams. This resulted in:

Image:bball5_combo.jpg

This graph made use of raw score to allow easy comparison of defense and offense. Unfortunately the redundant information at the bottom could not be removed. The smaller data segment still shows no visual trends in data.

At this point, I am accepting the fact that either offense or defense is more important than the other. The only trend that arises is the need for balanced team with strong attributes of both.

Step 7: Final Visualization

The only trend that was discovered in the examination of the data was the necessity to have a team balanced in both strong levels of offense and defense. The final visualization was created to clearly showcase this balance and its relation to win percentage:

Image:bball6_final3.jpg

This visualization in essence diffuses my original question. Although more thorough analysis was necessary to discount the potential for any trend favoring offense or defense, this overview graph illustrates a larger overarching trend.

In all I thought this was a very interesting assignment. Tableau wasn't as easy as it appeared to be (massaging data in order to compare was tedious). It is still a quick tool for seeing all the possibilities. I was surprised to see the question changed scope and target more so much. But I guess that's what rapid visualizations will do for you.



[add comment]