A2-KetrinaYim

From CS294-10 Visualization Fa08

Jump to: navigation, search

Contents

Stage 1: The Hunt for Data

Being an avid gamer, I decided that my domain would involve video games. And since sales are such a vital part of the video game industry (often overriding considerations of general game quality) I opted to look for data tables of video game sales. Following that decision was a simple question:

  • Which video game generated the most revenue?

Eventually, I found a suitable CSV table on Swivel for games released in 2006. There were 100 entries and nine columns in the table: Rank (integer), Game Title (string), Platform (string), Copies Sold (integer), Revenue (integer), Review Score (real number between 0.0 and 1.0), Release Month (date), Genre (string), and Publisher (string). A couple details to note are that the data involves sales in North America only, and that Rank is an ordinal value allowing one to arrange games by number of copies sold (though this is redundant since the number of copies sold is already one of the columns).

Stage 2: Visualization Generation

Before importing the data into Tableau, I converted the table into an Excel workbook (it wasn't necessary, but the formatting I added to make the table more readable would have been lost if I had saved as CSV). After fixing a few minor character conversion issues (for some reason all apostrophes in the original CSV were replaced with a set of characters) and merging a couple of genres that differed only by wording (such as Racing and Racer), the table was ready to be loaded into Tableau. It did not take much for me to arrive at an answer to my question, as I simply had to make Game Title the rows and Revenues Generated the columns. Tableau automatically selected a bar graph, which resulted in this rather long graph due to the significant number of video games released in 2006.

Tableau's interface made the graph easier to read, however, since the revenue axis stayed in place while I could scroll through the many bars. It became immediately apparent that Madden NFL 07 was the overwhelming winner in terms of revenue. That was when two new questions arose:

  • For each publisher, which game generated the most revenue in 2006?
  • Which publisher earned the most revenue in 2006?

Stage 3: Visualization Refinement

To address these new inquiries, I added Publisher to the rows, causing the games to be grouped by publisher (left). I also produced a graph showing which publisher earned the most revenue (right), but I ended up being more interested in the amount individual games fetched, so I did not develop the second question further.

It was a rather convenient way to look at all the games a publisher had made, and it allowed me to focus on one set at a time. This also helped me discover that Electronic Arts (unsurprisingly) had not only released the most games, but also earned the most money.

Stage 4: Question Overhaul

At this point, I began wondering if revenue was the best variable to be looking at when considering a game's popularity, particularly because I still had a hard time believing that Madden NFL 07 earned so much more than any other game, even those of the same genre. Because there was no indication otherwise, it was possible that revenue included rentals, which often occur when people just want to try a game before buying. Using the number of copies sold would instead focus on the people who found the game appealing enough to own it. I also chose to organize games by genre instead of publisher, allowing me to explore more possibilities for questions than a grouping by publisher would have. These considerations changed my question drastically:

  • For each genre, which game sold the most copies in 2006?

Putting Genre and Game Title in the rows and Copies Sold in the columns the following graph. I also added data labels to the bars, to facilitate the reading of values when the graph is viewed outside of Tableau.

Not counting genres that contained only one game, it was rather surprising to see how in some genres, the sales numbers were relatively close (as in the case for Action games), while in others there was one game that outsold all others by a vast margin (as in the Licensed and Sports category).

Stage 5: An Attempt to Go Beyond Bars

Knowing that the dataset also included review scores of each game, I decided to try adding Review Score to the visualization to answer the most perplexing questions of all:

  • Is there a correlation between a game's rating and the number of copies sold?
  • Is there a correlation between genre and rating?

Naturally, I assumed higher-quality games would sell better than average or below-average games. I also expected games of certain genres, such as licensed games, to be of poorer quality than games from other genres, especially if the game's primary purpose is to act as merchandising for legions of fans ready to buy anything related to their "object of worship." However, I was less certain of a rating's impact on sales figures. Initially, I was set on using bars to represent both copies sold and review score, but I ended up with a chart that was somewhat difficult to read and gave too much emphasis to rating(left). Part of this was due to the fact that Tableau did not allow me to choose a different color for the review score bars and the fact that I could not have each axis be of different lengths.

Switching to a scatter plot (right) did not help much, since many of the points overlapped. Tableau did allow me to mouse over a data point to see all of its details, but as an image the names of the games are lost, which meant I couldn't answer the simple question of which game sold the most copies. Then I discovered that Tableau featured a bar graph where a measure could be represented by color.

Final Stage: The Data Visualized

Having each bar colored according to its corresponding game's rating seemed to be a compact way to show sales figures and ratings, especially since I realized my questions were not focused on the specific value of the review score. Thus it was sufficient to use color to give viewers an idea of whether the game was good, average, or mediocre. I chose a gradient of green for good scores, which moved to gray for average and red for low scores. I had the low end begin at 0.3, since none of the games went below that value and it created a greater color difference between average and low-scoring games.

The end product was the bar graph below (caption is included in the image), for which the only complaint I could make is that Tableau ought to allow more customization in the placement of the legend. Each game is grouped by genre and listed alphabetically by title. The bars represent the number of copies sold, and a data label containing the number provides a redundant encoding that facilitates the reading of bars that are too far away from the axis at the bottom, though they would not be necessary if the visualization is viewed in Tableau. The bars allow comparisons of sales among games to occur more quickly than number comparison alone would allow, so one can swiftly determine the best-seller of each genre (and still, Madden NFL 07 was on top in copies sold). Color comparison between the bars also allow viewers to easily acquire a sense of the game's rating. This helped me determine that the correlation between rating and sales was more significant within each genre than it was overall. With the exception of Licensed and Action games, high-scoring games tended to have bigger sales. As for determining the relationship between genre and rating, that was evident from the colors of the bars in each group. And as expected, many of the lowest-scoring games were in the Licensed category. Shooter and Role-Playing had many of the higher scorers, most likely due to the considerable amount of time invested in design and development for games of these types. The true sign of the visualization's effectiveness was that it didn't take much more than a quick scroll through the graph to answer both questions.

End Credits: Tableau vs. Spotfire

Towards the last stages of visualization development, I figured out how to get Spotfire working on my computer and gave it a try. Though I did like how Spotfire allowed users to decide which columns and rows of a data table to import and the instantly-accessible filter sliders, I found Spotfire to be less flexible than Tableau in generating visualizations. For example, multiple dimensions can only be added to the Y-axis in Spotfire, while in Tableau the only restriction is the visualization's readability. I was also slightly disturbed by Spotfire forcing the whole graph to be visible in the window, even when it meant showing only every fifth game title in the columns or packing the bars so close that they form a blob made of rectangles. Both systems have their good and bad points, but for now Tableau is the more user-friendly of the two.



[add comment]
Personal tools