From CS294-10 Visualization Sp11
I wanted to explore a dataset related to things I'm interested in. I played around with looking at Twitter trend data, but found some nice Starcraft 2 datasets courtesy of [SC2Ranks].
The dataset I will be using is the top 5000 Diamond-league-ranked Starcraft 2 players from October 5th, 2010. Note that since then a new tier called Master league has been created, but at the time Diamond represented the highest tier of players. For anyone confused, these are basically the top players in terms of the online ladder system.
My interest in exploring this dataset is discovering which subpopulation of SC2 players tend to perform the best. In this interest, I will explore different dimensions of the dataset and attempt to drilldown to a relatively focused set of players.
Artificial Additions to Dataset
There were two modifications I made to the dataset:
I added a "rank" column numbering the entries from 1-5000. This was for some reason not present in the original data, but is implicit. The points column serves almost the same purpose, but the wins/losses also have some weight, and there are many cases where players have the same points and I wanted a strict ranking.
I also had to clean the race column. There were cases when a player would have a race of Zerg/Protoss, for example, representing multiple races. This made it controversial how to categorize the player. I resolved this using the following rules:
- If there is a majority race, I used the majority, e.g. Zerg/Zerg/Protoss => Zerg and Terran/Terran => Terran.
- Otherwise, use Random as the race, e.g. Zerg/Protoss => Random, Zerg/Random => Random.
I feel that this is a fair way of dealing with the issue, although it may to some extent artificially inflate the proportion of Random players.
Score/Rank by Race
This initial visualization was average points by race. I isolated the graph to only show the 1400-1600 points range to make the differences more obvious. From initial observations, Terran has the highest average points, while Random has the lowest. Random mostly makes sense, since players who don't practice with a specific race tend to not do as well. However, saying that Terran does the best in general might be a bit premature.
I feel that this chart gives a much better picture of the situation, although it is harder to read. This graph uses rank instead of points, and while they should be approximately equivalent, using rank gives a better spread. For rank, the lower the value, the better the player is. This graph is additional bucketed by rank, into buckets of size 500. This allows you to get a better sense of the distribution within each race.
Judging on this new visualization, it seems my original assumption wasn't that far off at all. If anything, it dramatizes the differences. Terran has a disproportionate amount of players in the top 500, and Random has a disproportionately small amount of players in the top 500. The distribution for Zerg and Protoss seem more normalized.
Rank by Region
I next wanted to visualize a similar measurement by ladder region. Learning from my previous attempt, I bucketed the rank for each region. However, I additionally chose to measure the ranks by the percent of the total of each region. The reason for this is some regions like Europe and USA have a disproportionately large amount of players in the dataset compared to the other regions, which would cause misconceptions about what is being seen. Finally, I ordered the countries by the highest average rank.
This visualization supports the generalization that Korea has the best Starcraft players. The proportion of players in the top 500, or even the top 1000 is not even rivaled by any other region. In addition, the average rank is also highest, followed by Europe. Taiwan, despite being in 3rd place, seems to have a disproportionately low amount of high-ranked players.
Win Ratio, Total Games by Rank
I was now curious to see the disparity in win ratio between ranks. However, my initial attempt is almost incomprehensible. I still wanted to pull some information out of this, so I tried total games played instead of win ratio, but even that didn't produce much interesting. However, I was given hope by the ever-so-slight rise in what looks like the top 100 players. I then resorted once again to bucketing by rank:
This visualization proved much more compelling. The win ratio is slightly higher for the top 500, and slightly lower for the bottom 500, but is generally still rather flat. However, there is a clear correlation between the games played and the rank. It seems practice really does make perfect.
I decided to make these two graphs to compare to my previous visualizations on race and region. While it doesn't specifically affect what I think about my visualizations, it is interesting to note that the relatively low number of players from Korea compared to USA and Europe doesn't seem to stop it from having the highest ranked players. The proportion of each race corresponds almost perfectly to the average score of each race, however. This might mean that the reason Terrans have a higher average score might just be due to their population, rather than them intrinsically being better.
My final visualization limits the scope to the top 100 ranked players, out of the total 5000 in my dataset. This is to confirm the validity of my generalizations on the dataset as a whole. The region and races are ordered by number of records in the top 100. Missing columns mean that the specific region/race combination didn't even make an appearance in the top 100.
Korean Terrans appear as the most prolific, as expected. The average games, however, don't seem to pan out as well. Terrans in Korea actually play the least number of games. Terrans in Southeast Asia seem to play a disproportionately large number of games compared to their presence in the top 100. Nevertheless, players in Korea do play more games on average than the other regions.