From CS294-10 Visualization Sp11
Domain and Question
I wanted to answer some questions relating to my research. I work on the Opinion Space project. Opinion Space is a new innovation tool that maps participants in a discussion to the 2-D plane, where those more similar in opinion are closer together, and those with differing opinions are further apart. Each participant answers a discussion question that is evaluated and rated by other participants in the space. For more information and a 3-minute demo video go to: http://opinion.berkeley.edu
We recently deployed Opinion Space with an organization interested in using it to engage a focus group on questions relating to the US automotive industry. In this instance Opinion Space, we allowed users to write responses and rate each others responses for a 2 week period. After that seeding period of ratings, we introduced a leaderboard featuring the top 20 responses in the space. We then allowed users to view the leaderboard and rate those responses.
My question is after we introduced the leaderboard, how much did the ranking of the top 20 comments change? Also, what was the general trend of the reordering of the list, if there was a reordering at all.
I retrieved the unique rankings of the top 20 comments between the time the leaderboard was introduced to when the site closed. This data fit the following model:
rank 1 | ... | rank 20 | time |
With rank 1-20 and time as the dimensions and the comment ids as the measures. I first tried to use the raw data alone, but was unsuccessful in generating any substantial visualization in Tableau, as it was difficult to manage the data this way. What I was really interested in was using the comment ids as the dimensions. So next, I processed the data a bit with a python script that created a new csv with the following data model:
comment id | rank | time
This way, I treated comment id and time as dimensions and the rank as the measure.
After inputting the data into Tableau, I first generated a parallel-coordinates/bump-chart like plot that illustrated the change in rank for each comment, with each comment being a separate line. The result is the following giant visualization:
My initial question was answered: yes there was significant reordering going on over the period where the leaderboard was introduced. As an aside, it is also important to note that there were not many new introductions into the top 20. This is partially due to the nature of the introducing the leaderboard, where those responses have more exposure and hence have a feedback cycle of getting more and more ratings.
My next question was: were comments for the most part moving in one direction (i.e. only getting lower in ranking or getting higher in ranking), or was there fluctuation in both directions? Another way to frame the question is: were comments, for the most part, moving towards their true rank and hitting somewhat of an equilibrium? As the final ranking is what we're interested in, to answer this, I filtered out to see only the comments that ended up in the top 5, and those that began in the top 5.
Comments that began in the top 5
We see very nice general trends of these comments moving, for the most part, in a pattern towards their final resting positions. Notable examples include comments 715 and 804, which quickly plummet from the top 5 and continue on a downward trend. Comment 5 and 799 prove their value as they hold on to about the same position throughout the entire visualization. Finally, comment 575 starts at the top and gradually falls towards resting in the middle of the top 20. So far so good.
Comments that ended in the top 5
We saw equally nice trends in this graph. Comment 410 has quite an inspiring trend, as it began as rank 17/20 and continually rose in the ranks until it ended up as the top comment overall. Comments 623 and 720 do see a bit of fluctuation, but primarily in day 14, where there was the majority of change in the rankings. After day 14, they exhibit a general trend up to the top 5 comments. Comments 799 and 5 are repeats from the last graph.
It seemed like things were going very well, so I decided to do a sanity check. I took a look at the ending few days to see if there were any abnormal trends in the top 10 or so responses. Given that most of the comments followed a general path, I was looking for anything that looked like it had too steep of a slope (positive or negative). The results were the following:
Possible strange behavior
Comment 172 stood out as it was not on the leaderboard for day 13, only on the leaderboard for a portion of day 14, and then quickly rose through day 15-17, a period of time where most other comments did not experience a large amount of change. This abnormal behavior could either be an outlier in the data, a data processing error, a result of a bug in the system, or a result of a user acting maliciously. This will require more analysis.
I chose this visualization as the final visualization as it illustrates the eventual hit of an equilibrium in the last 4 days of voting. I chose to omit day 14 as it made the visualization much more manageable without losing too much data (although comment 410's spectacular rise to the top is no longer visible). The original question began as an interest if the rank of the top comments had changed after the leaderboard was introduced. After that was made clear, the real question of value became clear: does having a leaderboard introduced after a period of seed ratings help move comments to their true rank in the top 20. While the other visualizations generated are successful in illustrating a general move towards equilibrium, this one does the same without becoming too unwieldy.