A3-NickDotyHeatherDolan
From CS294-10 Visualization Fa08
Contents |
Domain
We are interested in the social networking domain and want to visualize connections, incoming and outgoing, between people. More specifically, we will look at the Amherst College Planworld blog community and the subscription connections between users.
Research questions
Are connections being made across class years at Amherst College? If so, are these connections made within years flanking the year of interest? Or do they extend beyond that? Are connections generally made by a few people or several people?
These questions might be of interest to the school. Connections from past to current students might be a way of assessing alumni activity. These connections might also be of interest to potential students that are interested in a strong alumni community to leverage after graduation. The ability to examine these questions may also be interesting in comparing how many people are publishing in an undergraduate community versus how many simply consume.
Visualization and interaction
We want to visualize the various connections (and types of connections) between people in a social network. Our visualization can show the relative number of connections between groups and whether the connection is subscription or publication.
Interaction can allow users to investigate the data: clicking on a class year expands to show more detailed information for each person in that class year. Mousing over an individual person highlights their connections to other users and whether those connections are incoming or outgoing.
We think building on the design of the dependency graph will help quickly answer our research questions. It provides the right level of granularity for the data that we have and, while social connections aren't dependencies, the ring structure lends itself to investigating connections between people. Choosing some other structure (rather than a ring) would make distinguishing lines between years and users harder to see. We also share the same goal of wanting to initially see the size and number of connections but also provide the opportunity to drill down into nodes and the particular types of edges.
Design
Aspects of the existing design we want to leverage include:
- The ring structure
- Depicting the type of connections (dependencies in existing diagram) by using color. In our case, the color of the connection line would indicate one of three types of connections.
Additions and changes:
- We plan to represent each of 10 class years as a group around the ring.
- Connections between classes will be represented by a single line. The width of the line will depict the number (percentage?) of connections between classes.
- Classes can be expanded to see connections at an individual level.
- We will need to use three, instead of two, colors for the connection lines as there are three types of connections:
- Consumption Connection - a user reads (consumes) information published by another student, but the other student does not reciprocate or the consuming student does not publish.
- Published Connection - a student publishes work that is read by another student, but does not read the other student's work or the other student does not publish. The consumption and published connections are inverse to one another. Depending on the name moused over, you see a published or a consumption connection.
- Shared Connection - Both students publish and read each others work.
The type of connection will be encoded by the color of the edge (see second image above).
Data Set
We have data for connections between students at Amherst College for the past 10 years. As well as usernames, we can infer class years and some data about usage of the system (when they last logged in, how often they've published, etc.). We don't have (or can't use) other personal data like major, location, etc. Usernames may need to be anonymized before publishing.
We combined users and their connections into a single JSON file.
Final Visualization
Our final visualization is a circular connectivity graph. It shows connections between individuals in 8 class years. The size of the class year is proportional to the amount of space the year fans out over the circle's circumference. By default, it displays all consumption(reading) connections for 8 class years in blue. Double clicking on a particular year will show only connections between the individuals in the selected class year and people in other class years. This allows for the examination of publishing/consumption trends for individual class years. Reciprocal relationships are also shown, in red, in this state. These are relationships where two people are both publishing and reading the other person's work.
Since our research interests were not around specific individuals, but trends throughout class years and, in the interest of privacy, we did not display individual user names or IDs.
We've had some issues with deploying the Flex/Flare application, but this page may work. Be aware that this may take over a minute to load, and can be both processor and memory intensive.
If for some reason, this page doesn't work, download the application.
Application performance is not ideal. There is a delay between double clicking a class year and graph update.
Application for Windows: Media:DependencyGraph.zip
Deviations from Design in Implementation
We are using the circular graph with two views, similar to the two in the story board, but with a few changes:
- The individuals are not labeled in the implementation. It is possible to still see individual connections.
- When viewing all years at once, there is not a single connection line with a width indicating the number of connections. Instead, all individual connections are represented by lines using the same color and are grouped by class year. We take advantage of the micro/macro feature that Tufte talks about: by showing all of the individual connections, users can easily see the macro situation.
- Lines fan out over a portion of the circle indicating how many connections exist for a class year. It's also possible to see the density of connections with this implementation.
- The implementation does not highlight individual connections when mousing over a particular connection, but instead only when clicking on a particular year. (Changing this many edges on mouse over doesn't appear to be viable in Flex. Double-clicking helps imply how long the process will take.)
- Two colors show connections instead of three as originally proposed in the story board. Two is sufficient for examinations across class years.
- We weren't able to visualize the entire data set at once due to the large number of records causing performance problems an hitting limitations in the flash player, so we reduced the number of classes from 10 to 8. (The full data set has 2500 users and 40,000 edges. We reduced this to roughly 1000 users.)
Source Code
Source code is available here. In order to compile, you'll need the Flex and Flare libraries. You'll also need to import the Flare sample applications library as a reference (for the progress bar and a few other conveniences).
Media:DependencyGraphSource.zip contains all the source, templates and build files. Users may need to make a change based on the location of fonts on their system. Also, this project has a dependency on Flex, Flare and Flare.Apps libraries.
Division of Labor
Initial Design Discussions - Nick and Heather
Design Draft and Storyboard - Heather, reviewed by Nick
Getting Flare Up and Running, experimental code - Nick and Heather
Data Wrangling and Shipping Project Code - Nick
Cross Platform and Browser Testing - Heather
Final Write Up - Heather, reviewed by Nick
Development Process
Initial Design Discussions:
- We spent about an hour or two discussing the initial design. We knew that we wanted to use the dependency graph and most discussion was around how connections between people differed from the dependencies between flare libraries. We wanted to be able to show reciprocal relationships between users, which was not an existing feature of the dependency graph.
Design Draft and Storyboard:
- The mock ups, based on the initial discussion, and story board took a several hours to mock up in Photoshop and document.
Getting started:
- Neither of us were familiar with flash, ActionScript, flare, and flex, so we both spent many hours getting the development environments set up, working with tutorials, and getting the example project to compile and run.
Dealing with the Data:
- Nick spent a few hours manipulating the data into a useful format and creating the JSON file. We started with everything in SQL, anonymized, joined and pulled out just the fields we needed into a tab-delimited text dump and then used Perl to convert into a single JSON file with two arrays, one for users and one for edges.
Coding:
- Nick spent 8 to 10 hours on the actual code implementation. Much of this came down to fighting issues with scaling and performance (as well as the difficulty in debugging Flex applications). Several design decisions had to be made based purely on performance issues.
- We started with the demo application code for visualizing Flare library dependencies. Substantial changes were made (to reading data, processing data, visualization parameters, and interaction), but we still owe the basic structure (and the skeleton code) to the sample application.



