From CS294-10 Visualization Sp11
Exploring Web site stats in Real-Time
I have used google analytics and other services for website statistics. However, one thing they cannot do is tell me which users are visiting which pages or downloading which files. Hence, I have stored this information manually, and my database is full of this data. I wanted to produce a web interface for querying these web statistics, and easily be able to determine the most popular pages and group them various columns. A user of this system can find out which files are most downloaded, which users are downloading the most files, which pages are most visited, etc.
I wanted to make an exploratory interface first, so that you could continually ask new questions. I want to be able to filter by date and change the grouping and columns for which a user is searching. This allows the data to be explored and more importantly, catered to the questions that a user may have in regards to popular files or pages on a website.
To keep this simple, the first sketch I made simply had a bar chart to display data. I thought about the most intuitive way to allow a user to specify queries, and for the sake of simplicity, I decided to use HTML forms. Although not quite in line with direct manipulation, it is a metaphor many people do understand. The reason for this mechanical interface is mainly because of the data that I have access to, MySQL. Hence, a query interface logically is structured, hence, a form seemed like a good interface.
As you can see in the sketch. There are select menus for which data to look at, which data to group by, and a date filter. These are used to filter and query the data.
The idea was for the interface to be exploratory. In the sketch below, the idea is that the user decides to change the group by parameter to get another perspective on a particular query.
I spent a good amount of time working on the project. However, a lot of time was spent exploring protovis as well as writing server-side code to communicate via JSON. I think now that I have a framework in place, it would be easier to add features and make a more refined version of this project.
Changes between concept and implementation
I added a few features. For example, you will see below that there is a table that populates in addition to the graphics. This came later, as a result of wanted to know what Web page is associated with given data. Additionally, I added a color frequency control which you will find below. I also added the ability to change the limit of records, which was not in my original sketch.
The first thing you will see is a form:
Once you submit the form, you will notice that it populates a table, providing the user with the name of the files or users which were grouped by, and an associated link if the user wishes to explore what page this data is referring to on the website.
A few considerations were made for accessibility. For example, you can bookmark the page because it uses the GET parameters to create the graphics. This makes it so you can send a graphic to a colleague or friend.
Graphics and Queries
The graph contains labels that are associated with the data. At the top of each bar is the value of the column data. That is, it is the number of hits or number of downloads of a page/file or user. Depending on the data, you will see a black label vertically aligned representing the name of the data. For example, in the figure below, you will see the page named "Tornado : tornado.hip" when querying for downloads grouped by files. What this means is that the "tornado.hip" on the Web page "Tornado" is the most downloaded.
By default, all queries are ordered by hits in descending order. This is currently a limitation on the software, but could easily be added as another option.
Some Controls and Filters
As mentioned above, filtering was an important option for being able to explore the data set. A few considerations were given: data grouping, color, date, and number of records. Other variables could be added in future implementations.
After a few iterations, I realized it may be nice to change the color of the individual columns. So I added a color frequency that can be changed by the user to create different color schemes.
The user also has the ability to change the limit on the number of records. I set the max of this to 100 for security reasons on the server side.
Grouping was necessary because it can give you information about the same pieces of data from different perspectives. What downloads were most popular? What users downloaded the most files? Which pages were visited most and which users are visiting those pages?
Overall, it's been fun to look at the different data. Although, I think I would add more features. For one, I would want to inspect the data directly on the graph itself as opposed to a form. Additionally, giving the ability for the user to get rid of outliers or query for data on specific files would be nice.
You can view it live here: http://www.dan-lynch.com/visProj
The code here (you will not be able to connect to the database as the code must be run on the same server for security reasons): File:Dla3.zip