From CS294-10 Visualization Sp11
- Matt Can
In computer science, we often visualize relationships and structured objects with graphs. When we want to analyze several structures for their similarities and differences, we can reduce this to the problem of graph comparison. The goal of this project is to develop novel visualization techniques and accompanying algorithms to facilitate graph comparison.
To help motivate this problem, here are some examples of the kinds of objects we would like to compare:
- Social Networks - We take the social networks of two individuals (at some fixed depth) and want to know which groups of friends they share. Naturally, we encode these as graphs, which we want to compare.
- Recipes - There exist many recipes for chocolate chip cookies, and we want to know how these compare. One approach is to encode the recipes as graphs, where nodes represent ingredients and actions, and edges represent ingredient flows. This creates DAGs from the recipes, and now we want to compare them. This example generalizes from recipes to any process that can be represented by a DAG.
- Parse Trees - We want to visualize how syntactically similar two sentences are, so we compare their parse trees.
This project builds on previous work in dynamic graph visualization, the problem of visualizing the changes in graphs over time, as nodes and edges are added and removed. Dynamic graph visualization is related to graph comparison, and some techniques developed for it may transfer over. But with graph comparison, the focus is on highlighting shared substructure among two or more objects, not tracking the structural changes of a single object over time. If appropriate, we will evaluate our methods against those that exist for dynamic graph visualization.
Initial Problem Presentation
- Link to slides here
Brandon Liu - Apr 04, 2011 03:51:12 pm
It seems like there may be a lot of interesting related work in computational biology, for example, visualization of protein interactions
Julian Limon - Apr 04, 2011 07:52:31 pm
I like this idea a lot. However, I believe that the problem may be too hard to grasp if the task is too general. You may be able to narrow it down by choosing the most useful examples, such as cases where the labels are known and the graphs can be compared unambiguously. I particularly like the ability to use this for recipes. However, controlled vocabularies would be needed for both actions and ingredients to make sure you can make the best out of the data.
Michael Cohen - Apr 05, 2011 12:12:39 am
I really like this idea and have a couple of thoughts:
- I'm not sure you need a carefully controlled vocabulary to make good use of domain knowledge. For instance, in the recipe example, simply knowing that a box contains an ingredient gives you 100% certainty that it should match another ingredient, and a pretty high degree of confidence that ingredients with the same name are exact matches (the only exception I can think of is if one of the recipes uses the same ingredient more than once). By "fixing" the ingredient matches first, you can make the task of matching the more ambiguous action boxes more tractable. Note that this doesn't require limiting the content of the kinds of boxes.
- Brushing and linking would be a big help to users comparing graphs side-by-side.
Siamak Faridani - Apr 05, 2011 01:23:58 am
Great Idea. One possible domain for this problem might be commit trees for Git. I personally really struggle with Git's version controlling (although much better than SVN) so I am looking forward to using something like this for visualizing forks and merges in Git :)
I feel there are two aspects to this project,1- how you define diffs (algorithms) 2- how you represent them. I think finding an intuitive way of representing these diff graphs might be a very useful contribution to the field.
David Wong - Apr 05, 2011 02:10:34 am
For your project, I think it'd be much easier to tackle a more specific problem, say the analysis of social networks, than to go ahead and try to general problem as the way graphs change in one domain can be completely different from the way graphs change in another domain. With a narrower problem space, you can also cater the visualization to be more meaningful for that specific domain, (eg a genearl diff versus track changes in word). On another note, it can be interesting to try more than one "graph diff" method on your graphs and visually compare them.