From CS294-10 Visualization Fa08
Schema search visualization
Let's create two super great awesome visualizations for schema search!
Databases and other structure data (spreadsheets, html) have schemas -- blueprints of the way data is laid out.
Schemr is a metadata repository and search engine for database schemas. The backend is a relational database with a nifty extensible data model. A web service exposes an API for schema manipulation. The search algorithm uses simple schema matching techniques to produce similarity matrices between a "search graph" and a target schema. A search graph is how Schemr represents search terms. In addition to keyword searches (like web search), Schemr supports searching with a schema fragment. In other words, Schemr produces vectors of similarity scores between the search graph elements and the repository's corpus of schemas.
Currently, Schemr search results visualize only a single repository schema at a time with few ad-hoc visual encodings to show the quality of match. There is no interactivity.
The server is implemented as a Java web service and runs inside a J2EE app server, like Tomcat. The client is a Eclipse RCP application using SWT, Swing, and the Prefuse visualization toolkit inside.
This is the first component of an open source information integration project.
The repository has several hundred database schemas. These schemas come from domains such as real estate, automobiles, and e-commerce, and are both relational and xml.
Search results are essentially tables: columns are search graph elements, and rows are repository schema elements, and each cell value is a similarity schore between 0 and 1.
- "Listing View" - Show listing of search results by page; N results on a page
- Show roughly the size and relevance of each result
- Allow drill-in to a particular schema
- "Schema view" - Show interactive correspondences between search schema and repository schema
- Allow selection of a region of schema elements
- Highlight matching elements
- Decrease focus on non-relevant elements.
- "Listing View" - Small multiples preview of search results.
- "Schema View" - Interactive visualization of schema elements and mappings
Schemr Visualization shows schema search results from a query composed of the Berkeley course catalog schema, and a couple of keywords. The query results are cached locally and initially presented in a small-multiples view. The user can then interact with the results, and optionally open a schema result on the detailed view page.
A difference between the state of the current application and what was specified above is the visualization of semantic correspondences between the query schema + keywords and the resulting visualization. I could not (so far) find good way to visualize the mappings with lines. Instead the visualization has correspondences encoded as text color, shape size and color.
The source code is available on at svn co http://broccoli.cs.berkeley.edu/svn/broccoli/openii/VisClassSchemrClient/
The application for OS X can be downloaded here: http://www.eecs.berkeley.edu/~kuangc/projects/schemr-visclass.tar.gz (45mb)
Windows http://www.eecs.berkeley.edu/~kuangc/projects/schemr-visclass.zip (39mb) (warning: the Windows version may need some memory args tweaking...)
Schemr was a project I worked on at Google this summer. It requires that the application be written for the Eclipse rich client platform (RCP). One tricky part is that Eclipse requires its UI to use SWT, whereas the visualization package I chose to use was Prefuse, written on Swing.
A large part of the development time this time around was figuring out the correct interaction model between the the AWTEventThread of Swing and the same in SWT. Figuring out when and where to dispatch a different thread, whether synchronous or asynchronous, to prevent illegal thread access was challenging. If I had to do it again, I'd definitely not use two different GUI packages. Learning Prefuse, and in particular, the interactions, took some time as well. I found Jeff's examples helpful, and hope there will be more documentation in the future.
Since Schemr requires a postgres DB, a Java application server (Tomcat), I decided to cache a sample search as serialized objects using Java Serialization. While limits the scope of this demo to a single search, it decreases the work involved in setting up the system for evaluation --- one download, rather than a long setup process. It also sped up development time as I didn't have to make round trips to the server.
Schemr's visualizations are two:
For the small-multiples view, I decided to use a node-link tree layout to show the graph structure for schema results. Within each view, there is the ability to pan (left-click drag), zoom (right-click drag), center (right-click), and open the schema (right double-click). Next, I plan to present a different small-multiples view of bar charts quantifying the % of matching elements, quality of matching per patching element, and size of resulting schema. This alternative view would be both faster to render and give a different, more quantitative way to evaluate the match. I plan to add this soon.
For the detailed schema view, I use the same node-link layout, to show a bigger version of the schema. Correspondences are encoded with text color. The quality of match is encoded as a size. Different schema elements -- whether it's a schema title, element, attribute or relationship is encoded as node color.
One thing that I struggled with and didn't figure out how to do in time is to automatically find the centrold of the tree, and zoom to accomodate it in the panel. There is an action that does the same -- I plan to add this soon.