From CS 294-10 Visualization Sp10

Jump to: navigation, search


The data domain I'll be using is database profiles. In my research under Joe Hellerstein I am building a tool that analyzes a database and creates a profile of information based on it. The base element of the visualization will be a column of a table. Depending on the database from which the profile is extracted, there may be many such elements. Regardless of the number of elements, there will be several features extracted from each column (and pairs of columns) and so there will be a high dimensionality of information. Much of the backend functionality for collecting this data is already written.


The interactive visualization will be a graph of nodes, representing columns, grouped by the table from which they originate. This preserves the structure of the database in the visualization, making it easier for the user to relate what he sees to the raw data. Each node will have 3 characteristics: size, left fill, and right fill. They will denote the data variability (how much the values differ from each other), unique value percentage, and null value percentage, respectively. This will allow immediate visibility for important characteristics of columns. There will be 2 kinds of edges in the graph: undirected edges denote columns that are similar, either absolutely or by substring resemblance; directed edges will denote columns whose values are detected to be composites of 2 or more other columns' values. This will allow the user to see visually relationships in his data that can be used to better administer the database. There will be a slider to set the threshold for which these edges should appear. This will allow a finer control over what kind of information is revealed to the user; at a high threshold connections reveal properties like foreign keys, at a low threshold connections can reveal interesting anomalies or redundancies in data management. When a single node is selected, details-on-demand will show a summary of information on that column. When multiple nodes are selected, details-on-demand will show the estimated strength and size of a join on those columns. This is a crucial step in creating queries.


Main view


Anatomy of a node


Join view

Final Submission

View online: here

Source Code: File:DBProfile.zip

The final visualization turned out fairly true to my original proposition with only a couple exceptions due to difficulty of implementation. Instead of arranged in a matrix, the table areas are set up side-by-side. This was both easier to implement and has the benefit of allowing you to easily see tables that have columns that have the same name. Instead of bisecting the nodes and filling the halves based null percentage and unique value percentage, I have encoded the percent of unique values in the column into the size of the node, and the percent of null values into the color value (redder = more null values). The final difference is the lack of directed edges denoting composite fields, due to time constraints. I have also added the option to filter out tables, at the professor's suggestion. I used sample data from my research project to try out the visualization. Future versions of the visualization will hook into the database profiler dynamically. I used Flare, and the biggest challenge in this assignment hooking up all the interactions in the visualization, including the slider which is implemented in JavaScript and hooked up to Flex.

[add comment]
Personal tools