Multidimensional Data Visualization
From CS294-10 Visualization Sp11
Class on Feb 7, 2011
- Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases. Stolte, Tang, and Hanrahan. IEEE Transactions on Visualization and Computer Graphics, 8(1), Jan 2002. (pdf)
- Multidimensional detective. A. Inselberg. Proc. IEEE InfoVis 1997. (pdf)
- Dynamic queries, starfield displays, and the path to Spotfire. Shneiderman. (html)
Jvoytek - Feb 07, 2011 01:32:09 pm
It seems to me that both the scenarios in the readings provide methods for filtering the dimensions available in a multi or n-dimensional data set to a more manageable size. This is necessary because whenever I'm presented with a visualization of such complexity I often think I'd need to know what I was looking at before it made any sense, in which case the value of the visualization becomes in finding novel relationships between data points rather than getting an overview. It seems to me that there is an intrinsic difference between the types of automatically generated n-dimensional displays in the papers and more crafted displays meant to "tell a story" rather than ask a question or allow knowledgeable practitioners to investigate a complex data set.
Dan - Feb 07, 2011 04:58:51 pm
The first paper by Inselberg discusses how to analyze graphics using parallel coordinates for displaying data. His analysis was good when he discusses "obtaining clues" in the data, seeing where different variations deviated or showed patterns in clusters. He also mentioned that interactivity of the software displaying the graphics is important, to determine what a particular line in the graphic consists of. At first there is no way to tell except for global trends. In all, it was a good delineation of a process to find relationships within multivariate datasets.
I think in general the display of n-dimensional data is a complex task, and the context can really affect how you should choose a visualization.
David Wong - Feb 07, 2011 07:26:40 pm
I found the Polaris paper to be particularly interesting. Their ability to link data that was selected from one perspective to another is an incredibly useful tool in data exploration. As most data tables are multidimensional, this system seems very useful in the everyday world. One aspect that I thought was missing from this paper was the ability to manipulate data and see the change, which was noted in the Inselberg paper. I think this would be a helpful addition to the interactivity of Polaris. If possible, I'd like to read more in the literature about interactive visualization systems that help users explore and spot trends in data.
One question I had about the Inselberg paper was that it seemed the analysis they performed when querying specific trends in the data could have been found through some sort of logistic regression (normal or regularized), with the other dimensions being the features and the quality and/or yield as the indicator variable. This way you can identify which features are contributing to the specific variable. While I realize that regression won't include any semantic knowledge of the data, it seems more rigorous and isn't subject to the idea that "you can't be unlucky all the time!", which isn't the most encouraging guarantee.
boheekim - Feb 07, 2011 08:06:45 pm
I think these readings emphasize the point we've discussed in class, about how visualizations can reveal issues with your data. Inselberg stresses the importance of recognizing our own expectations of our data and testing our assumptions. I also really appreciated his "don't let the picture scare you" attitude in the beginning. I often find that the visualizations I end up needing to navigate are packed with data, which is good, but can make them frightening to approach.
Saung Li - Feb 07, 2011 09:17:07 pm
I like the whole idea of Polaris where it helps generate visualizations from given types of data. Since it knows what graphics are good for ordinal vs quantitative types of data, it can help people avoid using bad representations of the data. The huge emphasis on efficient database management allows for greater interactivity so the user can create the visualizations more effectively. It would be interesting to see Polaris support animations for data like growth over time, as this would help users understand the data through a presentation-style format. A supplemental project to Polaris could be some program that can automatically format data that can be fed into Polaris, along with automatic information retrieval. This, of course, seems pretty challenging since data can appear in all sorts of formats. Perhaps a new standard for interfaces for data can be created by the community to support such automation.
Manas Mittal - Feb 08, 2011 12:16:08 am
I like the mathematical formalism that is introduced by the Polaris paper. It adds to my model of thinking about visualizations. One thing that stood out in the Polaris work was the 'drilling down' and 'drilling up', and to view visualizations as data exploration rather than looking merely for output, i.e., the exploration journey is more important than the end point.
Maneesh mentioned that this paper was 'dense', and was one of the key papers in the last decade. I am not quite sure why - to me, it seems to be largely a paper discussion how this visualization toolkit is implemented, i.e., its something a product manager in a company would define.
Michael Hsueh - Feb 08, 2011 12:23:27 am
I really liked Inselberg's examples of multidimensional exploration. By extending visualization methods to facilitate exploration, we are able to gain insights too complex to obtain through lone, static images. But the methodology goes much further by allowing computational models using hyper surfaces and bounding envelopes. This is significant, as it extends the reach of visualization to not only encompass analysis, but also decision making. Computational models mean we can interactively explore the consequences of tweaking variables. Of course, we can already do this using statistics but the visual component gives us much more power. Numerous types of complex problems involving decision making (e.g., optimizations) can be helped by this computational methodology.
I appreciated the 'detective' approach described by Inselberg. It shows that visualizations can be useful sometimes only through an iterative process. Rather than seeing the story all at once, we may need a series of visualizations to understand some subtle and important relationships in the data. Information from each individual visualization must be composited and used to refine further visualizations. Visualization then becomes a forward process in which parts of the 'story' are told incrementally.
Matthew Can - Feb 08, 2011 03:35:56 pm
I agree with Manas. I think a big contribution of Polaris is the table algebra, its tight integration with the relational database model, and its direct mapping onto the axes of the visual display. This is what allows users of Polaris to focus their attention on the data rather than the particulars of the visualization (that is, compare it to Excel).
Regarding David's comment about interactive data manipulation, Polaris does support data transformations (though I'm not quite sure what you had in mind). For example, the operations include data partitioning, aggregation (summation, average, etc), sorting, and filtering. I would agree though that Polaris only scratches the surface of what's possible with interaction techniques for data visualization. It provides just brushing and tooltips.
Sally Ahn - Feb 08, 2011 04:26:13 pm
Like Michael, I also enjoyed Inselberg's "detective" paradigm--and as most real datasets involve many dimensions, the "multidimensional detective." His paper underscores the need for interactivity for any visualization tool for complex data. He emphasizes evaluating such tools with "real and necessarily challenging datasets," and he does exactly that by rigorously describing the analysis process with the VLSI and economic datasets. One point I found interesting was that this process relies on the human to perform many iterations, which heavily involves visual cues. Can we design a system that can perform such iterations automatically? The authors touch upon this in the last part of the paper, where they describe "intelligent agents" that may--"at least partially" perform such iterations. The difficulty of that task seems to be that, in Inselberg's words, "each multivariate dataset and problem has its own 'personality' requiring…variations in the discovery scenarios and calls." I would be interested to hear about such systems that may have been developed since the publication of this paper.
Krishna - Feb 08, 2011 04:30:10 pm
As much as I liked the parallel co-ordinates idea, I am not sure how it would work for cases when a subset of the dimensions should be viewed together on a single axis. For example, if I have 30 features that describes rows of processes and the last 15 of them are histogram bins, I am not sure how parallel co-ordinates would work. Specifically, I feel the flow of lines from one parallel axis to the another impedes the use of this idea to depict dimensions that represent some form of aggregates, or dimensions which are best visualized using histograms or pie-charts. Would collapsing these axes into a single axis work ?
I liked the short writeup on dynamic queries, I think this is certainly a powerful technique towards visualizing multi dimensional data. There was an interesting note on how one such system allows the user to 'dynamically' change the axis. Not sure how that would work both technically and user experience wise, I need to see it to believe it.
Michael Cohen - Feb 08, 2011 06:29:15 pm
Having used Tableau (nee Polaris) a bit now, I think it's useful but perhaps not quite as general-purpose as the article would lead us to believe. I've hit a bit of a wall in my A2 where I need to plot two different data sets on the same set of axes, and since they don't have a "relationship" in the relational sense, Tableau won't do it. I was excited to see the reference to "layers" in the article, but apparently that feature didn't make the transition from Polaris to Tableau.
My conclusion from this experience and from the article is that the decision to relate visualization strategies to database logic makes some otherwise-difficult tasks easier, but also makes some otherwise-straightforward tasks more difficult. If we're exploring a data set that conforms well to a relational model, Tableau can be handy, but it seems to have some pretty hard limits as complexity increases. Perhaps Tableau is to visualization as iMovie is to video. (If that sounds like a put-down, I don't mean it that way; being able to do the simple things quickly and easily is valuable.)
Michael Porath - Feb 08, 2011 11:36:39 pm
Coming back to Tableau, it reminded me why I liked using the tool, aside from the fact that it only runs on Windows. The process of using Tableau matches the process I use when creating data visualizations. You start off with the data, massage it, transform it, and then you visualize.
Although you can customize pretty much anything, the results you get when using Tableau are a little bit too generic. However, I'm inclined to kickstart new projects with it because it seems a great prototyping tool. Say you have a large data set; your final visualization can be very complex and potentially interactive. Using the software, you can quickly get an idea about how the data looks like, sift through the data, find outliers, and see correlations. In essence, Tableau is a prototyping tool.
Karl He - Feb 09, 2011 04:38:07 am
The key takeaway about this lecture and readings seems to be that high-dimensional data is hard to interpret. Systems like Polaris play an interesting role, allowing you to easily generate visualizations that help you make sense of the data. Instead of trying to create a visualization based on your conclusions about the data, the conclusions about the data can be drawn from the visualizations.
The danger I see in this is seeing trends that only occurred by chance. There is definitely some level of sanity-checking that will be needed when working with tools like this, lest people start drawing correlations between global warming and pirates.
Julian Limon - Feb 09, 2011 12:56:40 pm
The Polaris article describes in great detail the fundamental logic behind Tableau. After having used Tableau for a while, I appreciate the great effort that was put into the software to provide with a (at least mathematically) optimal visualization for a given dataset. The very first visualization that appears after pressing "Show Me" provides a very good approximation that does the work. It may not be the most beautiful solution, but displays all the data points in the way that ties the data to the best possible mark. However, I'm still not clear about the Table algebra that we saw in class. I understand the operations, but I am not clear about its usage in the software. It would be good to see an example in class using the table algebra in Tableau.
Brandon Liu - Feb 10, 2011 01:29:41 pm
In response to comments about Tableau: something I found interesting was that the Polaris system (and its current commercial incarnation) are targeted at business intelligence users, not computer programmers. Check out the tutorials at http://www.tableausoftware.com/learn/training. My question from a usability perspective is whether Tableau is more like a 1) more usable Excel, or 2) more usable visualization tool.