From CS294-10 Visualization Sp11
- David Wong
Visual search interfaces allow users to visually identify their search queries as opposed to lists generated from standard search engines. Schneiderman's work with starfield displays in visual information seeking is an example of prior work in this area. There has been more work done in the field of dynamic queries for visual information seeking and more generally, in visual search interfaces. Recent work has been done in looking at visual search interfaces on the Internet in regards to search engines for books and other items.
I want to investigate a visual serach interface that projects items, such as news articles and/or wines, onto a starfield display where users can explore items using standard map navigation commands (pan, zoom). The projection will be done using a combination of textual and quantitative data using a method called canonical correlation analysis (CCA). I plan to run a topic modeling analysis of text from the description of these items, featurize the text, and combine the resulting feature vectors with the quantitative data in CCA to yield an item's coordinates. Within the interface, a user can view topic clouds as well as query based off of text.
As for evaluation, I plan to look at click through rate in comparison to standard list search interfaces. I plan to evaluate this over a dataset of wines, as the the descriptions of wines have common features that can be modeled well through topic modeling. Also, there are quantitative rating data that can be obtained online as well.
Initial Project Presentation
Jvoytek - Apr 04, 2011 04:05:25 pm
I wonder if there's a way to better associate the visuals you're using with the content you're searching for. For example the visualization looks like a field of stars which is pretty far removed from choosing wine. Maybe even placing the points on a white background would make the connection between the visualization and the object you're looking for more inline.
Brandon Liu - Apr 04, 2011 04:15:22 pm
A graphical search display is difficult to scale up with lots of results; an interesting design challenge would be how to show just a subset of results to explore, while still giving an idea of the overall density of the total result set.
Julian Limon - Apr 04, 2011 07:55:35 pm
I believe this is a very interesting problem to tackle, David! Multi-dimensional spaces produce cognitive overload and non-traditional categories might get lost in the long tail if there's no easy way to see them. I wonder whether you could compare the 2D visualization with to traditional scatterplot matrices (using the most common wine categories) to provide the users with more tools to understand the space.
I'm particularly interested in the techniques you're planning to use to parse textual descriptions and project them into a visualization. I'll be looking forward to learn more about the solution you ended up choosing and why it was chosen.
As I mentioned in class, I believe that one other datapoint you might include is the author or source of the description. By learning which sources are more trustworthy to the user, you might increase their weight in the final projection.
Michael Cohen - Apr 05, 2011 12:19:19 am
I think it would be helpful for some dimensions to (optionally) be hard filters rather than preferences, especially with a space as big as wines. For instance, maybe I know that I'm only willing to go for a cheap wine tonight, under $X/bottle. If I can completely block out more expensive wines, that both takes dead ends off the table, and potentially allows the two dimensions of the plot to represent more information that I do care about, because now the variance within price will be much smaller.
Siamak Faridani - Apr 05, 2011 01:31:05 am
I feel for the wine application a recommendation system might be a better solution. You might also look at a DM method called t-SNE (t distribute Stochastic Neighbor Embedding)it has a much better clustering properties than PCA or CCA for example if two wines are in the same region there are very similar. The only problem with t-SNE is that proximity of two clusters has no meaning. For example if two wines are far from each other in t-SNE you cannot really claim that they are more different that two wines that are a little closer.
Sally Ahn - Apr 05, 2011 02:16:15 am
I was wondering about the visual variables used in this type of visualization. For example, what do the different colors represent, why do the sizes of the dots vary (it almost gives a depth-like dimension), what does the opacity signify, etc. There's quite a lot of different variables being used, which may be a visual overload, so I think conducting user tests is a great idea.
Matthew Can - Apr 05, 2011 02:56:44 am
Great problem you're addressing. Wine is a good domain for this, but I can imagine that the techniques you develop could be applicable in other areas as well (any product with multiple attributes, really).
Since the principal axes of the "wine space" don't correspond to any real world dimensions, I think the user evaluation of this visualization can provide useful insight on how people make sense of these kinds of plots that reduce multidimensional data onto the 2D plane. You might find a set of design principles for helping users create good mental models of these visualizations. I think the evaluation deserves as much thought as the other parts of this project.