From CS294-10 Visualization Sp11
- David Wong
Visual search interfaces allow users to visually identify their search queries as opposed to lists generated from standard search engines. Schneiderman's work with starfield displays in visual information seeking is an example of prior work in this area. There has been more work done in the field of dynamic queries for visual information seeking and more generally, in visual search interfaces. Recent work has been done in looking at visual search interfaces on the Internet in regards to search engines for books and other items.
I want to investigate a visual serach interface that projects items, such as news articles and/or wines, onto a starfield display where users can explore items using standard map navigation commands (pan, zoom). The projection will be done using a combination of textual and quantitative data using a method called canonical correlation analysis (CCA). I plan to run a topic modeling analysis of text from the description of these items, featurize the text, and combine the resulting feature vectors with the quantitative data in CCA to yield an item's coordinates. Within the interface, a user can view topic clouds as well as query based off of text.
As for evaluation, I plan to look at click through rate in comparison to standard list search interfaces. I plan to evaluate this over a dataset of wines, as the the descriptions of wines have common features that can be modeled well through topic modeling. Also, there are quantitative rating data that can be obtained online as well.
Initial Project Presentation
Jvoytek - Apr 04, 2011 04:05:25 pm
I wonder if there's a way to better associate the visuals you're using with the content you're searching for. For example the visualization looks like a field of stars which is pretty far removed from choosing wine. Maybe even placing the points on a white background would make the connection between the visualization and the object you're looking for more inline.
Brandon Liu - Apr 04, 2011 04:15:22 pm
A graphical search display is difficult to scale up with lots of results; an interesting design challenge would be how to show just a subset of results to explore, while still giving an idea of the overall density of the total result set.
Julian Limon - Apr 04, 2011 07:55:35 pm
I believe this is a very interesting problem to tackle, David! Multi-dimensional spaces produce cognitive overload and non-traditional categories might get lost in the long tail if there's no easy way to see them. I wonder whether you could compare the 2D visualization with to traditional scatterplot matrices (using the most common wine categories) to provide the users with more tools to understand the space.
I'm particularly interested in the techniques you're planning to use to parse textual descriptions and project them into a visualization. I'll be looking forward to learn more about the solution you ended up choosing and why it was chosen.
As I mentioned in class, I believe that one other datapoint you might include is the author or source of the description. By learning which sources are more trustworthy to the user, you might increase their weight in the final projection.
Michael Cohen - Apr 05, 2011 12:19:19 am
I think it would be helpful for some dimensions to (optionally) be hard filters rather than preferences, especially with a space as big as wines. For instance, maybe I know that I'm only willing to go for a cheap wine tonight, under $X/bottle. If I can completely block out more expensive wines, that both takes dead ends off the table, and potentially allows the two dimensions of the plot to represent more information that I do care about, because now the variance within price will be much smaller.
Siamak Faridani - Apr 05, 2011 01:31:05 am
I feel for the wine application a recommendation system might be a better solution. You might also look at a DM method called t-SNE (t distribute Stochastic Neighbor Embedding)it has a much better clustering properties than PCA or CCA for example if two wines are in the same region there are very similar. The only problem with t-SNE is that proximity of two clusters has no meaning. For example if two wines are far from each other in t-SNE you cannot really claim that they are more different that two wines that are a little closer.
Sally Ahn - Apr 05, 2011 02:16:15 am
I was wondering about the visual variables used in this type of visualization. For example, what do the different colors represent, why do the sizes of the dots vary (it almost gives a depth-like dimension), what does the opacity signify, etc. There's quite a lot of different variables being used, which may be a visual overload, so I think conducting user tests is a great idea.
Matthew Can - Apr 05, 2011 02:56:44 am
Great problem you're addressing. Wine is a good domain for this, but I can imagine that the techniques you develop could be applicable in other areas as well (any product with multiple attributes, really).
Since the principal axes of the "wine space" don't correspond to any real world dimensions, I think the user evaluation of this visualization can provide useful insight on how people make sense of these kinds of plots that reduce multidimensional data onto the 2D plane. You might find a set of design principles for helping users create good mental models of these visualizations. I think the evaluation deserves as much thought as the other parts of this project.
Michael Hsueh - Apr 05, 2011 08:53:16 pm
I had the same thought as Michael Cohen regarding hard vs soft filters. Going the other direction, perhaps I'd like to keep the price of a bottle low... unless I find a wine that really matches perfectly all my other preferences. I think the flexibility of your visual variables could help in this regard. It would be interesting to see what can be done to visualize and distinguish results with "fuzzy" filters.
Manas Mittal - Apr 05, 2011 09:41:15 pm
I was intrigued by the axis along which you'll evaluate different wines. I remember you mentioned using some form of lexical analysis to find out key words (actually, have you considered using TF/IDF). In addition to that, once you get the all the variables, you could use PCA or Fisher Linear Discriminant (FLD) Techniques. FLD would potentially generate completely synthetic axis which perhaps is a bad thing, but then, if people are able to perceive along those axis, we should redefine the terminology for wine-quality along those axis. Indeed, this is how assessment theory works in general.
Saung Li - Apr 05, 2011 10:43:18 pm
The first step where users describe what they are looking for may need to be more specific. For example, some users may want to emphasize certain aspects such as price over the other qualities, so this should be taken into account. Also, there may be some people who don't exactly know their own preferences for some of the qualities, so recommendations could be provided.
Karl He - Apr 06, 2011 03:16:35 am
I like that you already have a good idea of what your visualization will end up like. It may be a better idea to have discrete points instead of sliders, e.g. 1-5 stars for acidicy and just a thumbs up or thumbs down for whether you liked the wine. This would make it easier to deal with the incoming data, as well as make the users less indecisive.
Dan - Apr 06, 2011 12:20:51 pm
Very cool idea. Analyzing text, and looking at featurization functions. I wonder though, will people who are interested in Wine use an interface this complex? I wonder if there is a way to use another type of visualization. However, this could just be an arbitrary data set that could work with this as a general visualization platform.
Michael Porath - Apr 07, 2011 04:04:35 pm
As a follow-up to my comment in class, here are some of my thoughts:
- You might want to isolate a few dimensions and provide filtering and/or brushing and linking with your 2D graph. For example, people would probably want to filter by types of wines (white, red, rose) or price rather than having those included in the dimensionality reduction. I mostly see the qualitative dimensions (like tannins, acidity, dryness etc.) as candidated for the dim. reductionality, while the more quantitative/ordinal could act as filters (again, price, red/white etc)