Text Visualization

From CS294-10 Visualization Sp11

Jump to: navigation, search

Lecture on April 18, 2011

Slides

Contents

Readings

  • Information Visualization for Search Interfaces, Marti Hearst, Search User Interfaces, Chapter 10. (html)
  • Information Visualization for Text Analysis, Marti Hearst, Search User Interfaces, Chapter 11. (html)
  • Mapping Text with Phrase Nets. Frank van Ham, Martin Wattenberg, Fernanda B. ViĆ©gas. IEEE InfoVis 2009. (pdf)

Julian Limon - Apr 18, 2011 11:50:03 am

Hearst makes a very important distinction between visualizations to aid search and visualizations to aid text analysis. Hearst argues that complex visualizations usually do not perform better than text-based faceted search in most cases. For a regular user only looking for a certain piece of information (i.e. search is a means and not an end it itself), the omnipresent document list with highlighting might be more than enough. However, I would be curious to know if text-sized graphs like Tufte's sparklines could be implemented to augment search.

On the other hand, visualization can be very helpful for text analysis tasks. Those users will use visualization techniques to explore a corpus and make sense of the data. They need to see the forest and the trees and may rely heavily on techniques like brushing and linking to discover patterns and infer relations. Before reading Hearst's chapters, I had only been exposed to word clouds or network diagrams as visualization techniques for text analysis. Nevertheless, Hearts presented a number of different techniques that have been explored and have yielded good results. Visualizations such as DocuBurst, PaperLens and SeeSoft may provide inspiration for more adventurous visualizations in text analysis.

Krishna - Apr 19, 2011 01:40:16 pm

It looks like most of the text visualization techniques are motivated by summary statistics, such as word counts, or short term relations and concordances between words, phrases, etc. There seem to be no push towards techniques that try to 'visually narrate' the contents of a text. For example, the table of contents of a text does that - naively though. If not precise, current computational linguistic techniques can at least find approximate topical relations in text. The question is how to visualize such narrative hierarchies and topical structures in text. As an example, imagine reading a 1000 page tome, a narrative visualization can help the users select sections of the text and read it in a non-serial way. A naive approach would be to improvise on top of the table of contents by visually emphasizing how past readers of the text have read the book.

On a sidenote, checkout http://www.sccs.swarthmore.edu/users/08/ajb/tmve/wiki100k/browse/topic-list.html - it is a faceted browser where the facets are derived from a topic modeling algorithm(LDA)

Brandon Liu - Apr 19, 2011 02:10:26 pm

Related to the Wikipedia edit histories in lecture and the citation visualizations in the readings: I found a great visualization of Wikipedia deletion discussions: http://notabilia.net/ . It's not exactly 'text visualization' since I think the votes for/against are given as data (and not extracted from the text) but it is an interesting method of showing evolving opinions in a discussion.

Saung Li - Apr 19, 2011 03:35:27 pm

I agree with Krishna that most of the research in text visualization has been focusing on simple attributes such as word count and relationships such as "X and Y". It would be interesting to see how visualization can be used to summarize the actual content of the text, and to see more complicated relationships between words. I really liked the discussion on tag clouds in class. My first encounter with tag clouds was on Facebook, which showed my most commonly used words in my status updates. These look well aesthetically and lets me quickly get the gist of what words I use often. However, after getting the gist, it becomes more and more difficult to analyze the rest of the text as they become smaller. It then becomes a matter of trying to analyze the visualization instead of analyzing the text, thereby defeating its purpose of making the text easier to understand. Thus, I think it is great for something like the Facebook application, but if the user really wants to analyze something then tag clouds aren't the way to go.

Dan - Apr 19, 2011 03:59:25 pm

Visualizing the search results as a starfields and clusters was interesting... reminded me of Opinion Space. The tag clouds are interesting looking, but it's hard to get anything useful from them. I think in terms of webpages where there are categories/tags displayed, it can be interesting to randomly stumble into topics of interest, but overall there isn't much use if you know what you are looking for. In terms of text visualization, I think it's important to pull out context whenever possible, which means that natural language processing and visualization should meet in the context of text visualization. In many cases a word is used in many different ways, and I think it could create varied results.

Siamak Faridani - Apr 19, 2011 11:19:55 pm

I didn't know the entire Martie Hearst is available online for free. What interested me the most was that it was interesting to see systems that combined many of the former HCI elements for example the BETA interface has used the TileBar elements in addition to bar chart and other traditional elements. It is also interesting that all of these different models for search have failed as a general web search interfaces. In the first chapter she mentions that a user interface should be simple and then in the tenth chapter she goes on to explain these complicated interfaces, many of which are not that intuitive. I personally did not like the fact that the impact of wordl was downplayed a little. Wordl's may not be the most informative ones but I really believe it is the most intuitive one. One problem with wordl visualizations is that it is not interactive. You cannot click on wordls and go to the documents (it violates the first principle of text visualization) Although other word clouds do that. The ones that are on wordpress blogs are typically very useful (at least to me)

As for phrase nets, I am wondering for what texts we can use it and how much overhead it has for the person who develops the visualization.

Michael Hsueh - Apr 19, 2011 11:45:39 pm

I like Krishna's point about visualizing higher level relationships in text. Extracting these relationships is definitely an interesting challenge that might draw on NLP techniques. Reading the Hearst articles, I sensed a distinction between search and exploration-oriented tasks. We see that many text visualizations resort to concordances in the face of having no meaningful ways of visualizing large amounts of nominal data. Many of the comments in class about the weaknesses of tag clouds was based on the fact that search tasks are difficult. Hearst's survey revealed this to generally be the case for concordance maps. Other impressive visualizations such as DocuBurst, while offering good summary views of data, suffer from the same shortcomings. The distinction makes a point for us to ask ourselves whether a concordance map or even something as simple as a list of word counts is most useful for the task at hand, which can either be exploratory or search oriented in nature. I personally think that one important reason for the popularity for tag clouds is simply their aesthetic appeal and simpleness (someone seeing a tag cloud for the first time probably won't need much time or instruction figuring out how to understand it). This probably has as much to do with its success as any other particular merit based on its visualization effectiveness. I've seen people include wordle printouts on tshirts and in notebooks. Bar charts, not so much.

Michael Cohen - Apr 20, 2011 12:45:57 am

I'm skeptical of the value of the "Partisan Words" visualizations. It's not clear to me what story the spatial layout of the dots is trying to tell, especially for the large numbers of dots for which there's no room for a label. The list of words at right is actually clearer and more usable (for instance, it's clear that pro-choicers use "woman" whereas pro-lifers use "mother", which gives you at least some insight into the rhetoric). Laying out the dots horizontally by overall frequency doesn't seem to add any useful information (since overall frequency is highly correlated to partisan-ness anyway) and takes up space that could be better used to show other information. For instance, there could be two lists with lines drawn between words that are semantically related, like "mother" and "woman" or "trade" and "wage".

Manas Mittal - Apr 20, 2011 02:11:02 am

I found the Ham et al. paper quite interesting in that it shows how a rather trivial analysis mechanism (orthographic) can supplant a more complex one (syntactic) without losing much. I also found the related work section quite interesting.

With regards to Krishna's point of semantic meaning extraction, as mentioned, it is a hard linguistics problem. I like the idea of showing what the users read first etc, and I can imagine this would be useful even in cases like web pages (For example, what articles are the most read articles in the New York Times? How would we encode that information?

I have also been thinking about these things both in context of email ideas and also for my visualization. What kind of information would you like to visualize about your spending data patterns? Is the location important? Can we do that visualization? What kind of strings are important. Can we show patterns (people who go to coffee shops also go to costco? This kind of exploration is interesting and useful.

Karl He - Apr 20, 2011 06:18:49 am

It appears that the textual visualizations which are the most effective are the ones that are very specific. The visualization regarding presidents mentioning other presidents is a good example, it is very specific about what it will accomplish and is very easy to understand. This goes back to Krishna's point that most visualizations do not "narrate" the text, while they can convey limited information there doesn't appear to be a good way to visualize any text as a whole.

Matthew Can - Apr 21, 2011 08:01:34 pm

I'm excited by some of the future work that's possible with phrase nets, especially with interactive visualization. For example, it would be useful if a user could click on a word in the phrase net and have the system generate suggestions for queries centered on that word (perhaps based on the most common terms that co-occur). The user could select one of those queries and the system would generate a new phrase net. And in addition to lexical and syntactic relationships, one could imagine a system that supports more abstract relationships. Here's an example. Suppose I run the phrase net on text from a political blog, and I want to learn some things about Obama. I could type in queries that contain the word "Obama", but many times in the blog post, the author will reference Obama with pronouns. Words like "he" and "him" will be referring to Obama. The phrase net would be much stronger if it could employ some kind of coreference resolution to be able to make such relationships.

Sally Ahn - Apr 21, 2011 09:04:48 pm

The examples in Hearst's book suggest that exploration/analysis of text is better suited for visualization than search tasks. Hearst writes that the nominal nature of textual data creates a great challenge in visualizing text, and many of the examples work around this by associating the text with quantitative attributes, such as frequency and dates. Since these quantitative attributes do not reveal the semantic information of the text itself, we see what Krishna observed: a lack of text visualizations that can "narrate" the content of the text. Text "content" is an enormously complex and dense data to tackle, which may explain why text visualization applied to search seem to be less successful than those applied to exploration and analysis. In the former, the user is ultimately interested in the semantic meaning of the text in its entirety, so the search interface must allow access to as much information about the text data as possible. On the other hand, exploration and analysis tools can have more specific goals that do not require presenting all of the text's semantic information (such as the Naming Names example, where only the debate participants' names are needed), and so the visualization can easily reduce the density of textual data. I think Phrase Nets addresses this challenge by providing an overview of text in a map that suggests its content while also reducing the data density of the entire text. It would be interesting to see how this approach can be taken further to provide a more comprehensive view of the visualized text.



[add comment]
Personal tools