Tag Clouds

From CS294-10 Visualization Fa07

Jump to: navigation, search

Lecture on Oct 10, 2007


Here's a different version of Marti's talk about how to add facets to tags: http://www.ischool.berkeley.edu/~hearst/talks/tags07.ppt


[edit] Readings

  • Chapter 10: Modern Information Retrieval, Hearst. Focus on sections 4-7. (html)

[edit] Brian Gawalt - Oct 10, 2007 01:59:01 pm

One thing that we didn't get a chance to address was the number of cycles involved in producing a tag cloud. It may be a lousy visualization, but is it really so bad when viewed as a function of the effort it takes to render one? It seems like naive tag-clouding is several orders of magnitude quicker than executing, say, RawSugar's clustering algorithm.

[edit] Hannes Hesse - Oct 10, 2007 04:11:09 pm

I have a few, disparate thoughts on some of the topics we touched on today:

  • One major function of tag clouds appears to be to provide quick access to a summary of what the blog, site, whatever, is about. In the case of blogs, tags often at least partly reflect the words used in the blog posts themselves. In fact, some blogging engines provide tag recommenders: After writing a blog post, the author can click a button and see a list of recommended tags based on some text analysis (Yahoo provides a tag recommendation service). It would be interesting to compare manually assigned tags on blogs and individual blog posts with automatically computed text summaries. I would argue that the value of assigning tags decreases the smaller the difference is (because machines could apply these tags automatically, or a tag cloud on any given corpus could be computed on the fly, like Amazon's 'concordance' cloud).

  • Tag clouds would be more useful if the were faceted. In the case of Flickr, meaningful facets could be 'places', 'events', 'people', 'colors' and so on. Many of the existing tags could probably be automatically classified in these facets by using dictionaries and ontologies like WordNet. Tag clouds could then be augmented to support more interactive browsing along these facets. This could include highlighting, grouping by or hiding individual facets. Further value could be added by making some of these facets hierarchical. Place tags would profit from this (the tag 'sanfrancisco' would then be part of 'california'). Again, in the case of place names, some of this could be automated. Ambiguities like 'broadway' could be resolved by looking at other tags for the item in question (Is 'newyork' one of them?) or looking at the frequency distributions of this tag (most photos tagged 'broadway' are, in fact, about New York City).
  • If there is value in the co-occurence patterns of pairs or tuples of tags, users should be able to explore these. Using spatial proximity to encode co-occurence is difficult because for each tag, there are generally many co-occurent other tags. A network visualization seems more suitable. Building this into a tag cloud could be achieved, for example, by drawing weighted links between co-occuring tags in a cloud when hovering over a tag. As a side-effect, unwanted artifacts like the high co-occurence of 'new' and 'york' could be discovered and dealt with.
  • A tag cloud that summarizes an entire blog (or any other collection of documents) typically encodes the total term frequencies by font size or color. But currently, detail pages for individual items or blog posts do not make use of this (they typically just display all tags for this document in a list). It could be useful to encode the inverse document frequency of each tag in a document by size (IDF = 1/number of documents that are tagged with this tag). This would provide a sense of whether this tag is rare or widely used in the collection (in the example from today's lecture, the tag 'cfml' would be in very small print because it is widely used in the blog). Generally, terms that appear in large print in the blog-wide tag cloud would appear in small print in the clouds for individual blog posts.
  • Using existing ontologies, meta-tag clouds could be generated from tag clouds. For example, a tag cloud that contains many names of countries and programming languages could be dumbed down this way by aggregating the individual country tags to one big tag 'countries'.
  • It was brought up that tag clouds do not encode change at all. There are several ways to do this, with varying degrees of interactivity: A simple way would be to compute the relative growth for each tag (relative growth = times this tag was used in the past x days / mean of the usage of all tags in the past x days). More interactively, the number of times a tag was used in the past x days could be encoded with font size, and x could be manipulated with a slider (or two markers on a timeline). Both require that the items are tagged with a timestamp.

[edit] Karen Hsu - Oct 10, 2007 08:06:00 pm

Tag recommendation can increase convergence and improve overall tag quality. Like on del.icio.us, recommendations for tags are based on those that other users have previously used. Something interesting to consider are tags generated by other methods; I recently came across this textual content-based tag generator that doesn't seem very helpful from the small sample of websites that I tested (it really only worked well with news articles), but it has potential. You can imagine a more sophisticated algorithm (than simply using, say, term frequency) that can more accurately auto-generate tags.

I also really like the idea of faceted tag clouds. Hannes gives a great example of its usefulness for place tags, and I think it really plays into Schneiderman's visual information-seeking mantra of "overview first, zoom and filter, then details-on-demand." For more information on how to add facets to tags, Prof. Marti Hearst has a different version of her talk available here.

[edit] Maneesh Agrawala - Oct 12, 2007 06:42:44 am

Hannes - Interesting thoughts. Marti wanted me to tell you and the class that a different version of her talk is available at:


It talks about how to add facets to tags.

[edit] Omar - Oct 16, 2007 10:55:06 pm

marti talked about tag clouds showing the trend, with two senses of trend: what's hot now and how things have changed over time. i don't think tag clouds do a good job on either front. in the first case, you don't know that what you're seeing is hot now unless the tag cloud is only for the last month, or some distinguishable unit of time.. often, the time frame of a tag cloud is not indicated.

for the second part (how things are changing over time) you need to rely on an individual's memory. they need to remember how the tag cloud was, and how it is now. it'd be neat to see a tag cloud that integrates, using other standard visualization dimensions, changes in the cloud over time.

[edit] Robin Held - Oct 16, 2007 11:15:05 pm

During lecture, the notion of tag redundancy came up. We briefly talked about how to deal with multiple tags that mean the same thing, but are spelled or abbreviated slightly differently. A potential problem from such redundant tags is that users searching for certain content may use a specific tag and only get a small subset of what's really available. It seems like Quicken has a simple way to combat the issue. New entries into the register usually include the date and amount of purchase, as well as the payee and category (groceries, dining, etc). One typically ends up with a few payees that are used many times per year, and it's useful to be able to summarize how many transactions are performed with each of them while budgeting. However, if the payee name is entered differently even by one character, the system considers them to be completely different individuals/businesses. Quicken's auto-complete mechanism fights the issue by automatically filling in the rest of the payee field as one begins to type in letters. One can then quickly see how one previously spelled the payee name, and use it again. This should be familiar to anyone who has seen auto-fill at work in a web browser. Quicken also allows one to manually manage the list of payee names, so spelling mistakes or rare payees can be removed. It seems like such a system would also be useful for tagging. After one uploads a file or links a page and begins to enter a tag, the tags assigned by other users could show up to provide suggestions and hopefully keep redundancy to a minimum.

[edit] James Andrews - Oct 17, 2007 02:40:40 am

If tag clouds are primarily social (and not purely info visualization), I wonder how much their use depends on the way the tags are generated? The lecture mentioned a blog sentiment that 'artificial' tag clouds (created by a company instead of users) weren't a valid use of tags, but what about generating tags from games like The ESP Game [1] or just from searches, as opposed to explicit label inputs? It seems that these would have fundamentally different social meaning: 'normal' tagging indicates the type of content the community internally uses for itself, while ESP Game tagging would indicate the type of content people who aren't in the community would used to describe things on the site, leaning toward common and broad terms by the nature of the ESP Game itself. And working for search terms would represent what visitors are interested in, instead of what content creators are creating. So it seems important at least to clearly mark the way the tag cloud was created, but it also seems like there might be better visualizations for different methods of making tags.

Since tags are social, it might also be interesting to filter tags by author, or groups of authors. So I can still see the tag cloud for all flikr, or for all pictures a specific user put on flikr, but can also see the tag cloud formed from labels my friends added. Does flikr (or anything) support this?

[edit] David Jacobs - Oct 17, 2007 09:33:41 am

Robin: Are you suggesting some sort of auto-complete when querying a database of tags, or when entering the tags initially? For queries, I think this would be a great idea since the primary difficulty is knowing which tags are out there (preventing no results searches -- I also think this is one of the reasons tag clouds caught on). This doesn't do much to prevent redundancies, however. So if we're talking about auto-complete when entering tags, my only complaint is that it might limit creativity in tagging. I can't really say much without some kind of study, but I'm willing to bet people generate fewer unique ideas when they're presented with suggestions (I'm reminded of the game Taboo here). What might be effective is to merge tags based on their similarity after the tag has been entered. For example, after a user misspells "San Fracnisco", the system would ask if it meant the same thing as "San Francisco", and correct the tag accordingly.

[edit] Amanda Alvarez - Oct 17, 2007 07:09:05 pm

Marti mentioned that the reason tagging has become so popular now, and that we can (finally) get this metadata that other methods failed to produce, is because it is a brainless activity - no constraints, simple, single words, etc. Should we then be surprised that the resulting collection of tags (tag clouds) are not the best visualizations in the world, or are hard to use? The beauty (and nuisance?) of tags seems to be in the fact that, just as tagging itself is unconstrained, so is the aggregate of tags, which has a myriad of functions. The next step with tags would seem to be a way for the user to order them according to some preferred function/purpose (as in, how do I like my classifications, what am I looking for). You want the tags to tell you about trends? You want a frequency or alphabetical ordering (which you can already get)? Do you want a tooltip to pop up with similar tags or details about that tag? Do you want a particular hierarchy of tags by date, location, author, category? This is getting into the idea of meta-tag clouds, above. Do you like some particular visual design that you want the cloud to conform to, or are you concerned with popularity of tags, or navigation, or social links? I guess basically something like the Flamenco interface, but simpler and immediate. There is a lot more scope for using a bunch of different visual variables in the tag clouds, that is for sure.

[edit] Jimmy - Oct 17, 2007 09:44:05 pm

Marti’s user study about tag cloud was shows that many interviewers did not realize the alphabetical ordering is a standard. To me, this is not a surprise. I was not aware of the alphabetical order either. The layout is kind of distracting that I didn’t notice how it’s ordered. The various font size makes it difficult to capture the alphabetical ordering. This makes me wonder if the alphabetical order really makes sense, if it’s really useful, and if it’s just that everyone’s following the order. Will this order help us more easily recognize the trend or summary of the text? I feel the order by frequency would make more sense. It’s more likely to give us a sense of what’s going on in the text. The tag cloud alternative in the end of the class provides the good illustration on the frequency ordering.

[edit] Ken-ichi - Oct 18, 2007 09:20:22 am

I'm actually a fan of tag clouds, mostly for characterizing the author of the tags rather than the things they are tagging. I just took a look at some of the tag clouds of people I know on delicious, and thought the clouds were a fun way to discover both the things people were interested in saving and how they used tags. One friend used the "readlater" tag frequently, while another used "awesome" quite a bit. Coder friends had large tags like "python" or "ruby," while others had tags relating to specific research topics. When I know the people involved, these can actually turn into conversation topics, so in some situations tags can be a way to get to know someone better. I also really like the way delicious shows related tags when you click on a tag, so that if you clicked on the "artists" tag in my tag cloud, you might notice that I've also tagged several of the same bookmarks with "nature," and you could click through to see all artists who create art about nature.

Regarding Marti's comment about the way tag clouds make the eye jump around, I agree this is a bit unsettling, but I often find that it forces me to slow down. Perhaps the presence of such chaos combined with the semantic promise of words can create an incentive to stop, consider, and pick apart the visualization. I remember Tufte saying somewhere that while you want to avoid clutter, you should never be afraid of frightening off readers with too much detail. If readers are genuinely interested, they will spend the time to understand a visualization, and will ultimately benefit from relevant details. Maybe tag clouds say more about you to people who are genuinely interested in you than to strangers.

[edit] Mcd - Oct 21, 2007 10:41:38 pm

Faceted tagging is something that I've wondered about for sometime, but in a different sense. For a course last semester on the organization of information in collections (Prof. Ray Larson, iSchool), I wrote a paper exploring facetting tags not by semantic relationships among the tags themselves, but as a means of encoding the relationship between tags themselves. I analyzed the top ten or so tags from the 50 most popular URLs on del.icio.us (all-time), and ultimately coded each one as Topical (e.g "news"), Descriptive (e.g. "New York Times"), and Subjective (e.g. "funny"). I considered such analysis as a potential method for disambiguation.

Such a simple semantic architecture for tags could easily be encoded to enrich tag clouds, for example as the color of the tag, which could potentially aid in navigation through the cloud.

I'm not sure if it would be helpful for the final projects, but if anyone is interested in looking at the paper, I'd be happy to share it.

[edit] N8agrin - Oct 22, 2007 11:06:49 am

Thanks, Marti, for posting the extra slides, they are very interesting.

I've often thought about the utility of tags and visualizing a tag space in a way that would be more navigable. Generally, I've thought that tag spaces would benefit from taking a more faceted approach, but considering Hannes' comments I've began to question whether my ideas are truly based on faceted navigation or if they would simply be another method of navigating a given tag space.

My biggest concern about Hannes' comments, and perhaps facets in general, is that there is generally a presupposed need for some higher order of organization. As Hannes and Marti suggest these hierarchies could be determined via querying large ontologies like Wordnet, but I wonder if they couldn't be discovered within the tag set itself by analyzing the nature of co-occurrence of each tag. Some tags that signify higher order organization might have a specific pattern of co-occurrence such as the tag "ants". Other tags that are more specific, such as "Solenopsis invicta" might have a completely different pattern of co-occurrence. Perhaps visualizing these co-occurrences as a network, as I believe Hannes also mentioned, would make the patterns more obvious and help discover the tags that could act as the facets of the larger tag set?

[edit] Hazel Onsrud - Nov 05, 2007 09:21:01 pm

In response to Jimmy:

I am a frequent delicious user and find the alphabetical ordering of the tag clouds to be very useful. I agree, the clouds themselves seem to serve very few useful purposes....(mainly I use them simply to decrease the length of the list that would be required if I listed all of my tags without a cloud format)...but I find that being able to order them in alphabetical order a least allows me to find what tags I may think I have at a glance or note common misspellings or similar words when weeding my controlled vocabulary.

[add comment]
Personal tools