Discussion of Good and Bad Visualizations

From CS294-10 Visualization Fa07

Jump to: navigation, search

Class on Sep 5, 2007


[edit] Readings

  • Chapter 3: The Power of Representation, In Things That Make Us Smart. Norman. (pdf)
  • Chapter 4: Data-Ink and Graphical Redesign, In The Visual Display of Quantitative Information. Tufte.
  • Chapter 5: Chartjunk, In The Visual Display of Quantitative Information. Tufte.
  • Chapter 6: Data-Ink Maximization and Graphical Design, In The Visual Display of Quantitative Information.

Optional Readings

  • The representation of numbers. Zhang and Norman. (pdf)

[edit] Kristal Sauer - Sep 05, 2007 09:16:55 pm

I really enjoyed seeing all the visualizations today. I compiled a list of traits of good vs. bad visualizations based on the discussion, and I thought I would share it here:

Good Visualizations

  • Color conveys meaning (ie, on roads, interconnect)
  • Makes a complicated concept easy
  • Conveys information about 3D structure
  • Makes the topic interesting; makes the viewer want to explore the data
  • Intuitive
  • Position conveys meaning
  • The text and visualization agree/work together
  • Color consistency
  • Meaning is evident at a glance, but more info is embedded and may be understood at further examination
  • Pictures are used for explanation (vs. a potentially ambiguous description)
  • Icons match real life (ie, photos)
  • No sifting through data is required
  • Small multiples
  • Color scheme works
  • Appropriate use of photography + useful overlay (this was one of my favorites)
  • Meaning is evident even when visualization is viewed out of context
  • Small volume blown up to a larger view (context is retained)
  • Axes are present
  • To scale
  • Multi-exposure photograph (reduces redundancy while highlighting the dynamic portion of the image)

Bad Visualizations

  • Data is presented in a table, when a graph could be used
  • Not to scale
  • Color is arbitrary
  • Axes of neighboring graphs are not aligned (this makes it hard to compare directly)
  • Shadows may create ambiguity
  • Not readily comprehensible
  • Information is easy to understand; visualization may be unnecessary
  • Too many annotations
  • Misleading title
  • Intended audience is unclear
  • Colors are indistinguishable from each other
  • Text is too small
  • Irrelevant information
  • Missing or illogical ordering
  • Mislabeling
  • Wasted space (ie, too much white space, or figure is too large)
  • Trends don't pop

It was interesting how some of these came up a lot in various people's visualizations. I noticed that the axis alignment problem was recurring. Some of the best incorporated realistic images/icons with colored interconnect.

[edit] Ariel Rokem - Sep 08, 2007 01:41:13 pm

After thinking about it a bit more - I had two more comments on the visualization that Amanda showed (the image taken from Kandel's textbook):

1. There is a difference of scale that is not emphasized enough, imho. The image of the columnar structure on the right of the image is, in fact much(!) smaller than the image on the left suggests. The chunk extracted from the image of the brain would have been much(!!) larger than the image on the right suggests.

2. There is a factual error in the image of the brain on the left. The chunk is extracted from the left image from a very different location from where these projections go to in the human brain. I think that this error stems from the fact that this is where these projections go to in the rat brain (which looks quite different than the human brain).

I apologize for the nitpicking.

[edit] Ken-ichi - Sep 08, 2007 02:59:06 pm

One thing that struck me during the class presentations and during the readings was the wild variation in what people count as "good" and "bad" visualization. There were several in-class examples of "good" visualizations that I didn't think much of, and one or two of the "bad" ones that I thought showed potential. The same thing happened in the readings. As beautiful as many of Tufte's redesigns were, many of them seemed preposterous while I was reading. One that comes to mind is his redesigned box plot, where the box becomes another line offset by a just the width of the line, and the median line becomes a gap, also only as wide as the line. It does convey the same data with fewer marks, but it's illegible! Anyone with even margically substandard vision would have serious problems reading that graph, and everyone would have difficulty comparing multiple such graphs.

I guess my point is information visualization seems to be far more art than science. Everything we've learned in class and all the excellent points made by Norman and Tufte are just heuristics. Useful to a point, but never to be used without question.

[edit] Jimmy - Sep 08, 2007 04:13:36 pm

Tufte's data-ink principles help us design clear and concise graphs and avoid the use of redundant data-ink and non-data-ink. He provided some good examples like removing the grid ticks in the graphic that shows the periodicity of properties of chemical elements. However, he also mentioned that some essential parts of the graphics are not removable, such as the curves that show the periodicity. So I am wondering if it is necessary for data-ink maximization. The problem is that we might fail to visualize the underlying statistics well in pursuit of data-ink maximization. In Tufte's another example "The Function of Criticism" in chapter 4, I don't see the benefit of removing half of each bar in the bar chart. The redesigned graphic is oversimplified and kind of unintuitive. As people are more used to the bar chart than the "half-bar" chart, erasing redundant data-ink doesn't seem to be very effective in this example. I think it's not always good to try to simplify the visualization, and the redundant data-ink is not always bad. What's more important is to make the visualization easy to understand, regardless of how much data-ink is erased.

[edit] Teale Fristoe - Sep 08, 2007 04:55:17 pm

Having spent so much time scrutinizing visualizations lately, I find myself able to find problems with even the most attractive, informative images. I was hoping to use this ability to criticize some of Tufte's images, but some others have already beat me to it. So, instead, I'm afraid I'm going to have to pick on one of the good visualizations presented in class.

For his good visualization, David Sun chose a binary tree of investment options, which shows a tournament style decision making process for choosing which investment option is best. This immediately struck me as a problem, as the second bullet on the first page (13) of The Visual Display of Quantitative Information mentions that graphical displays should make the viewer think about the substance rather than the methodology. However, in the case of this visualization, I actually think having the methodology apparent is a good thing. The problem with using a tournament style elimination process to determine the best investment is that it can be very misleading depending on how the seeding works out. For example, in this decision tree, it's possible that Emerging Market Sovereign Bonds (I believe the investment beat by Global REITs in the first round) is superior to any of the other investments, but because it lost in the first round it appears to be one of the worst investments.

I think the conclusion is that I'm becoming dissatisfied with all visualizations I'm encountering. I guess that's just what happens when you spend so much time analyzing anything, though. Ultimately, it's more a matter of minimizing problems than maximizing quality, because perfection is unattainable. As others have pointed out and I agree, even Tufte, who wrote the book on quality visualizations, comes up with some seriously problematic ones.

[edit] N8agrin - Sep 09, 2007 10:31:55 am

At times reading Tufte is immensely amusing. Seemingly out of thin air he declares certain visualizations inappropriate or execesive. True, his data-ink principle (more commonly known, I would think, as less is more) is an acceptable stance for developing an informative visualization, however, as others have pointed out, he quickly takes this to an extreme and lauds the results (see his examples of reduced box-plots and bar graphs).

As Tufte himself states, visualizations should tell a story. Much like a good book, whose words need to be chosen carefully, not in excess or scarcity, the richness of an information graphic is dependent on just the right amount of aesthetic, obvious detail and compelling data. If all narratives were boiled down to Tufte's Spartan aesthetics, Moby Dick would read in one line "A man dies a tragic death trying to kill a white whale."

These overly reduced images help to show general trends, but make referencing the underlying data nearly impossible. Instead, I believe that an effort should always be made to help show the data of a visualization in its context. Tufte also argues for information in context, but then seemingly contradicts himself in an effort to produce visualizations which follow a minimalistic aesthetic. Creating context might mean providing otherwise 'unnecessary' axis and gridlines, but if they can help the viewer understand the purpose of the information as well as the quantitative data underlying the image, it has succeeded in telling its story.

Aside from this critique, Tufte's Maximizing Data-Ink chapter is enlightening. I particularly liked his concept of a range plot and found his argument for such visualizations coherent and sound. The dot-dash-dot plot may take the principle too far, but in certain scenarios may also provide a worthwhile context that could display an aspect of the underlying data in an unforeseen way.

[edit] David Jacobs - Sep 09, 2007 05:28:17 pm

Speaking of good visualizations... Here's a presentation with some very interesting interactive visualizations of world census data (I've also posted it in the Visualization gallery section). It's a little long, but quite good:

Hans Rosling: Myths about the developing world

I find it hard to believe that anyone would criticize the quality of Rosling's visualizations, but many of the slides Rosling uses would likely be declared overly complicated by Tufte's data-ink rules. This leads me to ask whether Tufte's data-ink rules apply to digital media, where ink doesn't mean the same thing. Is the appropriate analogue for ink screen space? This probably works for static visualizations, but what about animations, where the dimension of the visualization is increased? Does the data-ink ratio translate to data-<some kind of area-time product> ratio? Even if it does, how are we supposed to handle visualizations with interactive components?

[edit] Omar - Sep 09, 2007 11:12:16 pm

n8agrin: i don't think data-ink can be compressed into the comment less-is-more. less-is-more might imply removing data (ie, focus on one dimension or reduce dimensionality; take more summary statistics) but all, if not all, of tufte's examples retain the data and lose nothing but the fluff. my feeling on data ink is that removing all non-data ink is a good exercise to engage in as we're learning about effective visualization -- of course stylistic elements will be brought back in, but it's important to know what is style, and what is substance, and how each element is actually functioning in your data graphic.

david jacobs: motion does seem interesting, i think maneesh said we'd be covering animation when he discussed the course summary. just like the deceptions you can insert into still data graphics, i imagine motions can imply many things that the data does not necessarily suggest (for instance, the standard growth motion of small to large might imply something, or the speed/change in speed of the animation, that is not supported in the data). that being said, the gapminder stuff is quite awesome to behold :)

[edit] Hazel Onsrud - Sep 09, 2007 10:58:45 pm

I agree with Ken-ichi, despite all the helpful hints such as Kristal laid out in this discussion (and have been abundant in our readings), visualization really does seem to be an art over all else. In other a visual design books, such as The Art of Color: The Subjective Experience and Objective Rationale of Color By Johannes Itten, this opinion is reiterated. Further, Itten notes that we all have colors we are partial too (that we find harmonious). He notes for example, that if fashion dictates that the spring palette include a color set that a designer is not used to he or she could spend days laboring over minute decisions that, with a pallet more in their comfort colors, they could easily make. While I certainly agree with this contention, personally, that I am able to better choose some color combinations over trying to find a pleasing balance with others, I wonder how this “familiar” concept of design elements extends beyond color and hinders/facilitates our ability to create visualizations in a more general sense. Personally, I cannot tell if I would be more prone to visualize items in a certain manner, but after better familiarizing myself with more types I wouldn’t be surprised to find that I become more at ease with some types of visualizations (say webbed maps) rather than for example, bar charts. Yet, I wonder if this preference is as engrained in our individual human nature as Itten suggests with our color preferences.

[edit] Daisy Wang - Sep 09, 2007 11:20:28 pm

Tufte's data/ink principle is too paranoid of the efficiency of data representation, while ignoring the fact that the redundancy in representing the same data in different ways can actually help audience to understand or catch the idea faster in different ways. Just like when you are explaining something, you try to explain it in different ways to different people.

For example, different shadings of bars will make it easier for people to refer to each category; the grid lines in a chart can make it easier to measure the distance between points.

I think for each visualization, there is certain amount of data the designer wants to convey to the audience, which should be fixed: D; and suppose the amount of ink we use is: INK. I imagine the effectiveness of the visualization will be a bell curve with respect to INK. This curve reaches the peak when the visualization get cluttered.

[edit] Kenrick Kin - Sep 10, 2007 12:46:30 am

Does anyone else think Tufte's version of the quartile plot looks good (p124)? I personally felt like I needed to rub my eyes to make sure I was seeing clearly. I was amused by Tufte's chapters on data-ink. He removes as much ink as possible, then decides to throw things back in because they "look good," or so he says (ie with the baseline of the bar chart). I do agree that the baseline looks good, but that's because in general I think too much white space makes the viewer feel lost and that things are just floating. I see his point with the range-frame, but when the axes aren't connected, it appears as if the pieces don't belong together (although this could be the result of me expecting plots to have conventional frames). I don't think it's a waste of time to draw extra lines if it helps ground the viewer even if it doesn't provide extra data. Of course, this is all just my opinion, just as it's his opinion that the his quartile plot looks good, but that is the nature of graphical design.

[edit] Danielarosner - Sep 10, 2007 07:42:09 am

Norman's claim that we remember best what we can measure or represent is interesting. It seems to ignore information that is more emotive or "visceral" as Norman might describe. Representation can have characteristics that enable us to make emotional links to other information, maybe reminding us of an experience with a friend that we hold dear. This type of information can be more subjective and more difficult to represent since the viewer's personal memories or experience is critical one's associations. But that shouldn't imply that we forget or give the information little weight.

[edit] Charlotte Wickham - Sep 10, 2007 01:30:36 pm

I have to agree with Ken-ichi the redesigned boxplot is incredibly hard to read and I find it has a similar vibrating effect as the Moire effects condemned by Tufte in the previous chapter. My general opinion of the redesigns in the chapter is that they are all taken a step too far. Take the redesigned bar chart. The final version has small divisions in the bars at 5% intervals, but how would a viewer know they were every 5%. I agree with the data-ink maximization principle in general but I think it needs to be carefully applied.

[edit] Robin Held - Sep 10, 2007 02:37:38 pm

One of the key issues with Tufte's style of writing seems to be a lack of clarity on the universality of his "principles." He typically writes in a clear, definitive manner, and imparts a tone of finality to most of his statements. As a result, when he breaks from his guidelines, it seems hypocritical. He should give more consideration to the contextual nature of the principles. Without doing so, he runs the risk of alienating readers who disagree with inflexible, one-size-fits-all rules. For instance, Tufte's treatment of box plots have already been mentioned in comments above. I completely agree that his high "data-to-ink ratio" box lines are difficult to read. In that instance, he doesn't put enough effort into discussing a scenario where such a line/box plot WOULD be useful. So the reader is left believing that his example is horrible and is likely to distrust Tufte's conclusions on that specific topic.

[edit] James Andrews - Sep 10, 2007 09:33:21 pm

For many of the visualizations that Tufte praises and I dislike, I think Tufte ignores the fact that there is a sort of common visual language that we all know, and that we are conditioned to recognize. When I see a bar chart, I instantly understand "that's a bar chart," and can go straight to extracting the data it presents. When I see a redesigned bar chart, as on page 101 of "Quantitative," it's disconcerting; the bars no longer conform to any standard I recognize, so I must first take the time to puzzle out what kind of data it's supposed to represent. If I were exploring my own data, and thus drawing my own graphs, it would be fine to freely 'invent' a new visual language (since I invented it, I would understand it). But if I intend to present information to other people, I'd much prefer to speak their language and have them understand me instantly, rather than present graphs that may confuse and distract.

His favorite visualization -- the map of Napoleon's losses in Russia -- is a great example of a visualization that does not work without a relatively large amount of additional explanation. It packs the information in, and, if you wish to sit down with it and study it, it's great. But we could get the information faster if the unusual language (eg, line thickness for troop size, the 'linking' of a temperature graph with a map path) were avoided, and familiar language (eg, the underlying map) were strengthened to be more instantly recognizable. Additional redundant elements, such as arrows to indicate direction, could also help.

His data ink property is additionally troubling because it doesn't seem well grounded in human vision. Our time to understand an image is not proportional to the amount of ink used in the image's creation. A shaded bar in a bar chart, for example, is actually easier for me to see and understand than an unshaded bar, since it stands out from the background. If the 'better solution' were to use that space more effectively, sure, you might add extra data instead of shading. But just leaving the bar 'un-inked' to maximize the data ink ratio would make the visualization less effective.

His suggestion, on that note, that Chernoff faces be cut in half to eliminate redundancy, is especially troubling. The key idea behind Chernoff faces is to take advantage of our special ability to process the features of human faces: if we mutilate those faces, isn't much of the benefit lost?

[edit] Amanda Alvarez - Sep 11, 2007 09:47:51 am

In reading the chapter from Norman, and the Zhang & Norman paper, I found myself trying to fit together and match up the two analyses of representations they presented. Now, reading the comments above, I feel that some of Tufte's 'arbitrary' design decisions could be better understood if they were examined in Norman's framework. So I'll start with how I think the two readings match up.

 * Norman: Things (= representations) make us smart by shifting the task from reflective to experiential.
 * Zhang&Norman: Representations are distributed across the external and the internal.
 * Internal = Reflective; allows modification and action on representations.
 * External = Experiential; activates perceptual processes, is direct, efficient and intuitive, reduces task difficulty, 
   allows actions in the world.

The principle of naturalness if fulfilled by a completely external representation. Arabic numeration, for instance, is not natural, it requires a lot of learning, and while the steps of calculation are external, the representation of dimensions is largely internal (hence the tradeoff, the system is sometimes better for representation than calculation).

Things that make us smart are externalizations, external representations. The more these external representations match up with our perceptual capabilities and create an experiential effect, the better (cf. naturalness). These representations fit best with us when we can internalize and digest them. But things that are entirely external cannot make us smart, because we cannot process them. We cannot act reflectively on external (experiential) representations.

Which brings me back to Tufte: Erasing (redundant) data-ink increases the externality of a representation, which people tend not to like, as the above comments attest. The external representations don't create an experiential effect (although they should). The balance between the two types of representations has been upset by a shift to the external. One of the reasons for the criticism of Tufte is that people recognize that the representations have ceased to become reflective. The internal representation, the one where we forget the world and create an abstraction, has been lost. Without this representation, we cannot generate new knowledge or metarepresentations.

Tufte's focus is on data-ink efficiency. In erasing data-ink, he is erasing that which makes the graph reflective; the redundant stuff that is removed might also be internal, ie. we might have collective memory or understanding of what it represents. Tufte is able to take the visualizations from being distributed representations with versatility (and widespread intelligibility) to stripped down externalizations because some of the images (eg. boxplots) have a limited use or audience. The increase in efficiency is, however, met with a narrowing of the range of tasks for which the representation can be used. Tasks outside a narrow range become more difficult, not easier, and a large part of the audience experiences no intuitions or direct perceptual insights; the experiential possibility plummets as the representation becomes more external.

Norman: "If the representations are just right, then new experiences, insights, and creations can emerge." (p. 47) It is about getting the abstractions right. As we noted in class, Tufte is actually dealing with a limited set of representations (static data graphs printed on paper), and most of the time his strong pronouncements probably get the abstractions right for this particular set. Most of the time, representations distributed across both the external and internal will end up being more versatile, and will generate more new knowledge (one particularly needs to remember the reflective, and the possibility it introduces of creating metarepresentations).

Finally... In the readings there was this general idea that cognitive factors influence the evolution of representational systems, and vice versa (hence we have 'cognitive artifacts'). It seems like our perceptual constraints make up a lot of this influence. Can a representational system ever really make a task harder or easier, or reveal anything new? I think the answer lies in the use of the word 'artifact'; it is only some accidental little increments that make up the perceived ease and increase in knowledge.

[edit] James O'Shea - Sep 11, 2007 05:25:36 pm

It is clear many people disagree with Tufte's extreme approach to maximizing the data-ink ratio in his redesigns of some of the traditional graphs (e.g. boxplots). I'll expand upon Omar's comment a little to emphasize that Tufte's redesigned graphics are a good thought exercise. It is one thing for him to say that removing clutter, avoiding redundancies, and focusing on the data are good practices, but it is another to actually see these principles taken to the extreme. I found it helpful to see his redesigned box-plots, however preposterous they may be, because they forced me to really think about what is essential in a visualization. I feel like it gives me a better understanding of how a graph or chart is composed.

I think conceptually it is important to examine these ideas, but I did find myself thinking that it seemed a little outdated. I agree with David that many of Tufte's principles need to be re-examined with respect to the use of technology and computer graphics. It is still necessary to think about the essence of visualizations (as with the data-ink ratio), but I do think the rules change a bit when dealing with massive data sets (which are becoming more common through the use of technology and increased computing power) and the interactive visualization tools needed for exploring them.

[edit] Jonathan Chung - Sep 11, 2007 09:18:33 pm

On the topic of Good/Bad visualizations, I am reminded of something a colleague of mine came across and forwarded to me. It's called the "Periodic Table of Visualizations" and is located here:


While it is by no means a complete list of visualization methods, I do like how it categorizes them and briefly states what sorts of data or intents fit best with each given type. Based on the visualizations I saw today, many which could be linked to visualizations here, one common mistake that I saw was the classic case of a visualization that is bad because it's the "wrong" type of visualization chosen for the given purpose.

One example of this was the one in which a pie chart was used to depict the wins and losses of various football teams. This would work adequately (though not optimally) if there were actual data to plot, but in the case of a 1-0 record, it became a rather confusing and amusing picture. A more obvious example from Monday's lecture is the one in which a bar chart was used to plot nominal data pertaining to cars and countries of origin.

[edit] Athulan - Sep 11, 2007 11:49:08 pm

I would like to chime in with some criticism of Tufte as well. He does not seem to address the constraints placed on visuals when they appear in print. Many times visuals look "bad" because of the limitations of the printing method. This especially is the case when he discusses the "vibrations" in the visuals, especially the cross-hatching. He recommends grayscale variations instead of the cross-hatching, but this was not possible in many cases as grayscale was not available - which is why people turned to cross-hatches in the first place. I would argue a good visualization is something that makes the best of a given medium/format and not the "best" in an isolated universe (as Tufte would like them to be).

I also think that sometimes having high-density visuals (like the train timings) makes the information a bit difficult to convey. Atleast, the visuals take a while to decipher (again, like the Napolean match - which someone mentioned earlier here). I am not sure if I would always go for the packed visual. Many times when I am reading a technical publication I am skimming visuals for interesting trends and not for density of information. An ideal visual (IMHO) would be one that "grabbed" the viewer by thrusting an interesting trend and which had dense information which became apparent on closer examination. Of course, this is probably not that easy to accomplish...

[edit] Mark Howison - Sep 12, 2007 12:35:25 am

In response to James:

While I agree that there is to some extent a "common visual language" for simple visualizations such as bar or pie charts, I would also speculate that there are differing (and sometimes even incompatible) norms between disciplines and professions when looking at more complex visualizations. For instance, in my example of a bad graphic, showing the results of a 2x2 factorial study by connecting repeat conditions with lines may be the norm for educational psychologists, yet to someone unfamiliar with that discipline the lines could imply a time series. As another example, physicists often use Feynman diagrams to represent and even perform calculations for particle interactions. Yet, to someone not familiar with the norms of Feynman diagram construction, they may appear as nothing more than a collection of squiggly lines with some Greek letters thrown in. (Feynman diagrams, by the way, are a great example of a cognitive artifact.)

[edit] Carroll - Sep 12, 2007 10:41:59 pm

From reading the above comments it seams clear that Tufte's idea of maximizing the data-ink ratio (and his general affinity for custom visualization types for particular data sets) has a major caveat, it may increase the time needed to understand the data model. Taking the junk ink to its theoretical minimum has a certain attractiveness in terms of efficiency of the encoding, but it will wind up being less efficient for the reader who could comprehend a more standard visualization much more quickly. For the bar chart example it seems like the ink he erased is actually more easily decipherable than the ink he left, simply because it looks somewhat like a traditional bar chart.

On a completely separate note, I just wanted to share a very cool visualization I came across called the cortical homunculus. Some of you may have seen a similar drawing before in a anatomy textbook, it's a fairly common representation. Basically, the size of the body parts of the homunculus are proportional to the amount of space used in the brain to control them.

[add comment]
Personal tools