Data and Image Models
From CS294-10 Visualization Fa07
Lecture on Aug 29, 2007
- The eyes have it, Schneiderman. (html)
- The structure of the information visualization design space. Card & Mackinlay. (ieee)
- Chapter 2: Graphical Integrity, In The Visual Display of Quantitative Information. Tufte.
- Chapter 3: Sources of Graphical Integrity, In The Visual Display of Quantitative Information. Tufte.
- On the theory of scales of measurement. Stevens. (jstor)
 Ken-ichi - Aug 28, 2007 10:56:40 pm
One thing that's always bothered me about Tufte's emphasis on data density in visualizations is that he always seems to assume the the graphic is a substantive document in its own right, deserving the same level of scrutiny as the newspaper article in might accompany. This seems to be his assumption when he laments the paucity of scatterplots in American newspapers in these readings. I guess this is probably valid if you read a physical newspaper sitting down over coffee, where you might have the time and the inclination to pore over details, but maybe simpler, less dense displays are more appropriate when a publisher assumes readers will be skimming, glancing at graphs to gain a summarized, reduced view of an article's implications rather than a view as finely articulated as the article text itself.
 Kenghao Chang - Aug 28, 2007 11:46:47 pm
Card and Schneiderman both proposed taxonomy for information visualization, but the two taxonomy came from different perspectives. Schneiderman's is kind of top-down application driven, but Card's looked at more basic elements of visualization. For example, although both taxonomy include data as an variable, Card's work focused on the expressibility of data, such as nominal, ordered, and quantitative, but Schneiderman's definition of data comes from how data are organized, inter-related with others. Moreover, Card added graphical properties (those perceived by our retinal system) as another dimension of the taxonomy whereas Scheneiderman took the application-oriented view that how people interact with data is used to form his taxonomy.
Chapters about data integrity by Tufte is mainly about how data distortion misleads people.
 David Sun - Aug 29, 2007 01:48:06 am
I'm not convinced that Tufte's use of "relational graphics" as the sole metric for ranking the integrity (or what he calls "sophistication") of various printing presses is a good idea. Certain data sources lend themselves naturally to time-series plots (rather than relational plots); examples include fluctuations in the stock market and exchange rates (I feel it is somewhat unjustified to rank Wall Street Journal, naturally one of the biggest sources of data of this kind, last in the graphical sophistication table (Table 1)). A closely related issue is that relational graphics could easily mislead the reader into believing the existence of a causal relationship between variables when correlation is the strongest connection supported by evidence.
 James O'Shea - Aug 29, 2007 07:12:32 am
Schneiderman and Card both propose variations of a visual taxonomy. This is obviously a difficult, if not impossible, task, and any attempt to exhaustively organize the literature is probably going to fail in some respect. On the other hand, I think there is tremendous utility in having a framework with which to think about these things, despite knowing there will be shortcomings. I think one area they fail to adequately address is the mapping between the viewer's perception of the visualization and the data itself. Card expands the visualization vocabulary beyond Schneiderman's data types and data actions to involve graphical properties such as retinal encodings and spatial positions. He also suggests that the success of a visualization often depends upon an effective mapping between this vocabulary and the data. I think this is actually an important point which deserves more discussion. The degree to which viewers are able to make this mapping between what they see and the structure of the "real" data is often a critical issue which is not always understood. Card hints at this while discussing the Table Lens and the Hyperbolic Browser, both of which require the viewer to invert the mapping from a distorted visualization to the actual data set. Some visualizations depend on this decoupling (and the proper perception of it) more than others, and our understanding of how this is done, and whether it can be adequately performed in all cases, is often poorly understood and warrants special consideration.
 Omar - Aug 29, 2007 09:34:25 am
Ken-ichi: i'd say the new york times is going tufte's way. for the past few years they have been increasingly using data graphics, usually quite elaborate ones, though the elaborate parts often go towards catching the readers attention (does counting in donkies vs elephants really help the reader better understand the dem/rep difference?). actually, what the new york times is doing is really interesting from an attractiveness perspective. i'll be the first to say that sometimes tufte's examples and his own data graphics are downright boring. i'd love to see more focus on attracting readers to a data graphic while still conveying the data you want to convey. i think the new york times is trying hard at that -- it'd be interesting to know who they are commissioning to make their data graphics.
 Danielarosner - Aug 29, 2007 02:49:10 pm
A little research into American meat classification (such as "AA" and "AAA") showed that we were a bit off in class. The value of A's seems to run counter to our prediction in class, and the categories are nominal, not ordinal. In a study of Indonesian meats, the Food and Agriculture organization posted the following key:
AAA - Meat produced is qualified for export
AA - Meat produced is qualified for domestic trade
A - Meat produced is qualified for meat trade only within the municipality/city
 Teale Fristoe - Aug 29, 2007 06:28:10 pm
First, Danielarosner, I would argue that the classification still is ordinal, as fit for international trade implies fit for domestic trade, etc.
Second, with response to the discussion of Playfair's visualization of imports versus exports, I believe Playfair did use a visual cue to differentiate between different levels of trade deficit and surplus. While color does indicate the single nominal, discrete values of deficit and surplus, the size of the color chunk indicates the amount of deficit or surplus. So, right after the crossing point, when there are only a few more exports than imports, there is also only a small amount of yellow. By using this strategy, Playfair is able to communicate both the big picture (England is now exporting more than it is importing) and the details (this is how much more England is exporting than importing).
Finally, I was a little confused about Bertin's classification of orientation as only communicating nominal values. Do speedometers and similar displays not use orientation to communicate a high precision of quantitative data? Am I missing something here?
 Amanda Alvarez - Aug 29, 2007 07:24:10 pm
David Sun: It's true that certain data, particularly the kind dealt with in the Wall Street Journal, lend themselves to time-series plots. I think Tufte's point is that the viewer of the graphic has the ability to decode and appreciate sophisticated relations; he notes that even children are good at this. Yes, a correlation can mislead someone, but Tufte's argument is that we should not assume that the viewer will be misled. In fact, we want to "confront statements about cause and effect with evidence, showing how one variable affects another" (p. 82), ie. we should not run away from positing causal relationships. (Hopefully the viewer also knows 'correlation does not imply causation', or as Tufte puts it, "Correlation is not causation but it sure is a hint.") Looking at it the other way, we may wonder whether it wouldn't be a good idea to have a variety of graphics, and not just 100% relational content. Presumably the non-relational graphics have some information value, so perhaps a better metric would be the data/ink ratio advocated by Tufte. After all, the simple graphics can also have Tufte's qualities of excellence and integrity.
Teale Fristoe: In the case of the speedometer, I don't think the oriented needle is the 'datum', it is just an accessory to indicate the (approximate) numerical value. It is the relative positions of the numbers in the display that codes the quantitative information.
 Ariel Rokem - Aug 30, 2007 01:16:47 pm
One thing that I did not understand in class: what is "deconstructing" a visualization? Is it simpy deciphering what the data model and what the image model are? Related to that (I think): Is there really an analytic way to do "analytic graphics"? That is, is there an algorithm we can apply to any visualization and will tell us whether we have extracted all the information we need in a way that the human system can then look at it and understand whatever it was we wanted understood?
 Robin Held - Aug 31, 2007 10:58:13 am
Chapter 3 of the Tufte reading seems to generally emphasize balance. At first, he seems almost personally offended by the domination of graphic artist in most publications. However, by the end of the chapter, the message seems to be that one needs to include both the original author of the data/words AND an artist to create an optimal presentation. After all, a graphic artist may be more knowledgeable with regards to use of white spaces, colors, etc., while the author knows what he/she wants to convey to the reader. Tufte also points out how many publications shy away from sophisticated graphics, but are willing to include advanced diction and grammar. When an article refuses to include well-developed figures, the reader is cheated of an opportunity to gain insight into the topic that can't be acquired from text alone. Tufte makes a good case for the balanced use of sophisticated text AND graphics to effectively convey information to one's audience.
 Jimmy - Aug 31, 2007 03:51:17 pm
Tufte proposed the principle "show data variation, not design variation", claiming the importance of sticking to the truth of data when its visualized. A distorted data visualization will be misleading as people tend to believe graphics == data and ignore what the underlying statistics are. And he provided some examples showing how easy we might be deceived by the distorted graphics. I agree that we should avoid too much exaggeration on graphics, but I am wondering if a little bit of design variation is tolerable. If the visualization of data is designed for everyone, we want to convey the statistics efficiently and clearly. The readers might not want to look into the details, but they need to know the big picture of what the statistics may imply. So some design variation might be good to highlight the important parts.
Tufte also mentioned the statistical graphics in college and high school textbooks are more sophisticated than those in news publications. I agree that people might not have difficulties understanding the graphics, as he said even a twelve-year-old child could understand the relational graphics. But it could be the case that we don't need much sophistication in graphics for news like Wall Street Journal.
 Mark Howison - Sep 03, 2007 10:56:41 am
James O'Shea wrote: The degree to which viewers are able to make this mapping between what they see and the structure of the "real" data is often a critical issue which is not always understood.
Indeed, this is a critical issue for understanding not only the role of visualizations, but also the nature of expertise or "professional vision" within a given domain that uses visualization. For instance, in a comparison of an expert bird-watcher versus a novice viewing a bird, the expert might immediately notice specific features of the bird, such as its beak shape, color, etc. and use these to classify the bird as a particular species, while the novice may just see bird and notice that it flies. In the math/science education literature, this distinction is usually called deep structure vs. surface features. There is a frequently cited study by Chi et. al. (1981) in which they presented physics freshman undergraduates (novices) and grad students and postdocs (experts) with physics textbook problems/diagrams, then asked them to group them based on similarity. The experts identified categories based on the deeper structure of fundamental laws, such as conservation of energy or F=ma, while novices tended to categorize based on surface features such as the formulas or variable names in the problem, or the nature of the diagrams (i.e. both involve something rotating).
From the anthropology/sociology literature, there is a study by Goodwin (1994) on how professionals see differently by virtue of the perspectives that they have been indoctrinated into through their professional training. He terms this phenomenon professional vision and provides an in-depth case study of how video footage from the Rodney King trial was reframed by a defense witness who was an expert on the dynamics of police arrests. By highlighting certain features of the video data, the expert witness presented an interpretation of police officers undertaking normal, professional police practices for dealing with an uncooperative criminal, such that the jury actually saw the footage differently from their first impression of a helpless man being unnecessarily beaten. The video data itself was unchanged, but the interpretation or "mapping" changed drastically.
 Kristal Sauer - Sep 03, 2007 08:43:48 pm
Schneiderman points out that humans are quite skilled at examining a visual display and noticing information about patterns and relationships. Clearly, this is an important fact to consider when presenting one's information. It strikes me that humans' proficiency in visual analysis also leads to a potentially useful application of computer vision -- analysis of experimental output (ie, graphs and figures). Understanding how people process visual information could provide another method for transitioning from experiment to discovery. This could be especially useful if we understood how expert analysts and experimenters are able to visually identify important results and then emulate their processes for use by others.
 David Jacobs - Sep 03, 2007 09:45:33 pm
One of the interesting things I've noticed while reading the Tufte books (I just got them a few days ago) is that many of the visualizations are not in written in English. For the more historical visualizations, it makes sense because they originated from non-English speaking cultures. Even the modern charts presented, however, are fairly regularly not in English. I was thinking that maybe Tufte presents other language visualizations to demonstrate that a good visualization shouldn't need to rely on written text to convey it's meaning. Anyone else have a thought?
 James Andrews - Sep 04, 2007 05:14:22 am
(in response to David, above:) While there are good visualizations that don't rely on language, I don't think that's a general principle and I don't think Tufte is trying to illustrating it via his choice of examples. The non-English visualizations are generally presented along with explanations, and so still rely on some written language to make sense. In particular, the "wondrously complex" timetable for the Java railroad line (pg 24 of Envisioning Information) comes to mind: without Tufte's detailed explanation it was completely incomprehensible to me. And some of the English visualizations rely heavily on written language, and would not have the same effect without it; the Criminal Activity chart on page 31 of Envisioning, for example, would lose much of its impact if one didn't read and understand the words "Murder" and "Pistol Whipping a Priest". Finally, it would often be very hard to obey Tufte's "Graphics must not quote data out of context" principle without relying on the written word.
 Jonathan Chung - Sep 04, 2007 09:38:10 pm
To bring closure to the "meat classification" discussion in class, what I think we were getting at was the USDA's measure for classifying various grades of meat. This does not follow the A/AA/AAA system, but rather, a bucket system which uses designations such as Prime, Choice and Select, which can be readily seen at your local independent supermarket or butcher (Safeway and large chains opt out of this system).
There are actually 8 grades in all, but only 3 are sold in stores with an official stamp. The rest are relegated to "store brand" status or for canned/packaged goods.
The way they perform the grading is also interesting and would be classified as a hybrid of nominal and ordered. This bears an explanation.
The eight grades are as follows:
Prime - Choice - Select - Standard Commercial - Utility Cutter Canner
While there is a clear progression from a high quality grade to a low quality one, in the two lower echelons, the grades within those levels are not meant to be better or worse than the others. They are just within that level and use different labels for different uses. For example, canner meat is used in canned goods while utility meat is used in prepared foods such as frozen dinners.
As for actual grading method, inspectors will evaluate "for tenderness, juiciness, and flavor," traits that sound too fuzzy to put on paper but break down to something where a trained inspector can take a quick glance and know where that carcass falls on the scale. And as diners, most of us can taste that a prime steak bears those 3 aforementioned qualities in greater abundance than a choice or select steak. Or in other words, the prime steak is juicy and tender whereas a lower steak is tougher.
I do not know off the top of my head what makes meat fall into the standard vs. commercial grade, nor do I know what the difference is between utility, cutter and canner meat besides their usage. All I know is that the differentiation there is based on usage of the meat rather than its inherent quality, hence making that classification nominal rather than ordered.
 Athulan - Sep 04, 2007 10:39:07 pm
I came across this very neat visual (Dangerous Liaisons -- you need to scroll for this) in an online article from the magazine Foreign Policy.
This is a nice illustration of how to show more than two dimensions in a 2D visual. This visual is presenting data from a survey of sexual behavior in different countries, and shows four variables. The Y-Axis has the average number of sexual partners in a country, the X-axis has the % of people having unprotected sex in a country, the size of the country bubbles represents the average rate of sexually transmitted diseases, and the color of the bubble shows the average age of first sexual intercourse. The selection of the axes, coloring, and sizes very effectively conveys the main point of the article. We can see and size trends across the plot, indicating that these are correlated with the entities in the X and Y axes. The annotations explains some cases as an example and leaves the reader to make all the correlations.
 Kenrick Kin - Sep 04, 2007 11:03:43 pm
I would like to see Tufte write an updated version of chapter 3 in The Visual Display of Quantitative Information. All of his examples are from 1980 and earlier and I would hope that graphic artists and statisticians have now found some kind of synergy. With current authoring tools, statisticians do not even need to be great graphic artists to design pleasing graphs - all there really is to choose from is color. Is there more integrity in today's publications or have people found more subtle ways to twist visualizations into producing a desired message? I agree with Omar that there should be some talk about the attractiveness of a graphic to lure readers. Photos certainly lure readers, and if possible, visualizations should too, regardless of how interesting the pure statistics may be.
 K7lim - Sep 05, 2007 12:09:01 am
I'm taken with Schneiderman's breakdown of: Overview first, zoom and filter, then details-on-demand I think it's telling that many laypeople will point to Google Maps and mashups based therein as amazing examples of visualization of data.
Not only does map data translate well to the screen, as Tufte points out, but the online, draggable, zoomable, searchable Google Maps lend themselves directly to the Overview first, zoom and filter, then details-on-demand.
Overview first : looking at a big map of a general area Zoom : you literally zoom down to the point that you want Filter : by choosing a route or a set of points via search query, you filter the rest of the land out. by choosing map view vs. satellite view, you can run the data through a filter, either distilling roads as lines, or enabling for quick filtering of "green open spaces." Details-on-demand : Once a series of points has been chosen, details about the positions are a click away.
Visualizations that do not have such direct and literal applications to Schneiderman's breakdown have a much tougher time appealing to the intuitive sense of goodness that users bring.
 Karen Hsu - Sep 05, 2007 09:04:07 am
I think deconstruction just refers to critical examination of a visualization, extracting its makeup and then also maybe providing an explanation for the choices behind it. Deconstructable information include the data model, image model, and element encodings used, as well as things like data set size, uninformative elements, etc.
 Nate - Sep 16, 2007 10:57:12 pm
Is a clock an example of orientation encoding ordinal data, in contrast to Bertin's "Levels of Organization," or do those encode information through position (the numbers around the circumference) and the orientation of the arms is secondary? Seeing that others picked up on this as well, I think orientation definitely plays a role in encoding ordinal data (see Movado watches).
Do visualizations in popular media serve a different purpose than information communication, such as decoration? (Echoes Omar's description of Tufte's illustrations as boring.)
 N8agrin - Sep 16, 2007 10:42:08 am
Shneiderman's advice on information seeking, to 'overview first, zoom and filter, then details-on-demand' is well taken considering the interactive visualizations made possible by modern computers. This contrasts to Tufte's insights on 'static' visualizations which do not necessarily offer the opportunity more closely inspect data.
I did not find Card and Mackinlay's approach to classifying information visualizations to be very useful. I found their use of tables to classify data in a visualization to be confusing and difficult to decode. It wasn't readily obvious what gain was had by classifying visualizations using this method. While I wouldn't argue that information visualizations are wholly extemporaneous works of art, I find it difficult to believe that codifying many types of data will provide a set of coherent useful visualization patterns that are not already somewhat obvious. The one advantage of this approach, and perhaps this is the point which I missed, is that using these principles, machines could easily interpert what types of data they are being presented with and automatically build 'best-fit' visualizations.
Again, I found Tufte's advice regarding data integrity to be very insightful and appropriate. His examples of 'lying' data were profound and pointed out errors, abstractions and exaggerations Powerpoint has always helped me include in my own visualizations. In particular is Tufte's breakdown of the ratio of image effect to data effect and his demonstration on how the image is interpreted incorrectly because of the variation of the image's overall display of exaggerated data ratio.