Reading #4

Profiles in Badness
from Kieran Healy, Carl Bergstrom, & Jevin West

Read Healy's introductory chapter from Data Visualization for Social Science:
- Look at Data: What Makes Bad Figures Bad

Read Bergstrom & West's Calling Bullshit essays:
- Misleading axes on graphs
- The Principle of Proportional Ink

Use the tag “R4” when you post your assessment of the readings and the questions raised.

The text discussed for this occasion approach the effectiveness of communicating quantitative information and the recommended process to create visual representations of data. Across the multiple readings, the main idea that I get from the texts is that, as a designer, apart from the aesthetic side of the process, you have to always take into account who is your audience and what is the main purpose of your work (the main idea that you want to communicate with your graphs).

The readings provide us with a comprehensive amount of examples that help us distinguish between good and bad visualizations. In order to avoid bad practices, we need to keep in mind the visual habits we have and how they might affect our visual relationship with the representation of patterns and quantitative information.

Another important element that we need to always consider is the crucial role that labeling and proportion play in our visual perception. Some rules that are important to remember are to always start with zero when it comes to bar graphs, while this is not necessarily the case for line graphs.

Nevertheless, the main idea is to evaluate the data on a particular case. Therefore, we can say that it is possible to bend the rules if the data we are working with, allows us to do so.

Im conclusion, in order to need to achieve a successful design, we need to be clear, concise and efficient so our readers can receive our message quickly. We must also never forget that we have a responsibility to never mislead the public.

The three texts we read for today all discuss the process of creating visuals to communicate quantitative information effectively and, more specifically, accurately. The first, a chapter from Kieran Healy, begins by identifying reasons we might be concerned with visualizing data at all: the main point here is that summary statistics conceal the nuances of their constituent data, that visualizations can illuminate these nuances, but that poorly constructed visualizations can give rise to additional problems.

In an effort to characterize poor data visualizations, the chapter identifies three areas of concern: visual aesthetics, the substance of underlying data, and habits of human visual perception. While oftentimes poor visualizations include problems in more than one of these areas together, the claim of the chapter is that distilling down these larger problems into the three smaller groups will help illuminate them. Toward that end, the chapter provides examples of poor visualizations in each area, analyzing why each one fails to be successful.

While problems with data and with aesthetics are a bit more intuitive, problems with the visual perceptive habits of humans require more formal knowledge. For this reason, the chapter then details various studies, theories, patterns, and factoids about visual perception, oftentimes relating them to data visualization specifically.

Throughout the chapter, and in its conclusion, there is a message that there are many theories and rules for effective data visualization and that it is important to be aware of them. However, also clearly stated is that there can be highly effective visualizations that break these conventions; in these cases it seems important to break conventions with arguable intention.

The following two articles discuss proportion and axis-labelling in graphs. With respect to axes, the consensus is that bar graphs should always begin at zero whereas line graphs need not necessarily do so (and may sometimes even be problematic because they do). The underlying idea is that bar graphs represent absolute quantities, so starting at an arbitrary non-zero point obscures the true quantity represented by the bars. In contrast, line graphs illustrate changes in one variable with respect to another, meaning they depict relative quantities. Forcing an axis to start at zero in this case may make the changes in the chart so subtle that it becomes imperceptible, illustrating stability when that might not actually be the case. Here again, of course, the data need to be dealt with on a case-by-case basis, and appropriate axes should be shown given the context of the data.

The final article, discussing proportion in graphs, is quite related. It furthers the previous argument of bar chart axes needing to start at zero with the idea each bar in a bar chart should have an area that is proportional to the numerical figure it is meant to represent, which is only possible if each bar is measured from zero. The article then discusses this idea with various other plots, from line graphs to bubble charts and more, notably reiterating (from the previous two texts) that additional dimensions (3-D, for example) should only be used in the cases where the data very much necessitates it, never as simply an aesthetic preference. It provides an illuminating example of a 3-D bar plot displaying the incidence of renal disease with respect to systolic and diastolic blood pressure; here, there are two (somewhat) independent variables and a third that is dependent on them both, which is an apt scenario for using this sort of plot.

Problems typically encountered with data likely fall into three categories: aesthetic, substantive and perceptual.

Since every individual brings the lens of their own life to every inference it's shocking there aren't more, but simplification of categorization seems reasonable within this scope.

But it is more likely that we are incorporating one or two of these pillars and failing for the third to uphold the same standard than it is that a visual has failed entirely at one or all three. As mentioned it is more common to fall short on the use of good data than it is to implement bad design, though not impossible.

Hard and fast rules seem to weaken the further we deviate from the standard, so while Tufte lays out principles, sometimes the best implementation is the rebellious breaking or bending of a rule, in exactly the right way with exactly the right data that makes for a successful representation.

But when manipulating data to infer an idea, as much considered need be applied to the design choice, as the mechanism and form of display, as the intentional presentation. I find it particularly interesting that in this quest for the proper balance of all three, there is a 'tour de force' in Minard's "Napolean's Retreat," which even then is an elusive, perfect storm that is more the exception than the rule.

Nonetheless, Tufte's timeless message on Graphical Excellence (1983) holds true as principles to always consider:

Graphical Excellence is the well-designed presentation of interesting data-a matter of substance, of statistics, and of design... [It] consists of complex ideas communicated with clarity, precision, and efficiency..[It] is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space... [It] is nearly always multivariate... And graphical excellence requires telling the truth about the data.

Unlike previous weeks, this reading had multiple sources that more or less communicate the same message. It is a message that has been embedded in my brain since the first year of undergrad, “The designer has a responsibility to his/her design.” They key term “responsibility” is sometimes thrown out of the door. These three readings is a reminder of that happens when we tend to forget the impact each design has. I feel this is intensified when the design incorporates data.

Most of rules mentioned on “Misleading Axes on Graphs” were taught to early on. There isn’t much to debate. However, it is important to mention these rules. Typically the used convention when it comes to plotting data should be maintained. If the author wants to test another theory, it may confuse the readers. Again, its about a story that the designer wants to tell. The content should drive the visuals or the experience.

“Proportional Ink” was another reading that focused on the RGB and the CMYK. Going back to the shades of color and the size of the shading is crucial when designing a graph. It can easily be misleading. The key term that stuck was “proportion.” We tend to think of proportion of white space to the content, but we also have to keep in mind the grays in between. In addition, the concept of “3D” should only be used when the information reflects a three dimensional nature.

Finally, “Look at Data” was a compare and contrast reading that allowed me to look at information from a different perspective. Before this class, I was aware of “pretty” graphs and “ugly” charts. But this reading has shown me there is another layer that is far more important. (On a side note, excel tables and graphs need a ReVamp) What I thought was funny was the “frustrating’ Tufte argument. I’m sure that others in class will disagree, however “graphical excellence” is not just preferred but especially nowadays, expected.

Again, it echoes the first point of this response, The designer has a responsibility to his/her design” but now to add “to avoid misinterpretation of the truth.”

The breakdown of visual data forms in the Healy chapter are introduced by an affirmation of data visualisation's fundamental value in the overall data science pipeline: there are purportedly hard limits to descriptive statistics, and visual reformulations can act as checks against their authority.

Healy then follows up with caveats in a discussion of the intrinsic faults and necessary tradeoffs in the process of creating such visual representations. But the manifold of errors in the construction (and subsequent biases in the interpretation) of the representations can be generalised to a fairly universal degree, beyond visual or numerical elements, and into human cognition as a whole.

Anscombe's quartet or Hewitt's analysis do not just exemplify bad data- the data sets demonstrate a tendency to place faith in the explanatory power of summary stats beyond their purview and application. More generally, our instincts drive us towards repeated recognition of certain patterns and structures, even (as noted by Healy) when presented with evidence to the contrary, a tendency which can be an outright terrible match for the sheer complexity, probabilistic nature, and stochastic processes of the observable world:

https://fivethirtyeight.com/features/the-media-has-a-probability-problem/

https://www.newyorker.com/magazine/2005/12/05/everybodys-an-expert

The upshot is that we're all less objective than we might typically expect ourselves to be, across a range of dimensions and levels of consciousness. In terms of truth-seeking, this seems a problem that can less be solved outright than simply mitigated to the best of our abilities, by a continuous process of expanding the perspectives through which we aggregate, analyse, and represent information, and subsequently cross-referencing to reduce the area in which error can propagate. The result is a probabilistic narrowing of uncertainty, rather than a deterministic elimination.

Impressions

The readings on Profiles in Badness initially caused some anxiety around how easy it may be for designers to create misleading graphs. However, upon closer examination and comprehension, the main points were too sound to be forgotten once articulated.

Takeaways

The arguments I heard loud and clear were:

• Never truncate the y-axis (Bar, Area)
• Fit plot to your data points (Scatter, Line)
• Use ink that is proportional to data
• Aesthetically: simplify, avoid junk

Thoughts

Ongoing debates referenced that I think intriguing to perhaps discuss and pick apart at another time:

• How do we make judgement calls on whether research studies proving certain visual perceptions present in most readers should be re-trained by persisting with new visual techniques (evolve), or which are innately a weak-point ("leave-alone", humans cannot "unsee")?
• When were 3D charts ever a thing?
• On creating "memorable" charts and graphics - when is it wise, or when does it oversimplify dynamic data trends as a "fact of truth"?

Readings

Misleading Axes on Graphs

The points made in this article are already well known to me. I can appreciate the section on multiple axes on one graph. I often feel those types of graphs are confusing and often suggest causal relationships that may not exist. However, I feel another issue here is that in most of the charts presented there is a lack of accountability; namely the author of the chart rarely chooses to include their name in the work. I think this simple act may go a long way to cause graph authors to re-think if their work is truthful or misleading.

The gun deaths chart is probably the most provocative example in the chart. My belief is that respecting the cultural norms of the audience is important; flipping an axis from the common convention is only going to mislead despite whatever intentions the author has. Thus I would place a higher value on legibility and interpretability. It is not clear to me what value the author was drawing from, other than novelty.

Proportional Ink

This article covers some fairly well established territory again, and I agree with just about everything stated. I think the most interesting of the examples is the Time Magazine “causes of death.” The author makes a great point that comparisons can be made which the graph author likely did not consider (the massive ink space devoted to toddlers accidents vs. senior accidents). Also interesting is the author making the case for a some legitimate uses of 3D, which I had not previously given much thought to.

Chapter 2 “Look at Data”

I thought this book chapter served as a great overview to many common topics in data visualization. Already familiar with Ancombe’s Quartet, I found the Jan van Hove small multiple of “same correlation” scatterplots really interesting. I’m not sure if this was attempted before, but what amazed me is that was produced in 2016. It seems so clear & useful, and a great reminder of how visualization can really assist in discovering interesting patterns that may be missed if usingly only the most common quantitative statistical measures.

The Tufte/Holmes “debate” always struck me as ridiculous. To me, it’s always been a question of competing values, not of “proper chart construction”. The Tufte box plot is a good reminder of how minimalism doesn’t always communicate more efficiently. The “violin” plot always struck me as superior – if the desired value is to communicates more information in a smaller place, that is, the very essence of efficiency. But the violin plot wasn’t technically feasible in 1980s and prior (box plot of course being a Tukey invention).

I think the NY Times “Essential to live in a Democracy” plot is really important example. I think this is a tremendous inherent problem in many of these survey scales. In the business world likert scales and NPS scales are some of the most poorly represented data. I am wondering if there is a word for this phenomenon – when someone answers 6 out of 10 on a survey and another person answers 3 out of 10 it doesn’t necessarily mean the the person who answered 6 feels 2x as strongly as the person who answered a 3.

The section describing the work of Cleveland/McGill and later Heer/Bostock is interesting. I don’t remember seeing this level of quantification of decoding before. I would be interested in understanding how this is changing over time. I see more scatterplots than ever in the New York Times and Wall St Journal. Is this because the readers are now familiar with the chart type, or is it simply that these publications cater to a higher educated audience?

Finally I’d like to comment on the discussion on axis that fail to include zero on line graphs. As the prior article noted, and I agree, sometimes there are very good reasons to not include a 0 baseline. The argument that “graphs that don’t go to zero are a thought crime” fails to distinguish between types of graphs. I believe it is an error to automatically assume malice or bias in situations like this.

Our reading for this week centered around how data visualization specialists employ strategies and techniques that, at their most innocent, can cause misinterpretation of what is presented, and at their most malicious, can actively deceive their audience.

I was fascinated by the details of human perception covered by Healy's 'Look at Data' introduction, and the extent to which our perceptive strengths and weaknesses affect a visualization's ability to communicate its message clearly. A great example of this effect, and one that I admit was not one I had realized or focused much on in the past, is in color palette. I have always chosen color palettes for visualizations based on aesthetic preference, but never realized the perceptive effect these colors can have on interpretation. In displaying a varying degree of palettes that do a 'good' job at accurately representing sequential and categorical differences, it is easy to see how nonuniform jumps in color differences could lead to value judgements on information that is not actually present in the data being visualized.

Healy's detail of the experiment showing our cascading ability to correctly estimate the differences between two values based on the visualization used was exceptionally useful. Healy stresses the strengths and weaknesses we have in identifying differences - what stuck with me the most is the idea that having a shared or common scale (over length encoding, for example) is supremely helpful.

Healy talks about data ink, and this point is hammered home by Bergstrom and West (which, as a short aside, I found terribly entertaining reading). Through this reading we see many examples of violators of best practices laid out by the authors, and how these examples breed misperception. While everything these authors laid out makes complete sense, I would love to see some more examples that aren't horrendously bad but rather walk the line between acceptable and problematic, so we can become more nuanced and detailed in our process of avoiding such pitfalls as designers.

From the class I'm curious to know the answers to two questions:

(1) Do we believe in Bergstrom and West's (somewhat_ hard-and-fast rule that bar charts (or any chart displaying a value by area) should ALWAYS include the zero axis? Are there any examples where this may be appropriate?

(2) I found it interesting (and surprising) that Healy gave some credit to Nigel Holmes's 'Montrous Costs' visualization. What is out sentiment - do we agree that this visualization is a little more memorable (in a positive way), or that visualizations that sometimes are a little more lax on maximizing data ink are a more digestible?

The three readings for this week all come more or less under the rubric of "best practice". While it could be said that all three share common themes, the two shorter posts from "Visualization" both read as more functional and utilitarian which would make sense given their demographic. The last from "Data Visualization for Social Science" is certainly a more nuanced approach and incorporates into its argument matters of historical record, physicality and psychology.

The first from Visualization: "Misleading axes on graphs", encourages readers, similar to Drucker, to regard the act of codifying data visually as a conscious process of telling a story. The choices we have for example, including (or not) a zero point in a bar or line chart, determine what story we are telling and how we have chosen to tell it. While in some cases there may be good reasons for deviating from the norm, in the vast majority of cases assayed here, the root cause is most likely disingenuousness or in the best case, incompetence. The takeaway is ultimately the line graph is a visual tool used to show deltas in variables while bar graphs are better for asserting magnitudes.

The second of the two, while equally informative, was perhaps the most entertaining highlighting a parade of bad faith or simply awful counter-examples to the principle of "proportional ink". The main focus of this piece is that the amount of ink used represent a data point should conform to a direct proportion of its value. Attempts to fulfill secondary objectives (Ex: making the visualization more interesting by the use of 3D bar charts, donut charts, etc.) should be avoided if (as they often do) they tend to obscure the viewer's perception of relative or absolute values in favor of visual spice.

The last article, and my favorite, considers what limitations we as consumers of visualizations have and what of these points should be incorporated into any ethical/functional considerations. Additionally as producers of such stories, what ethical choices we face when we omit data or use the wrong tools/bad data to support a story we would like to tell.

Again the theme of prioritizing aesthetics vs clarity is revisited. That said, of most interest to me were the sections dealing with geometric and physical limitations of perception (I'm currently reading "Thinking Fast and Slow" by Daniel Kahneman with several thematically similar chapters). Taken as a whole, these are issues that defy any sort of remediation. The fix in almost all cases is simply "don't do that" since the problem is one of how humans are physically instantiated and not one of policy or ethics. We are simply wired to respond more or less positively to various rhetorical assertions based on how they are encoded.

In contrast to their original assertion that a "boiled down" list of principles it's difficult to enumerate, I was grateful for the section on the Cleveland and McGill study. While I understand the concern, being provided a model framework is useful until I can internalize it well enough to understand when exceptions are reasonable. To this end (and in response to these blogs) I have already considered several times how I might remake our first two assignments in light of these guidelines and how my usage either supported or violated best practice.

Honestly, as subjects for response papers, there wasn't much here to debate. Every point is well-taken and I appreciate the skill required to bring this in an understandable fashion to non-experts. I look forward to being able to use these pointers in future class projects.

Interpretation and its Discontents

I was struck in this reading by how the nature of our interpretive capabilities as human beings can often lead us astray. This was most notable in visual illusions, such as Edward Adelson's checkershadow. I have looked at this illusion multiple times, and even seen spent hours looking at different demonstrations of it, but it still amazes me how robust this human capability of meaning creation/interpretation is. Even knowing that our brain is interpreting values different than those hitting our retina doesn't help "fix" the perceptual experience of how it appears.

This also came to the forefront in a different way with Healy's treatment of 'randomness.' It amazes me how we as humans generally perceive the Poisson-based distribution to have more structure than the Matérn distribution, despite the fact that algorithmically the first is more "random" than the second.

In these readings, these lapses in rationality are posed as a constraint on our ability to directly perceive "objective" values in our world, but I also see this as the feature of human beings that gives power and salience to data visualization. This unconscious/subconscious/nonconscious (whatever it may be) "pop-out"-edness is less about the physical ink on paper and more about our embodied humanness.

I believe that these interpretative discrepancies with "reality" also come in the cognitive framing of a given subject matter. I would be interested in learning more about how the conceptual metaphor at play (regardless of whether it is instantiated in visual or textual form) affects the viewers' interpretation of a visualization. These readings focuses primarily on the visual side of these things, but I am also interested in the priming effect a large headline can have on the reading of a visualization, even if it entirely bypasses consciousness. For example a line graph showing a slowing trend of GDP could have a title of "Due for a Rebound?" or "Economy in a Sinkhole" and may create significantly different takeaways from the same perceptual stimuli.

In short, I think these cognitive biases should always remain at the forefront of the design process -- not only because of the potential to mislead, but also because these biases are what make human interpretation of data visualization possible at all.

Menu

Impressions

Takeaways

Thoughts

Interpretation and its Discontents