Profiles in Badness
The three texts we read for today all discuss the process of creating visuals to communicate quantitative information effectively and, more specifically, accurately. The first, a chapter from Kieran Healy, begins by identifying reasons we might be concerned with visualizing data at all: the main point here is that summary statistics conceal the nuances of their constituent data, that visualizations can illuminate these nuances, but that poorly constructed visualizations can give rise to additional problems.
In an effort to characterize poor data visualizations, the chapter identifies three areas of concern: visual aesthetics, the substance of underlying data, and habits of human visual perception. While oftentimes poor visualizations include problems in more than one of these areas together, the claim of the chapter is that distilling down these larger problems into the three smaller groups will help illuminate them. Toward that end, the chapter provides examples of poor visualizations in each area, analyzing why each one fails to be successful.
While problems with data and with aesthetics are a bit more intuitive, problems with the visual perceptive habits of humans require more formal knowledge. For this reason, the chapter then details various studies, theories, patterns, and factoids about visual perception, oftentimes relating them to data visualization specifically.
Throughout the chapter, and in its conclusion, there is a message that there are many theories and rules for effective data visualization and that it is important to be aware of them. However, also clearly stated is that there can be highly effective visualizations that break these conventions; in these cases it seems important to break conventions with arguable intention.
The following two articles discuss proportion and axis-labelling in graphs. With respect to axes, the consensus is that bar graphs should always begin at zero whereas line graphs need not necessarily do so (and may sometimes even be problematic because they do). The underlying idea is that bar graphs represent absolute quantities, so starting at an arbitrary non-zero point obscures the true quantity represented by the bars. In contrast, line graphs illustrate changes in one variable with respect to another, meaning they depict relative quantities. Forcing an axis to start at zero in this case may make the changes in the chart so subtle that it becomes imperceptible, illustrating stability when that might not actually be the case. Here again, of course, the data need to be dealt with on a case-by-case basis, and appropriate axes should be shown given the context of the data.
The final article, discussing proportion in graphs, is quite related. It furthers the previous argument of bar chart axes needing to start at zero with the idea each bar in a bar chart should have an area that is proportional to the numerical figure it is meant to represent, which is only possible if each bar is measured from zero. The article then discusses this idea with various other plots, from line graphs to bubble charts and more, notably reiterating (from the previous two texts) that additional dimensions (3-D, for example) should only be used in the cases where the data very much necessitates it, never as simply an aesthetic preference. It provides an illuminating example of a 3-D bar plot displaying the incidence of renal disease with respect to systolic and diastolic blood pressure; here, there are two (somewhat) independent variables and a third that is dependent on them both, which is an apt scenario for using this sort of plot.