Reading #5

Subtleties of Color
by Robert Simmon

The use of color to display data is a solved problem, right? Just pick a palette from a drop-down menu (probably either a grayscale ramp or a rainbow), set start and end points, press “apply,” and you’re done. Although we all know it’s not that simple, that’s often how colors are chosen in the real world. As a result, many visualizations fail to represent the underlying data as well as they could.

Read the blog series.
And/or watch the lecture.

Use the tag “R5” when you post your assessment of the readings and the questions raised.

The reading for this week gave us the opportunity to reflect on the the importance of the usage of color and the relation it has for depicting data.

With the help of multiple examples, Simmon pays special attention to the differences between the way our computers render colors and the way we perceive them. He argues that our perception is more relatable to the properties of lightness, hue and saturation, while keeping in mind at all times that a correct use of color "enhances clarity, aids storytelling, and draws a viewer into your dataset".

The author invites and encourages us to be conscious about our limitations to perceive color and, apart from the need to have a palette where colors are as different from each other as possible, Simmon emphasizes that we should limit the categories to 12 -although I personally think that this is a very high number.

The final general "rule" is to maintain a continuity of associations we already have with certain objects and their colors.

The main conclusion for this reading is to not complicate patterns and pre existing associations while keeping in mind that "there is no perfectly objective view of color".

In this week's reading was more into the field I have been learning about for the past 7 years. However, this topic focused more on the use of color through data and not a general theory. It was a helpful revision and a light read. Agreeing with most of what Simmons explained, I enjoyed the general umbrella of “show[ing] patterns and relationships that are otherwise hidden…”

A constant battle a designer has is transmitting a color from screen to print and then back to screen again. The “lost in translation” from RGB and CMYK was always a weak point. However, the designer now must think of color from the nonlinear human eyes and the linear commuter system. Working with lightness is translatable with both worlds. Specifically, the NASA Ames example was a beautiful selection of colors because of the continuous change of lightness. Going through the different data types; sequential, divergent and qualitative, and the ideal color use was extremely helpful.

I was interested to read about the use of unnatural colors when it comes to data interpretation. The painful rainbow gradient is a trending use of colors, when in reality, it serves a weaker function. A more realistic color choice will allow the concept to express more intuitively. For example, Missing or invalid data should be clearly separated from valid data. Instead of typing NA in a table, or a blank space, adding a strong color will allow to stand out.

Ending the whole blog with helpful tools is the highlight of the reading. It is one thing to read about color and another thing to test color with a specific purpose in mind.

Patrick Serr
Reading #5 - 10/25/17


Through a series of blog posts under the heading “Subtleties of Color,” Roberts Simmons examines the ways in which human perception and cognition should affect the choices made by designers when creating data visualizations. His points are rooted in the study of numerous texts by thinkers across disciplines (as outlined in the sixth and final segment of the series) as well as scientific papers and working models proposed by professional organizations (primarily cartographers and NASA scientists). His language is clear enough to be understood by the layperson, while the visual examples he provides, along with specific step-by-step instructions for using tools, provides a level of richness suitable for students and professionals to glean novel insights.

“Subtleties of Color” is broken into six sections: an introduction to color theory and the HCL color model which more is closely attuned to human perception, three sections describing color strategies appropriate to different types of data sets, and finally a wrap-up of useful tools and his sources.

The middle three sections — “The ‘Perfect’ Palette,” “Different Data, Different Colors,” and “Connecting Color to Meaning” — all cover new some new theoretical ground compared to previous class readings, and are worth close examination. Simmons’ straightforwardness regarding visual strategies is refreshing and provides the aspiring designer with helpful rules of thumb. Understanding that degrees of lightness are more easily perceived by the eye explains his model for palettes: combine a linear, proportional change in lightness with a simultaneous change in hue and saturation. His selection of visual examples reinforces the point while sparking inspiration for the number of ways in which this might be contextually applied to different data sets. By going on to explain the difference between color usage in sequential data sets versus what he terms “divergent” and “qualitative” (or “categorical”) data sets makes both intuitive sense and is satisfying to see in practice.

I especially appreciated his acknowledgement in Part 4 of the ways in which theoretical models and color rules run into real-life data to produce unexpected results. His discussion of a topographical map with a “heat map” palette from red to light yellow illuminates how even a perfectly logical set of color choices may require tweaking based on the figure-ground relationships created by specific visualizations.

Simmons’ discussion of the role aesthetics, while understandably vague, stand as the one point at which he diverges tonally from the pragmatic tone of the rest of his writing. His advice — “I can only encourage you to keep your eyes open” — is accurate but leaves something to be desired. Where other authors referenced in our course reading have taken a stab at identifying some of the aspects of what separates beautiful or tasteful visualizations from the mundane or mediocre, Simmons seems content to move on. Perhaps he would better serve the reader by either omitting discussion of aesthetics altogether or dedicating an entire segment to a more developed discussion.

This series of blog posts discusses, in the context of effectively visualizing data, certain aspects of color theory and patterns in the way humans perceive color. The first post discusses the differences between how computers generate color (as a combination of red, green, and blue) and how humans perceive color (with respect to lightness, hue, and saturation). In doing this, it illustrates the reasons why the RGB color space model is not intuitive for humans to use. As an alternative, it presents the idea of CIE color spaces, which correspond better with human intuition.

The second post begins discussing incorporating human perceptual patterns to create data visualizations that accurately reflect underlying data. Specifically, it notes how humans do not perceive color linearly along a rainbow (as they do on a greyscale gradient), explaining how data visualizations that map a rainbow color scheme to data overemphasize certain quantities and underemphasize others. In this same vein it notes (as the reading did last week) how human perception of color is influenced by the colors that surround it, making the same color look completely different when placed in contrasting surroundings.

The third post builds on the ideas discussed in the second post. While the second deals only with mapping color to data that is sequential in nature, the third describes what to do if the data is divergent, meaning it moves away from a central point in two different directions. The main solution here is viewing the divergent data as sequential data in two different directions, and accordingly mapping two contrasting gradients to the two sequences. The post also discusses categorical data, noting that here, rather than a gradual continuum, it is necessary to have colors that are as different from each other as possible. It notes that human perceptive ability generally limits the number of categories able to be differentiated by color to 12.

The fourth post discusses additional topics about using color to represent information, focusing mainly on aligning use of color with human intuition. It begins by mentioning that if colors are being used to represent a phenomenon that is already associated, physically or culturally, with certain colors, those existing associations should be preserved as best as possible. It also discusses issues in and techniques of displaying multiple datasets together, both with color; here, it is important to clearly differentiate the colors used between datasets (or between data and no data). The post finishes off by discussing situations where it may be necessary to break certain “rules” outlined previously, mentioning that it is always necessary to make critical judgement and respect aesthetics.

The fifth post concludes by detailing tools for practically implementing the ideas discussed in the previous posts.

Robert Simmon, Subtleties of Color

Solving the problem of representing numbers with color, Simmon theorizes, is one that can be achieved with basic principles of color theory. I appreciated the simple but effective example of the Mars image directly translating to a paint by number, where human effort completed a task while the computation lagged behind. It illustrates the power that the psychology and biology of color are so ingrained in our human experience, it's a shame we ever manage to deceive these natural tendencies.

But aesthetic appeal, reliability, and understanding become intrinsically linked in the patterns and perceptions. Simmons highlights effectively some of these possible misperceptions and then outlines principles to guide better decision making and best practices - even if only in the most relative terms- for using color to "illuminate data".

Purpose of data visualization- any data visualization- is to illuminate data. To show patterns and relationships that are otherwise hidden in an impenetrable mass of numbers.

Color can be interpreted only through perception. Like many other readings, internal bias, biological limitations, and cultural associations can all effect how we use and interpret color, and the goal is to avoid, at the very least, the very worst of these by considering the lightness, hue and saturation of each element. Association and relativity, as well as basic judgment about known data being represented (like the example to use a scale from yellow to dark blue to show ocean depth, versus primary color scale that is typically used, and undertstanding limitations of perceptions- we can't see a dark yellow, doesn't register by our retina), all come in to play.

"Color has an objective reality, but there is no perfectly objective view of color"-

Simmons helps break down best practices with a "perfect palette" theory and some best practices for application:

a kind of spiral in color space that cycles through a variety hues while continuously increasing in lightness

Our vision system is primarily driven by lightness * hue and saturation are secondary, most important is that light is varied perceptually accurate.

Sequential data: data that that has low value to high value is best represented by alight to dark scale- navy to slate blue.

Divergent data: profit and loss- divergent palettes- our visual systems are better at picking up dark, saturated colors, so use neutral color as central points- this way you highlight outliers, and prevent any association with either for middle values.

Qualitative data: land cover, political parties (european), want colors as distinct as possible to differentiate categories with 7 +-2 as the ideal palette range.

Consider accessibility such as low vision and color blindness- avoid red/green/ brown palettes.

Consider presentation and accessibility and refer to Tufte, karen Brewer, Colin Ware.

use intuitive, semanically associated colors, matched palettes, figure-ground- use it with cultural references, as hierarchy for layering, as transition points that aren't diverging data

One useful practice is to use color to differentiate data from no data, "so before we're even aware the eye is discerning what's what."

Ben Shneiderman philosophy: overview first, zoom in filter, details on demand (for interactive data visualizations) Use color for hierarchy and layers to show relationships.

-color brewer- select palette based on data type -chroma.js- tool that interpolates LCH space -nasa color tool- build palette, pick hue, palette wheel

Simmon tip and final thought:

use standard design tools- grouping, careful use of line, typography, color- to make more coherent visualizations

The emphasis Simmon places on neurological/cognitive bases within colour theory seems to strike a fairly universal tone for the construction of any system built across several layers of abstraction. In such cases, because each layer is constructed on top of the prior while adding or changing certain structural dimensions, the earlier layers can add or effect constraints in the subsequent ones in non-obvious and unexpected ways.

One example Simmon mentions is simultaneous contrast, which describes how our cognitive perception of one certain patch of colour is affected by adjacent colours. The effect is "hard coded" to some extent within our base physiology, and therefore needs to be taken into consideration in questions of higher-level design.

This kind of connection between layers of abstraction within a complete system seems natural on the face of it, but can be hard to conceptualise or identify when dealing with these layers individually. I'm reminded of the way the network protocol suite is designed and developed from the physical and hardware levels, up into the domain of high level software. I'm thinking in particular of 8b/10b encoding for the Ethernet protocol, which serves no purpose in terms of the content being transmitted digitally, but is significant to the underlying physical layer w.r.t. balancing voltages in the devices being used.

This passage from the Programmer's Compendium seems to illustrate the designer's responsibility to accommodate such issues and constraints, no matter how one may wish they fell beyond the system's purview:

We'll stop here, because we're already beyond the scope of what can be considered programming, but there are many more protocol issues to accommodate the physical layer. In many cases, the solutions to hardware problems lie in the software itself, as in the case of the 8b/10b coding used to correct DC offset. This is perhaps a bit disconcerting to us as programmers: we like to pretend that our software lives in a perfect Platonic world, devoid of the vulgar imperfections of physicality. In reality, everything is analog, and accommodating that complexity is everyone's job, including the software's.

Lumpy Color Spaces

I read these blogs a few days ago and then gave them a few days to see what floated up throughout the next couple of days. The first thing that struck me and kept coming back was the asymmetry of our perceptual color space.

One side of this is the fact that "objectively uniform" adjustments to color values, such as changing the hue or saturation by the same unit, are not necessarily perceived as linear transitions. As Simmons writes, our perceptual color space is "lumpy," when describing the shape of the NASA AMES color tool.

I used to take sketching classes and my teacher used to say that our heightened sensitivity to greens is due to the fact that green is the inverse of red and we are sensitive to red because of its resemblance to blood.

His explanation wasn't exactly accurate (doesn't account for why green is actually more sensitive than red, or what this "inverse of" means in terms of our rods and cones), but I do think it is interesting to think about how these asymmetries in our perception are rooted in some sort of "biological heuristic," honed over time to maximally identify things most important to the human species.

Knowing this, I think, can really inform color choices in information visualization -- by knowing what our perceptual systems are calibrated to notice (or not notice), we can truly develop effective interpretive systems that function in collaboration with our perceptual biases.

Robert Simmon’s blog posting series / video “Subtleties of Color” presents a compact overview of the most important considerations when using color. He suggest sthat in the opening paragraph that folks choose colors from a “Drop down menu”, set “start and end points” and click apply. So Simmon is addressing a problem that many tools default to color palettes which are either inappropriate for data visualization or perhaps even well constructed palettes can be applied poorly.

Simmons proposes that the highest quality examples of color theory come from the world of cartography, a field with a long history that has had much opportunity for refinement over the years. It’s hard to argue with this, given that a data vis legend like Bertin was also a cartographer, and developed his system through his trade.

Simmon’s review of color theory, which in the video he remarks are just the very basics, emphasizes perceptual color spaces. His main point is that we need to design color palettes which are based on human perception abilities – not simply mathematical properties. For example, “lightness” (value) dominates human perception space, and we must consider this when choosing a palette. In the video this is clarified with the comment “There is no such thing as dark yellow” – meaning yellow as a hue choice may always be perceived as something bright.

Simmon also makes the point that essentially the task of choosing color for visualization is a mapping exercise. This is made very clear in a literal example with the Mars “sketch” and then later when items like divergent, sequential, and qualitative data. For example, it is useful to discuss “diverging data” before discussing “diverging palettes”. Identifying breakpoints in the data for example, is essential before mapping color to the data. For me, this highlights a limitation in a tool like Tableau – which encourages applying color palettes before really mapping the data.

I thought the example of using “categories” of color was useful (e.g. 3 shades of green for forest) is a helpful example. Also the examples using “natural” palettes vs. “unnatural” in the phytoplankton are helpful as well. The argument appears to be, create affordances that respect where the user is currently at - don’t over complicate the system if it will cause one to lose the audience. Hence this ties into the larger theme that proper color choice is one of “aesthetics and judgement”. I believe that actually most of these “judgement calls” could also likely be codified but that would be a large under-taking.

In this week's reading, Robert Simmon presents foundational color theory in the context of data visualization, specifically focusing on his extensive expertise in topographical map-making. In his blog post and lecture, Simmon does a masterful job conveying concise information in an impactful way. This being my first exposure to color theory, this reading fascinated me and highlighted (pun intended) decision points regarding color that I never before considered, but are crucial to our work as data designers.

Simmon explains how color can be so powerful and expressive in displaying quantitative information, but how it can also easily mislead. A great microcosm of this interplay comes through in his lecture, when he is referencing a debate between himself and a fellow NASA visualizer who is an advocate of the rainbow palette. His coworker sticks with the rainbow palette because it does a better job at "showing detail," but Simmon's claim is that it manufacturers detail that isn't present in the data itself rather than illuminating insight inherent to the information being analyzed.

Color theory, and Simmon's orientation to this theory, rests on how we perceive color, and the relationship between colors. Simmon shows us that there is an "existing solution" around what colors we should and should not use - that our existing research largely answers the question on how we perceive color (looking at saturation, hue, and lighting), but highlights how it can be misused in visualizations. Most notably for me, how diverging or qualitative palettes can attribute unequal jumps in value to equal jumps in the data, simply due to how we interpret and perceive the color choices for those values (this is his contention around the rainbow palette I explained above). His common-sense approaches to ascribing value to colors (using colors that have cultural or logical significance to the data being examined) also adds to the breadth of his approach, and one I find no real qualms with. I will definitely be using his perspective (and palettes) in my visualizations moving forward.

In this week's reading the topic is a quick primer on color theory. All the content is premised on a truism introduced at the beginning of the post, that "The purpose update of visualization - any data visualization- is to illuminate data. To show patterns and relationships that are otherwise hidden in an impenetrable mass of numbers."

Simmons with the rudiments of color spaces their respective deficiencies. RGB, a color space I've used most of my life, clearly had uneven brightness which (prior to this talk) I'd never noticed. HSV is somewhat better but has uneven brightness and again there is uneven coloring perceptual space given to colors like greens. Both of these can also numerically represent imperceptible colors such as dark yellow. A solution for this is the CIE color space. Specifically in this talk he Advocates CIE lch which seems functionally to be similar to HSV but with a color engine that takes into account the wetware that will do the perceiving.

Regardless of our choice of color spaces, and although CIE is very good in accounting for various visual deficiencies, it still does not speak to other issues such as our perception of gray scale (how it varies with surrounding grays) as well as red-green color blindness. This in particular was an issue I'd never considered prior to this talk, but made all the more salient when one of his colleagues admitted to not being able to decipher side-by-side spectrums (red-green vs. brown) which Simmons admitted to using professionally.

Following this, he enumerated several different use cases for various data types, introducing several clear examples for three data type categories: sequential, divergent and qualitative. Each of which are (in the majority of cases) best served by ramps, two ramps with a middle ground and several distinct colors respectively.

With these mechanics out of the way the rest of the post/talk deals with issues of palette colors and how they can serve or diminish intelligibility. Under this rubric he introduces a handful of axioms which I found quite useful as a beginner. These range from the more obvious such as using intuitive colors (I.e. blue for water) and usage of different colors for complementary data sets so that you can reduce unnecessary ink (Ex: the combined dataset where he could forgo drawing coastlines). And included some I would have been less likely to consider such as the impact of culturally associated colors.

What struck me most about this whole presentation is the degree to which the subject matter still seems to be something that professionals have to internalize as opposed to a set of best practices which covers the 99%. By way of example I showed a few of my friends the sand dunes imagery which Simmons suggested was problematic and without exception they all interpreted the data correctly.