Bad Data Essays-R4

The breakdown of visual data forms in the Healy chapter are introduced by an affirmation of data visualisation's fundamental value in the overall data science pipeline: there are purportedly hard limits to descriptive statistics, and visual reformulations can act as checks against their authority.

Healy then follows up with caveats in a discussion of the intrinsic faults and necessary tradeoffs in the process of creating such visual representations. But the manifold of errors in the construction (and subsequent biases in the interpretation) of the representations can be generalised to a fairly universal degree, beyond visual or numerical elements, and into human cognition as a whole.

Anscombe's quartet or Hewitt's analysis do not just exemplify bad data- the data sets demonstrate a tendency to place faith in the explanatory power of summary stats beyond their purview and application. More generally, our instincts drive us towards repeated recognition of certain patterns and structures, even (as noted by Healy) when presented with evidence to the contrary, a tendency which can be an outright terrible match for the sheer complexity, probabilistic nature, and stochastic processes of the observable world:

https://fivethirtyeight.com/features/the-media-has-a-probability-problem/

https://www.newyorker.com/magazine/2005/12/05/everybodys-an-expert

The upshot is that we're all less objective than we might typically expect ourselves to be, across a range of dimensions and levels of consciousness. In terms of truth-seeking, this seems a problem that can less be solved outright than simply mitigated to the best of our abilities, by a continuous process of expanding the perspectives through which we aggregate, analyse, and represent information, and subsequently cross-referencing to reduce the area in which error can propagate. The result is a probabilistic narrowing of uncertainty, rather than a deterministic elimination.

Menu