The Data Detective

Harford has written a pile of books on economics and ran a show decoding the world of statistics. I picked up The Data Detective: Ten Easy Rules to Make Sense of Statistics (2021) to see if it might be useful for undergraduates in the social sciences. Harford is a good storyteller, hence the pile of books, but the ten commandments and the golden role (be curious) are quite vague (essentially: avoid bias, use different data types, put data in context, know the source, question big data and algorithms, be open to change your mind). Useful, but not specific. At times the author speaks to fellow data nerds, however the majority of the content is introductory (too generic for university students). Well suited for a mass market book.

The book is in response to the "statistics lie" narrative, but a cautionary tale of how to engage them carefully and cautiously. The pitch: "I want to convince you that statistics can be used to illuminate reality with clarity and honesty. To do that, I need to show you that you can use statistical reasoning for yourself, sizing up the claims that surround you in the media, on social media, and in everyday conversation. I want to help you evaluate statistics from scratch, and just as important, to figure out where to find help that you can trust." (p. 9)

A few notes:

"The counterintuitive result is that presenting people with a detailed and balanced account of both sides of the argument may actually push people away from the center rather than pull them in. If we already have strong opinions, then we'll seize upon welcome evidence, but we'll find opposing data or arguments irritating. This biased assimilation of new evidence means that the more we know, the more partisan we're able to be on a fraught issue." (p. 36)

"A randomized controlled trial (RCT) is often described as the gold standard for medical evidence. In an RCT, some people receive the treatment being tested while others, chosen at random, are given either a placebo or the best known treatment. An RCT is indeed the fairest one-shot test of a new medical treatment, but if RCTs are subject to publication bias, we won't see the full picture of all the tests that have been done, and our conclusions are likely to be badly skewed." (p. 125-6)

"Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily by mysterious and commercially sensitive black boxes." (p. 183).

Another Now
Epistemic Freedom in Africa
Subscribe to receive new blog posts via email