
Calling Bullshit
The Art of Skepticism in a Data-Driven World
Categories
Business, Nonfiction, Self Help, Psychology, Philosophy, Science, Politics, Technology, Audiobook, Sociology
Content Type
Book
Binding
Hardcover
Year
2020
Publisher
Random House
Language
English
ASIN
0525509186
ISBN
0525509186
ISBN13
9780525509189
File Download
PDF | EPUB
Calling Bullshit Plot Summary
Introduction
We live in an era where data and statistics have become the primary currency of persuasion. From political campaigns to corporate marketing, from scientific research to social media, quantitative information surrounds us in unprecedented volume and complexity. Yet this abundance has created a paradox: while we have more facts at our fingertips than ever before, distinguishing genuine insight from sophisticated deception has become increasingly challenging. The proliferation of misleading statistics, deceptive visualizations, and algorithmic black boxes has created an environment where even well-educated individuals struggle to separate signal from noise. This exploration of modern deception techniques offers a systematic approach to developing statistical skepticism without requiring advanced mathematical training. By examining how selection bias distorts our understanding of reality, how visual representations can manipulate perception, and how correlation gets falsely elevated to causation, we gain practical tools for navigating our information landscape. The focus remains not on technical statistical details but on fundamental questions about how information is gathered, represented, and interpreted. These critical thinking skills serve not merely as defensive measures against manipulation but as essential components of informed citizenship in a quantitative age.
Chapter 1: The Bullshit Epidemic: How Deception Evolved in the Information Age
Deception is not a modern invention. Even Plato complained about the Sophists who were indifferent to truth and only interested in winning arguments through rhetorical tricks. But to understand modern bullshit, we must look beyond human civilization to the evolutionary roots of deception. Animals have been deceiving one another for hundreds of millions of years as a survival strategy. The mantis shrimp performs threatening displays during molting periods when its powerful claws cannot actually function - an empty threat that works because potential predators find the risk of confrontation too great. Ravens demonstrate even more sophisticated deception by fake-caching food when being watched by other ravens, pretending to stash snacks while actually keeping them hidden. Humans have elevated deception to unprecedented levels through two critical cognitive advantages. First, we possess a theory of mind - the ability to think about how others will interpret our actions and use this understanding to our advantage. Second, we have developed complex language systems that allow us to convey virtually unlimited messages. Together, these capabilities enable us to model how our communications will affect others, creating opportunities for both efficient information transfer and sophisticated manipulation. The ubiquity of bullshit stems from three fundamental factors. First, nearly everyone - from corporations to politicians to individuals - is trying to sell something, creating constant incentives for persuasion over accuracy. Second, humans possess the cognitive tools to determine what kinds of deception will be effective in specific contexts. Third, our complex language systems allow for infinite varieties of misleading communication. Perhaps most importantly, as Italian engineer Alberto Brandolini observed, "The amount of energy needed to refute bullshit is an order of magnitude bigger than to produce it." This asymmetry between the ease of creating misinformation and the difficulty of correcting it creates a structural advantage for deception. This asymmetry manifests dramatically in cases like the persistent myth linking vaccines to autism. Despite overwhelming scientific evidence disproving any connection, this falsehood continues to circulate based largely on a thoroughly discredited 1998 study. Though the paper was retracted and its author lost his medical license, the damage was done. Millions of research dollars and countless scientific hours have been devoted to repeatedly disproving this claim, yet the myth endures, contributing to decreased vaccination rates and preventable disease outbreaks worldwide. The digital age has accelerated this problem rather than solving it. Despite unprecedented access to fact-checking resources through smartphones and the internet, misinformation spreads faster than ever. Research on social media platforms confirms what Jonathan Swift observed in 1710: "falsehood flies, and truth comes limping after it." False rumors consistently reach more people than accurate information, even after being debunked. During the 2013 Boston Marathon bombing, a fabricated story about an eight-year-old Sandy Hook survivor dying in the blast spread to over 92,000 people despite rapid fact-checking efforts exposing the falsehood.
Chapter 2: Black Box Deception: Understanding How Modern Bullshit Works
Modern deception often operates through metaphorical "black boxes" - processes or technologies that audiences cannot easily penetrate or evaluate. While sociologist Bruno Latour originally applied this concept to scientific claims built upon specialized equipment and techniques, the same principle applies to contemporary bullshit. Effective deception rarely takes the form of simple, easily refuted lies. Instead, it shields claims from investigation by wrapping them in layers of complexity, jargon, and inaccessible methodology. Consider a simple claim: "Cat people earn higher salaries than dog people." When presented without context, this assertion invites immediate skepticism. However, if the claim comes packaged with references to TED Talks about personality types and workplace success, it becomes more challenging to dismiss. If further supported by research studies employing statistical terminology like "ANCOVA" and "log-transformed data," the claim becomes effectively shielded behind technical language that most people cannot evaluate. This represents the convergence of lies and bullshit: a potentially false claim concealed by rhetorical artifices that prevent straightforward verification. The key insight for detecting this form of deception is that you rarely need to understand the internal workings of analytical black boxes to evaluate claims emerging from them. Any black box system takes in data and produces results. Most often, deception occurs either because the input data are flawed or because the output results are misrepresented. Rather than attempting to decipher complex statistical methodologies, focus on whether the data collection process was sound and whether the conclusions logically follow from the reported results. This approach proves particularly valuable when confronting claims based on machine learning algorithms. These systems represent perhaps the ultimate black boxes - even their creators often cannot fully explain how they reach specific conclusions. However, machine learning systems remain entirely dependent on their training data. If an algorithm learns patterns from biased or unrepresentative data, it will inevitably produce biased results regardless of its technical sophistication. By focusing on data quality rather than algorithmic complexity, anyone can identify potential problems without specialized technical knowledge. Similarly, when evaluating statistical claims, consider whether researchers have adequately accounted for confounding variables or selection effects. A study comparing pet owners in New York City with those in upstate New York might find that cat owners earn higher salaries, but this could simply reflect the practical difficulties of keeping dogs in expensive urban environments rather than any intrinsic relationship between pet preference and income. Identifying such alternative explanations requires no statistical expertise - only careful consideration of how the data were collected and what factors might influence both variables. The black box model of deception explains why technical complexity often serves as camouflage for weak arguments. By creating barriers to verification, purveyors of bullshit exploit the natural human tendency to defer to apparent expertise. Developing effective bullshit detection skills therefore requires not technical mastery but strategic skepticism focused on inputs, outputs, and logical connections rather than methodological details.
Chapter 3: Statistical Manipulation: When Numbers Tell Convenient Lies
Numbers possess a unique persuasive power in contemporary discourse. Unlike words, which we recognize as human constructs, numbers seem objective - direct representations of reality rather than interpretations. They suggest precision and scientific rigor, appearing to exist independently of those reporting them. This perceived objectivity makes numbers ideal vehicles for deception, as audiences often accept numerical claims with less scrutiny than verbal assertions. This deference to quantitative information manifests in common phrases like "the data never lie" or requests to "just see the raw numbers." Yet this perspective fundamentally misunderstands how numerical information functions. Even when a measurement or calculation is technically accurate, it can still mislead when presented without appropriate context or in ways that prevent fair comparisons. Understanding how numbers can deceive requires recognizing the various ways they can be manipulated to support predetermined narratives. Summary statistics represent a common source of numerical deception. Means, medians, and modes condense complex distributions into single values, but choosing inappropriate summary measures can dramatically misrepresent underlying patterns. Politicians exploit this when proposing tax policies that primarily benefit wealthy constituents. By reporting the mean tax savings across all taxpayers, they can claim their plan will save families an "average" of several thousand dollars annually, even when the median family - one in the middle of the income distribution - would receive nothing. This technically accurate but deeply misleading use of averages exploits mathematical literacy gaps in the general public. Percentages create particularly fertile ground for manipulation because they can be calculated and presented in multiple ways. When comparing two values, percentages can be calculated relative to either the higher or lower value, producing dramatically different results. When bitcoin's value dropped from $19,211 to $12,609 in December 2017, this could be described as either a 34 percent decrease (relative to the initial value) or a 52 percent decrease (relative to the final value). Both calculations are mathematically correct but convey substantially different impressions of market volatility. The confusion deepens when comparing percentage values themselves. Consider a sales tax increase from 4 percent to 6 percent. This represents a 2 percentage point increase in absolute terms, but a 50 percent increase in relative terms (since 6 is 50 percent greater than 4). Advocates for tax increases typically emphasize the smaller percentage point figure, while opponents highlight the larger relative percentage. Neither representation is inherently wrong, but each serves different rhetorical purposes. Perhaps the most insidious form of numerical deception involves what mathematician Jeremy Wertheimer calls "mathiness" - formulas and expressions that mimic the appearance of mathematical rigor while disregarding logical coherence. Consider the "Trust Equation" promoted by consulting firms: Trust = (Credibility + Reliability + Authenticity) / Self-Interest. This appears to offer a rigorous framework for understanding interpersonal dynamics, but fundamental questions remain unanswered: How would these abstract qualities be measured? In what units? Why would these particular variables combine in this specific mathematical form? The equation provides an illusion of precision without actual analytical value.
Chapter 4: Selection Bias: How Sampling Distorts Our View of Reality
Selection bias represents one of the most pervasive and problematic sources of statistical deception. It occurs when the individuals or cases sampled for analysis differ systematically from the population about which conclusions are being drawn. This fundamental flaw undermines the validity of subsequent analysis regardless of how sophisticated the statistical methods employed might be. Consider a simple example: A professor wants to know how often students miss class at her university. She surveys students present during a Friday afternoon lecture and finds they report missing an average of two classes per semester. This seems implausibly low given that only about two-thirds of seats are typically occupied. The problem becomes obvious when we recognize that students answering the question aren't a random sample of all enrolled students - they're precisely those who attend class regularly enough to be present during the survey. Those who frequently miss class are systematically excluded from the sample, creating a distorted picture of attendance patterns. Insurance companies exploit selection effects in their advertising. Companies frequently claim that new customers save hundreds of dollars annually by switching to their coverage. How can multiple competing insurers make similar claims? The answer lies in who switches. Different insurance companies use different algorithms to determine rates, weighting factors like driving record, mileage, and storage conditions differently. When drivers shop for insurance, they naturally look for companies whose algorithms would lower their rates substantially. The only people who switch are those who will save significantly by doing so, creating a biased sample that makes every insurer appear to offer superior rates. Selection bias creates particularly confusing statistical paradoxes when comparing individual and group perspectives. In Portugal, approximately 60 percent of families with children have only one child, yet about 60 percent of children have siblings. This apparent contradiction makes perfect sense when we recognize that multi-child families each contribute multiple children to the population, while single-child families contribute only one. Similarly, universities often boast about small average class sizes while students report predominantly experiencing large classes. Both can be correct simultaneously - if a department offers many small classes and a few large ones, most classes will indeed be small, but most students will experience large classes because that's where most students are enrolled. Institutions frequently exploit this distinction between class-level and student-level statistics. A university might truthfully advertise an average class size of 18 students while most undergraduates find themselves in lecture halls with hundreds of peers. Neither representation is technically false, but focusing exclusively on class-level statistics creates a misleading impression of the typical student experience. Selection effects appear in virtually every domain where sampling occurs. A psychiatrist might observe that throughout her career, she has treated many patients suffering from excessive anxiety but none suffering from insufficient anxiety. This makes perfect sense - people with too little anxiety have no reason to seek psychiatric treatment. Understanding selection bias requires constantly asking who or what might be systematically excluded from our observations and how this exclusion might distort our understanding of the broader population.
Chapter 5: Visual Deception: The Power and Danger of Data Visualization
Data visualizations possess extraordinary communicative power. A well-designed graph can instantly reveal patterns that would remain obscure in tables of numbers. This efficiency makes visualizations invaluable tools for honest communication, but it also creates opportunities for sophisticated deception. Understanding how visual representations can mislead is essential for navigating our increasingly graphical information environment. One of the most common visual manipulations involves axis truncation. Bar charts should generally extend their vertical axes to zero because these visualizations emphasize absolute magnitudes through bar height. When designers start the vertical axis at a higher value, they dramatically exaggerate differences between values. What might be a modest 5% difference can appear as a towering disparity when the axis begins at, say, 45%. This technique appears frequently in political messaging, advertising, and even scientific publications when authors wish to emphasize small differences that might otherwise appear insignificant. Another prevalent form of visual deception violates what statistician Edward Tufte calls the "principle of proportional ink" - the idea that the area of ink used to represent a value should be proportional to that value. Three-dimensional pie charts frequently violate this principle because perspective distorts proportions, making slices in the foreground appear larger than equally sized slices in the background. Similarly, bubble charts can mislead when designers scale circles by diameter rather than area, causing visual representations to grow disproportionately faster than the values they represent. Color choices represent another subtle but powerful form of visual manipulation. Using red for one political party and blue for another in the United States immediately triggers partisan associations. Selecting color scales that transition through yellow can create the impression of a "danger zone" in the middle of a distribution. Even the choice between sequential and diverging color schemes influences how viewers interpret data, potentially suggesting value judgments about which parts of a distribution are "normal" versus "extreme." Dual-axis charts, which plot two different variables with different scales on the same graph, offer particularly rich opportunities for deception. By carefully adjusting the relative scales, designers can create the illusion of correlation between unrelated variables or mask relationships between connected ones. This technique frequently appears in attempts to link unrelated phenomena, such as vaccination rates and disease prevalence, by visually aligning trends that have no statistical relationship. Perhaps most troubling are what Tufte calls "ducks" - graphs where decoration overwhelms purpose. USA Today pioneered this approach with visualizations using lipstick tubes as bars in charts about cosmetics spending or ice cream cones as pie charts about popular brands. These designs sacrifice clarity for superficial visual appeal, making it harder for readers to accurately interpret the underlying data. Similarly, "glass slippers" force data into visual forms designed for entirely different types of information, creating a false sense of rigor through inappropriate structure.
Chapter 6: Correlation vs. Causation: The Fundamental Error in Statistical Reasoning
The distinction between correlation and causation represents perhaps the most fundamental concept in statistical reasoning, yet it remains persistently misunderstood and misrepresented in public discourse. Correlation - the tendency of two variables to move together - is relatively easy to establish statistically. Causation - the determination that changes in one variable directly produce changes in another - requires much stronger evidence and logical frameworks. Media headlines frequently blur this distinction, transforming correlational findings into causal claims. A study showing that people who exercise regularly have lower cancer rates becomes "Exercise Prevents Cancer" in news coverage. This transformation occurs through subtle linguistic shifts - replacing "associated with" or "linked to" with active verbs like "reduces," "prevents," or "causes." These shifts fundamentally misrepresent the underlying evidence and can lead to misguided personal and policy decisions. Consider a 2016 study published in JAMA reporting that people who exercise less have increased rates of thirteen different cancers. This observational study identified correlations but could not establish causality - perhaps exercise reduces cancer risk, or perhaps people who don't exercise have other characteristics that increase their cancer susceptibility. The press largely ignored this critical distinction, with headlines like "Exercise Can Lower Risk of Some Cancers by 20%" (Time) and "Exercising Drives Down Risk for 13 Cancers, Research Shows" (Los Angeles Times) suggesting direct causal relationships not supported by the evidence. Several mechanisms create correlations without causation. Reverse causality occurs when the presumed effect actually causes the presumed cause - for instance, when disease symptoms prompt medication use, creating a correlation between medication and illness that reverses the causal direction. Confounding variables create spurious correlations when an unmeasured third factor influences both variables under study. Socioeconomic status, for example, affects both coffee consumption and heart disease risk, potentially creating a correlation between coffee and heart health that disappears when accounting for this confounder. The problem extends beyond media simplification. Even original scientific articles sometimes make this error. A study of children in San Francisco revealed that those who consumed more milk fat were less likely to be severely obese. The authors correctly cautioned that this correlation does not demonstrate causality, but then titled their article "Full Fat Milk Consumption Protects Against Severe Childhood Obesity in Latinos" and suggested reconsidering recommendations promoting lower-fat milk. This linguistic shift from correlation to causation appeared in the very same paper that explicitly acknowledged the limitations of correlational evidence. When we don't know which way causality flows, we should avoid making prescriptive claims. In 2018, Zillow reported a negative correlation between housing price increases and birth rate declines. Cities with the largest housing price increases showed greater declines in fertility rates for women aged 25-29. The Zillow report carefully avoided claiming causation, noting several alternative explanations. But when MarketWatch covered the findings, their headline asserted causality: "Another Adverse Effect of High Home Prices: Fewer Babies." This transformation from correlation to causation exemplifies how nuanced research findings become oversimplified causal stories in public discourse.
Chapter 7: Artificial Intelligence: Separating Genuine Advances from Hype
Artificial intelligence has generated cycles of hype and disappointment since its earliest days. In 1958, The New York Times reported on the Navy's "embryo of an electronic computer" that would eventually "walk, talk, see, write, reproduce itself and be conscious of its existence." This "embryo" was the perceptron - a simple logical circuit designed to mimic a biological neuron. Its inventor, Frank Rosenblatt, predicted his machines would think like humans, recognize faces, translate speech, and perhaps attain consciousness. More than six decades later, many of these ambitious predictions have indeed come true. Modern facial recognition, virtual assistants, and machine translation systems all rely on neural networks conceptually similar to what Rosenblatt envisioned. Yet contemporary AI coverage remains characterized by the same pattern of grandiose predictions and inflated expectations. Understanding the genuine capabilities and limitations of artificial intelligence requires separating substantive advances from marketing hyperbole. Modern machine learning represents a fundamental inversion of traditional programming. In classical software development, programmers write explicit instructions telling computers what to do with input data. Machine learning instead provides computers with training data and correct answers, allowing algorithms to discover patterns and generate their own programs for classifying new information. This approach has proven remarkably effective for specific tasks like image recognition and language processing, but it remains entirely dependent on the quality of training data. This data dependence creates both the power and vulnerability of AI systems. With high-quality, representative training data, machine learning can identify subtle patterns invisible to human analysts. With biased or unrepresentative data, these same systems will faithfully reproduce and potentially amplify existing prejudices. Facial recognition algorithms trained primarily on light-skinned faces perform poorly on darker-skinned individuals. Hiring algorithms trained on historical hiring decisions perpetuate gender and racial biases embedded in those decisions. These systems don't create bias independently; they reflect patterns in their training data with unprecedented efficiency. Media coverage often exacerbates misconceptions about AI capabilities through anthropomorphization and sensationalism. When Facebook researchers observed chatbots developing simplified communication patterns, headlines proclaimed "AI Creates Its Own Language" and suggested researchers "shut down" the project to prevent a robot uprising. The reality was far more mundane - the systems had simply developed inefficient communication patterns that failed to achieve their designed purpose of human-like conversation. The gap between AI capabilities and public perception creates significant risks. While researchers and technologists debate whether artificial general intelligence might someday pose existential threats, immediate concerns about algorithmic bias, privacy violations, and automated decision-making receive insufficient attention. As researcher Zachary Lipton observes, "Policy makers are earnestly having meetings to discuss the rights of robots when they should be talking about discrimination in algorithmic decision making." Maintaining appropriate skepticism toward AI claims requires focusing on fundamentals rather than technical complexity. Questioning the source and quality of training data, examining how systems are evaluated, and considering what biases might be embedded in the development process provides a framework for assessing AI claims without requiring deep technical expertise. The most important insight may be that machine learning, despite its remarkable capabilities, remains fundamentally limited by its training data - there is no magical algorithm that can spin flax into gold.
Summary
The ability to detect and refute statistical deception has become an essential survival skill in our data-saturated world. Throughout this exploration, we've seen how misleading information manifests across domains - from selection bias that distorts our understanding of reality to visual manipulations that exploit perceptual vulnerabilities, from correlation-causation errors that generate unfounded recommendations to AI hype that confuses potential with achievement. The common thread connecting these diverse forms of deception is their exploitation of cognitive shortcuts and technical complexity to bypass critical evaluation. Perhaps the most valuable insight is that statistical skepticism requires no advanced mathematical training. While technical knowledge certainly helps, the core skills involve asking fundamental questions about how information was gathered, what it actually represents, and what alternative explanations might exist. By focusing on data collection processes rather than analytical black boxes, questioning whether visual representations accurately reflect underlying values, and maintaining healthy skepticism toward causal claims based solely on correlational evidence, anyone can become more resistant to quantitative manipulation. These critical thinking habits serve not merely as defensive measures against deception but as essential components of informed citizenship in an increasingly quantitative world.
Best Quote
“To tell an honest story, it is not enough for numbers to be correct. They need to be placed in an appropriate context so that a reader or listener can properly interpret them.” ― Carl T. Bergstrom, Calling Bullshit: The Art of Skepticism in a Data-Driven World
Review Summary
Strengths: The book is systematic and comprehensive in presenting the various ways misinformation can be identified and understood. It effectively teaches readers how to recognize and question dubious information. Weaknesses: The writing style is described as relatively dull, lacking engaging humor compared to similar works like "Bad Science." Overall Sentiment: Mixed. While the book is appreciated for its thoroughness and educational value, it is critiqued for its lack of engaging writing style. Key Takeaway: The book is a valuable resource for learning to identify misinformation, although it may not be as entertaining as other works in the same genre.
Trending Books
Download PDF & EPUB
To save this Black List summary for later, download the free PDF and EPUB. You can print it out, or read offline at your convenience.

Calling Bullshit
By Carl T. Bergstrom