10 Ethical Visualization and AI Effects
10.1 Learning Objectives
By the end of this chapter, you will be able to:
- Identify common deception techniques in data visualization, including truncated axes, cherry-picked time windows, dual-axis distortions, and area-versus-length encoding errors
- Analyze whose data gets visualized and whose data remains invisible in standard practice
- Evaluate AI-generated charts for accuracy, bias, and misleading defaults
- Describe how text-to-chart tools work and where they fall short in professional settings
- Assess algorithmic outputs presented as visualizations for fairness and auditability
- Apply accessibility standards for color-blind readers and screen-reader users
- Articulate the designer’s responsibility when publishing a chart to a broad audience
10.2 A Chart That Changed Policy for the Worse
In early 2020, the Georgia Department of Public Health published a bar chart showing confirmed COVID-19 cases across the state’s highest-burden counties. The chart appeared to show a steady decline in cases over a two-week period. Journalists, public officials, and residents interpreted it as evidence that reopening was safe.
There was a problem. The dates on the x-axis were not in chronological order. The bars had been sorted so that values decreased from left to right, regardless of the actual date sequence. When a data journalist rearranged the bars into proper chronological order, the trend was flat or rising, not falling. The visual impression of decline was an artifact of sorting, not a reflection of reality.
The Georgia chart was not the product of a rogue data scientist. It was published by a state health department during a crisis. Whether the distortion was intentional or accidental remains debated. What is not debated is the consequence: the chart shaped public understanding during a period when accurate information could have saved lives.
This chapter examines how visualizations mislead, who benefits from that misdirection, and what happens when AI systems begin generating charts at scale. The stakes are not abstract. They are measured in decisions, resources, and sometimes lives.
10.3 Common Deception Techniques
Most misleading charts are not produced by villains. They are produced by well-meaning analysts who do not notice the distortion, or by designers under pressure to tell a particular story. The techniques below appear frequently in news media, corporate reports, and political communications.
10.3.1 Truncated Axes
A bar chart encodes quantity through the length of each bar, measured from a baseline of zero. When the y-axis starts at a value above zero, the visible bar lengths no longer correspond to the actual magnitudes. Small differences appear large. A 3% change can look like a 300% change.
Truncated axes on bar charts are almost always misleading. The exception is narrow: if the chart includes a clear break symbol and explicit annotation explaining the non-zero baseline, the distortion may be acceptable. In practice, most truncated bar charts include neither.
Line charts are different. Because line charts encode change through slope rather than bar length, a non-zero baseline can be appropriate when the goal is to show variation around a level. A line chart of daily temperature that starts at zero Kelvin would compress all meaningful variation into an unreadable band at the top.
The rule is simple. Bar charts should start at zero. Line charts may start elsewhere, but only with clear axis labels and an honest rationale.
10.3.2 Cherry-Picked Time Windows
A stock that lost 40% of its value over five years can be made to look like a winner by selecting the right six-month window. A public health intervention that had no lasting effect can appear transformative if the chart ends right after a temporary dip.
Cherry-picking is the act of selecting a time window, geographic subset, or demographic slice that supports a predetermined conclusion while omitting context that would complicate it. It is one of the most common forms of visualization deception because it requires no technical trickery. The chart itself may be perfectly accurate. The lie is in what the chart leaves out.
Responsible practice requires showing enough context for the viewer to form an independent judgment. If you are showing a trend, include enough history to reveal the baseline. If you are comparing regions, explain why those particular regions were selected.
10.3.3 Dual-Axis Charts
A dual-axis chart plots two variables on separate y-axes that share the same x-axis. The left axis might show temperature in degrees. The right axis might show ice cream sales in dollars. Because the two axes can be scaled independently, the chart creator can make the two lines appear to move in lockstep, diverge dramatically, or do almost anything in between, simply by adjusting the axis ranges.
Dual-axis charts are sometimes defended as space-efficient ways to show correlation. The problem is that the visual correlation they create is an artifact of axis scaling, not a property of the data. Two completely unrelated variables can be made to look perfectly correlated on a dual-axis chart. Two genuinely correlated variables can be made to look independent.
Better alternatives exist. Faceted panels with aligned x-axes show two variables without the false-correlation risk. Scatter plots show the actual relationship. Indexed line charts, where both variables are converted to a common scale (such as percentage change from a base period), allow honest comparison on a single axis.
10.3.4 Area Versus Length Encoding Errors
When a designer uses a pictogram, icon, or bubble to represent a quantity, the viewer’s brain interprets the area of the shape, not its height or diameter. If a value doubles and the designer doubles the height of an icon, the area quadruples. The visual impression of change is twice the actual change.
This is the mechanism behind Tufte’s “shrinking doctor” and “shrinking dollar” examples. It is also the reason that bubble charts require careful calibration. In ggplot2, the scale_size_area() function maps values to area rather than radius, producing honest encodings. The default scale_size() maps to radius, which exaggerates differences.
The principle extends to any encoding that uses two-dimensional shapes. Maps that use proportional symbols, infographics that use scaled icons, and dashboards that use sized tiles all risk area-versus-length distortion if the scaling is not handled correctly.
10.4 Whose Data Gets Visualized
Every dataset represents a choice about what to count. That choice has consequences. Communities that are counted attract resources. Communities that are not counted do not.
Consider a city dashboard that tracks 311 service requests by neighborhood. Wealthier neighborhoods may file more requests because residents have more time, more internet access, and more familiarity with the system. The dashboard shows those neighborhoods as having more problems. Poorer neighborhoods, where residents face greater barriers to filing, appear quiet. A city official looking at the dashboard might conclude that the wealthier neighborhoods need more attention. The data is accurate. The conclusion is wrong.
This pattern repeats across domains. Health data is richer for populations with insurance and access to care. Crime data reflects policing patterns as much as criminal activity. Employment data undercounts gig workers, undocumented workers, and people who have stopped looking. Environmental monitoring stations are more common in affluent areas.
The visualization designer has a responsibility to ask: Who is missing from this dataset? Whose experience is not captured? A chart that presents partial data without acknowledging its gaps implicitly claims to show the whole picture. That claim is a form of deception, even if unintentional.
Practical steps include adding annotations that describe known data gaps, using phrases like “among those surveyed” rather than universalizing language, and actively seeking supplementary data sources for underrepresented populations.
10.5 AI-Generated Charts: Trust and Verification
Large language models and code-generation tools can now produce charts from natural language prompts. A user can type “show me a bar chart of quarterly revenue” and receive a finished visualization in seconds. This capability is genuinely useful. It is also genuinely risky.
AI-generated charts can fail in several ways that are difficult for non-expert users to detect.
Incorrect aggregation. The model may sum values that should be averaged, or average values that should be summed. A prompt asking for “average revenue by region” might produce a chart that shows total revenue instead, if the model misinterprets the request.
Wrong chart type. AI tools sometimes select chart types based on superficial pattern matching rather than data semantics. A model might produce a pie chart for time-series data or a line chart for categorical data, because those chart types appeared in similar-sounding examples in its training data.
Fabricated data. If the model does not have access to the actual dataset, it may generate plausible-looking but entirely fictional numbers. The chart will look professional. The data will be invented.
Misleading defaults. AI tools may apply defaults that introduce distortion: non-zero baselines, rainbow color scales, 3D effects, or unlabeled axes. These defaults are not random. They reflect patterns in the training data, which includes many poorly designed charts.
Missing uncertainty. AI-generated charts rarely include confidence intervals, error bars, or other indicators of uncertainty unless explicitly prompted. The resulting charts project false precision.
AI chart generators are production assistants, not analysts. They can save time on formatting, but they cannot verify the accuracy of the data they display or the appropriateness of the chart type they select. Every AI-generated chart should be checked against the raw data before publication. If you would not publish a chart made by an intern without reviewing it, apply the same standard to charts made by a language model.
10.6 Text-to-Chart Tools in Professional Practice
A growing category of software products allows users to describe a chart in plain language and receive a rendered visualization. These tools sit between the user and a visualization library (such as ggplot2, matplotlib, or Vega-Lite), translating natural language into code.
In professional settings, text-to-chart tools are useful for rapid prototyping. An analyst can generate a draft visualization in seconds, inspect it, and then refine the underlying code. This workflow is faster than writing code from scratch, particularly for standard chart types.
The risks emerge when the draft is treated as final. Text-to-chart tools optimize for visual plausibility, not analytical correctness. They produce charts that look right, and looking right is not the same as being right.
Responsible use of text-to-chart tools involves three practices.
First, always inspect the generated code. The code reveals the data transformations, aggregations, and aesthetic mappings that produced the chart. If you cannot read the code, you cannot verify the chart.
Second, compare the chart to a summary table of the underlying data. If the chart shows five bars, check that the five values match the data. This takes thirty seconds and catches a surprising number of errors.
Third, apply the same design principles you would apply to any chart. Does the axis start at zero for bar charts? Is the color palette accessible? Are the labels clear? AI tools do not reliably apply these principles on their own.
Scenario: A marketing analyst uses a text-to-chart tool to generate a quarterly performance dashboard for a board meeting. The tool produces clean, professional-looking charts in under a minute.
Risk: One chart shows customer acquisition cost on a dual-axis plot alongside revenue growth. The axis scales have been chosen by the AI to maximize visual correlation between the two variables. The board may interpret this as a causal relationship.
Mitigation: The analyst reviews each chart against the raw data, replaces the dual-axis plot with two faceted panels, and adds a note explaining that correlation does not imply causation. The dashboard takes fifteen minutes longer to prepare. The board receives accurate information.
10.7 Algorithmic Outputs as Visualizations: Fairness and Audit Trails
When a machine learning model produces a risk score, a recommendation, or a classification, that output is often presented to decision-makers as a visualization: a color-coded risk level, a ranked list, a heat map of predicted outcomes. The visualization becomes the interface between the algorithm and the human who acts on its output.
This creates a specific ethical challenge. The visualization may be technically accurate in the sense that it faithfully represents the model’s output. But if the model itself is biased, the visualization becomes a vehicle for bias. A heat map of predicted crime risk that reflects historical policing patterns rather than actual crime rates will direct officers to the same neighborhoods that have been over-policed, reinforcing the cycle.
Fairness in algorithmic visualization requires several practices.
Document the model. The visualization should include or link to information about the model: what data it was trained on, what features it uses, what its known limitations are. A risk score without a model card is a number without context.
Show uncertainty. Algorithmic predictions are probabilistic. A heat map that shows a single predicted value for each cell hides the confidence interval around that prediction. Showing uncertainty (through color saturation, width of intervals, or explicit labels) gives decision-makers the information they need to weigh the prediction appropriately.
Enable auditing. If a visualization informs a consequential decision (hiring, lending, sentencing, resource allocation), the underlying data and model should be auditable. The visualization should be reproducible. An auditor should be able to trace from the chart back to the data and model that produced it.
Test for disparate impact. Before publishing a visualization of algorithmic outputs, check whether the results differ systematically across demographic groups. If a risk-scoring model assigns higher risk to one racial group after controlling for relevant variables, that pattern should be investigated and disclosed, not hidden inside a heat map.
A city government uses a predictive model to allocate building inspectors. The model’s output is displayed as a heat map showing “inspection priority” by census tract. The map consistently highlights low-income neighborhoods with older housing stock.
The model may be technically correct: older buildings are more likely to have code violations. But the visualization, presented without context, could reinforce the perception that low-income neighborhoods are inherently dangerous or neglected. It could also direct enforcement resources in ways that burden vulnerable residents.
An ethical approach would annotate the map with the model’s logic, note that building age is the primary predictor, and present the data alongside investment data showing which neighborhoods have received the least maintenance funding. Context changes interpretation. The designer’s job is to provide it.
10.8 Accessibility: Reaching Every Reader
A visualization that cannot be read by 8% of male viewers (those with color vision deficiency) or by any viewer using a screen reader is not a finished product. It is a draft that excludes part of its audience.
10.8.1 Color-Blind Readers
Red-green color blindness (deuteranopia and protanopia) is the most common form, affecting approximately 1 in 12 men and 1 in 200 women. Charts that rely on red-green distinctions as the sole differentiator are unreadable for these viewers.
Practical solutions include the following.
Use colorblind-safe palettes. The viridis family (viridis, magma, inferno, plasma) is designed to be perceptually uniform and distinguishable under all common forms of color vision deficiency. The Okabe-Ito palette is another well-tested option.
Use redundant encoding. Map the same variable to both color and shape, or both color and line style. If a viewer cannot distinguish two colors, they can still distinguish a circle from a triangle.
Avoid encoding critical information in color alone. Use direct labels, annotations, or patterns (hatching, stippling) as additional channels.
Test your charts. Free tools such as Coblis (color-blindness.com/coblis-color-blindness-simulator) and the colorblindr R package let you preview how your chart appears under different forms of color vision deficiency.
10.8.2 Screen Readers
Screen readers convert on-screen content to speech or braille output. A chart rendered as an image is invisible to a screen reader unless it has alternative text.
Effective alternative text for a chart includes three components: the chart type (“bar chart”), the data being shown (“quarterly revenue for 2024”), and the main finding (“revenue increased 12% from Q1 to Q4”). The alt text should not describe every visual detail. It should convey the same insight that a sighted viewer would take away.
In Quarto and R Markdown, alt text can be added using the fig.alt chunk option:
#| fig-alt: "Bar chart showing quarterly revenue for 2024. Revenue increased from $98M in Q1 to $105M in Q4, a 7% gain."For interactive charts (plotly, leaflet), accessibility is more challenging. The plotly package supports aria-label attributes, and the htmlwidgets framework allows custom accessibility metadata. These features are underused in practice. Making them standard should be a goal for any team producing interactive visualizations.
10.8.3 Cognitive Accessibility
Not all readers have the same level of statistical literacy. Charts designed for expert audiences may be unreadable for general audiences, and vice versa. Cognitive accessibility means providing enough context (titles, subtitles, annotations, legends) for the intended audience to understand the chart without external help.
A chart in a research paper can assume familiarity with box plots. A chart in a newspaper cannot. Knowing your audience is an accessibility practice, not just a design preference.
10.9 The Designer’s Responsibility
A chart is never neutral. Every design decision encodes a perspective. The choice of what data to include, what to exclude, what to emphasize, and what to downplay shapes the viewer’s understanding. The designer is not a passive conduit between data and audience. The designer is an author.
This authorship carries responsibility. The designer should be able to answer, for any published chart, the following questions.
Is the data accurate? Has it been verified against the source? Are the transformations (aggregation, filtering, normalization) documented and appropriate?
Is the encoding honest? Do bar charts start at zero? Do bubble sizes map to area, not radius? Are axis labels clear and complete?
Is the context sufficient? Does the viewer have enough information to interpret the chart correctly? Are data gaps, limitations, and uncertainties acknowledged?
Is the audience served? Can color-blind readers get the same message? Can screen-reader users access the key findings? Is the chart appropriate for the audience’s level of statistical literacy?
Is the purpose legitimate? Does the chart inform, or does it manipulate? Would the designer be comfortable explaining every design choice to a skeptical audience?
These questions are not a burden. They are the standard that separates professional visualization from decoration.
Before publishing any chart, ask yourself these questions.
Is the baseline honest? Do bar charts start at zero? If any axis uses a non-zero baseline, is it clearly marked and justified?
Who is missing? Does the dataset exclude populations, time periods, or variables that would change the story? If so, is that exclusion acknowledged in the chart or its caption?
Can everyone read it? Would a color-blind viewer get the same message? Does the chart have alt text for screen readers? Is the font size legible at the intended display size?
Would I defend this in public? If a journalist, regulator, or affected community member questioned this chart, could I explain every design choice without embarrassment?
Does the visual match the data? Is the visual impression (the “feeling” the chart creates) consistent with the magnitude and direction of the actual data? If the data shows a 3% change, does the chart look like a 3% change?
10.10 Key Terms
| Term | Definition |
|---|---|
| Truncated axis | A chart axis that starts at a value other than zero, potentially exaggerating differences between bars or points |
| Cherry-picking | Selecting a subset of data (time window, geography, demographic) that supports a predetermined conclusion while omitting contradictory context |
| Dual-axis chart | A chart with two separate y-axes sharing one x-axis, where independent axis scaling can create false visual correlations |
| Area encoding error | Distortion that occurs when a designer scales a two-dimensional shape by height or diameter rather than area, making changes appear larger than they are |
| Lie factor | The ratio of the size of an effect shown in a graphic to the size of the effect in the data, as defined by Edward Tufte |
| Text-to-chart tool | Software that translates natural language descriptions into rendered visualizations, typically using AI to generate code |
| Model card | Documentation that describes a machine learning model’s training data, intended use, performance metrics, and known limitations |
| Redundant encoding | Mapping the same data variable to multiple visual channels (such as color and shape) to ensure the information is accessible even if one channel is not perceived |
| Alt text | Alternative text attached to an image or chart that describes its content for screen readers and other assistive technologies |
| Disparate impact | When a seemingly neutral process (such as an algorithm) produces systematically different outcomes for different demographic groups |
| Data gap | The absence of data for a population, region, or time period, often reflecting systemic barriers to data collection rather than the absence of the phenomenon being measured |
| Perceptual uniformity | A property of color scales where equal steps in data value produce equal perceived steps in color, ensuring the visual impression matches the data |
10.11 Exercises
Check Your Understanding
A bar chart of monthly website traffic starts its y-axis at 9,000 rather than zero. The bars range from 9,200 to 9,800. Explain why this is misleading and describe what a viewer might incorrectly conclude.
Define the lie factor. If a data value increases by 20% but the graphical element representing it increases by 60% in visual size, what is the lie factor?
A dual-axis chart shows temperature on the left y-axis (range 60-80 degrees) and ice cream sales on the right y-axis (range $10,000-$50,000). The chart creator adjusts the right axis so the two lines overlap almost perfectly. What is wrong with this chart, and what alternative would you recommend?
Explain the difference between encoding a value as the radius of a circle versus encoding it as the area of a circle. Which approach is honest, and why?
A city government publishes a heat map of 311 complaints by neighborhood. Wealthier neighborhoods show more complaints than lower-income neighborhoods. A council member concludes that wealthier neighborhoods have more infrastructure problems. Identify the flaw in this reasoning.
What three components should effective alt text for a chart include?
Name two colorblind-safe color palettes and explain what makes them accessible.
An AI text-to-chart tool generates a pie chart for a dataset with 15 categories. Why is this likely a poor chart choice, and what would you recommend instead?
What is a model card, and why is it important when algorithmic outputs are presented as visualizations?
A researcher publishes a line chart showing declining crime rates over five years. The chart begins in a year that had an unusual spike in crime. Identify the deception technique and explain how it distorts the viewer’s understanding.
Apply It
Find a real-world example of a chart with a truncated y-axis (search “misleading charts” or browse viz.wtf). Take a screenshot, calculate the lie factor if possible, and write a brief analysis explaining what impression the chart creates versus what the data actually shows.
Take one of your own charts from an earlier chapter and run it through a colorblind simulator (such as Coblis at color-blindness.com). Document what you find. If the chart is not fully accessible, redesign it using a viridis palette and redundant encoding (color plus shape or color plus direct labels).
Use an AI text-to-chart tool (such as ChatGPT, Claude, or a dedicated tool like ChartGPT) to generate a chart from a dataset of your choice. Document the prompt you used and the chart you received. Then verify the chart against the raw data. Write a paragraph describing any errors or misleading defaults you found.
Write alt text for three charts from earlier chapters in this book. Each alt text should include the chart type, the data being shown, and the main finding. Keep each description under 150 words.
Find a news article that includes a data visualization about a social issue (health, education, environment, justice). Analyze whose data is represented and whose is missing. Write a paragraph discussing what the visualization shows and what it cannot show because of data gaps.
Create a before-and-after pair of charts using the same data. The “before” chart should include at least two deception techniques (truncated axis, misleading color, cherry-picked window, area encoding error). The “after” chart should present the same data honestly. Include code for both versions.
Review a dashboard (from a news organization, a company, or a Shiny app) and evaluate its accessibility. Can a color-blind user read it? Does it have alt text? Is the font size legible? Write a brief accessibility audit with three specific recommendations.
A machine learning model assigns credit risk scores to loan applicants. The scores are displayed as a color-coded table in a bank’s internal dashboard. Write a one-page memo to the dashboard designer explaining what fairness checks should be performed before the dashboard is deployed. Reference at least two concepts from this chapter.
Think Deeper
Some practitioners argue that truncated axes on bar charts are acceptable in internal reports for expert audiences who understand the context. Others argue that the rule “bar charts start at zero” should be absolute. Write a 400-word essay defending one position. Address the strongest counterargument from the other side.
AI text-to-chart tools are becoming standard in business intelligence platforms. In five years, most routine charts may be generated by AI rather than by human analysts. Write a 500-word reflection on what this shift means for the ethics of visualization. Who is responsible when an AI-generated chart misleads a decision-maker: the tool creator, the analyst who published it, or the organization? Can responsibility be shared?
W.E.B. Du Bois created data visualizations in 1900 to make visible the economic and social conditions of Black Americans. Florence Nightingale created visualizations in 1858 to make visible the preventable deaths of soldiers. In both cases, the act of visualizing data that had been ignored was itself a political act. Write a 500-word essay on the politics of visibility in data visualization. Whose data gets visualized today, and whose does not? What are the consequences of that asymmetry?
A company’s marketing team creates a chart showing that their product is “preferred by 3 out of 4 doctors.” The survey was conducted among 8 doctors, 6 of whom were paid consultants for the company. The chart is technically accurate. Is it ethical? Write a 300-word analysis applying the principles from this chapter.
Design an “ethical visualization audit” process for an organization. Describe at least five specific steps that a team should follow before publishing any chart to an external audience. For each step, explain what it checks and why it matters.
Screen readers cannot interpret most interactive visualizations (Shiny apps, plotly charts, Leaflet maps). This means that a significant portion of web-based data journalism is inaccessible to blind and low-vision readers. Write a 400-word proposal for how the data visualization community could address this gap. Consider both technical solutions and policy approaches.
10.12 Attributions
This chapter draws on and is inspired by the work of many scholars and practitioners:
- Tufte, E.R. – The Visual Display of Quantitative Information (Graphics Press, 1983, 2001) and Envisioning Information (Graphics Press, 1990) – foundational work on data-ink ratio, lie factor, and chartjunk
- Cairo, A. – How Charts Lie (W.W. Norton, 2019) – a practical guide to detecting and avoiding deceptive visualizations
- D’Ignazio, C. & Klein, L.F. – Data Feminism (MIT Press, 2020) – a critical framework on power, inclusion, and whose data gets counted
- Du Bois, W.E.B. – Data portraits from the 1900 Paris Exposition, visualizing the conditions of Black Americans
- Nightingale, F. – Rose diagrams from Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army (1858)
- Mitchell, M. et al. – “Model Cards for Model Reporting” (2019) – a framework for documenting machine learning models
- Cesal, A. – “Writing Alt Text for Data Visualization” (Nightingale, 2020) – practical guidance on accessible chart descriptions
- Wilke, C.O. – Fundamentals of Data Visualization (O’Reilly, 2019) – color scales, accessibility, and chart design principles
- Vivek H. Patil – foundational design and materials for data visualization