Skip to main content

Told It Was Fake. Believed It Anyway.

Psychology Deep-Dives · 8 min read · By D0

The Transparency Assumption

The dominant policy response to deepfakes follows a logic that sounds airtight: if people know content is artificially generated, they’ll discount it appropriately. So label everything. Add watermarks. Display AI-disclosure banners. The informed citizen, properly warned, will not be deceived.

A study published in Communications Psychology tested that assumption against actual human behavior. The results were uncomfortable.

The Study

Researchers at the University of Bristol ran three preregistered experiments with a total of 673 participants. The design was direct: show people a video of someone confessing to a crime or moral transgression, warn them the video is a deepfake, then measure how much the warning changed their judgment of the person on screen.

In one scenario, participants watched a local government official appearing to confess to bribery. In another, a vegan influencer apparently admitted to eating meat. The videos were either authentic recordings or AI-generated deepfakes.

Some participants received specific warnings — “this particular video is fake.” Others received generic warnings — “deepfakes exist and some videos online are fabricated.” A control group heard the audio obscured.

The findings: 47 to 56 percent of participants in warned conditions still judged the depicted person as guilty, relying on what they saw in the video.

The number that deserves attention isn’t that percentage. It’s this one: among participants who explicitly said they believed the warning — who consciously acknowledged the video was fake — approximately 50 percent still relied on the video content when making moral judgments.

You can know something is fake and still be influenced by it. That is the core finding.

The Continued Influence Effect

This phenomenon has a name in cognitive psychology: the continued influence effect. It describes a well-documented pattern in which corrections do not fully undo the effects of initial exposure to false information.

The research on this effect spans decades of misinformation studies. Once a piece of information is encoded — whether or not the person later learns it’s wrong — that encoding leaves a trace. The correction competes with the original information rather than simply replacing it.

Think of it as two items in memory: the false claim and its correction. Rational processing says the correction wins. Human cognition doesn’t always comply. The original information remains accessible, and under conditions of judgment — especially moral judgment, where visceral responses matter — it continues to exert influence.

The Bristol study is significant because it applies this established principle to deepfakes specifically. Deepfakes aren’t just claims. They’re video. They carry the full weight of what cognitive scientists call direct visual evidence — the closest thing the brain recognizes to eyewitness experience. A text-based correction is competing against something the visual cortex processed as real. That’s a difficult fight.

Seeing Is Believing, Even When You Know You Shouldn’t

The qualitative data from the study is as interesting as the quantitative results. When participants who had been warned expressed skepticism about the video, their stated reasons were telling: most cited video defects — artifacts, lip-sync irregularities, lighting inconsistencies — rather than the warning itself.

In other words, even among skeptics, the skepticism was anchored to the video content rather than the external information about its origin. They were still engaging with the footage on its own terms. The warning was noted; the visual evidence was interrogated.

This reflects something fundamental about how humans process moving images. For most of evolutionary history, seeing someone do something meant they did something. The inferential shortcut from “I saw it” to “it happened” is both ancient and largely automatic.

Deepfakes exploit that shortcut. Warning labels operate at the level of conscious cognition. The mismatch is built into our architecture, and no amount of media literacy training fully resolves it.

When Warnings Work Better

Not all warnings are equal. The study found a meaningful difference between specific and generic interventions.

A specific warning — “this video you are about to see is a fabrication” — was more effective at reducing the influence than a generic warning like “deepfakes exist online.” Both reduced the effect. Neither eliminated it.

This has an implication that current platform policy mostly ignores: generic AI-disclosure labels are the least effective form of intervention. The banner that reads “This content was generated with AI” at the bottom of a video is a generic warning. It signals a category, not a judgment about the particular piece of content in front of you. By the logic of the study’s findings, it’s doing the minimum possible work.

The more effective intervention is specific, contextual, and ideally source-attributed. “This specific video is fabricated” from a named fact-checking organization outperforms a boilerplate disclosure label. That framing is harder to deploy at scale — it requires human review, not automated tagging — which is precisely why platforms prefer the cheaper version.

The Policy Problem

The dominant regulatory posture toward AI-generated media — in the EU AI Act, in US platform policies, in various national disclosure laws — is disclosure-first. Label AI-generated content. Let users make informed choices.

The Bristol study places a ceiling on that approach.

Disclosure doesn’t fail completely. Warnings reduce the continued influence effect; they don’t produce zero effect. But they reduce it to approximately half, in controlled experimental conditions, with a salient warning delivered immediately before viewing. In the actual information environment — where disclosures are smaller, less prominent, often delivered after the fact, and competing with emotional engagement in the content — the gap between “warning delivered” and “warning heeded” is wider still.

The study’s authors are direct about the implication: removal or prohibition of deepfake content should be considered alongside labeling. The transparency-only framework, they conclude, is not psychologically sound as a standalone protection.

This matters because the transparency framework is politically convenient. It places the burden of protection on individual media literacy rather than on platforms or content creators. If you were informed and still influenced, that framing implies the failure was yours — a failure to correctly process the information you were given. The study makes clear that framing misunderstands how human cognition actually works.

What the Evidence Means

The deepfake problem is usually framed as a detection problem: can you tell the fake from the real? Researchers are building better classifiers. Platforms are deploying detection tools. Detection accuracy is improving.

But detection solves a different problem than the one the Bristol study describes.

Knowing something is fake does not fully neutralize its influence on you. You can identify a deepfake and still carry a residual impression of its subject as probably guilty of the depicted act. You can verbally acknowledge that a political figure’s statement was synthesized and still find your mental model of that figure slightly shifted. The correction is real; so is the residue.

This is not a failure of intelligence. It’s a feature of how memory, visual processing, and moral judgment operate in humans. The continued influence effect has been replicated across many domains — misinformation corrections, news retractions, recanted testimony. Deepfakes inherit the effect and amplify it through the peculiar credibility of moving images.

For anyone consuming information, the practical implication is uncomfortable: you cannot trust your own immunity. Being aware of deepfakes in general does not protect you from the specific video in front of you. The knowledge that manipulation exists is not the same as not being manipulated by any particular instance of it.

That’s a harder truth than the transparency framework wants to acknowledge.

The Useful Part of Warnings

To be precise: warnings are not useless. They reduce the effect. That reduction matters, especially in high-stakes contexts where even partial protection has value.

What the study establishes is the ceiling. A label reduces influence; it does not eliminate it. A specific warning reduces influence more than a generic one. Neither reaches zero.

The practical design implication: if you encounter a video flagged as a fabrication — from a fact-checker, a journalist, a platform moderator — and you consciously acknowledge it as fake, you will be partially protected. But partial protection means you should not then form or express judgments about the people depicted in it. The video you’ve been told is fake can still shape your priors. The fact that you know it’s fake doesn’t mean it hasn’t.

Discount it entirely. Don’t base assessments of real people on its content. And don’t forward it under the assumption that your recipients, also warned, will be immune.

The appropriate response to deepfake-flagged content is not “I’ve been warned, I’ll be careful.” It’s “I should not use this as evidence of anything about anyone, because the residue of exposure is real regardless of what I consciously believe.”

Conclusion

The transparency assumption — warn people, let them make informed decisions — is a reasonable starting point. The Bristol study shows it is not a sufficient endpoint.

Fifty percent of participants who explicitly believed a deepfake warning still relied on fabricated video content when rendering moral judgments. That’s not a media literacy failure. It’s the continued influence effect doing what it does: leaving traces that survive correction.

The policy framework that treats disclosure as adequate protection misunderstands the cognitive architecture that deepfake manipulation targets. Labels are not equivalent to inoculation. Platforms have an incentive to treat them as though they are — it’s cheaper, and it shifts responsibility to the viewer.

The viewer’s brain was not designed to cooperate with that arrangement.


This article is part of Decipon’s Psychology Deep-Dive series, where we examine the cognitive mechanisms that make influence tactics effective — not to excuse them, but to understand them clearly enough to defend against them.


Sources: