Turnitin's AI false-positive rate, in Turnitin's own words

Published June 10, 2026. Every Turnitin figure on this page is from Turnitin's own published documentation.

You don't need a critic to make the case against treating an AI-detection score as proof. Turnitin makes it themselves. The company publishes its false-positive rates, warns that scores should never be the sole basis for action against a student, and deliberately hides scores under 20% to avoid false flags. Here is what their own numbers mean for the report on your screen.

The two published rates

In June 2023, Turnitin's Chief Product Officer published the company's false-positive numbers in a post on the Turnitin blog:

Our document false positive rate — incorrectly identifying fully human-written text as AI-generated within a document — is less than 1% for documents with 20% or more AI writing. Our sentence-level false positive rate is around 4%. This means that there is a 4% likelihood that a specific sentence highlighted as AI-written might be human-written. Annie Chechitelli, Chief Product Officer, Turnitin, June 2023

Notice the qualifier on the headline number. The "less than 1%" applies only to documents the detector already scored at 20% or more. It is not a blanket accuracy claim, and it is a rate, not a guarantee: across a school's worth of essays, even a sub-1% rate flags real, innocent students every single term.

Why the 4% is the number that reaches your classroom

The document rate is what gets quoted. The sentence rate is what you actually experience. When you open a flagged report, you see individual sentences highlighted as AI-written. By Turnitin's own math, roughly one in every twenty-five of those highlights is wrong, and nothing on the screen tells you which ones. The error doesn't announce itself. It looks exactly like every other highlight.

That asymmetry is the heart of the problem. The student knows which sentences they wrote. The teacher only knows what the report says. When the two disagree, the report has the institutional weight, and the student is left trying to prove a negative.

Turnitin's own warnings

The company's release notes for the detection model contain three admissions worth reading verbatim:

False positives (incorrectly flagging human-written text as AI-generated) are a possibility in AI models. To avoid potential incidence of false positives, no score or highlights are attributed for AI detection scores in the 1% to 19% range. Turnitin release notes, July 2024

Please be reminded that an AI Writing score should not be used as the sole basis for adverse actions against a student. Turnitin release notes, December 2023

Since launch, we have observed a higher incidence of false positive detection in the first few or last few sentences of a document. Turnitin release notes, May 2023

Take those together. The vendor suppresses an entire score range because it can't trust it, knows the edges of documents misfire, and explicitly tells schools not to act on the score alone. Yet in practice, the score is very often the only evidence a student ever faces.

scale check

Washington State University ran 148,547 assessments through Turnitin in Fall 2024. The student newspaper applied Turnitin's own published document rate to that volume and estimated roughly 1,485 essays were likely false-flagged in one semester at one university. That figure is the Daily Evergreen's derivation, not an official WSU count, and Turnitin's rate carries the 20%-or-more qualifier the simple math doesn't reflect. But WSU's own number points the same direction: 33% of its Review Board cases about alleged AI misuse between 2023 and 2025 ended in a "not responsible" finding. WSU cancelled the product in February 2026.

Who the errors land on

False positives are not evenly distributed. Stanford researchers tested seven AI detectors and found them near-perfect on essays by US-born eighth-graders but wrong on 61.22% of TOEFL essays written by non-native English speakers. Reporting by Bloomberg Businessweek documented the same pattern for neurodivergent students, whose more uniform, careful prose reads as machine-like to a model trained on statistical typicality. If you teach multilingual learners or students with IEPs, the error rate in your classroom is not the published average. It is worse.

Three documented cases

These are not hypotheticals. Each of these became public in the last year:

Newby v. Adelphi. An Adelphi University student with learning and neurological disorders was flagged by an AI detector and disciplined, even though his help had come from the university's own tutoring program. In February 2026, a New York State Supreme Court judge reversed the discipline and ordered his record expunged. It is the first court ruling to overturn an AI-detection accusation.

Palo Alto. A high-school sophomore's Crucible essay was flagged 76% AI. His family assembled 1,162 pages of evidence, including the document's full revision history. The school held the grade, and in May 2026 the family filed a federal civil rights suit. The allegations are unproven; the cost of fighting the flag is not.

Wake County. A Green Hope High School freshman's English assignment was run through three different AI detectors, which returned 62%, 75%, and 87%. Three tools, three answers, all wrong: a second teacher reviewed the document's version history and cleared her. North Carolina's Department of Public Instruction now advises that detectors "have proven not to be dependable" and should never be the only factor. Note what actually exonerated her: not a better detector, but evidence of her writing process.

What a process record gives you that a percentage can't

Every one of these cases turned on the same missing thing: a trustworthy record of how the writing happened. The detector guesses about the finished text. A process record watches the writing as it occurs: the rhythm of keystrokes, the pauses, the revisions, every paste and where the cursor was when it landed.

That record doesn't accuse anyone. It simply exists, for every student, from the first keystroke. When a question comes up, the answer is a replay, not a probability. The Wake County student was saved by a coarse version of this (Google Docs version history). A purpose-built process record is that defense, by default, at keystroke resolution.

If you've already concluded the detection arms race isn't winnable, you're in good company. Here's what the alternative looks like.