seojuice

How to Edit AI Drafts So They Read Like You Wrote Them

Vadim Kravcenko
Vadim Kravcenko
Jul 14, 2025 · 12 min read

TL;DR: AI detectors are noisy enough that editing to "beat" them is the wrong target — Sadasivan et al. (2023) showed recursive paraphrasing collapses every detector class they tested (some down to 15–25% accuracy), and OpenAI withdrew their own classifier in July 2023. The job that matters is making the draft read like you wrote it. AI drafts produce a small, repeatable set of mechanical tells: em-dash overuse, antithesis chains, parenthetical uniformity, hedge openers, structural linearity, and vague pronoun handoffs. Each has a deterministic fix. Past the mechanical pass, the judgment work (verifying claims, swapping generic examples for specific ones, killing citations you can't source) is what separates an edited draft from another edited AI draft. Plan thirty minutes per piece for the whole pass.

Why detection isn't the right target

I edit two or three AI drafts a week. My own writing run through a model for compression, drafts a contractor sent over for sign-off, and the occasional piece I started in a model and finished by hand. The detector question came up twice last month. Both times the right answer was the same: ignore the detector, edit the text.

The detector question is louder than it should be because the tools sell certainty they don't have. A 2023 University of Maryland paper stress-tested the best classifiers on the market under a recursive paraphrase pass. The drops were not subtle. The retrieval-based detector fell from 100% accuracy to 25%. DetectGPT's AUROC fell from 82% to 18%. Watermarking detectors, which were supposed to be the robust ones, fell from a 99% true-positive rate to 15% after five rounds. The pattern across every detector in the paper: a thin layer of paraphrasing breaks them.

Split-panel comparison of what AI detectors catch reliably versus what they get wrong, with citation counts
What detectors catch and what they get wrong. The right side is the one most editors haven't priced in.

OpenAI withdrew their own AI Text Classifier in July 2023, citing a "low rate of accuracy" — the tool only caught 26% of AI-written text and falsely flagged 9% of human writing. Stanford's Liang et al. (2023) reported an average false-positive rate of 61% against non-native English writers on TOEFL essays, which means a non-native operator running a clean human draft through a detector gets flagged most of the time. None of those facts add up to "detectors are useless." They add up to "detectors are a noisy second opinion, not an optimization target."

The reframe is simple. The signal that matters is whether a human reader would say "this reads like AI." Fix that and the detector question solves itself. Most of the way to fixing it is mechanical.

The six tells that account for most of it

From reviewing roughly two hundred AI drafts across Claude, GPT-4 family models, and Gemini through 2024 and 2026, the same six patterns show up at high enough density to read as AI to a human reader. Model differences exist (Claude leans harder into antithesis; GPT tends to over-hedge), but the six are model-stable enough to teach as one list.

Six mechanical AI-writing tells with what each looks like in a draft and the specific edit that removes it
The six mechanical tells. Cap each at the threshold in the right column; the cumulative effect on a draft is bigger than fixing any one in isolation.

Why these six and not twelve. Every long list I've seen of "AI tells" collapses to these when you ask which patterns a reader actually notices versus which ones are stylistic preferences. The list below is the working subset. If your draft has none of them at high density, it probably already reads as human. If it has four or more, no amount of polish on the other twenty stylistic suggestions you'll find online matters.

Reading the remediation table

Here's the table you'd keep open in a second tab while editing. The left column names the tell. The middle column shows the shape it takes in a draft. The right column gives the fix and the threshold to apply.

TellWhat it looks likeFix
Em-dash overuseFive or more em-dashes per 1,000 words, often three in a single paragraphCap at two per 1,000 words; replace the rest with periods, commas, or parentheses
Antithesis chains"X is not Y. It is Z." appearing three or more times in the pieceCap at two per article; rewrite the rest as straight assertions
Parenthetical uniformityEvery parenthetical is the same type (all definitions, all asides, or all source-flags)Mix three types per article: definition, aside, and source-flag
Hedge openersSentences starting with "It's worth noting," "Importantly," "Notably," "Crucially"Delete the hedge; lead with the claim itself
Structural linearityEvery paragraph is three sentences, every section is three paragraphsBreak the meter: one-sentence paragraphs, varied section lengths, occasional long paragraph
Vague pronoun handoffs"This," "It," or "They" opening a paragraph with no clear referent in the previous oneReplace with the explicit noun phrase

Em-dash overuse is the most visible tell and the easiest fix. Models default to high density because the training data over-represents long-form journalism that punctuates this way. A draft will come out with eight or nine em-dashes per thousand words, sometimes three in a single sentence. Search the document, count the hits, cap at two per thousand. Twenty seconds of work, biggest single drop in the "reads like AI" surface area.

Antithesis chains are the rhetorical move where a sentence sets up a negation and then asserts the alternative. "It's not about visibility. It's about authority." Models reach for the pattern because it's a low-cost way to sound confident, and it's all over the instructional writing they trained on. Cap at two per article, rewrite the rest as straight claims. The piece sounds less performative immediately.

The triage flow: edit, rewrite, or throw out

Most editor time gets wasted on drafts that should have been thrown out at minute three. Naming the throw-out category saves the rest of the day, and a four-question triage handles it in about five minutes.

Four-question triage flow for AI drafts ending in three terminal answers: edit, rewrite from outline, or throw out and re-prompt
Four questions, three answers. The "throw out" verdict shows up more often than editors expect, and the five-minute triage is what saves the next ninety.

Question one: does the draft have at least one specific claim, example, or number that isn't generic? "Marketing teams should align around the customer journey" is not specific. "Marketing teams that publish more than two pieces a week see decay onset at six months instead of fourteen" is. If the draft has zero specifics, the model wasn't given enough material to work with, and editing won't fix that.

Question two: is the structure salvageable, or does it need to be re-thought? Read the H2 list out loud. If the argument flows, the structure is fine and the work is mechanical and judgment-level edits. If the H2 list reads as a stack of topic-adjacent paragraphs with no thesis through-line, the structure is the problem and editing the paragraphs won't help.

Question three: are the factual claims correct? Spot-check one number, one citation, and one named tool reference. If two of three need correcting, the draft has hallucinations all the way through, and you'll be cleaning them up for the next hour.

Question four: how many of the six tells appear at high density? If two or fewer are at high density, the mechanical pass is short. If four or more are, the draft was generated with default sampling and minimal post-processing, and the edit job is bigger.

The three terminal answers. Specific examples present, structure good, claims sound, two or fewer high-density tells: edit in place, plan forty-five to ninety minutes. Structure good but specifics missing or claims wrong: rewrite from outline using the draft as a brief, plan ninety to a hundred and fifty minutes. Generic throughout, structure tangled: throw out and re-prompt with a tighter brief. About fifteen to twenty percent of drafts in my queue end up in the throw-out bucket. Worth the five minutes to find out.

What an edited paragraph actually looks like

The mechanical fixes are easier to see in a worked example than to describe. Here's a representative paragraph from a recent draft (a piece about page-decay signals), in two passes.

A representative paragraph shown before and after edits, with mechanical changes called out by category
The before-and-after voice print. The four annotated edits between them are responsible for most of the "reads like a person wrote it" shift.

The draft version, as the model produced it: "Content decay is not a slow process. It is a steep one — and most operators miss it entirely. It's worth noting that the typical decay curve runs faster than people expect. This is why measuring it matters: without a baseline, you'll never know if a page is recovering or sliding further. Importantly, the signal lives in the trend, not the snapshot."

The edited version: "Content decay runs steeper than most operators expect. The typical curve drops eight to twelve percent in monthly clicks within six months of the last meaningful update, and the signal lives in the trend, not the snapshot. When I audited my own portfolio last quarter, three of seven pages on the watch list had passed the eight-percent threshold; one was already at fifteen and I'd missed it because I was watching the snapshot."

Four changes between them. Two of three em-dashes removed. The antithesis ("not a slow process. It is a steep one") replaced with a straight assertion. The vague "This is why" handoff replaced with the explicit referent. And one specific example added at the end — a number, a portfolio audit, a piece of operator experience the model couldn't have known. The fourth edit does more work than the first three combined. Most "reads like AI" reactions evaporate when a real example shows up.

The judgment calls the checklist doesn't cover

Naming what stays human is honest about the limits of the mechanical pass. Four classes of edit fall outside any checklist, and each is the kind of work that makes an edited AI draft different from another edited AI draft.

Factual claims. Models invent numbers at a rate that surprises operators new to AI editing. A draft claimed "63% of marketers report measuring content ROI," cited as if it were a real survey. The real Content Marketing Institute number from the closest year was 41%, and the survey it came from measured something different. Verify every number against a primary source you can name. If you can't name it, drop the claim.

Citations. Same pattern, worse than numbers. Models invent paper titles, author names, and quoted lines. A draft will cite "Smith and Patel (2022)" with a quote that doesn't appear in any paper they wrote. The knowledge-based trust read covers the broader angle; the working rule is verify every reference before publish.

Framing. The angle and thesis are editorial decisions, not editing tasks. If a draft's framing is wrong (the piece is selling certainty where the topic deserves nuance, or it's hedging where the topic deserves a clear take), no mechanical edit will fix the piece. Rewrite the lead and intro; the rest of the draft can usually be kept.

Examples. This is the single highest-impact edit in any AI draft. The generic example ("a marketing team," "a SaaS company") gets replaced with a specific one from your own work. One real example outweighs five hedge-removals on the "reads like a human" axis, and it's the part a reader will remember a week later.

When the upstream problem is voice, not editing

The escape valve. Some drafts come out generic because the brief was generic. The edit pass can't fix a voice that was never defined upstream. If you find yourself triaging three drafts in a row to "throw out" or "rewrite from outline," the model isn't the problem; the brief is.

The symptom is consistent. The drafts all read as adjacent to your topic without any of your point of view. Every fix you'd want to make is at the framing level. The mechanical edits clean up the surface and leave the piece sounding like it could be anyone's. That's a voice-strategy problem, not an edit-workflow problem, and the fix is upstream.

The voice-strategy companion piece covers how to define a voice that a model can actually be steered toward (style sheet, signature moves, banned formulations). For operators past the editor stage who want to formalize the voice profile and bake it into recurring prompts, the agentic workflows piece is the next read.

Detector reality, in three short paragraphs

The honest read on what AI detectors actually do. Three categories.

What they catch reliably: verbatim model output with default sampling. If you're shipping unedited GPT or Claude or Gemini text, detectors will flag it. They were trained to. The detector tells you something you already know.

What they catch poorly: paraphrased output (Sadasivan: retrieval-based detectors fall to 25% accuracy and watermark detectors to 15% under recursive paraphrasing), mixed human-and-AI passages, and drafts past a moderate revision depth. Once a human has done the six-tell pass, most detectors lose confidence.

What they get wrong: non-native English writers (Liang et al., 2023: 61% average false-positive rate on TOEFL essays), heavily-hedged academic writing, and anything with low burstiness. If you write in a hedged or formal register, your unaided human prose may score "AI" on a detector. That's the detector's bias, not a signal about your writing. Run your draft through our detector tool if you want a sanity check, but treat the output as a second opinion, not a score to optimize against.

What I actually do in the thirty minutes before publish

Operationalize the whole piece into a thirty-minute pre-publish pass. Four steps, time-boxed.

Triage, five minutes. Answer the four questions, get a verdict. If the verdict is "throw out," stop here and re-prompt with a tighter brief; you'll get the saved twenty-five minutes back. If it's "edit in place" or "rewrite from outline," carry on.

Mechanical sweep, ten minutes. Search for em-dashes, count, cap at two per thousand words. Grep for "It is not" and "is not a" antithesis patterns, cap at two per article. Scan paragraph starts for hedge openers, delete. Scan paragraph starts for vague "This/It/They," replace with the noun phrase. Read three paragraphs for parenthetical uniformity, mix the types.

Judgment pass, ten minutes. Verify one factual claim against a primary source. Replace one generic example with a specific one from your own work. Re-read the intro and check the framing — does the lead carry the thesis, or does it dance around it? Rewrite the lead if it dances. The refresh-strategy piece covers the parallel pass when the AI draft is updating existing content rather than greenfield.

Sanity check, five minutes. Read the first two paragraphs out loud. If they don't sound like you, the lead still needs work. Read the last paragraph; if it ends on a generic flourish, cut the last sentence.

What the detector question really is

The detector question is a proxy for a different one: "did anyone actually edit this." The first is unanswerable in any precise way; the second is answerable in thirty minutes. Reframe the question, and the work follows.

Two directional reads if you want to go deeper. The AI-stuffed-blogs piece covers the failure mode this editing prevents, and the AI-website migration piece covers the broader strategy context for operators using AI at scale.

FAQ

Will Google penalize content that was AI-drafted? Google's stated position in the March 2024 spam-policy update is that AI assistance itself isn't penalized; what they penalize is content created at scale without editorial oversight and without value to the reader. Edit the draft, add the specific example, and the piece is well inside policy. The black-hat-trap piece covers the broader policy framing.

Are AI detectors actually accurate? No, not in any way you can rely on. Sadasivan et al. (2023) showed recursive paraphrasing drops several detector classes to 15–25% accuracy, Liang et al. (2023) found a 61% average false-positive rate against non-native English writers on TOEFL essays, and OpenAI withdrew their own classifier in July 2023 citing low accuracy (it correctly identified only 26% of AI text and flagged 9% of human text as AI). Treat detector output as a noisy second opinion, not an optimization target.

How long should editing an AI draft take? Thirty to ninety minutes for an in-place edit, ninety to a hundred and fifty for a structure-rewrite. If you're sinking three hours into a single piece, the right call was probably to throw it out and re-prompt with a tighter brief.

Which is more important — the mechanical edits or the judgment work? The judgment work, by a wide margin. The mechanical pass handles about eighty percent of the "reads like AI" surface; the judgment work is what makes the edited draft different from another edited AI draft, and the specific-example swap is the single highest-impact change. The content-decay context is a useful reference for the audit-side judgment.

What if a client asks whether the content was AI-drafted? Be honest. Most clients are fine with "AI-drafted, hand-edited" when the editing is real. The brand risk isn't the AI; it's hiding the AI. The editing is the deliverable.

<script type="application/ld+json"> {"@context":"https://schema.org","@type":"FAQPage","mainEntity":[{"@type":"Question","name":"Will Google penalize content that was AI-drafted?","acceptedAnswer":{"@type":"Answer","text":"Google's stated position in the March 2024 spam-policy update is that AI assistance itself isn't penalized; what they penalize is content created at scale without editorial oversight and without value to the reader. Edit the draft, add the specific example, and the piece is well inside policy."}},{"@type":"Question","name":"Are AI detectors actually accurate?","acceptedAnswer":{"@type":"Answer","text":"No, not in any way you can rely on. Sadasivan et al. (2023) showed recursive paraphrasing drops several detector classes to 15-25 percent accuracy, Liang et al. (2023) found a 61 percent average false-positive rate against non-native English writers on TOEFL essays, and OpenAI withdrew their own classifier in July 2023 citing low accuracy. Treat detector output as a noisy second opinion, not as an optimization target."}},{"@type":"Question","name":"How long should editing an AI draft take?","acceptedAnswer":{"@type":"Answer","text":"30 to 90 minutes for an in-place edit, 90 to 150 for a structure-rewrite. If you're sinking three hours into a single piece, the right call was probably to throw the draft out and re-prompt with a tighter brief."}},{"@type":"Question","name":"Which is more important, the mechanical edits or the judgment work?","acceptedAnswer":{"@type":"Answer","text":"The judgment work, by a wide margin. The mechanical pass handles about 80 percent of the 'reads like AI' surface; the judgment work is what makes the edited draft different from another edited AI draft, and the specific-example swap is the single highest-impact change."}},{"@type":"Question","name":"What if a client asks whether the content was AI-drafted?","acceptedAnswer":{"@type":"Answer","text":"Be honest. Most clients are fine with 'AI-drafted, hand-edited' when the editing is real. The brand risk isn't the AI; it's hiding the AI. The editing is the deliverable."}}]} </script>