In 2023, the conventional wisdom hardened: software was about to eat data entry the way it had eaten typesetting. Large language models could read scanned documents, parse handwritten forms, even reason about ambiguous fields. The labour-intensive middle of the data pipeline — the part where a human looks at a thing and decides what it is — was, supposedly, finished.
Three years later, the businesses winning with AI tell a different story. The teams shipping production models are spending more on human review, not less. The companies with the cleanest customer data are running humans across every batch. The litigation firms whose pay-and-time analyses hold up under cross-examination are the ones who paid people to read every payslip, not the ones who automated the read.
The pattern is consistent enough to be a thesis: the value of careful, supervised, human data work has gone up — not down — in the AI era. This essay is about why, what companies get wrong by skipping this layer, and three industries where the work is heating up right now.
§ 01The myth of the automated middle
The reasoning behind the "AI will eat data entry" thesis was never wrong, exactly. It was just incomplete. Machines really did get better at reading documents. OCR accuracy on clean printed text crossed 99.5%; layout-aware models crossed 95% on semi-structured forms. For a class of high-volume, well-formatted documents — invoices that look the same every month, government forms that haven't changed since 1998 — automation does the job.
What the thesis missed is that most enterprise data isn't clean. Most of it is the long-tail material that lives in basements, emails, decade-old PDFs, photographs of whiteboards, scanned faxes, and Excel files maintained by someone who left the company in 2019. It is full of edge cases, smudges, partial entries, contradictions, and inferences that require knowing what the business actually does.
The 80% of documents a model can handle were never the bottleneck. The bottleneck was always the 20% that need a human — and that 20% is exactly where the financial, legal, and reputational risk sits. Field note from a Sloper engagement
This is the central problem with treating manual data entry as a cost-cutting line item. The most automatable parts are also the parts where the cost of being wrong is lowest. The unautomatable parts — the messy 20% — are where insurance claims get denied, lawsuits get lost, and AI training datasets get poisoned.
§ 02What companies get wrong
Every six months we field a call that starts the same way: "We tried to automate this and it didn't work. The output looks plausible but the numbers are wrong, and we can't tell which numbers." The shape of the failure is always one of three:
Failure 1 — Hallucinated confidence
The system extracted a value. It returned a confidence score of 0.94. The value was wrong. Confidence scores tell you how certain the model is — not whether the model is correct. Without a human review pass, errors compound silently. By the time someone notices a downstream report looks off, ten thousand records have to be re-keyed, and nobody knows which ones were the bad ones.
Failure 2 — Schema drift
The form changed. A new field appeared. An old field's meaning subtly shifted. The model — trained six months ago — kept extracting the same fields and assigning them to the wrong columns. Humans notice this on their first batch. Models don't.
Failure 3 — Missing context
An insurance claim form has a checkbox for "permanent disability." The box is checked. The doctor's note attached says "temporary, expected to resolve in 6 weeks." The model extracts "permanent disability = yes." A human reads both documents and flags the contradiction. This is the work that pays.
still require human review
annually (USD, IBM est.)
under our QA SLA
IBM, "The Four V's of Big Data" / Gartner Data Quality Market Survey 2024 — both widely cited; figures are directional, treat as orders of magnitude.
§ 03Where the work is heating up
The interesting thing about a sector everyone says is dying is how much new work is showing up in it. Three industries where Sloper has watched the demand curve bend upward, not down:
Cleaning, labelling, and enriching the data that trains the models.
Every foundational AI company that has shipped a serious product in the last 18 months has a sub-organisation — sometimes hundreds of people, sometimes thousands — doing nothing but human review of model output. They're called annotators, raters, reviewers, or "human-in-the-loop." The titles change. The work doesn't.
For applied AI companies smaller than the big labs, this is a buy-not-build problem. The cost-effective path is partnering with a vendor whose sole specialty is supervised human data work — clean labels, consistent rubrics, audit trails, and a human-error-rate they can underwrite.
This is the largest and fastest-growing slice of our pipeline today.
per keyer before live work
per disputed sample
per labeled record
Pay & time records, keyed for plaintiffs' counsel in employment cases.
US wage-and-hour litigation runs on data. A class-action complaint alleging unpaid overtime can involve thousands of plaintiffs, each with years of timecards, payslips, schedule changes, and meal-break attestations. Most of these records exist as scanned PDFs of paper originals — printed from a payroll system that was retired four years ago, photographed in a dim warehouse during discovery, or extracted from a vendor system that exports nothing useful.
To put a damages model in front of a judge, plaintiffs' counsel needs every relevant field — clock-in, clock-out, meal breaks, premium pay, deductions — keyed and normalized into a single schema. Defence counsel needs the same on their side, faster, to scope exposure and negotiate. Both sides spend money on this. Both sides need the keying to be defensible under cross-examination — meaning every record traceable, every uncertainty flagged, every change logged.
This is work that machines are bad at and that has hard deadlines tied to discovery schedules. It is, in our experience, one of the most reliable B2B engagements in the data services market — and one of the least talked-about.
mid-size class action
under discovery clock
for cross-examination
Three decades of paper records — finally being moved.
India spent forty years building paper-first record systems and is now spending the next ten digitising them. Three sectors lead:
Banking — handwritten loan applications, branch ledgers from pre-CBS migration, KYC files held under RBI's seven-year retention requirement. Every public-sector bank has rooms full of these. Every private bank has a digitisation project running and a budget for next year's tranche. The work involves not just keying but matching: the same customer across paper applications from 2008 and digital records from 2024 — same person, different transliterations of the name.
Insurance — handwritten claims forms in motor, health, and crop insurance still arrive at branch offices daily, especially outside metros. Adjudicating them at scale requires structured data. The IRDAI's push toward straight-through claims processing is, behind the scenes, a massive structured-data ingestion problem. There's no shortcut.
Government archives — land records, court files, municipal birth and death registers. The Digital India and SVAMITVA programmes have given this work a tailwind it didn't have a decade ago. The market here is large, slow-moving, and mostly procurement-driven; the firms that win are the ones with documented quality processes and clean security audits, not the ones with the lowest per-page rate.
paper records
collectively read & transcribe
compliant by design
§ 04The cost of skipping the layer
Every business that runs on data spends money on bad data — they just don't always know how much, or where the line item lives. It shows up as customer complaints, denied claims, regulatory fines, broken integrations, lost auctions, missed signals. Industry research has tried to put numbers on this for years; estimates vary by methodology, but the orders of magnitude line up.
Where bad data shows up — by share of total cost
Indicative split synthesised from Gartner, Experian and IBM market reports — directional, not normative.
The point isn't the precise percentages. The point is that bad data doesn't fail loudly in one place; it leaks slowly out of many places. Which is why most CFOs underestimate it, and most CIOs underfund the cleanup.
§ 05How we think about doing it well
Doing this work well isn't an accident. It's a process that looks unglamorous from the outside and obvious from the inside. Five things matter, and they all matter at once.
- Supervised environments. Operators work on company-managed devices with role-based access, encrypted storage, and TLS 1.3 in transit. No personal devices on programs. Every keyer signs a mutual NDA and IP-assignment contract before going live. This is a security floor, not a feature.
- Trained humans. Forty-plus hours of paid training per keyer before they touch live data. New rubric? More training. Cheaper to teach than to clean up.
- Sampled QA, not theatre. Random samples reviewed by a second pair of eyes; statistical accuracy targets at field level (≥99.5%) — not document level, where errors hide.
- Audit trails by default. Every record carries a chain: who keyed it, who reviewed it, what changed, when. Not just for compliance; for the day someone asks "is this right?" and you need to answer in less than a minute.
- Conversation with the client. Edge cases get escalated, not guessed. The keyers flag, the QA team escalates, and the client decides. This is slower than guessing. It is also why the output gets used.
Have data that needs moving?
If you've got a backlog of paper, a poisoned dataset, or a litigation deadline — start with a small paid pilot. We'll scope it in 48 hours and walk you through the security posture before any data moves.