The challenge
A logistics operator processed more than a million shipping documents a year by hand — bills of lading, customs forms, invoices — across fourteen formats and several languages. Every keystroke was a chance to misroute a container.
They did not want a demo that worked on clean PDFs. They wanted a system that cleared the easy documents on its own and routed the hard ones to a person, with the confidence to tell the difference.
What we built
We built an extraction pipeline with retrieval-grounded models and a human-in-the-loop review queue. Low-confidence fields are flagged and sent to a reviewer; everything else clears straight through. Evals run before features ship, so accuracy is measured rather than assumed.
Reviewers correct in a purpose-built interface, and every correction feeds the next evaluation set. The system gets measurably better at the documents this operator actually sees, not the ones a benchmark imagines.
Evals before features. A model in production earns trust by being measured, not by being promised.Protocore · Engineering principles
The outcome
The pipeline now clears 92% of documents straight through across fourteen types, eight times faster than the manual line it replaced — and the operator's team spends its hours on the exceptions that genuinely need judgment.