Field notes
AI · Jul 2026

Structured output you can trust

Every LLM feature that touches a database eventually needs an answer, not an essay. A price, a category, a set of line items — something a downstream system can act on without a human reading it first. The tempting shortcut is to ask the model for JSON and parse whatever comes back. That works in the demo and fails the first time the model wraps its output in a friendly sentence.

Free-form text is not data

A language model is trained to produce plausible text, not valid documents. Ask for JSON and you will usually get JSON — until the day it prefixes the object with Here is the result, or emits a trailing comma, or invents a field because the input was ambiguous and it wanted to be helpful. Regex and try-catch parsing turn these into silent corruption or 3 a.m. pages. The failure is not rare enough to ignore and not common enough to catch in a demo, which is the worst possible frequency.

Make the schema the contract

Stop asking politely and constrain the output. Every serious provider now exposes structured outputs or tool-calling backed by a JSON Schema, and the strongest guarantee comes from constrained decoding — the model is only allowed to emit tokens that keep the output valid against the grammar. That moves correctness from hope to mechanism, and it makes the schema the interface: the same artifact defines what the model must produce, what your validator checks, and what your types enforce at the boundary. Keep it tight — enums instead of free strings, required fields marked required, ranges declared, additionalProperties set to false. Every degree of freedom you leave open is one the model will eventually exercise in a way you did not plan for.

Validate, repair, fail loudly

Constrained decoding gets you valid syntax, not valid meaning. The object can parse cleanly and still carry a date in the future, a total that does not match the line items, or a category that does not exist. So validate twice: the schema for shape, then business rules for sense. When validation fails, feed the specific error back to the model and let it repair — one bounded retry, not an open loop. If it still deviates, do not paper over it. Reject the result, emit a typed error, and route to a fallback or a human. A structured feature that fails loudly is debuggable; one that guesses is a hazard.

None of this survives contact with a model upgrade unless you measure it. Treat structured output like any other contract and put it under evals: a fixed corpus of inputs, the expected objects, and metrics for both parse rate and field-level accuracy. The eval is what tells you a new model version quietly regressed on a rare field before your customers do, and it is what lets you change models without holding your breath.

A schema the model can violate is a suggestion. A schema the decoder enforces is a contract. Only one of them survives production.
— Protocore · AI engineering

The pattern is boring on purpose: constrain the decoder, make the schema the single source of truth, validate for meaning, repair once, and fail in a way you can see. Boring is what lets a document-intelligence system read more than 1M documents and hold 92% straight-through processing across three model upgrades — because the output was never a hope, it was a shape the pipeline could trust.

Have a system to build?

Tell us the problem. We'll come back with an architecture and a plan.

Start a project