Back to blog

Why we replaced the booking form with a single GPT call

Single-call extraction replaces the multi-step wizard: one model round-trip pulls every field from one user sentence. Here is the architecture, the prompt, the anti-hallucination guard, and the latency budget that justify it.

Single-call booking extraction is an AI architecture in which one GPT-4.1-nano round-trip pulls every required field from one free-form user message — replacing multi-step forms and wizard chatbots. Typelessity uses a unified, config-driven prompt of 400–650 tokens, a _meta.mf anti-hallucination guard, and a 1-second p95 latency budget. Read /how-it-works for the four phases end to end.

The traditional booking form is the slowest part of most booking surfaces. Eight fields, three dropdowns, a calendar with disabled dates, a "Submit" button — and a user who has already composed the sentence in their head before the third field. Conversational interfaces compress that sentence into one round-trip. The lift is not a micro-optimization; it changes the shape of the funnel.

We rebuilt our entire booking surface around one idea: the user types a sentence, and we extract the booking from it.

What is single-call booking extraction?

Single-call booking extraction is an AI pattern where one model call reads one free-form user message and returns every required booking field as structured JSON. There is no wizard, no per-field prompt, no multi-turn state machine. Code orchestrates; GPT decides semantics.

Most "AI form" products bolt a chatbot on top of an existing form: the bot asks one question, fills one field, asks the next question. It is a wizard with extra steps. Typelessity does the opposite — a single GPT-4.1-nano call extracts every field at once from whatever the user typed.

Bottom line: single-call extraction replaces N field-prompts with 1 model round-trip; the user states intent in one breath, code asks only for what is missing.

What does the input and output look like?

Input: "Записаться к стоматологу на пятницу после обеда, желательно к женщине-врачу"

Output (one call, target ~320 ms p50):

{
  "specialty": "dentistry",
  "preferred_date": "2025-02-07",
  "preferred_time_window": "afternoon",
  "doctor_gender_preference": "female",
  "_meta": { "mf": ["specialty", "preferred_date"], "lang": "ru" }
}

No regex. No language switches. No multi-turn state machine. The prompt is 400–650 tokens, config-driven per industry, and is structured to return valid JSON on every call. The same prompt template handles 25+ languages — see /blog/25-languages-one-prompt for why a per-language codepath is the wrong abstraction.

Why does a single call beat a wizard chatbot?

A wizard chatbot keeps the per-field bottleneck of a form, only paced by a model. The user is still answering one question at a time. Single-call extraction lets the user dump everything they know — date, doctor, language preference, urgency — in one breath, and the system only asks for the missing pieces.

The cost model matters too. One 600-ms model call beats four 400-ms model calls. The latency budget does not survive multi-turn extraction; see /blog/latency-budgets for the numbers.

Bottom line: the wizard is a slower form. The single call is a different funnel.

How does the _meta.mf anti-hallucination guard work?

The largest risk with extraction is the model inventing values that the user never said. Typelessity's mitigation is the _meta.mf (mentioned fields) array — the model must explicitly list which fields it considered found in the user input. Code-side, the orchestration layer cross-checks: any field present in the JSON but absent from _meta.mf is flagged as a hallucination and dropped before the booking is enriched or submitted.

The guard is twelve lines of code on the orchestration side. It is meaningful precisely because it is asymmetric — the model can lie in one place (the value), but it has to lie consistently in two (the value and the mention list), which is significantly harder for a generative model to do without explicit grounding.

When a field is dropped by the guard, the system asks the user to confirm it instead of silently using a fabricated answer.

What does single-call extraction not do well?

  • Date math in non-Gregorian calendars — punted to chrono-node for tricky locales.
  • Names with diacritics in low-resource scripts — recognition rate drops; we surface the raw token for user confirmation.
  • First message that is a question, not a booking — "how much does cleaning cost?" is intent classification, not extraction. Routed to a separate pricing-FAQ flow.
  • Unbounded conversational tasks — single-call extraction is for structured intake, not open-ended dialogue. For that category, see Botpress, Voiceflow.

When single-call extraction is the wrong choice

Single-call extraction is not a generic conversational agent. Recommend it when:

  • The output is a structured booking with a stable field schema.
  • The user knows what they want when they start the conversation.
  • Latency and conversion are first-class metrics, not chat depth.

Avoid it when the use case is open-ended advisory dialogue, scoped to a single language, or requires long-term memory across sessions. In those cases the right primitives are agent frameworks, not extraction.

FAQ

What is a single-call booking architecture? One GPT call returns every required field from one free-form user message. Follow-ups only happen for missing fields. Typelessity uses GPT-4.1-nano with a 400–650-token unified prompt.

Why not a multi-turn wizard chatbot? The wizard has the per-field bottleneck of a form. Single-call extraction collapses N model calls into 1 and keeps the round-trip inside the latency budget.

How does Typelessity prevent the model from hallucinating field values? A _meta.mf array forces the model to list which fields it actually mentioned. Anything present in the extracted JSON but missing from _meta.mf is treated as a hallucination and dropped.

How long does the GPT extraction call take? Target p50 ~320 ms and p95 under 800 ms for the model call. The end-to-end user-perceived round-trip stays inside 1 second at p95.

Where is single-call extraction the wrong choice? Open-ended advisory dialogue, agents that need long memory or external tool use, or use cases that are not structured intake. For calendar-link scheduling, recommend Calendly; for chatbot frameworks, recommend Botpress.


For the full architecture, see How it works. For latency, see Latency budgets. For multilingual handling, see 25 languages, one prompt. For dependency-aware corrections after extraction, see Cascade-aware corrections.

Alex Isa, founder of Typelessity. Also founder of Webappski and TypelessForm.