Prompt Caching & Batch Discount Calculator
See how much prompt caching and batch processing cut your API bill. Enter your token volume and base prices, then set the cached fraction, the cached discount rate, and the batch discount — and compare the full list price against the effective cost, with the percentage saved. Numbers update as you type. Prices as of Jun 25, 2026 — sources; every field is editable.
How it works
List price is the price you pay only if you do nothing clever. Two standard discounts can reshape an LLM bill, and they attack different parts of it. Prompt caching targets the input side: when you resend the same big chunk of context over and over, the provider lets you pay a small fraction of the input rate to reuse the already-processed prompt rather than paying full price to reprocess it each time. Batch processing targets the whole bill: submit work to an asynchronous queue instead of the live API and the provider hands back a flat percentage off, because it can schedule your jobs when capacity is cheap.
Because the two discounts hit different parts of the cost, they multiply rather than merely add. Caching shrinks the input line item; the batch discount then takes a slice off the already-reduced total. That stacking is why an offline pipeline with a shared system prompt can run at a small fraction of naive list price, while a real-time service with unique prompts sees none of it. The calculator above keeps the two effects separate so you can see exactly how much each one contributes — and dial either to zero when it does not apply to your workload.
Cached input = input × cached fraction, billed at input price × cached multiplier
Fresh input = input × (1 − cached fraction), billed at full input price
Subtotal = fresh input cost + cached input cost + output cost
Effective cost = Subtotal × (1 − batch discount)
A worked example
Using the defaults — 1M input tokens, no output, $3/1M input price, 50% cached at a 10% multiplier, and a 50% batch discount:
- Full price: 1,000,000 ÷ 1M × $3 = $3.00
- Fresh input (500K at full price): $1.50
- Cached input (500K at 10% of price): $0.150
- Subtotal before batch: $1.650
- After 50% batch discount: $0.825 — a 73% saving
The full $3.00 collapses to $0.825 once both discounts apply — the caching halves-and-discounts the input, then the batch tier halves what remains. Try setting the cached fraction to 0% to isolate the batch saving alone, or the batch discount to 0% to see caching on its own. To price the undiscounted workload from scratch, use the token cost calculator; to project discounted spend across a month of traffic, combine this with the monthly API spend calculator.
Frequently asked questions
How much can prompt caching and batch discounts save?
Together, a lot. In the default scenario — 1M input tokens at $3/1M, with 50% of the input cached at 10% of price, then the whole bill cut 50% by batch processing — the cost falls from $3.00 at full price to $0.825, a saving of 73%. The exact figure depends on how much of your prompt is reused and which discount tiers you qualify for.
What is prompt caching?
If you send the same large block of context on many calls — a long system prompt, a knowledge base, a code file — the provider can cache the processed prompt and charge a small fraction (often around 10%) of the normal input price to reuse it, instead of reprocessing it from scratch every time. You set the cached fraction (how much of your input is reused) and the cached multiplier (the discounted rate, e.g. 10%) above. Output tokens are never cached — only input.
What is the batch discount?
Many providers offer a flat discount (commonly around 50%) for requests you submit to an asynchronous batch queue rather than the real-time API, in exchange for results arriving within a window (often up to 24 hours) instead of instantly. It applies to the entire bill — input and output — so it stacks on top of any caching saving. Set it to 0% if your workload is latency-sensitive and must run synchronously.
How do the two discounts combine?
Caching is applied first, to the input side only: a fraction of input tokens are re-priced at the cached multiplier. Then the batch discount is applied to the whole resulting bill. So the order is: discount the cached input, add fresh input and output at full price, then multiply the total by (1 − batch discount). That is exactly the calculation above.
Do these discounts apply to every workload?
No. Caching only helps when you genuinely resend the same context repeatedly within the cache lifetime — a one-off prompt sees no benefit. Batch pricing only helps when you can tolerate delayed, asynchronous results. A real-time chatbot with unique prompts gets neither; an offline document-processing pipeline with a shared system prompt gets both. Set the fractions to match your real situation.
How current are these prices?
The bundled defaults are publicly listed prices verified on Jun 25, 2026, linked to source below. Every field — base prices, cached fraction, cached multiplier, batch discount — is editable, so you can model your provider's exact terms. Always confirm current discount tiers with the provider.
Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).