Question 1

How much can prompt caching and batch discounts save?

Accepted Answer

Together, a lot. In the default scenario — 1M input tokens at $3/1M, with 50% of the input cached at 10% of price, then the whole bill cut 50% by batch processing — the cost falls from $3.00 at full price to $0.825, a saving of 73%. The exact figure depends on how much of your prompt is reused and which discount tiers you qualify for.

Question 2

What is prompt caching?

Accepted Answer

If you send the same large block of context on many calls — a long system prompt, a knowledge base, a code file — the provider can cache the processed prompt and charge a small fraction (often around 10%) of the normal input price to reuse it, instead of reprocessing it from scratch every time. You set the cached fraction (how much of your input is reused) and the cached multiplier (the discounted rate, e.g. 10%) above. Output tokens are never cached — only input.

Question 3

What is the batch discount?

Accepted Answer

Many providers offer a flat discount (commonly around 50%) for requests you submit to an asynchronous batch queue rather than the real-time API, in exchange for results arriving within a window (often up to 24 hours) instead of instantly. It applies to the entire bill — input and output — so it stacks on top of any caching saving. Set it to 0% if your workload is latency-sensitive and must run synchronously.

Question 4

How do the two discounts combine?

Accepted Answer

Caching is applied first, to the input side only: a fraction of input tokens are re-priced at the cached multiplier. Then the batch discount is applied to the whole resulting bill. So the order is: discount the cached input, add fresh input and output at full price, then multiply the total by (1 − batch discount). That is exactly the calculation above.

Question 5

Do these discounts apply to every workload?

Accepted Answer

No. Caching only helps when you genuinely resend the same context repeatedly within the cache lifetime — a one-off prompt sees no benefit. Batch pricing only helps when you can tolerate delayed, asynchronous results. A real-time chatbot with unique prompts gets neither; an offline document-processing pipeline with a shared system prompt gets both. Set the fractions to match your real situation.

Question 6

How current are these prices?

Accepted Answer

The bundled defaults are publicly listed prices verified on Jun 25, 2026, linked to source below. Every field — base prices, cached fraction, cached multiplier, batch discount — is editable, so you can model your provider's exact terms. Always confirm current discount tiers with the provider.

Prompt Caching & Batch Discount Calculator

How it works

A worked example

Frequently asked questions