Context-Window Cost Calculator
See the cost hidden inside a big context window: when you resend a large context — retrieved documents, long instructions, chat history — on every API call, the input bill multiplies. Enter your context size, the number of calls, and the input price, and get the cost per call, the total, and a table of how it scales with context size. This is the silent cost driver in RAG and long-context apps. Numbers update as you type. Prices as of Jun 25, 2026 — sources; every field is editable.
| Context size | Per call | Total |
|---|---|---|
| 10K | $0.0300 | $0.30 |
| 50K | $0.1500 | $1.50 |
| 100K | $0.3000 | $3.00 |
| 200K | $0.6000 | $6.00 |
| 500K | $1.5000 | $15.00 |
| 1M | $3.0000 | $30.00 |
Total cost is a straight line through the origin in context size: double the window, double the bill. The table re-renders as you change the calls or price above.
How it works
The reason a context window costs money on every call comes down to one fact: the model keeps no memory between requests. Whatever the model needs to "know" for a given answer — the system instructions, the retrieved passages, the conversation so far, the document you are asking about — has to travel with the request. There is nowhere on the provider\'s side to leave it. So each call carries the full context as input tokens, and each call is billed for those input tokens at the input price. The window is not a stored asset you pay for once; it is a payload you re-ship, and re-pay for, every time.
This is precisely why RAG and long-context applications surprise people on the bill. A retrieval pipeline might pull 100,000 tokens of supporting material into the prompt to answer a one-sentence question. The output is tiny and cheap; the input is enormous and repeated. If that same fat context is sent across ten queries, you have paid for it ten times. Teams that budget from output length — "the answers are short, so it\'ll be cheap" — miss the dominant term entirely. The cost driver is the size of the resent context multiplied by how often you resend it, and it scales linearly in both directions at once.
That linearity is the lever. Because total cost is context tokens × price × calls, every token you keep out of the window, and every call you avoid resending it on, comes straight off the bill. Returning the top three retrieved chunks instead of the top thirty can cut the input cost an order of magnitude with little loss in answer quality. Caching the fixed portion of the context re-prices those repeated tokens at a fraction. Summarising long history breaks the steady growth. The calculator makes the trade-offs concrete: change the context size and watch the table, and you will see exactly what each token of window is costing you.
Cost per call = context tokens ÷ 1,000,000 × input price
Total context cost = context tokens ÷ 1,000,000 × input price × calls
Input (context) cost only; the model\'s generated output is billed separately.
A worked example
Using the defaults — a 100K-token context resent across 10 calls on Claude Sonnet-class (Anthropic) at $3.00 per million input tokens:
- Per call: 100,000 ÷ 1,000,000 × $3.00 = $0.30
- Across 10 calls: $0.30 × 10 = $3.00
- Shrink the context to 50K: per call halves to $0.15, total $1.50
- Grow it to 1M: per call rises to $3.00, total $30.00
The pattern is unmistakable: the only thing standing between $0.30 a call and $3.00 a call is the size of the context you choose to resend. That is the whole argument for retrieving less, caching the fixed parts, and summarising history. To add the output cost and get a complete bill, use the token cost calculator; to model a multi-turn chat where the history grows on its own, see the cost per conversation calculator; and to convert a document length into a token count for the context field, start with the token estimator. The full method is on the methodology page.
Frequently asked questions
Why does a large context window cost so much?
Because the context is not stored on the server between calls — the model is stateless, so you must resend the entire context on every single call and you are billed for those input tokens each time. A 100K-token context costs $0.30 to process once at $3.00/1M; across 10 calls that is $3.00. The window size multiplies straight into your bill.
Is this the hidden cost driver in RAG and long-context apps?
Yes. In retrieval-augmented generation (RAG) and long-document workflows, the answer the model writes is usually short and cheap — the expense is the big block of retrieved chunks, instructions, or document text you prepend to every query. Teams often size their bill from output length and are blindsided when the real driver is the fat, repeated input context. This calculator isolates exactly that cost so you can see it before it shows up on the invoice.
How do I reduce context-window cost?
Four levers, in rough order of impact: retrieve less (return only the top few relevant chunks instead of stuffing the window), cache the fixed part of the context so repeated tokens are re-priced at a fraction, summarise or compress long history into a compact memory, and raise the relevance bar so you are not paying to resend low-value tokens. The first and second usually give the biggest savings.
Does the cost grow with the number of calls too?
Linearly. Each call resends the whole context, so total cost is context tokens × price × calls. Doubling the call count doubles the input cost; doubling the context size also doubles it. Both multiply together, which is why a large window combined with high call volume escalates fast. The table above shows the context dimension; change the calls field to scale the other.
Does this include the output (generation) cost?
No — this tool isolates the input cost of resending the context, because that is the part people under-estimate. The model's reply is billed separately at the output price. For a complete input-plus-output bill, take your numbers to the token cost calculator, or model a full multi-turn chat with the cost per conversation calculator.
How current are these prices?
The bundled input price is a publicly listed rate verified on Jun 25, 2026, linked to source below. It is a convenience default — the price field is editable, so the calculator stays correct even if a default goes stale. Always confirm current pricing with the provider.
Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).