Monthly LLM API Spend Calculator
Turn your traffic into a budget. Enter how many requests you expect per month and the average input and output tokens per request, pick a model, and get your projected monthly and annual API spend — plus the cost of a single request. Numbers update as you type. Prices as of Jun 25, 2026 — sources; every price is editable.
| Side | Per request | × requests | Monthly tokens |
|---|---|---|---|
| Input | 1,000 | 100,000 | 100M |
| Output | 300 | 100,000 | 30M |
How it works
Most teams do not think in tokens — they think in users, sessions, and calls. This calculator bridges that gap. You give it the unit you actually plan around — a request — and the average size of one, and it multiplies up to the monthly token volume the API will meter and bill. The math is the same per-token pricing as everywhere on the site, just driven from the top down: requests first, tokens second, dollars last.
The reason a request-rate view is so useful for budgeting is that traffic is the thing that grows. Token sizes per request tend to be stable for a given product, but request counts can multiply overnight when a feature ships or a campaign lands. By separating the two, you can hold the per-request shape constant and slide the request count to see your bill at 1×, 10×, and 100× scale — which is exactly the question a finance review asks. The annual figure makes the stakes concrete: it is the number that lands in a yearly budget.
Monthly input tokens = requests × input tokens/request
Monthly output tokens = requests × output tokens/request
Monthly spend = (monthly input ÷ 1M × input price) + (monthly output ÷ 1M × output price)
Annual spend = monthly spend × 12 · Cost per request = monthly spend ÷ requests
A worked example
Using the defaults — 100,000 requests/month, each averaging 1,000 input and 300 output tokens, on Claude Sonnet-class (Anthropic) at $3/$15 per 1M:
- Monthly input: 100,000 × 1,000 = 100M tokens → $300.00
- Monthly output: 100,000 × 300 = 30M tokens → $450.00
- Monthly spend: $750.00 · per request ≈ $0.0075
- Annual spend: $750.00 × 12 = $9,000
Even though input tokens outnumber output more than three to one here, the two sides land close together in dollars — a direct consequence of output being priced five times higher. Bump the request count to see how linearly the bill scales, then try a cheaper model to see how much headroom a switch would buy. To break a single request down token-by-token, use the token cost calculator; to decide whether this annual figure justifies self-hosting, see the API vs self-hosting comparator.
Frequently asked questions
How do I estimate my monthly LLM API bill?
Start from traffic, not tokens. Multiply your number of requests per month by the average input and output tokens per request, then price each side. The formula is requests × ((input/req ÷ 1M × input price) + (output/req ÷ 1M × output price)). With the defaults — 100,000 requests of 1,000 in / 300 out on Claude Sonnet-class (Anthropic) — that is $750.00/month, or $9,000.00 per year.
What counts as one request?
One request is a single API call: one prompt in, one completion out. In a chatbot, each user turn is usually one request — but remember the whole conversation history is resent every turn, so average input tokens per request climb as conversations get longer. If your product is conversational rather than one-shot, the cost-per-conversation calculator models that growth directly.
How do I find my average tokens per request?
If you are already in production, divide your provider's reported token totals by your request count for a representative day. Before launch, estimate from a typical prompt: count the words in your system message, the retrieved context, and the user message, and multiply by about 1.33 to get tokens; do the same for a typical answer. The defaults here (1,000 in, 300 out) describe a fairly compact RAG-style call.
Why show the annual figure too?
Because procurement and budgeting happen yearly. A workload that looks cheap at $750.00 a month is a $9,000.00 line item over a year — large enough to justify negotiating committed-use discounts, evaluating a cheaper model, or weighing self-hosting. The annual view also makes the cost of scaling traffic 10× visceral.
How can I bring this number down?
Three levers, in order of usual impact: cut output length (the most expensive side), trim the prompt you resend on every call (prompt caching helps enormously for fixed context), and switch to a cheaper model for the requests that do not need a frontier one. The provider price comparison shows the same volume across every model at once.
How current are these prices?
The bundled defaults are publicly listed prices verified on Jun 25, 2026, each linked to its source. Both price fields are editable, so the calculator stays correct even if a default goes stale. Always confirm current pricing with the provider.
Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).