LLM API Cost & Self-Hosting TCO Comparator
Answer the only question that matters for inference spend: is it cheaper to call the API or self-host an open-weight model? Enter your monthly token volume and a GPU, and get the monthly cost both ways, the break-even volume, the cost-vs-volume curve, and 12/24/36-month TCO. Numbers update as you type. Prices as of Jun 25, 2026 — sources; every price is editable.
API (grows with volume) Self-hosting (fixed) break-even
| Horizon | API | Self-hosting | Difference |
|---|---|---|---|
| 12 months | $720 | $4,179 | +$3,459 |
| 24 months | $1,440 | $8,357 | +$6,917 |
| 36 months | $2,160 | $12,536 | +$10,376 |
TCO assumes the rented GPU at the chosen utilization; owned-hardware amortization is covered by the GPU TCO calculator.
How the comparison works
Two cost structures meet here. The API is pure marginal cost: you pay per token, so cost is a straight line through the origin. Self-hosting is mostly fixed cost: the GPU bills by the hour whether or not you use it, so its line is roughly flat until you saturate capacity. They cross at the break-even volume.
API/month = (input ÷ 1M × price_in) + (output ÷ 1M × price_out)
Self-host/month = $/hour × 730 × utilization + overhead
Blended price/1M = (input × price_in + output × price_out) ÷ (input + output)
Break-even tokens = Self-host monthly ÷ blended price/1M × 1,000,000
A worked example
Using the defaults — 10M input + 2M output tokens/month, Claude Sonnet-class (Anthropic) at $3/$15 per 1M, versus an NVIDIA A100 80GB at $1.59/hour and 30% utilization:
- API: (10M ÷ 1M × $3) + (2M ÷ 1M × $15) = $60.00/mo
- Self-host: $1.59 × 730 × 0.30 = $348.21/mo
- Blended API price: $5.00 per 1M tokens
- Break-even: $348.21 ÷ $5.00 × 1M ≈ 69.6M tokens/month
So at this volume the API is far cheaper; self-hosting only pays off past roughly 69.6M tokens/month — and only if you can keep the GPU that busy. Change any input above to model your own case, then copy the shareable link to send a specific scenario to a teammate.
Frequently asked questions
At what volume does self-hosting an LLM become cheaper than the API?
Self-hosting has a roughly fixed monthly cost (the GPU runs whether you use it or not), while API cost grows with every token. The crossover is the break-even volume: self-hosting monthly cost ÷ blended API price per token. In the default scenario that's about 69.6M tokens/month. Below it the API wins; above it self-hosting wins on raw cost.
Is self-hosting really cheaper than paying for the API?
Only at high, steady volume. At 10M input + 2M output tokens/month, the API costs about $60.00/mo versus $348.21/mo to rent the GPU at 30% utilization — the API is far cheaper. Self-hosting only pays off once you can keep the GPU busy enough to spread its fixed cost across many tokens.
What costs does the self-hosting estimate include?
The rented-GPU estimate is hourly rate × hours in the month × utilization, plus any operational overhead you enter (DevOps time, monitoring, redundancy). It does not include hidden costs like idle capacity, reliability engineering, or latency trade-offs — those are caveats in the verdict, not dollar figures. See the methodology.
Why does the output token price matter so much?
Output tokens are usually 3–5× more expensive than input tokens, and generation is the slow part. A workload that is output-heavy (long completions) costs far more per request than an input-heavy one (long context, short answer), which also shifts the break-even point.
How current are these prices?
The bundled defaults are publicly listed prices verified on Jun 25, 2026, each linked to its source. They are convenience defaults only — every price is an editable input, so the calculator stays correct even if a default goes stale. Always confirm current pricing with the provider.
Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).