LLM Break-Even Volume Calculator

Find the single number that decides the API-versus-self-hosting question: the break-even volume — the monthly token throughput at which a fixed-cost GPU finally undercuts pay-per-token API pricing. Enter your all-in self-hosting cost and your blended API price per million tokens, and get the crossover instantly. With the defaults below, self-hosting breaks even at about 106.7M tokens/month. Numbers update as you type. Prices as of Jun 25, 2026 — sources; every input is editable.

Costs
Self-hosting breaks even at 106.7M tokens/month. Below that the API is cheaper; above it the fixed-cost GPU wins. That is about 8.9× your current volume of 12M tokens/month.
Break-even volume106.7M
Exact tokens / month106,666,667
Multiple of your volume8.9×
Self-host per 1M tokens$3.00
Break-even is a cost line, not a verdict. Crossing it means self-hosting is cheaper per token — but only if you can keep the GPU that busy. It says nothing about latency, reliability/uptime, compliance & data residency, or the engineering time to run inference yourself. Weigh those alongside the number.

How break-even works

Two cost structures collide here. API pricing is marginal: you pay a fixed amount per token, so total cost is a straight line through the origin — double the tokens, double the bill. Self-hosting is fixed: you rent (or own) a GPU that bills by the hour regardless of how many tokens flow through it, so its monthly cost barely moves as volume changes. As volume grows, the API line climbs while the self-hosting line stays flat, and at some point they meet. That meeting point is the break-even volume.

Mathematically it is the volume at which API spend equals the self-hosting monthly cost. Since API spend = volume × price-per-token, you simply divide the fixed monthly cost by the price per token to recover the volume. Working in tokens-per-million (the unit prices are quoted in) keeps the arithmetic clean.

Formula.
Break-even tokens/month = Self-hosting monthly cost ÷ Blended API price per 1M × 1,000,000
  = $320 ÷ $3 × 1,000,000
  = 106,666,667 tokens/month (≈ 106.7M)
Multiple of current volume = Break-even tokens ÷ Your monthly volume

A worked example

Suppose you have priced a self-hosted setup at $320 per month — say a mid-range GPU rented part-time plus a little operational overhead — and your workload runs at a blended $3 per 1M tokens on the API. Plug those in:

  • Cost per token on the API: $3 ÷ 1,000,000 = $0.000003 per token
  • Break-even: $320 ÷ $3 × 1,000,000 = 106,666,667 tokens/month
  • That rounds to ≈ 106.7M tokens/month
  • Against an example current volume of 12M/month, break-even is about 8.9× higher

So at 12M tokens/month you are roughly 8.9× short of the crossover — the API stays far cheaper until your steady volume climbs near 106.7M/month. Raise the self-hosting cost and break-even rises with it (you need more volume to justify a pricier rig); raise the blended API price and break-even falls (the API gets expensive sooner). Change any input above to model your own case.

Frequently asked questions

What is the break-even volume for self-hosting an LLM?

It is the monthly token volume at which the API and self-hosting cost the same amount. Below it the API is cheaper; above it self-hosting wins on raw cost. The formula is self-hosting monthly cost ÷ blended API price per token. With the defaults ($320/mo self-hosting and $3 per 1M tokens), that is about 106.7M tokens/month.

Why is the API a straight line and self-hosting flat?

The API is pure marginal cost: every token you send is billed, so the cost line passes through the origin and rises with volume. Self-hosting is mostly fixed cost: the GPU bills by the hour whether it is busy or idle, so its line is roughly flat until you saturate it. The two lines cross at the break-even volume.

My current volume is far below break-even — should I still self-host?

On cost alone, no. If break-even is 106.7M/month and you are running 12M/month — roughly 8.9× below it — the API is dramatically cheaper because you would be paying for a mostly-idle GPU. Self-hosting only pays off once you can keep the GPU busy enough to spread its fixed cost across a large, steady token volume.

What should I put in for the self-hosting monthly cost?

Use the all-in monthly figure: rented GPU ($/hour × 730 × utilization) or owned-hardware amortization, plus operational overhead such as DevOps time, monitoring and redundancy. The API vs self-hosting comparator computes that monthly figure for you, and you can paste it straight into this calculator.

How do I get the blended API price per 1M tokens?

Blend the input and output prices by your actual mix: blended = (input tokens × input price + output tokens × output price) ÷ total tokens. An output-heavy workload has a higher blended price (output is usually 3–5× input), which lowers the break-even volume; an input-heavy one raises it.

Does a lower break-even volume mean self-hosting is better?

A lower break-even simply means the crossover happens sooner — it falls when self-hosting is cheap (low $/mo) or the API is expensive (high blended price). Whether you benefit depends on whether your steady volume is above that point. Cost is only one axis; latency, reliability, compliance and engineering time matter too.

Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).