Question 1

What is the break-even volume for self-hosting an LLM?

Accepted Answer

It is the monthly token volume at which the API and self-hosting cost the same amount. Below it the API is cheaper; above it self-hosting wins on raw cost. The formula is self-hosting monthly cost ÷ blended API price per token. With the defaults ($320/mo self-hosting and $3 per 1M tokens), that is about 106.7M tokens/month.

Question 2

Why is the API a straight line and self-hosting flat?

Accepted Answer

The API is pure marginal cost: every token you send is billed, so the cost line passes through the origin and rises with volume. Self-hosting is mostly fixed cost: the GPU bills by the hour whether it is busy or idle, so its line is roughly flat until you saturate it. The two lines cross at the break-even volume.

Question 3

My current volume is far below break-even — should I still self-host?

Accepted Answer

On cost alone, no. If break-even is 106.7M/month and you are running 12M/month — roughly 8.9× below it — the API is dramatically cheaper because you would be paying for a mostly-idle GPU. Self-hosting only pays off once you can keep the GPU busy enough to spread its fixed cost across a large, steady token volume.

Question 4

What should I put in for the self-hosting monthly cost?

Accepted Answer

Use the all-in monthly figure: rented GPU ($/hour × 730 × utilization) or owned-hardware amortization, plus operational overhead such as DevOps time, monitoring and redundancy. The API vs self-hosting comparator computes that monthly figure for you, and you can paste it straight into this calculator.

Question 5

How do I get the blended API price per 1M tokens?

Accepted Answer

Blend the input and output prices by your actual mix: blended = (input tokens × input price + output tokens × output price) ÷ total tokens. An output-heavy workload has a higher blended price (output is usually 3–5× input), which lowers the break-even volume; an input-heavy one raises it.

Question 6

Does a lower break-even volume mean self-hosting is better?

Accepted Answer

A lower break-even simply means the crossover happens sooner — it falls when self-hosting is cheap (low $/mo) or the API is expensive (high blended price). Whether you benefit depends on whether your steady volume is above that point. Cost is only one axis; latency, reliability, compliance and engineering time matter too.

LLM Break-Even Volume Calculator

How break-even works

A worked example

Frequently asked questions