LLM Cost-vs-Volume Curve
See the whole picture in one chart: how API cost and self-hosting cost behave as your monthly token volume grows, and exactly where the two lines cross. The API climbs from zero with every token; self-hosting starts high and stays flat. In the default scenario below — 10M input + 2M output tokens/month on Claude Sonnet-class (Anthropic) versus an NVIDIA A100 80GB at 30% utilization — they cross at about 69.6M tokens/month. Prices as of Jun 25, 2026 — sources.
API (grows with volume) Self-hosting (fixed) break-even
How to read the curve
Two cost structures are plotted against the same horizontal axis of monthly token volume. The API line is marginal cost: it starts at zero and rises in a straight line, because every token carries the same price. Its slope is the blended price per token — $5.00 per 1M here — so a steeper line means a pricier, usually output-heavy, workload. Double the volume and you double the API bill; that is why the line never flattens.
The self-hosting line is fixed cost: the GPU bills $1.59/hour × 730 hours × utilization no matter how many tokens you push through it, so it sits as a nearly horizontal line at $348.21/month. Because it does not respond to volume, the cost per token falls as you serve more — the same monthly bill spread across more output.
The decision is read at the crossing. To the left of where the lines meet, the rising API line is still below the flat self-hosting line, so the API is cheaper. To the right, the API line has overtaken the fixed cost and self-hosting wins. The vertical "your volume" marker shows where today's workload sits relative to that crossing — in the default scenario it falls well to the left, so the API is the cheaper option at 12M tokens/month.
API line: cost = volume ÷ 1,000,000 × blended price = volume × $5.00/1M
Self-hosting line: cost = $1.59/hour × 730 × 30% = $348.21/month (flat)
Lines cross at: self-hosting monthly ÷ blended price per 1M × 1,000,000 = 69.6M tokens/month
Your volume marker: 12M tokens/month
A worked example
Take the default scenario point by point. The API charges $3/$15 per 1M for input/output; at 10M input + 2M output tokens that blends to $5.00 per 1M, the slope of the API line:
- API at 12M/mo: (10M ÷ 1M × $3) + (2M ÷ 1M × $15) = $60.00/mo
- Self-hosting (flat): $1.59 × 730 × 0.30 = $348.21/mo
- Blended price (the API slope): $5.00 per 1M tokens
- Lines cross at: $348.21 ÷ $5.00 × 1M ≈ 69.6M tokens/month
On the chart, the "your volume" marker at 12M sits far left of the 69.6M crossing, so the API line is still well under the self-hosting line: the API wins by a wide margin at this volume. Self-hosting only overtakes once steady volume climbs past the crossing — and only if you can keep the GPU that busy. To explore other models, GPUs, utilization levels and editable prices, use the full API vs self-hosting comparator, or pin down the single crossover number with the break-even volume calculator.
Frequently asked questions
How do I read the cost-vs-volume curve?
The horizontal axis is monthly token volume; the vertical axis is monthly cost. The API line rises from the origin — its slope is the price per token. The self-hosting line is roughly flat — the GPU bills the same whether it is busy or idle. Where they cross is the break-even volume (here ≈ 69.6M tokens/month). To the left of the crossing the API is cheaper; to the right, self-hosting is.
Why is the API line straight and the self-hosting line flat?
API cost is purely marginal: every token is billed, so total cost is volume × price-per-token — a straight line through zero. Self-hosting cost is mostly fixed: $1.59/hour × 730 hours × utilization is the same regardless of how many tokens flow, so the line stays nearly horizontal until you saturate the GPU.
What does the slope of the API line tell me?
The slope is the blended price per token — $5.00 per 1M tokens in this scenario. A steeper line means a more expensive (often output-heavy) workload, which makes the API line cross the flat self-hosting line sooner, lowering the break-even volume.
My volume is left of the crossing — what should I do?
Stay on the API. At 12M tokens/month you are below the 69.6M break-even, so the API (≈ $60.00/mo) is far cheaper than a mostly-idle GPU (≈ $348.21/mo). The two lines only justify self-hosting once your steady volume sits to the right of where they cross.
How does utilization move the self-hosting line?
Utilization scales the fixed cost: at 30% you pay for 30% of the GPU's hours. Higher utilization raises the flat line (you pay more per month) but lets you serve far more tokens for that money, pushing break-even right or left depending on how you fill the capacity. The full comparator lets you tune every input.
How current are these prices?
The defaults are publicly listed prices verified on Jun 25, 2026, each linked to its source. They are convenience defaults; in the full comparator every price is an editable input, so the curve stays correct even if a default goes stale. Always confirm current pricing with the provider.
Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).