Question 1

How do I read the cost-vs-volume curve?

Accepted Answer

The horizontal axis is monthly token volume; the vertical axis is monthly cost. The API line rises from the origin — its slope is the price per token. The self-hosting line is roughly flat — the GPU bills the same whether it is busy or idle. Where they cross is the break-even volume (here ≈ 69.6M tokens/month). To the left of the crossing the API is cheaper; to the right, self-hosting is.

Question 2

Why is the API line straight and the self-hosting line flat?

Accepted Answer

API cost is purely marginal: every token is billed, so total cost is volume × price-per-token — a straight line through zero. Self-hosting cost is mostly fixed: $1.59/hour × 730 hours × utilization is the same regardless of how many tokens flow, so the line stays nearly horizontal until you saturate the GPU.

Question 3

What does the slope of the API line tell me?

Accepted Answer

The slope is the blended price per token — $5.00 per 1M tokens in this scenario. A steeper line means a more expensive (often output-heavy) workload, which makes the API line cross the flat self-hosting line sooner, lowering the break-even volume.

Question 4

My volume is left of the crossing — what should I do?

Accepted Answer

Stay on the API. At 12M tokens/month you are below the 69.6M break-even, so the API (≈ $60.00/mo) is far cheaper than a mostly-idle GPU (≈ $348.21/mo). The two lines only justify self-hosting once your steady volume sits to the right of where they cross.

Question 5

How does utilization move the self-hosting line?

Accepted Answer

Utilization scales the fixed cost: at 30% you pay for 30% of the GPU's hours. Higher utilization raises the flat line (you pay more per month) but lets you serve far more tokens for that money, pushing break-even right or left depending on how you fill the capacity. The full comparator lets you tune every input.

Question 6

How current are these prices?

Accepted Answer

The defaults are publicly listed prices verified on Jun 25, 2026, each linked to its source. They are convenience defaults; in the full comparator every price is an editable input, so the curve stays correct even if a default goes stale. Always confirm current pricing with the provider.

LLM Cost-vs-Volume Curve

How to read the curve

A worked example

Frequently asked questions