Methodology: how every cost is calculated
No black boxes. Every number on this site comes from one of the formulas below. Each is plain arithmetic — verifiable once and stable forever — and is checked against known examples before release. Prices are convenience defaults; you can override every one of them.
1. API cost
Commercial APIs bill per token, input and output priced separately (output is typically 3–5× input). Prices are quoted per million tokens.
Example: 1,000,000 input at $3/1M plus 500,000 output at $15/1M = $3.00 + $7.50 = $10.50.
2. Self-hosting cost (rented GPU)
A rented GPU bills by the hour whether or not it is busy, so its cost is essentially fixed. We use a 730-hour month (365×24÷12).
Example: $1.50/hour × 730 × 30% = $328.50/month (before overhead). Overhead is your estimate of DevOps time, monitoring and redundancy.
3. Self-hosting cost (owned hardware)
Example: a $15,000 server amortized over 36 months contributes $416.67/month before power and overhead.
4. Blended price per million tokens
To compare a mixed input/output workload against a fixed self-hosting cost, we collapse the two prices into one blended rate weighted by the token mix.
5. Break-even volume
API cost is a straight line through the origin; self-hosting is a flat line. They cross where the two monthly costs are equal.
Example: a $320/month GPU against a $3/1M blended price breaks even at $320 ÷ $3 × 1M ≈ 106.7M tokens/month. Below it the API wins; above it self-hosting wins on raw cost.
6. Self-hosting cost per million tokens
Example: $1.50/hour at 2,000 tok/s and 100% utilization ≈ $0.21 per 1M tokens.
7. VRAM and model fit
Memory to hold the weights is parameters × bytes per parameter (set by quantization) plus headroom for the KV-cache and activations (~1.2×).
Example: a 70B model needs ≈ 140 GB at fp16 (two 80 GB GPUs) but ≈ 42 GB at int4 (fits one 48 GB GPU).
Assumptions & limits
- Token estimates from text use ≈ words × 1.33 (or characters ÷ 4) — a stated heuristic, not exact tokenization.
- Self-hosting figures are infrastructure cost only. They exclude latency, reliability engineering, and idle waste beyond the utilization you enter — those are caveats, not dollars.
- Prices are publicly listed list prices on their verification date; negotiated, committed-use, or regional pricing will differ.
- FX (EUR/GBP) is fetched daily and cached, with a bundled fallback rate labelled "indicative" if the source is unavailable.
Found an error in a formula or a stale price? Tell me — corrections are welcome and credited.