Methodology: how every cost is calculated

No black boxes. Every number on this site comes from one of the formulas below. Each is plain arithmetic — verifiable once and stable forever — and is checked against known examples before release. Prices are convenience defaults; you can override every one of them.

1. API cost

Commercial APIs bill per token, input and output priced separately (output is typically 3–5× input). Prices are quoted per million tokens.

API/month = (input_tokens ÷ 1,000,000 × price_in) + (output_tokens ÷ 1,000,000 × price_out)

Example: 1,000,000 input at $3/1M plus 500,000 output at $15/1M = $3.00 + $7.50 = $10.50.

2. Self-hosting cost (rented GPU)

A rented GPU bills by the hour whether or not it is busy, so its cost is essentially fixed. We use a 730-hour month (365×24÷12).

Self-host/month = $/hour × 730 × utilization + overhead

Example: $1.50/hour × 730 × 30% = $328.50/month (before overhead). Overhead is your estimate of DevOps time, monitoring and redundancy.

3. Self-hosting cost (owned hardware)

Owned/month = capex ÷ amortization_months + (watts ÷ 1000 × 24 × 30 × utilization × $/kWh) + overhead

Example: a $15,000 server amortized over 36 months contributes $416.67/month before power and overhead.

4. Blended price per million tokens

To compare a mixed input/output workload against a fixed self-hosting cost, we collapse the two prices into one blended rate weighted by the token mix.

Blended price/1M = (input × price_in + output × price_out) ÷ (input + output)

5. Break-even volume

API cost is a straight line through the origin; self-hosting is a flat line. They cross where the two monthly costs are equal.

Break-even tokens/month = Self-host monthly ÷ blended price/1M × 1,000,000

Example: a $320/month GPU against a $3/1M blended price breaks even at $320 ÷ $3 × 1M ≈ 106.7M tokens/month. Below it the API wins; above it self-hosting wins on raw cost.

6. Self-hosting cost per million tokens

Self-host $/1M = ($/hour ÷ 3600) ÷ (tokens_per_second × utilization) × 1,000,000

Example: $1.50/hour at 2,000 tok/s and 100% utilization ≈ $0.21 per 1M tokens.

7. VRAM and model fit

Memory to hold the weights is parameters × bytes per parameter (set by quantization) plus headroom for the KV-cache and activations (~1.2×).

VRAM (GB) ≈ params(B) × bytes_per_param × 1.2  |  fp16 = 2, int8 = 1, int4 = 0.5 bytes

Example: a 70B model needs ≈ 140 GB at fp16 (two 80 GB GPUs) but ≈ 42 GB at int4 (fits one 48 GB GPU).

Assumptions & limits

  • Token estimates from text use ≈ words × 1.33 (or characters ÷ 4) — a stated heuristic, not exact tokenization.
  • Self-hosting figures are infrastructure cost only. They exclude latency, reliability engineering, and idle waste beyond the utilization you enter — those are caveats, not dollars.
  • Prices are publicly listed list prices on their verification date; negotiated, committed-use, or regional pricing will differ.
  • FX (EUR/GBP) is fetched daily and cached, with a bundled fallback rate labelled "indicative" if the source is unavailable.

Found an error in a formula or a stale price? Tell me — corrections are welcome and credited.