Skip to content
LLM·TCO
Compare API cost Self-hosting Usage Data Compare costs
Home›Guides

Guides

The reasoning behind the numbers — how inference cost works, and how to read the calculators.

How LLM inference cost is calculatedToken math, input vs output pricing, and the $/1M-tokens model that underpins every LLM cost estimate.API vs self-hosting: how break-even actually worksFixed vs marginal cost, the crossover volume, and why the break-even point is all about utilization.The hidden costs of self-hosting LLMsIdle GPU time, DevOps and engineering hours, reliability and redundancy — why the hourly rate is not the real cost.Quantization and cost: int8/int4 economicsHow quantization shrinks VRAM and shifts throughput — and when it changes the API-vs-self-hosting verdict.Batching, caching & throughput: cutting $/tokenHow batch processing, prompt caching and higher throughput lower the effective cost per token.When does self-hosting actually pay off?A decision framework that weighs cost break-even against latency, compliance, privacy and engineering time.GPU sizing for LLM inference: the VRAM mathParameters times bytes per parameter plus KV-cache headroom — how to size GPUs for any model.

LLM·TCO

The real cost of LLM inference. Free, source-cited calculators for the cost of AI inference — API vs self-hosting.

Publisher: Redbit S.r.l.s.
Viale della Grande Muraglia 494, 00144 Roma, Italy
VAT IT15237911001

Tools

  • Compare
  • API cost
  • Self-hosting
  • Usage
  • Data

Learn

  • API vs self-hosting
  • Methodology
  • Guides
  • Sources

Site

  • About
  • Contact
  • Privacy policy
  • Cookie policy
  • Terms
  • Cookie preferences
© 2026 Redbit S.r.l.s. — All rights reserved. Cost estimates for informational purposes only; not financial advice.