LLM Provider Price Comparison
Compare every commercial LLM API on a single workload, side by side. Enter your monthly input and output token volume and the table below ranks every model from cheapest to most expensive, with its monthly cost and blended price per million. The table re-ranks as you type. Prices verified on the dates shown — sources; the volume is editable.
| Model / provider | $/1M in | $/1M out | Blended /1M | Monthly cost | Verified |
|---|---|---|---|---|---|
| Llama-70B via managed API (e.g. Groq/Together) cheapest | $0.60 | $0.80 | $0.63 | $7.60 | Jun 25, 2026 |
| Gemini Flash-class (Google) | $0.30 | $2.50 | $0.67 | $8.00 | Jun 25, 2026 |
| GPT-4o-class (OpenAI) | $2.50 | $10.00 | $3.75 | $45.00 | Jun 25, 2026 |
| Claude Sonnet-class (Anthropic) | $3.00 | $15.00 | $5.00 | $60.00 | Jun 25, 2026 |
Ranked cheapest first for the volume above. Cost = (input ÷ 1M × $/1M in) + (output ÷ 1M × $/1M out). Each row shows the verification date of its bundled prices.
How it works
The trap in any pricing page is the headline number. One provider advertises a low input price, another a low output price, and neither tells you what you will pay — because that depends on the shape of your traffic. This comparison removes the guesswork by applying one workload to every model at once: the same input and output token volumes run through each provider's two prices, and the results are sorted by the only figure that matters to your budget, total monthly cost.
Reading the table left to right, the two price columns show each provider's raw rates, the blended column collapses them into a single rate for your specific mix, and the cost column converts that into dollars. Because the ranking is driven by your blended cost rather than a marketing number, the order can change when you change the mix: shift the workload toward heavy output and the models with cheap generation rise; shift it toward long context with short answers and the cheap-input models win. That sensitivity is the whole point — a model that is cheapest for a summarisation service may be the most expensive for a content generator.
Monthly cost = (input ÷ 1,000,000 × $/1M in) + (output ÷ 1,000,000 × $/1M out)
Blended price / 1M = (input × $/1M in + output × $/1M out) ÷ (input + output)
Rows are sorted ascending by monthly cost; the lowest is flagged "cheapest".
A worked example
At the default volume — 10M input + 2M output tokens/month — the four bundled models rank like this:
- Llama-70B via managed API (e.g. Groq/Together): (10M ÷ 1M × $0.6) + (2M ÷ 1M × $0.8) = $7.60/mo — cheapest
- Gemini Flash-class (Google): (10M ÷ 1M × $0.3) + (2M ÷ 1M × $2.5) = $8.00/mo
- GPT-4o-class (OpenAI): (10M ÷ 1M × $2.5) + (2M ÷ 1M × $10) = $45.00/mo
- Claude Sonnet-class (Anthropic): (10M ÷ 1M × $3) + (2M ÷ 1M × $15) = $60.00/mo
The cheapest option, Llama-70B via managed API (e.g. Groq/Together) at $7.60, costs 87% less than the dearest for the identical workload — a reminder that on a fixed task, provider choice alone can swing the bill several-fold. Push the output volume up and watch the ranking reshuffle, because the models price generation very differently. To drill into a single provider's bill, use the token cost calculator; to project a chosen provider across a year of traffic, use the monthly API spend calculator.
Frequently asked questions
Which LLM API is the cheapest?
It depends entirely on your input/output mix, but for the default volume here — 10M input + 2M output tokens/month — the lowest-cost option is Llama-70B via managed API (e.g. Groq/Together) at $7.60/month, versus $60.00 for the most expensive (Claude Sonnet-class (Anthropic)). That is a $52.40 (87%) gap for the identical workload. Change the volume above and the ranking can shift, because models weight input and output differently.
Why does the cheapest provider change with my token mix?
Each model has its own ratio of input to output price. A model with cheap input but pricey output wins on long-context, short-answer work and loses on long-generation work. That is why this tool ranks on your blended cost rather than on a single headline number — the blended price per 1M already weights each side by how much of it you actually use.
Are these the same models, just at different prices?
No — they are different model classes with different capabilities. A frontier model and a fast/cheap model are not interchangeable for every task. Treat this as a cost map, not a quality ranking: use it to find the cheapest model that is good enough for each slice of your workload, and route accordingly.
What is the blended price per million?
It is the single average rate you pay across your specific mix: (input × input price + output × output price) ÷ (input + output). It collapses the two-dial pricing into one comparable number, so the rightmost column lets you line every provider up on equal terms for your workload.
Should I just pick the cheapest row?
Only after weighing quality, latency, rate limits, region availability, and data-handling terms. The cheapest model that meets your quality bar is the right answer — not the cheapest model overall. Many teams run a mix: a cheap model for the bulk of traffic and a frontier model for the hard requests.
How current are these prices?
The bundled defaults are publicly listed prices, each verified on the date shown in its row and linked to its source below. They are convenience defaults; the volume is editable so you can re-rank providers for your own workload. Always confirm current pricing with the provider.
Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).