LLM API Pricing Table
Input and output token prices for the major commercial LLM APIs, in USD per 1,000,000 tokens. Every row is dated and links to the provider's official pricing page, so you can see exactly when each number was checked and confirm it yourself. Prices as of Jun 25, 2026 — these are publicly listed list prices, and in our calculators every one of them is an editable input.
| Model | Provider | Input $/1M | Output $/1M | Context | Verified | Source |
|---|---|---|---|---|---|---|
| Claude Sonnet-class (Anthropic) | Anthropic | $3.00 | $15.00 | 200K tok | Jun 25, 2026 | pricing ↗ |
| GPT-4o-class (OpenAI) | OpenAI | $2.50 | $10.00 | 128K tok | Jun 25, 2026 | pricing ↗ |
| Gemini Flash-class (Google) | $0.30 | $2.50 | 1M tok | Jun 25, 2026 | pricing ↗ | |
| Llama-70B via managed API (e.g. Groq/Together) | Hosted open-weight | $0.60 | $0.80 | 128K tok | Jun 25, 2026 | pricing ↗ |
Want these numbers in a live calculation? Drop your token volumes into the token cost calculator or rank every provider side by side in the provider price comparison.
Input vs. output pricing
LLM APIs bill the two halves of a request differently. Input tokens (your prompt, system message and any retrieved context) are the cheaper side; output tokens (the model's generated reply) are typically 3–5× more expensive because generation is the slow, compute-heavy part. That asymmetry is why an output-heavy workload — long completions, code, essays — costs far more per request than an input-heavy one with a long prompt and a short answer. When you estimate spend, always split your monthly volume into input and output buckets rather than using a single blended figure.
How to read this table
Each price is quoted per 1,000,000 tokens, the unit every major provider now uses. A token is roughly ¾ of an English word, so 1M tokens is on the order of 750,000 words. To get a monthly bill, multiply your input tokens ÷ 1M by the input price, do the same for output, and add them. The Context column is the maximum number of tokens the model can attend to in one request — it caps how much prompt plus history you can send, not how much you are billed. The Verified column shows the date we last checked that row against the source; a ⚠️ marks a row that has aged past our staleness threshold and should be re-confirmed.
Where the data comes from
Every figure is transcribed from the provider's own public pricing page, linked in the Source column, on the verification date shown. These are standard list prices for the listed model tier; they do not include volume discounts, committed-use or enterprise agreements, batch-API discounts, prompt-caching rebates, or regional surcharges, any of which can move your effective rate substantially. We pin a model "class" (e.g. Sonnet-class, GPT-4o-class) rather than chasing every point release, because the headline price is what matters for planning.
Frequently asked questions
Are these the real prices I'll pay?
They are the public list prices on the verification date. Your effective rate can be lower with batch APIs, prompt caching, or committed-use discounts, and higher with premium regions or long-context surcharges. Treat the table as a planning baseline and confirm your tier on the provider's billing page.
Why is output so much more expensive than input?
Generating tokens is autoregressive — the model produces them one at a time — so output dominates the compute and latency of a request. Providers price that in. For most chat and agent workloads, output volume drives the bill more than input does.
What does the context column mean for cost?
Context is a capacity limit, not a price. A 1M-token context window lets you send a very large prompt, but you still pay the input rate for every token you actually send. Bigger context only costs more when you fill it.
Can I override these prices?
Yes — that is the whole point. Every price here is a convenience default. In the token cost calculator and the API vs. self-hosting comparator you can type your own negotiated rate, so the math stays correct even if a default goes stale.
Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).