GPU Pricing Table for LLM Inference

Inference GPUs with their VRAM, hourly cloud rental rates across providers, an approximate purchase price, and the headline specs that drive throughput. Cloud rates are shown for RunPod, Lambda and AWS; a dash (—) means a provider does not list that card. Prices as of Jun 25, 2026 — hourly rates move with availability, so confirm the live rate before you commit.

Inference GPU pricing & specs — verified Jun 25, 2026
GPU VRAM (GB) RunPod $/hr Lambda $/hr AWS $/hr Buy $ TFLOPS fp16 TDP (W) Verified Source
NVIDIA A100 80GB 80 $1.59 $1.29 $4.10 $15,000 312 400 Jun 25, 2026 pricing ↗
NVIDIA H100 80GB 80 $2.99 $2.49 $6.88 $28,000 990 700 Jun 25, 2026 pricing ↗
NVIDIA L40S 48GB 48 $0.99 $0.89 $1.96 $9,000 362 350 Jun 25, 2026 pricing ↗
NVIDIA RTX 4090 24GB 24 $0.69 $1,800 165 450 Jun 25, 2026 pricing ↗

Turn an hourly rate into a monthly bill with the GPU cloud cost calculator, or compare renting against buying over several years with the GPU TCO calculator.

Cloud rent vs. buy economics

Two very different cost structures sit in this table. Renting a GPU in the cloud is a pure operating cost: you pay by the hour, with no capital outlay, and stop paying the moment you shut the instance down — ideal for spiky or short-lived workloads. Buying the card is a capital cost you amortize over its life, plus power, cooling, hosting and maintenance. The crossover is largely about utilization: at, say, $1.59/hr a card costs about $1,160/month if you run it 24×7, so a $15,000 card pays for itself in roughly a year of continuous use — but only if you keep it busy. Idle owned hardware is the most expensive compute there is.

How to read this table

VRAM is the hard constraint — it decides which models fit (see the open-weight VRAM table). The three cloud columns are list rates for on-demand instances; spot or interruptible pricing is often far cheaper, and rates vary by region and availability, so treat these as a snapshot, not a quote. Buy $ is an approximate street price for the card alone, not a complete server. TFLOPS fp16 is a rough proxy for raw throughput (higher is faster), and TDP is the power draw that drives your electricity and cooling bill — multiply watts by your $/kWh and run-hours to estimate energy cost.

Where the data comes from

Hourly rates and the verification date come from the provider pricing pages linked in the Source column; the specs (VRAM, TFLOPS, TDP) are the manufacturer's published figures and approximate street prices for the buy column. Cloud pricing is the most volatile data on this site — providers change rates frequently and availability shifts hour to hour — so a ⚠️ marks any row that has aged past our staleness threshold. As everywhere on the site, these are convenience defaults: every rate is an editable input in the calculators, so your estimate stays correct even when a default drifts.

Frequently asked questions

Why does the same GPU cost so differently across providers?

Hyperscalers like AWS bundle in networking, managed services, support and SLAs, so their on-demand rate is typically higher than specialist GPU clouds such as RunPod or Lambda. The right choice depends on whether you need that surrounding platform or just raw compute.

What does a dash (—) in a cloud column mean?

It means that provider does not publicly list that GPU at the verification date — not that it is free. Availability changes often, so a card missing today may appear later.

When is buying cheaper than renting?

Roughly when you can keep the card busy near 24×7 for longer than its payback period — often about a year of continuous use for high-end cards. Below that utilization, renting usually wins because you stop paying when idle. The GPU TCO calculator models the full multi-year comparison including power.

Do these rates include electricity?

Cloud rates do — power is baked into the hourly price. The Buy $ column does not: if you own the hardware, add energy (TDP × hours × $/kWh), cooling and hosting. The TDP column gives you the watts to start from.

Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).