GPU Cloud Cost Calculator
Estimate the real monthly bill for a rented cloud GPU. Pick an instance or type your hourly rate, set how much of the month it is actually working, and get the monthly cost, the annual cost, and the effective cost per hour of genuine use — the number that tells you whether idle time is quietly inflating your spend. Numbers update as you type. Rates verified Jun 25, 2026 — sources; the rate is editable.
At 30% utilization the GPU does about 219 productive hours out of 730 in the month.
How it works
Cloud GPUs are billed by the hour, so the monthly cost is almost embarrassingly simple: multiply the hourly rate by the number of hours you keep the instance, then add any fixed overhead for orchestration, storage, egress, or engineering time. The only judgement call is the hours figure, and that depends entirely on your billing model. If you pay for a dedicated instance that runs continuously, the hours are fixed at the full month and there is no way to dial them down without releasing the machine. If you use spot, serverless, or autoscaled capacity that spins down when idle, then your effective hours are the month times your true duty cycle — and lowering utilization genuinely lowers the bill.
We model this with a single utilization slider so both cases are visible. Set utilization to 100% and you see the cost of an always-on reserved instance. Set it to your real duty cycle and, for usage-based billing, you see what you actually pay. The annual figure simply multiplies the monthly by twelve, which is the right horizon for budgeting and for comparing against the up-front capital of buying a card.
The most useful output is the effective cost per hour of use. It answers a different question from the sticker rate: not "what does an hour cost?" but "what does each productive hour really cost me, once idle time and overhead are folded in?" When overhead is zero and billing is purely usage-based, it equals the hourly rate. As soon as you add overhead, or pay for idle reserved time, it rises — and a high effective rate is a strong signal that you are either over-provisioned or should be looking at owned hardware.
Monthly cost = hourly rate × 730 × utilization + overhead
Annual cost = monthly cost × 12
Productive hours = 730 × utilization
Effective $/hour-of-use = monthly cost ÷ productive hours
A worked example
Using the defaults — a GPU at $1.5/hour, run at 30% utilization with no extra overhead:
- Monthly: $1.5 × 730 × 0.3 = $328.50/mo
- Annual: $328.50 × 12 = $3,942/yr
- Productive hours: 730 × 0.3 = 219 hours
- Effective $/hour-of-use: $328.50 ÷ 219 = $1.50
Because overhead is zero and we are assuming usage-based billing, the effective rate lands right back at the $1.5 sticker — exactly what you would hope. Now add $200/mo of overhead, or switch to a reserved instance billed for the full 730 hours, and watch the effective rate climb well above the sticker: that gap is the cost of idleness and operations that the hourly rate hides.
Once you have a monthly figure, compare it against owning the same card in the GPU TCO calculator, and against paying the API per token in the API vs self-hosting comparator. To turn this hourly rate into a price per million tokens, feed it into the throughput cost calculator; to size capacity for a target request rate, use the throughput planner. Browse all current rates in the GPU pricing dataset, and see full derivations in the methodology.
Frequently asked questions
How is the monthly cost of a rented cloud GPU calculated?
Hourly rate × hours in the month × utilization, plus any flat overhead. We use a conventional 730-hour month (365×24÷12). At the defaults that is $1.5 × 730 × 30% = $328.50/month. If you reserve the instance 24/7 you are billed for every hour whether or not it is doing work, so utilization only reduces the bill if you actually stop and start the machine.
Does a lower utilization really cut my bill?
Only if your billing is genuinely usage-based — spot/serverless instances that you spin down when idle. If you hold a dedicated instance reserved around the clock, you pay the full 730 hours regardless, and "utilization" simply describes how much useful work you got for that fixed spend. The effective cost per hour of actual use card makes that visible: it divides the whole monthly bill by the hours the GPU was productive.
What is the effective cost per hour of use?
It is the total monthly cost divided by the hours the GPU spent doing real work. At the defaults — $328.50/mo over 219 productive hours — that is $1.50/productive hour. When overhead is zero and billing is purely usage-based it equals the raw hourly rate; add overhead or pay for idle reserved time and it climbs above the sticker rate.
Cloud rental or buying the GPU outright?
Renting wins for bursty, experimental, or short-lived workloads — you pay only for what you use and avoid capital risk. Owning wins at high, sustained, long-term utilization, where you stop paying the provider's margin. Put the rental monthly figure here next to the owned figure from the GPU TCO calculator to see the crossover for your duty cycle.
How do I turn this into a cost per token?
You need throughput. Once you know sustained tokens-per-second for your model on this GPU, the throughput cost calculator converts the hourly rate into dollars per million tokens — and shows how idle time inflates it.
Are these hourly rates current?
The bundled cloud rates were verified on Jun 25, 2026 against public provider pricing and each links to its source. They are convenience defaults only — the hourly field is editable, so the calculator stays correct even when a default goes stale. Always confirm the current rate, region, and commitment terms with the provider.
Disclaimer. LLMTCO provides cost estimates and planning tools for informational purposes only. AI API and GPU prices change frequently; bundled defaults reflect publicly listed prices as of the verification date shown (Jun 25, 2026) and may be out of date — always confirm current pricing with the provider. These figures are estimates, not financial, tax, or procurement advice, and do not capture every real-world factor (latency, reliability, compliance, data privacy, engineering time).