GPU Cloud vs. Colocation: A Total Cost Analysis
The decision between GPU cloud and colocation isn't just about hourly rates versus capital expenditure. It's about utilization patterns, operational capability, timeline to deployment, and the total cost of ownership across a realistic planning horizon. Here's the analysis most companies skip.
GPU Cloud: The True Cost
Cloud GPU pricing has three tiers, and the economics differ dramatically:
On-demand: NVIDIA H100 instances cost $3-5 per GPU-hour on major clouds (AWS, GCP, Azure). At 24/7 utilization, that's $26K-$44K per GPU per year. For a typical 8-GPU training cluster, you're looking at $210K-$350K annually. The advantage: instant availability, no commitment, and you can scale to zero when not training.
Reserved instances: 1-3 year commitments reduce costs by 30-60%. An H100 reserved instance might cost $1.50-$2.50 per GPU-hour, bringing the annual cost per GPU to $13K-$22K. The catch: you're paying whether you use it or not. If utilization drops below 60-70%, you're losing money compared to on-demand.
Spot/preemptible: 60-80% discounts from on-demand, but instances can be terminated with 30-120 seconds notice. Useful for fault-tolerant training jobs with checkpointing. Not viable for inference or time-sensitive workloads.
Beyond compute costs, factor in: data transfer fees (egress charges add up fast when moving large datasets), storage costs for training data and model checkpoints, and networking costs for multi-node training.
Colocation: The True Cost
Colocation for GPU workloads involves multiple cost components:
Capital expenditure: An 8-GPU H100 server (DGX H100 or equivalent) costs $250K-$350K. For a small training cluster (4 servers, 32 GPUs), budget $1M-$1.4M in hardware. Amortize over 3-4 years.
Colocation fees: High-density colocation (30-50 kW per rack) costs $150-$300 per kW per month in major markets. A 4-server cluster drawing ~40 kW costs $6K-$12K monthly in colocation fees, or $72K-$144K annually.
Networking: Cross-connects, internet transit, and cloud on-ramps for hybrid architectures. Budget $2K-$5K monthly.
Operations: Remote hands, hardware maintenance, sparing strategy (keep replacement parts on-site). Budget $3K-$5K monthly for a small deployment, or hire an infrastructure engineer ($150K-$200K fully loaded).
Total 3-year cost for 32 GPUs: $1.2M hardware + $360K colocation + $120K networking + $150K operations = ~$1.83M, or ~$19K per GPU per year. Compare this to $26K-$44K per GPU per year on-demand cloud, or $13K-$22K on reserved instances.
The Utilization Crossover
The economics favor different approaches at different utilization levels:
Below 30% utilization: Cloud on-demand wins. You're only paying when you compute. Colocation hardware sits idle, depreciating.
30-60% utilization: Cloud reserved instances and colocation are roughly equivalent. The decision comes down to operational capability and timeline.
Above 60% utilization: Colocation wins on pure cost. At 80%+ sustained utilization, colocation can be 40-60% cheaper than cloud reserved instances over a 3-year period.
The key question: what's your realistic utilization? If you're training models continuously, utilization is high and colocation makes sense. If you're running inference with variable demand, or training intermittently, cloud flexibility has real value that offsets the higher per-hour cost.
Many organizations land on a hybrid: owned/colocated infrastructure for baseline workloads and cloud burst capacity for peak demand and experimentation.
Beyond Cost: The Factors That Actually Decide
Time to deployment: Cloud GPU instances are available in minutes. Colocation deployment takes 3-6 months (procurement, facility build-out, installation). If you need compute now, cloud is the only option.
Operational maturity: Owning GPU infrastructure requires expertise in hardware management, cooling, power distribution, and networking that most software companies don't have. If you don't have (or can't hire) infrastructure engineers, the operational burden of colocation outweighs the cost savings.
Data sovereignty: If your training data is subject to residency requirements or you need physical control of the hardware for security compliance, colocation provides guarantees that multi-tenant cloud can't.
GPU generation risk: GPU technology evolves rapidly. Hardware purchased today may be outperformed by next-generation GPUs within 18-24 months. Cloud lets you upgrade instantly; owned hardware locks you into a generation until the amortization period ends.
The right answer is rarely all-cloud or all-colo. It's understanding your workload profile, utilization patterns, operational capability, and growth trajectory — then designing an infrastructure strategy that optimizes across all four dimensions.
Need help with infrastructure?
We bring 20+ years of operator experience to help you make these decisions with confidence.
Talk to Us