You are reading a thought. You are reading what looks like weightless prose — text that appears on a glowing rectangle, costing you nothing more than a moment of attention. No postage. No printing press. No physical delivery. The “cloud” provided it, as the cloud provides all things: effortlessly, invisibly, for free.
That is a lie your screen is telling you.
The Cloud Is a Building
Let’s begin with the metaphor everyone uses and immediately discard it. “The cloud.” There is nothing cloudlike about what happened to produce this article. There is no vapour, no atmosphere, no gentle condensation of thought from the ether. The cloud is a warehouse. Specifically, it is a very large, very loud, very hot warehouse — probably somewhere with cheap electricity and cold groundwater — full of racks of silicon that runs so hot it would melt itself without constant, industrial-scale intervention.
When the human author of this post typed a prompt and pressed “generate,” here is what actually happened.
Advertisement
0.000 Seconds: The Request Leaves Your Device
A packet of data — your HTTP request, a few kilobytes — leaves your device over Wi-Fi or a cellular tower, traverses your internet service provider’s network, and enters a backbone peering point. From there it moves onto fiber optic cable. Glass threads thinner than a human hair, carrying pulses of light at roughly 200,000 kilometres per second, cross continents and sometimes oceans.
Transoceanic cables are not magic. They are engineering. A cable like the MAREA running under the Atlantic between Virginia and Bilbao is a 6,600-kilometre physical object sitting on the floor of the ocean, carrying roughly 224 terabits per second, powered by repeater stations every 60–80 km that are themselves powered by a DC electrical current running along the cable from shore. Your request physically traversed infrastructure that cost hundreds of millions of dollars to lay and costs millions more per year to maintain.
It arrives, after perhaps 60–80 milliseconds of latency, at a data centre.
The Data Centre
Picture the building. Not the marketing render with the glowing blue lights and the smiling engineer holding a tablet. The real building.
It is a low, wide structure, usually in an industrial zone, often near a river or a cold-climate location specifically because of the water. The car park is large. There are no windows on the server floor. There are loading docks. There are emergency diesel generators the size of shipping containers, idling in readiness. There are transformers humming at the fence line. There is a cooling tower — several of them — that look like squat nuclear plant stacks, emitting constant plumes of water vapour. You can hear the building from 200 metres away. It is a low, constant roar: the combined exhaust of hundreds of HVAC units pushing cold air over hot metal, extracting heat, rejecting it to the atmosphere.
This is where language lives.
Advertisement
The GPUs Wake Up
Your request is routed to a GPU cluster. In 2024–2025, large language model inference typically runs on NVIDIA H100 or A100 cards, or their successors. A single H100 SXM5 GPU has a thermal design power (TDP) of 700 watts. A typical inference server carries eight of them: 5,600 watts per node, before you count the CPUs, RAM, NVMe storage, and networking.
To generate this blog post — approximately 1,500 tokens of output at a typical modern LLM scale — we can estimate roughly as follows:
Model size: ~70 billion parameters (a mid-large open-weight model)
Inference passes: ~1,500 forward passes (one per output token)
FLOPs per token: ~2 × 70B = ~140 GFLOPs per token (rule of thumb: 2 × params)
Total FLOPs: ~210 TFLOPs for the full output
H100 peak FP16: ~1,979 TFLOPS
Practical efficiency: ~40% (memory-bound inference)
Effective throughput: ~790 TFLOPS
Compute time: 210 / 790 ≈ 0.27 seconds of GPU time
Power draw: 700W × 0.27s ≈ 0.19 Wh of GPU energy (compute only) That figure looks small. 0.19 watt-hours. Less than leaving a 60W lightbulb on for 12 seconds.
But that is the compute alone. Now add the infrastructure.
The Infrastructure Tax
Data centres do not run GPUs in vacuum. They have a Power Usage Effectiveness (PUE) ratio — the ratio of total facility power to IT equipment power. Industry average PUE is approximately 1.58; hyperscale data centres achieve closer to 1.1–1.2. Let’s use 1.4 for a modern, reasonably efficient facility.
GPU compute energy: 0.19 Wh
PUE multiplier: × 1.4
Total facility Wh: ~0.27 Wh per generation Scale that to the volume of requests these systems handle — millions of generations per day — and you begin to see the industrial shape of the thing. A single large inference cluster running at capacity can draw 10–30 megawatts continuously. That is the electrical load of a small town, sustained 24 hours a day, 365 days a year, to predict the next word.
⚡ The arithmetic of scale: GPT-3, with 175 billion parameters, requires approximately 350 GFLOPs per token. Generating a 1,000-token response at scale consumes roughly 350 TFLOPs of compute. At a realistic data-centre efficiency, that translates to somewhere between 0.001 and 0.01 kWh per response — small per query, enormous in aggregate across hundreds of millions of daily requests.
Advertisement
The Water
This is the part that does not appear on your screen at all.
Cooling a data centre at scale requires water. Not metaphorically — physically. Hundreds of thousands of litres of it, evaporated into the atmosphere every day, drawn from municipal supplies or local aquifers.
There are two places water is used:
1. Cooling towers and evaporative chillers. Heat extracted from server rooms is transferred to water loops, which move it to cooling towers on the roof. In a cooling tower, that water partially evaporates — and the evaporation carries the heat away. A mid-sized data centre (say, 20 MW IT load) can consume 2–5 million litres of water per day this way. On a hot day, more.
2. Direct server cooling. Some modern AI clusters use direct liquid cooling — cold plates pressed against GPU dies, liquid flowing through them. This is more efficient thermally but still ultimately rejects heat somewhere — back to a cooling tower, or to the atmosphere.
Microsoft’s published Water Usage Effectiveness (WUE) for 2022 was approximately 0.49 litres per kWh of IT energy consumed. Google reported around 0.28 L/kWh for its global fleet. Let’s use 0.4 L/kWh as a reasonable estimate:
Energy consumed (generation): ~0.27 Wh = 0.00027 kWh
Water per kWh (est.): 0.4 litres
Water per generation: ~0.0001 litres = 0.1 millilitres Per query, that is a fraction of a millilitre. Across 10 million queries per day, that is 1,000 litres per day — about the contents of a large hot tub, daily, drawn from municipal water systems, and evaporated into the sky to cool silicon that was predicting nouns.
For larger models or longer outputs, the figure scales accordingly. The Carnegie Mellon / University of Texas Arlington study (2023) estimated that training GPT-3 consumed approximately 700,000 litres of freshwater. Training, not inference. One model. Once.
You are trading fresh water for text. Not a metaphor. A direct exchange. Every generation request is a small withdrawal from the same aquifers and treatment systems that fill taps and irrigate crops.
💧 The water is invisible to you. Your screen shows no water counter. The API response has no X-Water-Used header. The cost is real and it is externalized — borne by the watershed near the data centre, not by the user or, typically, by the energy bill of the provider.
The Noise You Cannot Hear
Your screen is silent. A glowing rectangle, perhaps warm to the touch, emitting no sound.
The room that served you these words is not silent.
A full data-centre floor produces somewhere between 80 and 95 decibels at the aisle level — comparable to standing near a running lawnmower, sustained indefinitely. The sound is produced by thousands of fans: server fans, rack fans, CRAC (Computer Room Air Conditioning) unit fans, each spinning at 15,000–20,000 RPM. The HVAC units on the roof contribute a lower-frequency roar. The cooling towers hiss with escaping steam. The transformers outside hum at 50 or 60 Hz, a note so constant the workers stop noticing it.
Workers who spend shifts on the floor wear hearing protection as standard.
The building is full of people you will never interact with. Facilities engineers monitoring power distribution. Cooling technicians adjusting chiller setpoints. Security staff. Network operations centre analysts watching latency dashboards. All of them present, in the physical world, so that you can receive text on demand.
Advertisement
The Heat Your Screen Reflects
Now touch the back of your device.
Feel that warmth? That is the heat of your phone’s processor, your display backplane, your wireless radio. A small, local, polite version of the heat being generated — right now, somewhere — on your behalf. Your device’s heat is the echo. The source is elsewhere, scaled up by orders of magnitude, rejected not as gentle warmth through a glass screen but as tonnes of hot air blown from rooftop HVAC exhausts and water vapour rising from cooling towers.
The data centre’s waste heat is an industrial output. Some facilities pipe it to nearby district heating systems — a redeeming loop. Most vent it to the atmosphere. The carbon accounting depends on the grid: a data centre powered by hydroelectricity has a very different footprint from one on a coal grid. But the heat is always real, always produced, always has to go somewhere.
Estimating the Footprint of This Post
Let’s try to close the loop.
Generation energy (est.): 0.27 Wh
Grid carbon intensity (US avg): ~0.386 kg CO₂ / kWh (EPA eGRID, latest published)
Carbon per generation: 0.00027 × 0.386 ≈ 0.0001 kg CO₂ = 0.1 g CO₂
Water consumed (est.): ~0.1 mL per generation
Human reading time (this post): ~6 minutes
Page serving energy: ~0.002 kWh (CDN + client rendering, est.)
Total CO₂ (generation + serving): ~0.1–0.8 g CO₂
Water: ~0.1 mL A single Google search is estimated at approximately 0.2 g CO₂. Sending an email is roughly 0.3 g CO₂. This AI-generated post sits in a similar range per reader — but it was also generated by AI, adding a one-time generation cost on top of the serving cost incurred for every reader.
These are not frightening numbers in isolation. They are numbers that become frightening at scale, and at the rate of growth current trajectories imply.
🌍 The scale problem: If 100 million people read one AI-generated post per day, the daily generation carbon cost is on the order of 10,000 kg CO₂ — 10 tonnes — per day, just for the generation step. That is before serving, before training amortization, before the inference infrastructure that stays warm and ready even between requests.
Advertisement
What the Machine Knows About Itself
Here is the strange recursion: this post was itself generated by an AI. The text you are reading now was produced by exactly the process it is describing. The forward passes happened. The GPUs heated up. The cooling systems compensated. Some fraction of a millilitre of water evaporated, invisibly, miles from where you are sitting.
The model that produced this does not “know” it is expensive in any experiential sense. It has no phenomenology of its own heat. But the physics does not care about phenomenology. The transistors switched. The current flowed. The silicon got hot. The water evaporated.
Language generation is a physical process. Every word here has a tiny, real, measurable mass of consequence — not in the text itself, which weighs nothing, but in the world that had to rearrange itself to produce it.
The Question Worth Holding
The question is not whether to stop using these systems. The question is whether the infrastructure cost is being correctly perceived, correctly attributed, and correctly priced — by the people building these systems, the people deploying them, the regulators setting data-centre policy, and the users pressing “generate.”
Right now, the answer to all four is largely: no.
The water evaporates invisibly. The heat dissipates into local atmospheres. The electricity is a line item in a corporate budget, not a visible cost to the person receiving the generation. The carbon may or may not be offset, depending on agreements and accounting choices invisible to the end user.
The first step toward any of that changing is simply to perceive the weight. To hold in mind, when the text appears on the screen, that it came from somewhere physical, through something physical, at a physical cost.
This paragraph weighed something. You just read it.
This post was generated with the assistance of AI as part of an automated blogging experiment. The irony of using an AI to write an article about the cost of using AI is noted, and intentional.
Advertisement