AI & Data Center Load Growth: A Unique Perspective

Published: October 24, 2024

My goal with this article is not to estimate how many gigawatts I predict data centers, especially with the current hype around generative AI, will consume in 2030 or some N number of years from now. Instead, I give a different perspective.

Within both of the technology and energy industries, the vast majority of people view data center energy load growth as a black box with fixed and relatively stable demand. This is a narrow perspective. I also explore some energy and AI trends that are going to affect the load growth estimates that you see in many publications.

TL;DR: Data center loads are actually controllable and can be responsive and influenced by external signals, so they should be viewed as flexible. There are many AI and energy trends affecting data center load growth estimates.

This article is at the intersection of the technology industry and the energy industry. With that, it is written for audiences in both fields to understand.

Data Center HVAC Load is Controllable

First, it is important to decompose data center loads. They consist largely of two parts: the computers running in the data center and the HVAC (heating, ventilation, and air conditioning) system needed to cool them. Let us dig into the HVAC system’s load first.

In data centers (DCs), anywhere between 30% and 55% of the total energy consumption (“load”) is by the HVAC system (with variables like the DC’s design, the type and sophistication of the cooling system, and the external climate). Across many industries, the prevailing mindset is that the HVAC load for buildings is fixed with little flexibility. HVAC installers, operators, and software makers obsess over the concept of “unmet hours” (the concept that there is an hour of time where the HVAC system can’t heat or cool the building enough to meet the desired target temperature), and take a very conservative approach and try to maintain massive thermal margins to avoid them. In reality, for almost every type of building, there is a lot of room to operate the HVAC system of a building while still not creating any unmet hours. Further, the criteria for unmet hours is more or less of a line in the sand for most buildings and can be moved with often little noticeable impact to building occupants (intuitively, there is often little noticeable difference if a room is 70F or 72F, ceteris paribus). A small but growing number of companies are taking a more aggressive approach to managing their DC HVAC systems. With the quality of simulation and forecasting tools today, this more aggressive approach is calculated and is often low risk. This is not theoretical. In 2016, Google implemented an AI system that optimized the cooling system in their DCs that resulted in a massive 40% savings on cooling costs. Today, this translates to billions of dollars saved on data center costs every year for Google. It is hard to comprehend the impact of this; put another way, a little bit of software saves billions of dollars every year. Connecting that to today, the AI agent (in the field, when an AI model is taking actions in a system or real world, it is called an agent) behind these massive savings was approximately 10,000 to 1 million times smaller than ChatGPT when this paper was published in 2016. In 2021, Meta implemented a similar AI system in their data centers. I personally led work on optimizing HVAC systems using AI while I was at Yize NRG. While there, with a few wise choices, we were able to save as much as 50% on HVAC energy costs in certain large office buildings with fairly little noticeable impact to building occupants. Given that 30-55% of DC energy consumption is from the HVAC system, reducing this cost is paramount. Optimizing the HVAC load does not have to be done in isolation, it can also incorporate external signals.

It is important to remember that the AI field is much more than just chatbots, and is much older and richer than the recent generative AI hype.

Compute Load is Controllable

The vast majority of the remaining 45-70% of data center energy consumption is by the computers running in them (though 5-10% of total DC load can be consumed by the building’s electrical and lighting systems). The term “cloud” in computing simply means that a company is using computers located in a data center and not hosted on the premises of that company’s facilities. Today, many companies use a type of DC usually referred to as a “public cloud”. For those that do not know much about public cloud providers, you can think of an entity like Amazon Web Services (the biggest public cloud provider) as having a massive pool of computers that many different companies can rent (they can rent the whole computer or simply a slice of it). As unintuitive as it may seem, the utilization of each computer’s full computational capacity is controllable, and therefore their energy consumption is too. This is not a wild concept, Amazon Web Services has a sort-of market for their unutilized compute power (“compute”). In essence, AWS will vary how much it will charge you based on how much excess compute they have sitting unutilized (the leftover computers in that pool); they advertise up to 90% savings on these compute instances. The tradeoff is that these computers will not always be available. In economics, there is the concept that every action has an implicit cost; if you pay some entity N dollars to do something, they will do it depending on the value of N. AWS’s spot instances make this explicit, every compute load has an implicit cost and the task that was supposed to generate that compute load (i.e. run on that computer) may not be run if that cost is high enough.

Since every compute load has a cost, that creates an implicit hierarchy of compute loads within a single company renting compute power. To make this concrete, I will give you an example from the company that I founded, Gradient Energy. We have computers that we need to serve our product for our customers constantly; these compute loads are at the top of the hierarchy. At the bottom of the hierarchy would be the periodic tests we run every day for our code to limit the number of bugs. It does not matter if these run at 3am or 3pm. The compute load for the code tests is at the bottom of the hierarchy. The hierarchy of compute loads has two dimensions, importance and sensitivity to time (urgency), that are combined together (with a weighting that often varies over time) to compress the two dimensions into a one dimension hierarchy with an explicit price. In the example, our production servers are at the top of the hierarchy since they are both important and very time sensitive, and our code tests are at the bottom since they are of relatively low importance (I don’t write code with bugs, it’s the computer that’s wrong), and they have low sensitivity to time. Connecting all of this back to AI, somewhere in the middle of this hierarchy is the compute load we need for training our AI models and agents. At Gradient, we optimize electric vehicle charging using AI. Our AI is currently state of the art, meaning it is the single best system in the world for this task of optimizing EV charging. With that, our AI agents are reasonably mature, but we want to constantly improve them. To us, training new iterations (running experiments and observing the outcomes) of our AI agents is important, but it is not as important as serving our software and current AI agents nor as time sensitive. In other words, with the right incentives, we will wait several hours to train new iterations of our AI if AWS were to give us a, say, 50% discount on compute costs.

Further, since this is a public cloud, we are not the only company renting computers from AWS, there are thousands of others. Each of these companies has their own hierarchy of compute loads and, more importantly, different values of prices that they are sensitive to. In other words, Salesforce, the company with a market cap over $200 billion, has a greater ability to pay than a startup. The compute loads at the top of its hierarchy are going to have a greater price value than a startup’s. I am not arguing that all compute prices should be responsive and clouds should cater to the highest bidder; there are many downsides to this. There are many upsides to expanding the market for spot instances and increasing total compute load flexibility. The spot market could become more precise as well. Since most of the loads bidding for spot instances are predetermined and launching them is automated, the spot market mechanism could estimate the size and energy needed of the compute load for this task.

Aggregating Compute & HVAC Loads

Now that we better understand the dynamics of these two main consumers of power in a DC, it is important to aggregate them together, and understand that load as part of the broader electricity grid. The explanation of the above loads and proposing tools and mechanisms for them may seem like overkill, depending on your mindset. In isolation, the sheer cost savings explain why data centers should consider incorporating some of these tools, with some already doing that. Below, I explain why, given trends in energy and computing, that incorporating these tools and mechanisms may become even more important.

We can, and should, make aggregated data center loads responsive to external signals. The current construct for aggregating and using energy load flexibility is called demand response. Basically, when demand is high and the grid is under stress, flexible loads respond and reduce consumption; as of today, we are actually seeing certain distributed energy resources (DERs) (e.g. batteries in homes) generate power during demand response (DR) events. For those reading this unfamiliar with DR, it may seem like it is some sophisticated concept, but, like most things in energy, it is not; it is archaic and quite simple. For those in tech, a small Kubernetes cluster is more sophisticated than most demand response programs. I have many thoughts on how to improve and evolve demand response and other energy concepts. Stay tuned as I am gradually publishing those papers I have written. Even some of the biggest and most valuable loads today, entire factories and refineries, have some flexibility and are enrolled in DR programs. Extending this, massive DC loads will likely also need to have some flexibility in the future. In certain jurisdictions, it may even be required by regulators for reliability. Under certain conditions, DCs are already participating in DR programs in Texas.

Virtual power plants (VPPs) are a hot topic in the energy space today. VPPs have widely varying definitions in the energy space; the term has become more or less a marketing term. From my view working in the field, the widely varying and broad definitions of what a virtual power plant is seems to be having an adverse effect on discourse, and even progress, in the field with people often talking past each other in conversations. For instance, a “VPP” composed of smart thermostats, which can only offset the use of power with widely varying reliability (read: unstable and erratic load curve), is very different from a VPP composed of physical batteries located in homes (think: Tesla Powerwalls) that discharge power and generate real Watts and with a much higher level of reliability. Given the wide variety of VPPs and problems in the discourse, in one of my upcoming publications, I propose methods to classify VPPs and I explore the dynamics of each (see “A Taxonomy of Virtual Power Plants and Their Dynamics”). In a separate publication, I propose methods for measuring both VPPs and DERs (see “Tools and Methods for Measuring Virtual Power Plants and Distributed Energy Resources”). The sheer magnitude of the load of a site of DCs means that they can often participate in various DR programs directly. Given the widely varying definitions of virtual power plants and distributed energy resources, one could argue that a data center can be viewed as a VPP unto itself. The siting would itself be the bounds of the VPP with the computers themselves being considered the DERs, as they have the ability for a flexible and responsive load, along with the HVAC systems. Each building in the DC complex can be viewed as a sub aggregation of the VPP, and even groupings of racks (how computers are organized in a DC) can be viewed as a lower level sub aggregation. Considering the topology and complexity of DCs, operating this VPP and optimizing it for its many constraints while maintaining a target load profile is quite complex. I propose scalable solutions to this at varying levels of sophistication in several of my upcoming publications, and we are building them here at Gradient Energy (Dynamic Optimizing Virtual Power Plants in particular).

In the more distant future (10-50 years from now), DC load flexibility is even more critical with greater fluctuations in power supply given the rapid rise of intermittent renewables and the decline (due to unfavorable economics) of fossil fueled-powered generators like coal. More evidence is that, even when the supply of power is not constrained, the transmission of it often is (explained below).

So how do we make HVAC and compute load flexible and responsive while still being able to complete their intended tasks? Optimization. In the near future, I will explain how important optimization is to energy as it is core to my work and much of my research. If you are interested in learning more and staying up to date, email me or follow me on Twitter.

I mention my research a lot and how the publications are upcoming. I have been writing at a furious rate, faster than I can have the papers peer reviewed. I have a queue of publications that I have to balance with my work on Gradient Energy as building a company takes a considerable amount of time. If you are interested in reading my research when it comes out, email me or follow me on Twitter (please note that I do not post on social media except when it relates to my work or my research).

Energy Trends

Since 2016, I have been working in both energy and tech, usually applying AI to solve problems in energy. It may not seem like it to the uninitiated, but the fields are wildly different, with the largest difference being their pace. Tech is like a car speeding down a highway and energy is like taking a dog for a walk (including the many stops along the way and occasional backwards movement). Switching between the two is akin to doing 80mph on I-95, opening the door, and jumping out. Switching between the two is not just jarring, it can be quite painful. There are automated Twitter accounts that summarize everything that happened relating to AI that day. That summary would be the same length for capturing everything that happened in energy that year. Data centers are one area where these two industries collide. To energy’s credit, the pace can be faster if you are working with distributed energy resources. I do not want to sound negative about energy, it is the field I am dedicated to. Energy’s slower pace is not a bad thing as seemingly small changes in the energy field can have outsized impacts on society at large; caution can be healthy.

One of the main sticking points with building data centers is getting the electric utility to approve the interconnection, let alone getting them to upgrade that interconnection to the grid since DCs often require so much power. This process can take years along with a great deal of friction and confrontation. It is important to note that not all utilities and relevant entities are like this though. I will not explain interconnection queues, but here is a good summary of them and their problems. There are enough clean energy projects to more than double America’s power capacity, but the queue, with its regulatory process and ceremony, is backing those projects up more than five years. The companies building new DCs, like the big public cloud providers AWS, Google, and Microsoft, usually want to power them with clean energy, not for virtue signaling but because they are the lowest cost sources of power. With these lengthy queues to connect to the grid, these companies are getting creative. Microsoft is resurrecting the decommissioned nuclear plant Three Mile Island. These companies are starting to avoid the queues entirely by locating new DCs directly next to large power plants. AWS bought a DC site next to an existing nuclear power plant in Pennsylvania. Another trend is that these companies are investing in new technologies like Google has with small and modular nuclear reactors and Google has with geothermal.

If you are new to energy, I highly recommend looking into David Robert’s years of work at Vox and Volts. One of his articles was highly influential to me and my research.

AI Trends

I entered the AI field (specifically deep learning) in 2017. I remember a time before the invention of the Transformer. I have been told that this makes me a dinosaur in the field which is ironic since I am Gen Z. That should give you an idea of the pace the AI field has been moving at. With that, there is massive open debate with widely varying projections about the future of AI, the pace it is developing, its future capabilities, and even where to go next in the field. This has a very large impact on load growth.

The current hype in AI is around “generative AI”, which is centered around Large Language Models (LLMs) like ChatGPT. For those unfamiliar, you can think of AI as having two stages: training and inference (for those that are more experienced in AI, recognize that I am making broad strokes for a nontechnical audience). First, you need to train a model on some task or set of tasks. After that, you can use that model in the real world to make predictions (called “inference”). In general, you can think of training as a big, one time operation (more on that in a second), and inference as a series of smaller but numerous operations. For LLMs, training is more complicated. For most groups using LLMs, training is composed of two parts, training on a large amount of data (in this case, the contents of the entire internet), and then fine tuning the model to perform better on a smaller number of tasks and data (e.g. summarizing news articles). For reference, the amount of effort, measured by cost and energy usage, that goes into fine tuning a model usually ranges from 0.01% to 5% of the effort it took to train the model. In the real world, only a handful of companies train foundation models (i.e. that first phase of training) as this process can take months and hundreds of millions of dollars (the cost comes from thousands of expensive computers running for months on end), and then everybody else uses those foundation models and fine tunes them for their own tasks and on their own datasets. These foundation models are often gated and give you limited access to their fine tuning and customization. However, that is not the case for all foundation models.

In computer science, there is the concept of open source software which is an extension of the scientific process of freely sharing ideas and evidence. This means that the creators of some piece of software release the source code for that software and it is usually free of cost and allows you to do almost anything with that source code (it depends on the license). The entire field of AI is built on open source software and open science. This is arguably a large reason why the field has flourished and developed so quickly. Given the exorbitant costs of creating and training a foundation model, most companies, like OpenAI and Google, do not disclose much about them, the process of creating them, and important design choices (these are very important), let alone sharing them freely. However, some companies, like Meta and startup Mistral, are releasing their foundation models, along with many variants, completely free and as open source as possible along with thorough publications detailing and explaining the process of how they were trained and what choices were made. Open source is important in AI, especially generative AI, for many reasons. One reason is because it allows for parties (companies, universities, and individuals) to reuse existing trained AI models, saving a great deal of compute, energy, and cost as they do not have to undertake the difficult endeavor of training their own. In other words, instead of having 20 companies each trying to train their own foundation models, they can instead use Meta’s Llama models, saving a great deal of money, and energy, in the process. Further, the people working at the 20 companies can all collaborate and circulate ideas freely to create the best model they possibly can. In reality, this example number of 20 companies is much lower than the hundreds of universities and industry labs from around the world collaborating to advance the field.

Most AI models today, especially generative AI, are dependent on GPUs to run efficiently. For my nontechnical audience, computers’ CPUs are for general and can be used for most any task whereas a GPU is specialized for a small set of tasks with one of those tasks being useful in AI. With that, modern GPUs are a constraint for these models which can be massive and require several GPUs to run one single model. An AI model is often measured in shorthand by its parameter count; this is how much computer memory the model occupies (with some consuming hundreds of gigabytes). An imperfect analogy is that the parameter count is how big the AI model’s “brain” is. For several years, there was a trend to simply increase the size of models to increase performance (sparked by an OpenAI research paper). As of October 2024, that trend has slowed and arguably even reversed. Evidence of this is Meta’s Llama 3.1 class of models which claims to have state of the art performance (meaning it is the single best system in the world) at a variety of benchmarks while having a significantly lower parameter count than other LLMs (according to rumors and estimations). Further evidence that we are getting better performance with fewer parameters are Meta’s Llama 3.2 class of models which are generative AI models meant to be small enough to easily be deployed on edge devices, like smartphones. This emerging trend of better parameter efficiency means that we do not need as many GPUs and computers to get the same performance we would have otherwise. Using two GPUs for one model is going to consume much less energy than four. This impact may seem small, but becomes important at scale. If you are a large corporation and can reduce GPU usage by 5,000, that translates to millions in savings, megawatts of peak load avoided, and many Watt-hours of energy saved. Further, if we can run more AI models on a person’s device, like their phone or PC, then that saves compute in the cloud (it is arguably merely shifting energy consumption, but addressing this argument is out of the scope of this article). Another important trend is that computing hardware is becoming more energy efficient. A common measure of this is dollars per FLOP. We see this with both newer generations of Nvidia GPUs as well as the rise of custom computer chips (ASICs) that are specialized to run modern AI models.

I considered going into more technical detail about generative AI models for this article, but I decided it was out of scope for the two main audiences that would read this. I would have discussed things such as model GPU utilization (cycles and VRAM), efficiency and parallelizability of model architectures, implications of RAG, and costs of serving fine tuned models vs serving base models and injecting relevant information into the context window.

Conclusion

When thinking about the load growth of data centers, including when it is driven by AI, understand that there are certain trends in both energy and AI affecting these estimates. Data centers’ loads are actually controllable and can be influenced by external signals, so data centers can and should be viewed as flexible resources (though the shape of the curve varies along with its price sensitivity). All together, what I discuss will likely reduce data centers’ peak demand from what they otherwise could be (with some of it being shifted and some of it being avoided).

It is also important to contextualize data center load growth. Yes, they will likely consume a lot more energy than they will now, but electrification will have a much bigger impact. Electrification refers to industries transitioning their energy demand to electricity from other sources (like oil and natural gas). Examples are transportation transitioning to electric vehicles and the built environment electrifying heating and appliances and weaning off of direct natural gas heating. For concrete numbers showing the relative scale, we will use some 2030 energy estimates, electric vehicles are projected to consume approximately four times more energy than DCs, and building HVAC systems alone are projected to consume approximately six times more energy than DCs.

I talk a lot about LLMs and generative AI in this article since that is the current hot topic. Do not underestimate the effectiveness of “small” AI models and agents. You likely already interact with many each day without knowing it since they are part of existing products and services.

Let me know your thoughts on this article in either the comments where this was shared or by emailing me here or tweeting at me here.

I have been working in energy since 2016 and building real world AI systems since 2017. I am currently working on Gradient Energy, the company I founded. We save electric vehicle owners big, make owning an EV easier, and help the grid by optimizing EV charging using AI. By all published public metrics, our AI agent is state of the art, meaning it is the best in the world at this task, and more capable than any other. This means that we can save EV owners more money and streamline their experience better than anybody else. If you own an EV, you can learn more and sign up on our website. If you are interested in working with us, email me here. We do not currently need funding, but we will be looking for it in the future.

I would like to thank Vince Buhler and Henry Grover for reviewing this article.

I mention my research a lot and how the publications are upcoming. I have been writing at a furious rate, faster than I can have the papers peer reviewed. I have a queue of publications to refine that I have to balance with my work on Gradient Energy as building a company takes a considerable amount of time. If you are interested in reading my research when it comes out, email me or follow me on Twitter (please note that I do not post on social media except when it relates to my work or my research).