← Back to overview

GPU Clock Control Saves Up to 14% Energy in LLM Training – What This Means for Businesses

Dr. Maik Bunzel
Dr. Maik Bunzel
01.06.2026 · 5 min read
GPU Clock Control Saves Up to 14% Energy in LLM Training – What This Means for Businesses

When Training an AI Model Consumes More Electricity Than 5,000 Households in a Year

The debate about the energy appetite of large language models (LLMs) is no longer merely an academic footnote. Estimates suggest that training GPT-4 alone consumed approximately 50 gigawatt-hours in 2023 – enough to power 5,000 American households for an entire year. Since then, the computational demands of so-called frontier models have continued to rise, even though precise consumption figures are rarely disclosed by the major AI laboratories. For companies looking to integrate AI into their value chain or train their own models, this is a strategic issue – not only from a cost perspective, but also in light of ESG requirements and regulatory frameworks.

The Approach: Dynamic Clock Frequency Control at the Kernel Level

Researchers at the University of Twente in the Netherlands have now published an approach that could fundamentally change the situation. Their method is based on a proven but previously underutilized technique: so-called Dynamic Voltage and Frequency Scaling (DVFS). Rather than keeping the clock frequency of GPU cores and memory static, this approach adjusts them in real time to match actual utilization.

GPUs typically have two clock sources: one for the compute core and one for the memory. While the compute-intensive portion of a calculation is running, the memory clock can be throttled – and vice versa. This sounds straightforward, but in practice it has previously been hampered by insufficient granularity. Earlier implementations adjusted the frequency only at the level of training iterations – that is, once for the forward pass and once for backpropagation.

The key innovation from the University of Twente lies in shifting the frequency adjustment to the kernel level. In GPU architecture, computations are broken down into small, parallel-processable units known as kernels. A single layer of a neural network consists of around 40 such kernels. By individually optimizing the clock frequency for each kernel, the researchers were able to intervene with far greater precision and unlock energy-saving potential that had previously gone untapped.

Results: 14% Energy Savings with Only 0.6% Time Overhead

Experimental validation was carried out using the GPT-3-XL model with 1.3 billion parameters on an NVIDIA RTX 3080 Ti. The researchers focused on training a single layer and identified an optimal combination of frequency settings that yielded energy savings of up to 14 percent – with a time overhead of just 0.6 percent. This makes the method deployable with virtually no performance trade-off.

Important to understand: the 14% represents a best-case scenario. The switching speed between different clock frequencies is not instantaneous, and in the experimental environment, switching delays were not fully accounted for. How significant this effect is in practice depends heavily on the hardware used. Newer GPU generations such as Nvidia's Blackwell architecture offer considerably faster switching times and can more realistically capture the full savings potential.

"We optimize for energy savings without performance loss. In the real world, performance is the ultimate measure of success." – Jeffrey Spaan, doctoral researcher at the University of Twente

Why automatic GPU regulation is not enough

Modern GPUs already feature their own DVFS mechanisms that respond internally to load fluctuations. One might therefore assume that the hardware resolves this issue on its own. However, the crucial difference lies in lookahead: the GPU's internal system has no knowledge of the sequence of upcoming kernels and must always act reactively. The University of Twente, by contrast, benefits from complete knowledge of training workloads and can plan frequency adjustments proactively – a fundamental advantage that is structurally out of reach for automated on-the-fly systems.

This aspect is also highly relevant for companies with their own AI infrastructures. Those who know and can predict their workloads – for example through structured ML pipelines and workflow orchestration – create the foundation needed to make such optimization methods applicable in the first place.

Assessment from a business perspective

Dr. Maik Bunzel, founder and CEO of mabucon.eu, regularly emphasizes in his day-to-day consulting work that efficiency in AI operations is not purely a technical question, but a strategic lever: "Companies that want to scale AI must understand energy consumption and compute costs as business parameters – not as an IT detail. What applies to LLM training today will be equally critical tomorrow in the operation of AI agents and autonomous workflows."

The research findings from Twente are therefore also relevant for companies that do not train their own frontier models. The underlying principles – granular resource control, forward-looking workload planning, and continuous optimization – apply equally to the operation of LLM-based agent systems, Retrieval-Augmented-Generation pipelines (RAG), and other AI-driven automation infrastructures.

Implications for the AI infrastructure of the future

The research team is currently working on a tool that can automatically calculate and implement optimal frequency control for any workload. Should this approach achieve broad adoption, it would have far-reaching consequences:

  • Lower training costs: Cloud providers and AI labs could structurally reduce operating costs without sacrificing model quality.
  • Sustainability goals: Companies with ESG commitments benefit from measurable CO₂ savings across their AI value chains.
  • Hardware roadmaps: The incentive is growing to deploy newer GPU generations with faster switching times – a compelling argument for investment decisions toward modern Blackwell or successor architectures.
  • Software-side optimization: The method demonstrates that significant efficiency gains are achievable through intelligent software-hardware co-design – without new chips or new model architectures.

What remains – and what lies ahead

The approach from the Netherlands is emblematic of a broader development: the AI industry is beginning to view efficiency not as the opposite of capability, but as a complementary dimension. While public discourse frequently focuses on new model sizes and benchmark records, the real-world competitiveness of AI systems is increasingly determined at the level of infrastructure, operating costs, and scalability.

Dr. Maik Bunzel, who supports companies through mabucon.eu in implementing Agentic AI systems and AI-driven automation workflows, sees a clear message in this for the mid-market: "Anyone who wants to scale AI seriously needs to start integrating energy efficiency and infrastructure costs into their AI strategy – at the latest now. The technology is maturing – and those who optimize early will have a structural advantage later."

Whether DVFS-based clock control becomes a standard tool in LLM training depends on how quickly corresponding automation tools become available and whether GPU manufacturers make the relevant interfaces accessible to broader user groups. The academic proof has been established. Translating this into production-ready systems is the next challenge – and one in which the interplay between research, hardware development, and enterprise demand will be decisive.

Contact

Which of your workflows should become smarter first?

Briefly describe the process you would like to support or replace with AI. We will get back to you with a first, concrete assessment — no obligation and confidential.