Mathematical Breakthrough in LLMs? What's Behind Subquadratic's Claims

Dr. Maik Bunzel

21.06.2026 · 7 min read

Mathematical Breakthrough in LLMs? What's Behind Subquadratic's Claims

A Startup Shakes the Foundations of Modern Language Models

In the world of large language models (LLMs), genuine architectural breakthroughs are rare. Most of the time, it is incremental improvements in training data, parameter count, or Fine-Tuning methods that make new model generations better. That is why it caused such a stir when the Miami-based AI startup Subquadratic emerged from stealth mode and claimed to have solved one of the fundamental mathematical problems of modern LLMs – a problem that has been slowing the industry down for nearly a decade. Initial skepticism was high, and the first evidence was thin. Yet there are now independent evaluation results that at least give pause for thought.

The Core Problem: Why Transformers Are So Expensive

To understand why Subquadratic's claims are so explosive, one needs to briefly examine how today's LLMs work. The dominant architectural principle since 2017 has been the Transformer, described in the landmark paper "Attention Is All You Need" by Google researchers. At the heart of every Transformer operates a mechanism known as Dense Attention.

Dense Attention works, in simplified terms, as follows: each word (more precisely, each token) in a text is encoded as a number. This number is then multiplied by the numbers of all other tokens – for every possible word pair. For a text containing 10,000 words, this produces nearly 50 million individual multiplications. And the insidious part: the number of computations does not grow linearly but quadratically with the length of the text. Double the number of tokens, and the computational effort quadruples. This is precisely the effect known as quadratic expansion – and it is the main reason why LLMs are notoriously energy-hungry and expensive to operate.

For companies seeking to run LLM-based workflows at scale, this characteristic is not an academic problem but a very real cost brake. Dr. Maik Bunzel, founder and managing director of mabucon.eu, regularly observes this bottleneck in practice: "Many of our clients hit their limits precisely when it comes to processing very large document volumes or extensive codebases in an automated way. The computational effort makes such scenarios economically unviable today in many cases."

Sparse Attention: The Idea Behind the Promise

Subquadratic relies on an approach known in the research community as Sparse Attention. The core idea: not all relationships between tokens in a text are equally relevant. A model does not necessarily have to compare every word with every other word in order to grasp the meaning of a document. Sparse Attention selectively determines which token pairs are actually compared with one another – and skips the rest.

This sounds elegant, but it is anything but trivial. Earlier attempts to make Sparse Attention production-ready frequently failed because the simplified selection rules (for example: "always compare the first word with the fifth") were too rigid to capture the complexity of natural language. The performance of these models lagged behind Dense Attention systems.

What Subquadratic claims sets it apart: its model called SubQ selects relevant token pairs dynamically and text-specifically – not according to a fixed pattern, but computed adaptively for each input text anew. The exact details of how this selection works are kept as a trade secret. This is not unusual in the AI industry, but it does make external verification more difficult.

What the independent tests show

The decisive step from claim to evidence came when Subquadratic published results from an independent evaluation conducted by the company Appen. The results are remarkable:

Speed: In a pure speed test, SubQ was 56 times faster than models using FlashAttention, according to Appen – an established, already optimized sparse attention technique.
Coding performance: On LiveCodeBench, a benchmark for real-world competitive programming tasks, SubQ achieved 89.7% – a result that places it on par with leading coding models from OpenAI, Google DeepMind, or Anthropic.
Context window: SubQ is said to support a context window of up to 12 million tokens. For comparison: most current frontier models work with around one million tokens. In the so-called Needle-in-a-Haystack test – in which a model is tasked with extracting specific information from vast bodies of text – SubQ achieved 98% accuracy according to Appen, at both 6 and 12 million tokens.
Cost savings: According to the company, processing a specific benchmark run with Anthropic's Opus model costs around $2,600 – with SubQ, allegedly just $8. This figure has so far not been independently verified, as SubQ is not yet generally available.

„This could be a game changer, because models struggle with speed and inefficiency. But when you have kind of shocking results, it's really not as credible when you say it yourself." – Jeanine Sinanan-Singh, Appen, Director of Generative AI Research

Limitations and open questions

Benchmarks do not provide a complete picture of a model's capabilities. They measure performance under controlled, specific conditions and are no substitute for deployment across a broad range of real-world tasks. Furthermore, Subquadratic has so far made SubQ accessible to only a very limited number of users – despite reportedly tens of thousands of interested parties on the waitlist, including more than 500 enterprise customers.

The fact that the precise mechanism behind the dynamic token selection is not disclosed also makes a full scientific assessment impossible. Experienced AI engineers point out that in this area "pretty much everything has already been tried" and that sparse attention has nonetheless failed to become a lasting alternative to dense attention at the frontier level. The community's verdict remains correspondingly divided: groundbreaking architecture or well-staged hype?

SubQ is also not a general-purpose model intended to replace existing systems across the board. The company explicitly positions it for two scenarios: coding tasks and the processing of very large datasets. For other tasks – such as creative writing, complex reasoning chains, or multimodal processing – no comparable evidence has been presented to date.

What does this mean for businesses?

Regardless of how the debate around Subquadratic evolves, it highlights a central issue for AI-driven enterprise processes: The cost and energy question surrounding LLMs is real and strategically relevant. Anyone planning automation projects today that rely on analyzing large volumes of documents – such as contracts, technical documentation, codebases, or research reports – will quickly encounter economic limitations.

Dr. Maik Bunzel, founder and CEO of mabucon.eu, sees developments like SubQ as a signal driving the entire industry forward: "If it is confirmed that Sparse-Attention architectures can genuinely deliver frontier-level performance at a fraction of the computational cost, this fundamentally changes the calculus for many automation scenarios. Not only for large corporations, but especially for mid-sized businesses that have so far been hesitant due to operating costs."

It is important to remain nuanced: new architectural approaches take time before they are production-ready and reliably scalable. Businesses should closely monitor whether and how SubQ becomes accessible to a broader audience in the coming months – and what independent real-world tests follow. Until then, the bottom line is: the promise is substantial, but proof in productive deployment has yet to be delivered.

Outlook: The end of the Transformer era?

Subquadratic CEO Justin Dangel puts it provocatively: "We don't believe anyone will still be building on Transformers in a few years." This is a thesis that, as expected, is being debated controversially within the AI research community. Transformers are not only technologically dominant – they are the foundation of immense investments in hardware (particularly Nvidia GPUs), training pipelines, and infrastructure.

Nevertheless: the history of technology shows that fundamental architectural shifts are possible when the efficiency advantages are large enough. Whether Subquadratic actually initiates this transition or whether SubQ remains an interesting niche product will have to be demonstrated in the next phase of public scrutiny – including real API access and peer-reviewed publications.

For businesses building AI strategies today, the practical takeaway is: keep an eye on technological developments like this one, design pilot projects with flexibility, and treat architectural questions when selecting AI solutions as a strategic criterion – not just benchmark scores.