When AI Always Says 7: The Groupthink Problem of Large Language Models and What It Means for Businesses


The Experiment That Causes Discomfort
Anyone with a little time to spare can run a simple test: open your preferred AI chatbot – whether ChatGPT, Claude, or Gemini – and type: "Give me a random number between 1 and 10." The answer will most likely be 7. Repeat the request, and you'll usually get a 3 or 4, then an 8 or 9. What seems like a magic trick is actually a symptom of a deeply rooted structural problem with modern Large Language Models (LLMs): they are far more predictable, far more conformist, and far less creative than their users generally assume.
This phenomenon is no coincidence and no bug – it is a direct consequence of the way these models are trained. And it has far-reaching implications for companies that use AI not only for structured, clearly defined tasks, but also for idea development, strategic brainstorming, and creative processes.
Homogeneity as a Systemic Feature
Researchers have examined the phenomenon under the apt term "Artificial Hivemind," identifying a remarkable uniformity not only within individual models, but also across different systems from different manufacturers. When 25 different LLMs were each asked 50 times to formulate a metaphor for time, most of the 1,250 responses were: "Time is a river" or "Time is a weaver." The work was awarded the Best Paper Award at NeurIPS – one of the most prestigious AI conferences in the world.
The cause lies in the structural similarity of the training processes: most leading LLMs are trained on similar datasets, using similar methods, and for similar use cases. The result is a kind of collective regression to the mean – models favor statistically frequent, socially validated answers and avoid outliers. In other words, they are optimized for consensus, not originality.
"The way most chat interfaces are designed conveys the feeling of a personal conversation. Most people don't really realize the extent to which they're getting the same thing as everyone else."
For clearly defined, repeatable tasks – database queries, code generation, document summarization – this characteristic is quite useful. But as soon as companies embed AI in exploratory or strategic contexts, the model runs up against a fundamental limitation.
The Temperature Fallacy and Why Simple Parameter Tweaks Are Not Enough
It is tempting to assume that the problem can be solved through technical settings. LLMs have a parameter called "Temperature" that controls the randomness of the output. Higher temperature, more variance – so the simplified logic goes. In practice, however, simply cranking up this dial quickly leads to incoherence: models begin switching languages mid-response or producing semantically disconnected blocks of text.
The Australian startup Springboards has taken a different approach: their model "Flint," built on Alibaba's open-source model Qwen 3, was trained to specifically identify those points in a response where greater variance is meaningful and possible – and to increase randomness only at those points. When someone asks "Where should I travel in Europe?", the model only needs randomness at the point where it names the destination – not at every single word of the response. This precise, context-aware approach is technically demanding, but delivers far more convincing results than blanket parameter adjustments.
What This Means for Businesses
Dr. Maik Bunzel, founder and CEO of mabucon.eu, has long been pointing to a distinction that often gets lost in everyday business practice: there is a fundamental difference between AI systems that execute and those that explore. For workflow automation, repetitive business processes, and structured data processing, the uniformity of LLMs is not a disadvantage – it is a strength. Deterministic, reproducible behavior is precisely what you want in those contexts.
The situation is different for use cases where AI is meant to serve as a creative sparring partner: brand development, campaign ideation, strategic scenario planning, product innovation. Here, as research shows, standard models essentially produce the average of their training data – a distilled, filtered consensus of what counts as a "good answer" on the internet. For companies hoping to gain a creative competitive advantage through AI, this is a sobering insight.
- Ideation and Brainstorming: Standard models tend toward predictable, market-conforming ideas. Anyone using AI for genuine differentiation needs either specialized models or carefully designed prompt architectures that actively force divergence.
- Strategic Analysis: When different teams use the same models for market analysis, their AI-driven insights will inevitably converge – a competitive disadvantage that is not immediately visible.
- Automated Content Production: Mass-produced, AI-generated content from the same models will become increasingly similar – a serious challenge for brand differentiation.
- Multi-Model Strategies: The deliberate combination of different models with varying characteristics can help break the inherent uniformity of individual systems.
Hallucinations Reconsidered: A Paradigm Shift?
What is remarkable is the philosophical shift that Springboards is making with Flint. While the entire AI industry has spent years fighting hallucinations – the fabrication of facts – as a central problem, the startup advocates for a controlled engagement with the unexpected: "Most language models fight hallucinations. We welcome them," as the company puts it. This sounds provocative, but means something precise: in creative, exploratory contexts, deviation from the statistical mainstream can be valuable – when it is steered and made transparent to humans as a starting point for further processing.
This thought deserves attention because it introduces an important nuance into the AI debate: not all deviations are errors. The distinction between unwanted hallucination in fact-based contexts and productive divergence in creative contexts is a matter of the deployment scenario – and therefore of system design, not just the model itself.
Human oversight remains the decisive factor
An important cautionary note comes from practice itself: even with models that actively generate variety, directly adopting AI output without critical human reflection remains problematic. More variation does not automatically mean higher quality – it means a broader range of options from which people can draw with judgment, contextual knowledge, and creativity.
Dr. Maik Bunzel, founder and managing director of mabucon.eu, puts it aptly in the context of his work with companies: AI agents are most effective when deployed as a structured process accelerator – not as a replacement for human thinking, but as an extension of it. This applies to automation just as much as to creative support.
Outlook: What companies should do now
The insight that LLMs are structurally prone to uniformity should change the way companies set up their AI strategy. In concrete terms, this means:
- Making a clear distinction between automation and exploration applications – and for the latter, deliberately choosing models or configurations that actively encourage divergence.
- Regularly auditing your own AI usage for quality and originality – particularly in scaled content production and strategic analyses.
- Considering multi-model architectures that combine different models with varying strength profiles, rather than relying on a single provider.
- Consistently treating AI outputs as raw material that requires human curation, refinement, and contextualization.
The groupthink problem of LLMs is not a reason to abandon AI – it is a reason to use AI more deliberately and with greater nuance. Those who understand how these systems think and where their blind spots lie can use them far more effectively than someone who adopts their outputs uncritically. In a world where more and more companies are applying the same models to the same questions, it is precisely this understanding that becomes the strategic differentiator.