Two roads to AI’s future
A tale of two trajectories
We are at an interesting juncture in the journey of artificial intelligence.
On one hand, since around three years ago, generalist AI agents modeled on human cognition and communication have emerged as versatile assistants in our daily lives, operating through language and social interaction. This murkily defined plateau of human-like behavior has been long-awaited since before the Turing Test in the 1950s, and is now being achieved in many ways under the term “AGI”.
On the other hand, for many years now, AI systems have been showing themselves to be superhuman pattern-recognizing “oracles” in specialized, structured domains. From decoding the secrets of proteins to making weather prediction tractable, such systems – which often fall under the disciplinary umbrellas of “machine learning” and “statistics” – frequently surpass what human experts can accomplish or even perceive.
These two trajectories in AI development now progress side by side, each with distinct objectives, training needs, and roles in society. They are deeply intertwined with complementary strengths, and together they form a useful framework for understanding where AI is today, where it’s headed, and how to best harness it.
By comparing these two paths’ goals and challenges, we can understand:
- Why generalist language agents capture broad commercial value across society, but differently across domains
- Why pattern-recognizing AI in science demands superhuman abilities
- Why some people are awed by LLM-based AIs, and others unperturbed
- How they might interact with and complement each other
- What bottlenecks each must overcome, as a function of the application
- Why AGI is not the end of any story, but the beginning of a new chapter
The role of human feedback in each of these two trajectories is crucially different. In the superhuman pattern-recognition paradigm, practitioners often set aside personal intuition and learn to trust the data. By contrast, developing human-like AI behavior demands close alignment with human processes and values, not just replicating results. In other words, success for social-minded AI is defined by how well it mirrors human thinking and meets human-driven criteria, whereas success for specialist oracle-like AI tends to be measured by some quantitative notions of performance in the absence of knowledge, even when it deviates from common intuitions.
Ultimately, viewing AI’s evolution from these different perspectives helps explain how AI has developed so far, and how it may unfold.
Human language and accumulations of thought
Literacy is a crucial skill – it’s the gateway to influencing things beyond our immediate surroundings. The ability to use human language was one of the key breakthroughs in our development as intelligent beings – language is a remarkable general-purpose toolkit that lets us navigate virtually anything we encounter in society, including every concept we can perceive. It universally determines entry into modern human culture. Therefore, AI models that can work with general human language are bound to have far greater societal impact than models that cannot.
The practical power of language lies in how human societies accumulate knowledge through it. For millennia, we’ve stored expertise in tacit forms – through oral traditions, apprenticeships, and imitation. As writing became cheaper and more essential, this spoken wisdom transformed into documented knowledge, enabling increasingly complex knowledge-based systems. The internet now serves as a vast repository of our generation’s collective understanding, providing the raw material to build generalist AI agents.
Yet language is not monolithic. While general human language excels at its purpose – providing flexible tools to describe our ever-changing world – specialized fields develop their own linguistic ecosystems. The carefully measured play-calling of a football quarterback and the dense notation of quantum physicists share this in common. These domain-specific languages evolve to serve particular needs: precision, efficiency, or even deliberate obfuscation. They demonstrate that while human language provides the foundation, specialist knowledge often requires specialist expression to be effectively communicated and manipulated.
This linguistic specialization has profound implications for AI development. When we train models on general internet text, we’re essentially teaching them to navigate the broad toolkit of human communication. But when we need AI to excel in specific domains – whether medicine, law, or materials science – we must grapple with the fact that expertise in these fields exists in specialized linguistic forms that depart significantly from everyday language. The challenge becomes:
How do we bridge the gap between the generalist linguistic capabilities that make AI socially useful and the specialist expertise that makes it powerful in understanding the natural world?
The relationship between language and thought – long studied in linguistics – takes on higher stakes in the current AI era. To the extent that language shapes reasoning capability, the linguistic data we use to train AI systems fundamentally influences their capabilities and limitations. This is why the role of human feedback differs so dramatically between our two AI trajectories: for generalist agents, we need systems that mirror human linguistic, social, and thought processes, while for scientific AI, we often need to transcend human linguistic limitations entirely.
Generalist agents in human society
Generalist agents such as large language models (LLMs) are changing the face of our world. They are built to operate as humans do with language, allowing them to mirror human communication and reasoning and collaborate with humans.
These models are trained on broad internet-scale corpora – books, websites, conversations – to emulate human linguistic and cultural patterns, like drafting emails, explaining concepts, and solve problems across countless domains. Their value lies not in statistical perfection but in immediate utility – can they help humans complete tasks naturally and effectively? Because language is a universal interface to human society, these agents can serve countless roles with minimal domain-specific customization, which is why generative AI is now expected to tremendously impact every knowledge-manipulation role in society.
These language-based systems face their own bottlenecks, even in integrating into roles in society as a human would. Their generalization to new concepts can be shallow, for instance in analogical reasoning and abstract problem solving. And they are trained on text alone, without grounding in physical reality. As research into these bottlenecks progresses, they are gradually overcome by encoding additional tacit knowledge into the system.
The role of human feedback in training is paramount for these systems. Techniques like chain-of-thought prompting and intermediate supervision have been found to help generalist agents simulate procedural reasoning and get closer to robust cognition. Unlike scientific AI where human intuition must often be disfavored in favor of data, generalist agents must index heavily on human processes and judgments. Their objective functions are inherently harder to define – “success” means matching human expectations across countless implicit dimensions.
Statistical generalization therefore matters less than immediate human utility: can the system draft emails, explain concepts, or solve problems in ways that feel natural and helpful to humans? In other words, an LLM-based AI proves itself not by excelling on some theoretical data distribution, but by doing a human task as well as (or better than) a human would, all without human biological limitations like attention span and fatigue.
Superhuman pattern recognition in science and tech
Savant-like AI oracle models excel at finding subtle patterns in vast scientific data – far beyond what human perception alone can achieve. In fields like biotechnology, materials science, and physics, there has been an explosion of complex data that no single human (or even team of humans) could ever fully parse. Modern AI systems thrive in these highly structured domains by detecting patterns and solutions invisible to us.
The numerous achievements are striking:
- Deep learning models can analyze biomedical images or genomic sequences with such precision that they sometimes outperform expert physicians in tasks like identifying diseases from scans.
- AlphaFold and successor systems have achieved breakthroughs in predicting protein structures with computation that no human could ever match.
- In drug discovery, weather forecasting, quantum chemistry and more, AI models routinely sift through terabytes of experimental data to make conclusions at scales far beyond human capacity.
Why do these scientific domains demand superhuman ability? The answer lies in their complexity and volume of data. A biotech AI might need to correlate millions of genetic markers with disease outcomes, or envision how a slight change in a molecule’s structure alters its behavior. These systems often integrate first-principles knowledge – such as physics simulations – to structure the learning process, and their success can be measured against well-defined objectives. Humans are fundamentally limited by our biology in being able to process some of this data.
These AIs operate independently of human interaction; they do not require language to achieve their goals. This allows them to scale and optimize far more quickly in some dimensions. But they often lack common sense or the ability to explain their reasoning in human terms.
A recurring bottleneck is out-of-distribution generalization. Even the most capable scientific model can falter when presented with data that deviates from its training set. This is a known issue in fields like genomics and materials science, where generalization to novel scenarios can be difficult.
Humans have progressed in these situations by using strong inductive biases that encode significant tacit knowledge beyond what is in a given dataset. Injecting such domain knowledge – reaction mechanisms, symmetry groups, biophysical constraints, known gene networks – focuses search toward productive regions of solution space. AlphaFold’s Evoformer architecture succeeded precisely because it encoded protein-specific evolutionary priors that generic language models lack. Even single expert demonstrations provide invaluable guidance: one documented synthesis route can anchor an entire molecular generation policy.
For those working deep in these specialist fields, attitudes toward AI often differ from the lay public’s. The languages experts use are highly technical and removed from everyday speech. This has far-reaching consequences that hinder collaboration, as their tacit knowledge can be demanding to understand for outsiders and even insiders. Meanwhile, education of new experts deliberately deemphasizes the role of intuition in favor of empirical, measurable observation. As a result, domain scientists and engineers tend to view AI advances in their arena as separate from the buzz around LLM-based methods. They are not easily moved by a fluent chatbot’s verbal manipulations, recognizing the distinction with the type of AI systems that can solve their superhumanly hard pattern recognition problems.
Bridging the trajectories: interaction and integration
There is growing convergence between these paths. By now, it is common to build systems in which a generalist LLM delegates sub-tasks to domain-specific models. The LLM orchestrates, explains, and packages the specialist results for human users. Here, language serves as glue across modules.
This modular paradigm – generalist shell with specialist cores – allows agent systems to achieve both depth and breadth. But it requires solving hard problems like:
- Translating domain-specific outputs (e.g., a spectral feature) into human-understandable terms.
- Deciding when to invoke specialist models.
- Maintaining coherence across complex, multi-stage pipelines.
We also face issues of tacit knowledge transfer. These present themselves directly when we supervise an agent to be “like a good human mentor” – to pass on not just answers but reasoning processes. Agents can generalize well in this manner, mirroring how human apprentices learn: with examples, guidance, and feedback loops. But such tacit knowledge is often not so easily identified, and in specialist domains it is often confined to a few domain experts or waiting to be discovered in a good embedding of the data.
Remarkably, the machine learning foundations underlying all these systems – generalist and specialist systems alike – are shared. Statistical learning is a universal toolkit when applied properly; pattern-recognition techniques have been found to work across domains. Therefore, one might imagine a future superintelligence that seamlessly bridges the two modes, subsuming all this functionality in one system.
But such a unified AI will not emerge overnight, just as it took years of data accumulation for today’s training methods to work, in language and other scientific domains. Each domain-specific subsystem still requires extensive training and refinement, especially to make long sequences of good decisions and to find the proverbial needle in a haystack when solving complex tasks. Though engineering may well get much faster with the right AI tools, in scientific understanding there is a bottleneck in eliciting inductive biases and tacit knowledge to achieve human-level success and greater.
In any given scientific or technical domain, the speed of integration of these abilities is likely to be driven by the state of such elicitation. The more relevant tacit knowledge that is available, the faster human-level performance can be achieved and surpassed.
It’s difficult to imagine what superhuman-level performance will look like, especially in domains in biochemistry and the physical sciences where the necessary data are so vast. But there is a high expectation that more exists in the combination of data sets than has been found so far. We are now for the first time starting to measure and design objects of biology and chemistry at the scale of their function – signaling/ligand molecules at the micromolecular level, antibodies at the macromolecular level, tissues at the level of individual cells and their microenvironment, and so on. Most of the mysteries of drug discovery remain to be addressed, and there is reason for optimism that many answers can be found in the deluge of modern data.
Future implications
The future likely belongs to a synthesis: systems that combine the scientific prowess of specialist oracles with the communicative flexibility of social agents. Specialist subsystems will push forward scientific frontiers while generalists help disseminate and apply these insights to human goals. Our role is to guide this convergence thoughtfully, ensuring alignment with human values while maximizing beneficial capabilities.
Several key implications emerge from this way of thinking:
Distinct engineering targets: Modern discourse often conflates learning to understand the world on one hand, with being useful within human institutions on the other hand. These represent fundamentally different challenges. AlphaFold-3 achieves superhuman accuracy at protein structure prediction yet has zero awareness of quarterly objectives or team dynamics. GPT-4-class agents integrate seamlessly into email threads and development environments despite having modest scientific capabilities compared to AlphaFold. Each optimizes for different success metrics – objectively well-defined metrics on one hand, and alignment with human society on the other. AI in the Silicon Valley sense often means agents that can talk, plan, and execute – the “social mind” – while AI in science means “models that optimize specific objectives.” These communities optimize for different outcomes. Tool-augmented agents can bridge this gap in various ways, by maintaining conversational interfaces while delegating to domain-specific oracles when tasks exceed their competence threshold.
Fragmentation in specialist domains: Physical science data inhabit wildly different mathematical spaces – crystallographic densities, single-cell expression matrices, mass spectra – each with unique signal properties, symmetries, and evaluation metrics. Because these constraints are intrinsic to the data, unified super-models rarely achieve economies of scale outside textual modalities. The empirical track record bears this out: separate state-of-the-art models exist for protein folding, weather now-casting and quantum-chemistry energies, and they improve fastest when kept semi-autonomous. Attempts to lump them together have usually stalled on label scarcity or incompatible inductive biases.
Consolidation at the interface layer: While specialist models resist unification, the interface layer – systems that translate natural language intent into orchestrated specialist actions – benefits enormously from consolidation. Multi-agent frameworks show steep returns to scale when a single cognitive workspace can search, invoke tools, verify results, and plan coherently. This would lead to “one super-agent, many specialist oracles” as the dominant architecture. Language-conditioned policy models excel here because natural language provides a rich, overloaded toolkit for coordination and planning.
Language as a bottleneck: Because literacy already bottlenecks human capital, models that engage us through language enjoy outsized influence. But language operates at human cognitive bandwidth – suitable for our serial, attention-limited processing. In data-rich scientific domains, structured representations – molecular graphs, reaction templates, genomic tensors – carry orders of magnitude more information per unit. Future systems will likely ingest these native formats directly (as specialist foundation models already do), translating to prose only for final human consumption.
The value of inductive biases: Without strong inductive biases, exploration becomes computationally intractable. As pattern recognition methods improve and the marginal gains there become smaller, the elicitation of tacit knowledge is becoming an increasingly key challenge in scientific AI.
The strategic implications here vary for different stakeholders, who each have different roles to play.
Companies will invest in shared agent platforms while building or licensing specialized oracles where intellectual property provides competitive advantage. Researcher activity would be valuable in advancing the translation layer – developing data schemas, API protocols, and verification frameworks that allow seamless oracle integration with the guarantees a business needs. This also highlights that regulators would need to consider a full system of agents plus oracles rather than treating model weights in isolation, as the emergent behavior of integrated systems may differ dramatically from what their components would suggest.
Intelligence is not monolithic. By embracing both specialist oracles and social minds as distinct characteristics of AI subsystems, we can build AI systems that augment human capability across all dimensions, from scientific discovery to daily communication and collaboration. The path forward for now clearly requires both approaches, and orchestrating their complementary strengths.