The Future of Agentic AI
Why Small Language Models Will Rule the Enterprise
I think the future of Agentic AI resides within Small Language Models (SLMs).
Let me clarify, I’m not talking about general-purpose SLMs that are just smaller versions of their larger counterparts. The true power lies in models meticulously trained on highly specific data about a particular niche, all while retaining a surprising degree of generalization. The primary decision-making architecture will work with respect to its specialized training data, while broader, general-purpose reasoning will be handled by a separate, language-based architecture.
The challenges in accomplishing this are significant, but not insurmountable:
Lack of Quality Data: The biggest question is, where do you find the niche data to train these SLMs? Unlike the vast, general web data that fuels LLMs, this information is often proprietary, expensive, and requires deep domain expertise to curate.
Training Costs: While SLMs are smaller, the initial training cost to create a specialized, high-performance model can still be high. This often means that only enterprises with substantial resources can afford to build them, though this barrier is slowly lowering.
Loss of Accuracy: SLMs have a reputation for being notorious for hallucinations and inaccuracies, but I believe this is a misconception. When trained on a constrained, highly-relevant dataset, they can be remarkably accurate within their domain. The key is to recognize their limitations and use them only where they are designed to perform.
So, why do I have this opinion?
The High Cost of LLMs: LLMs kill a lot of compute per token. When you pass a General Purpose LLM like GPT a query, it treats all the tokens in the same way. It might achieve accuracy, but at the cost of high compute and, consequently, high financial cost. In enterprise, where millions of queries are processed daily, cost is of utmost importance.
The Inefficiency of Monolithic Architectures: What if there was an architecture that could only trigger certain neural networks in isolation with respect to the query given? This would achieve a similar level of accuracy to a large LLM but at a fraction of the compute cost. This is exactly what a hybrid SLM-based architecture enables. Architectures like Mixture of Experts (MoE) and Mixture of Recursions (MoR) are currently being tested in enterprise, but they have their own challenges. A key hurdle with MoE is that all experts often need to be loaded into VRAM for execution, even if only a few are used for a specific query. This can make implementing these models for multidisciplinary or cross-functional queries, which require loading many different experts, pretty hard to achieve effectively. MoR, on the other hand, is an emerging architecture that focuses on reusing a shared stack of layers, dynamically assigning different recursion depths to individual tokens. While it promises to improve parameter efficiency and adaptive computation, it's still in the early stages of development and has its own engineering complexities.
The Data Privacy Problem: The other major issue with these general-purpose LLMs is that they are often SaaS applications. When you send a query, you're sending your data to an external provider, and there is no guarantee of data privacy. For many industries, from healthcare to finance, this is a non-starter. A private, in-house SLM solves this problem completely by keeping all proprietary data on-premises.
The future of Agentic AI isn't about one monolithic model to rule them all. It's about a federation of specialized, efficient, and cost-effective SLMs, each an expert in its own domain, orchestrated by a general-purpose reasoning engine. This architecture will deliver the performance and specificity enterprises need, without the prohibitive costs and privacy risks of today’s one-size-fits-all models.