While AI technologies have been around for some time, they captured everyone’s imagination with the release of large language models (LLMs) and tools like ChatGPT. These models have given us a general language interface that can seemingly answer any question.
In the past year, we have also seen the introduction of Agents, which link language models to a broader set of tools and complex workflows. These tools are introducing AI into more complex processes and changing the nature of work. Yet, for all the hype and attention, real-world success has been mixed. Many AI projects fail, and as a result, many business leaders remain cautious about their implementation.
Technology meets reality
Sitting at the intersection of AI and data, I have been thinking about the gap between the technology and its successful adoption.
First, it is clear to me that the technology is very real. Language modes have been rapidly adopted by document-centric functions like marketing, software development, and research. And agents are adding the capability to take on more complex tasks, to make semi-autonomous decisions, and interact with external tools.
Meanwhile, the adoption of LLM and Agent technologies in core business operations is much less consistent. Failure rates are high, and two challenges consistently appear.
- The underlying data is not of high enough quality or adequately identified to be used effectively by AI models.
- Integrating AI systems into production workflows is often much more difficult than anticipated.
Thinking about these problems has taken me back to understand what worked successfully with earlier forms of AI.
Before the introduction of LLMs, we applied AI, in the form of Machine Learning and Optimization, to well-scoped problems. If an organization’s underlying data were inconsistent, we could use brute force to organize the subset of data needed for a particular problem. Enterprises now aim to use LLMs across the entire organization and solve more general use-cases. There is no subset of data anymore. We need quality data at scale across the entire enterprise.
A case in point illustrates the challenge:
I have a customer who is implementing AI to help manage and predict their sales process. They have an industry-leading CRM system which has been in production for 15 years with a significant number of customizations. The “Opportunity” table is the center of their implementation, and it has 250 columns with overlapping calculations, including four conflicting versions of “Purchase Order.”
Unfortunately, no AI system can make sense of that structure.
Everything old is new again
Those of us with (figurative) gray beards remember building semantic models as part of every data project. Sadly, as software development methodologies evolved, semantic modeling became a lost art. Modern development methodologies split large projects into smaller teams: a lead, three or four developers, a tester or two, and maybe a fractional product manager. While these teams can produce functionality quickly, they focus on the data needs of their component, not the larger enterprise.
While we could work around the compartmentalization of data with previous generations of AI, it is totally inadequate to take advantage of the newest capabilities.
LLMs and Agents need a broader (ultimately enterprise) view of data, and they need context and semantics. This provides them the foundation to actually reason and to interact with applications in a complex workflow.
To get there we must couple our modern iterative development practices with business modeling. An important side effect of comprehensive data modeling is that we speed up our development process. The tools our developers use will understand concepts like who a customer is, and the meaning and role of a purchase order. And they will be able to write quality code much faster.
To succeed, we need to bring the data architect back into the picture. This person is a domain expert, trained in modeling techniques. Our software engineers will thank us. Our LLMs and agents will thank us (literally I believe.) And our users will thank us.
What these eyes have seen…
My career has spanned seven generations of data technology, so far. As long as I have been working, vendors have said the same thing, over and over:
“Move your data to our platform and THEN we will deliver value.”
This was true with the first generation of data warehouses, and it is still true today. Now modern cloud vendors now charge customers for storage and processing. When you get “paid by the drink,” you want everyone to belly up to your bar.
Yet despite a couple of decades of trying we have largely failed to consolidate data and systems. Even if we do consolidate some data, it does not add the context which AI requires. So, if we are going to make consolidation a requirement for AI adoption, we are going to be sorely disappointed.
The surprising solution
We build ONE semantic business layer for the organization. It lives outside of all the data stores and applications and adds business context to them. Critically, the coding assistants, LLMs and agents consult this business layer directly. Through the business layer the AI knows how all the enterprise data is related and can find that data in the physical databases and APIs.
This new layer allows us to:
- Leave the data we have in place.
- Evolve applications independently of each other.
- Document the systems we have in both end-user and technical terms.
- Enable AI to range across all enterprise data and applications and find new and interesting knowledge in that data.
There are two significant changes to make this work.
- We need to elevate information modeling to a core business function. We may start the exercise at a project level, but we need this to evolve into an enterprise function.
- We need to stop spinning our wheels consolidating data platforms and focus on adding business value to the data we already have.
AI technology will undoubtedly continue to evolve at breakneck speed. For it to succeed at scale, we need to change our core data strategy.