From Boardroom to Tech Stack
The AI providers publish detailed evaluations of their own models. They call these reports Model System Cards. Buried in the technical sections of these documents, their own research scientists report hallucination rates ranging from 23% to 79%.
Those are not adversarial red-team results. Those are self-reported accuracy failures on the models that enterprises are building production applications on top of right now.
Dominique Shelton Leipzig, founder of Global Data Innovation and a governance advisor who has trained over 1,600 CEOs and board members on AI risk, surfaced these numbers in a recent column for FT Agenda. Her observation was striking: to a person, not one executive she had briefed had previously seen the error rates. The data was not hidden. It was published on the providers' own websites. It simply never entered the sales cycle or the boardroom.
The Information Gap Is the Risk
The typical enterprise AI deployment follows a familiar pattern. A line of business builds an application on top of a foundation model. The application ships. Usage scales. And nobody in the C-suite has reviewed the underlying model's accuracy characteristics because that data lives in a research document that the sales team has likely never read either.
This is not a theoretical concern. Credible research from MIT, Berkeley, and Brown has established that model drift, the degradation of accuracy over time, is endemic to large language models. It will not be eliminated through prompt engineering, retrieval-augmented generation, or future model iterations. The error rates that exist today will persist in some form tomorrow. The only variable is whether an organization detects them before they cause harm.
The consequences of undetected failure are already appearing in the public record. Surgical AI that misidentified instrument positioning, leading to patient injuries. Student risk-assessment models that flagged thousands of children based on misinterpreted slang. Autonomous vehicle programs scrapped after board members were named individually in derivative suits. Shelton Leipzig calculates that cybercrime has cost the global economy nearly $60 trillion over the past decade and warns that AI-driven incidents could multiply that figure by an order of magnitude.
The Governance Blueprint Exists
The good news is that the leadership behaviors required for successful AI deployment are well understood. Shelton Leipzig's TRUST framework identifies five practices that correlate with 85 to 90 percent of successful enterprise AI outcomes: Triage (select use cases aligned with business strategy), Right Data (ensure accuracy and rights compliance in training data), Uninterrupted Testing, Monitoring and Auditing (continuous verification against human standards), Supervision (accountability structures with rapid escalation), and Technical Documentation (audit trails that allow teams to identify when drift began and correct it).
She uses an analogy that resonates: continuous AI monitoring is like an alarm system on your home. If you have no sensors on the windows, you will not know when a burglar enters. The same principle applies to high-risk AI. Without continuous testing against accuracy metrics, an organization is running blind.
This framework gives boards and CEOs the clarity they need on what to demand from their organizations. The harder question is how to actually execute it.
From Policy Document to Production Infrastructure
Governance frameworks only matter if they translate into the technology stack. A compliance policy that exists as a PDF on a shared drive does not prevent an employee from pasting client data into a Copilot-enabled document. A board resolution on AI oversight does not automatically generate the audit trail a regulator will request.
This is the gap we built APERION to close.
SmartFlow is a centralized AI governance control plane that sits between an enterprise's applications and the models they depend on. All AI traffic routes through it. Every request and response is inspected, classified, and logged in real time. When the standards that leadership has defined are violated, whether that is PII leakage, an unauthorized model query, a compliance boundary crossing, or an anomalous usage pattern, SmartFlow does not just record it. It stops it. It triggers an escalation workflow. A manager gets notified. A CISO gets alerted. The violation is captured in an immutable audit log that feeds directly into regulatory examination packages.
This is what Shelton Leipzig's "U" in TRUST looks like when it moves from recommendation to reality. Uninterrupted testing, monitoring, and auditing is not a quarterly report from your external auditor. It is a 24/7 sensor network embedded in the AI traffic path, enforcing standards that the organization's own subject matter experts defined, alerting in real time when something deviates.
The governance blueprint tells you what to build. The control plane is how you build it.
What Comes Next
This is the first in a series we are calling "From Boardroom to Tech Stack." Over the next several posts, we will walk through each element of the TRUST framework and show exactly what the governance infrastructure looks like when it moves from strategy into production. We will cover how policy engines enforce use-case triage, how content classification protects data integrity, how agent identity systems create accountability at machine speed, and how audit trails generate the regulatory documentation that keeps enterprises out of enforcement actions.
Dominique Shelton Leipzig's full column in FT Agenda, "Boards May Find AI Error Rates Too Big a Risk Without Guardrails," is available here. Her book, TRUST: Responsible AI, Innovation, Privacy & Data Leadership, won the 2024 getAbstract Business Impact Book of the Year.
Ready to govern your AI infrastructure?
See how SmartFlow gives regulated industries complete AI sovereignty.
Request a Demo View Documentation