Resource Mismatch.
Pilots routed through flagship-class APIs. The demo looks impressive. Then unit economics fail at production scale because nobody costed the inference. We baseline cost-per-task during the MVP, not after deployment.
Technology doesn’t fix broken processes — it scales the dysfunction. Every Digital Employee we ship is gated by a five-step framework. We fix what’s broken first, then add AI, then measure. And we kill the agents that fail their KPIs.
Automation amplifies a good process. It amplifies a bad process at the same rate.
The methodology below is the discipline behind every production agent we deploy. It is the reason our outcomes hold under audit — and the reason we refuse engagements that try to skip steps 1 and 2.
Every handoff, every exception, every queue. Rank opportunities by ROI, not by which team is loudest. The map is unglamorous and almost always reveals that the “AI problem” is a process problem.
Before a single line of AI code, eliminate redundant steps and simplify decisions by hand. Automation amplifies a good process; it amplifies a bad process at the same rate. This is the step everyone wants to skip and the one that determines whether the deployment lasts.
Pilot a vertical slice on real data, real users, real edge cases. Same stack that will run in production. Not a demo — a working agent inside the actual operational context.
“We deployed an agent” is not a metric. “Onboarding fell from 16 weeks to 1.9 weeks” is. Shared dashboard, not a quarterly slide. Contract clause: if the numbers don’t move against the baseline, we don’t collect.
Implementations that don’t deliver get killed. No sunk-cost grinding. Loyalty to outcomes, not technology. The discipline to kill is what makes the rest of the framework credible — clients trust the process precisely because we’re prepared to walk away from a deployment we built.
Open-weight and frontier models combined per task. Incentives aligned to your outcomes, not to platform consumption volume.
Smallest model that does the work. The cost difference between tiers can be a factor of ten — and the demo model is rarely the production model.
POPIA is a deployment constraint, not a checkbox. Known jurisdiction, auditable identity, from day one.
Every agent on an orchestration layer with audit trails, guardrails, versioning. Not a Python script. Validated at Gate 1 (working MVP) and Gate 2 (production sign-off).
Grounded in your verified data using RAG. Validation loops cross-reference answers against source documents at runtime.
PII masked before reaching the model. Agents run in secure private-cloud instances. POPIA-aligned deployment is the default.
Responses tested across scenario sets before production. Human-in-the-loop governance enforced on sensitive processes.
Input sanitisation blocks prompt injection. Every decision logged. Each agent operates under its own managed identity.
The technology works. The architecture does not. These are the patterns we’ve seen kill otherwise sensible deployments — and the discipline we apply to avoid each.
Pilots routed through flagship-class APIs. The demo looks impressive. Then unit economics fail at production scale because nobody costed the inference. We baseline cost-per-task during the MVP, not after deployment.
RAG and MCP confused. RAG is the library — what the agent reads. MCP is the bridge — what the agent can do. Treating one as the other produces agents that hallucinate retrieval or execute actions without context.
MCP servers built fast rather than right. Schema hygiene, permission scoping, tool-description quality — those decide whether a server scales past the demo. We treat the MCP layer as production software from week one.
The Diagnostic & MVP is paid, four to twelve weeks, and produces a working agent on real data. Before the first line of agent code we baseline the process you want to fix.