Not every problem needs an agent. After 8 years deploying WhatsApp-native AI across African financial services, here are three principles we run by at FCB.ai — when to build an agent versus a workflow, how to keep architecture boringly simple, and how to stop your model from making 'stupid' decisions in production.
Three lessons from deploying WhatsApp agents in production across African financial services.
Every week I get the same pitch from vendors, consultants, and sometimes our own team: "Let's put an agent on it." Customer service? Agent. Lead qualification? Agent. Internal ops? Agent.
After 8 years building WhatsApp-native AI for banks, insurers, and telcos — from TFG's debt collections at 5,000 customers per month, to Botswana Life, Air Caraibes, AFMA Maroc — I can tell you most of those use cases shouldn't be agents. They should be workflows. Or classifiers. Or a well-written prompt chain.
Here are the three principles we actually run by at FCB.ai. None of them are glamorous. All of them have saved us from shipping the wrong thing.
1. Don't build an agent unless the task earns it
An agent is a model in a loop — choosing its own trajectory, calling tools, interpreting feedback, deciding when it's done. That autonomy is expensive. Every turn burns tokens. Every tool call adds latency. Every hallucination compounds.
Before we greenlight an agent architecture on any client project, we run four filters:
Is the problem genuinely ambiguous? If you can draw the decision tree on a whiteboard, build the decision tree. A debt collection reminder flow with three payment outcomes is not an agent problem — it's a workflow. We've seen teams burn six-figure budgets putting reasoning loops around tasks a rules engine solves in 40 lines.
Does the task value justify the token spend? A WhatsApp agent exploring freely costs 30-50K tokens per session. At scale, that's real money. For high-volume, low-margin interactions — FAQ deflection, balance checks, policy lookups — a workflow captures 80% of the value at 5% of the cost. We reserve agentic behaviour for the cases where a human agent would also need to think: complex claims triage, multi-product upsell, disputed collections.
Are the critical capabilities de-risked? Before we scope, we stress-test the bottleneck. For a claims agent, can the model reliably classify loss types from unstructured French+Arabic customer messages? If no, we don't scope the agent — we scope the classifier first, prove it, then wrap it.
What's the cost of a bad decision, and how fast do we catch it? This is the one most teams skip. In regulated financial services, an agent that autonomously promises a refund, commits to a policy change, or misquotes a premium is a compliance incident. We constrain scope accordingly: read-only by default, human-in-the-loop for anything irreversible, full audit trail on every tool call. Yes, that caps scalability. That's the trade-off you accept to deploy in an FSCA or CIMA-regulated environment.
Rule of thumb we use internally: if the task is ambiguous, high-value, verifiable, and low-consequence-on-error — build the agent. If any of those four is missing, build the workflow.
2. Keep the architecture boringly simple
Every production agent we run — whether it's a Botswana Life policy assistant, a TFG collections bot, or an AGMA insurance broker co-pilot — shares the same three components:
- Environment — where the agent operates (WhatsApp, in our case, plus the backend systems behind it)
- Tools — what it can actually do (fetch policy, send document, escalate, schedule callback)
- System prompt — goals, constraints, tone, escalation rules
That's it. The product surface is wildly different across clients. The backbone is nearly identical.
The mistake I see repeatedly — including mistakes we've made — is over-engineering upfront. Custom memory layers. Multi-agent orchestration frameworks. Elaborate routing logic. All of it slows iteration speed, which is the only thing that matters in the first three months of a deployment.
Our sequence is always:
- Ship a working agent with the three components above.
- Get it in front of real users (we onboard clients in weeks, not quarters — that's the whole point).
- Watch what breaks.
- Then optimise: cache trajectories to cut cost, parallelise tool calls to cut latency, stream progress to build user trust.
Optimisation before real traffic is theatre. Build the simplest thing that could work, put it in production, and let the data tell you what to harden.
3. Put yourself inside the agent's context window
This is the habit that most distinguishes teams that ship from teams that demo.
Your agent doesn't see the user. It doesn't see your Notion docs. It doesn't see the Slack thread where you explained the business rule. It sees 10-20K tokens of context, some tool descriptions, a static snapshot of the world, and it has to decide what to do next — with its eyes effectively closed between actions.
When an agent makes a "stupid" decision, 90% of the time it's because we didn't give it what it needed. The fix isn't smarter prompting. It's empathy for the model's actual epistemic situation.
What we do on every project:
- Run the task yourself, constrained to the agent's context. No peeking at the CRM, no domain knowledge from the last client call. Just what's in the prompt and the tool outputs. You'll immediately see what's missing.
- Ask Claude to critique the prompt. Is anything ambiguous? Would you ask for more parameters on any tool? Why did you choose this action at step 4? The model will tell you, bluntly, where your instructions fall apart.
- Audit trajectories post-hoc. When an agent fails a conversation, paste the full transcript back into Claude and ask why it made that call. This has saved us dozens of debugging hours per client.
This sounds obvious. It isn't practised. Teams spend weeks tuning temperature and model size before they spend an afternoon sitting in the agent's seat.
What this looks like at FCB.ai
Concretely, these principles shape how we scope every deployment:
| Use case | Agent or workflow? | Why |
|---|---|---|
| Policy FAQ deflection | Workflow + RAG | Predictable decision tree, low token budget per turn |
| Collections reminder + payment link | Workflow | Rules-based, compliance-sensitive, fully auditable |
| Complex claims triage (multi-product) | Agent | Genuinely ambiguous, high value, verifiable outcome |
| Broker co-pilot (GrowthPilot-style) | Agent | Open-ended, proactive, human-in-the-loop by design |
| Document capture + KYC validation | Workflow with model calls | Deterministic pipeline, regulated output |
The pattern is consistent: agents earn their place in the top quartile of complexity and value. Everywhere else, we deploy workflows that run cheaper, faster, and with cleaner audit trails — which matters when your clients are insurers, banks, and telcos with regulators watching.
The meta point
AI engineering in 2026 isn't about building the most sophisticated architecture. It's about matching architecture to the problem — and having the discipline to reach for the simpler tool 80% of the time.
The teams winning in production aren't the ones with the most agents. They're the ones who know exactly when not to build one.
Antoine Paillusseau
CEO, FCB.ai
