Stop building chatbots. Start building agents.

The past two years of AI investment in marketing teams has, with some honourable exceptions, produced a lot of chatbots. Customer-service chatbots. Internal knowledge-base chatbots. Lead-qualification chatbots. Sales-coaching chatbots. The pattern is consistent: a team identifies an interaction-heavy process, points an LLM at it, and ships an interface where a human asks questions and gets answers.

These projects aren't bad. Some save genuine time. They share a structural ceiling that's going to start to bite in 2026, and it's worth being explicit about it.

The ceiling: a chatbot is, by design, an interaction layer. It makes a single human-machine exchange faster. It doesn't, on its own, change the shape of the work. The human still initiates, decides, reviews, follows up, integrates the output into the next step. The chatbot has compressed one moment of friction in a longer chain. The chain is still the same length.

The actual leverage point is one level up.

What an agent is, and why it matters

An agent is a system that takes on a multi-step task end-to-end. It plans. It calls tools. It makes intermediate decisions. It produces output that's closer to a finished deliverable than to a conversational reply. It only escalates to a human when it hits genuine ambiguity that requires judgment — and even then, the escalation is structured, not "here's a chat window, ask me something".

The mechanical difference between a chatbot and an agent is significant but easy to under-appreciate. Both use the same underlying models. Both take natural-language input. Both can return natural-language output. The difference is what happens between input and output.

In a chatbot, the model receives a prompt, generates a response, stops. In an agent, the model receives a goal, plans a sequence of actions, executes those actions (calling APIs, querying databases, drafting documents, checking its own work, retrying when it fails), and only returns to a human when the goal is met or when it needs explicit input to proceed.

That difference compounds. A single chatbot exchange might save five minutes. A well-designed agent can take a workflow that previously consumed a full afternoon of senior time and run it in the background while the senior person works on something else.

We've been building these agents for clients and for our own products for the last eighteen months. The pattern that keeps repeating: the magnitude of the leverage scales with how much of the workflow you let the agent own. A chatbot owns one turn. An agent owns the whole loop.

A concrete example

The clearest example I can offer is from our own product, amivisible.co (architecture walkthrough here), an audit pipeline for AI search visibility. The end deliverable is a long-form audit report — typically forty to sixty pages — covering a brand's visibility across ChatGPT, Perplexity, Google AI Overviews and Copilot, with competitor comparison, share-of-voice analysis, prioritised recommendations.

A year ago, we tried running this as a chatbot-assisted workflow. The shape was: a human runs structured prompts against the AI engines, captures responses, classifies them, asks the chatbot to summarise particular sections, drafts the report, has the chatbot help with phrasing in places, formats the output, generates the charts. The chatbot saved time on writing-and-summarisation. The human was still the conductor of the whole orchestra.

The agentic version of the same pipeline is structurally different. We give the agent a target brand and a competitor set. It autonomously runs the prompt set against the engines, captures and validates the responses, classifies brand and competitor mentions, computes share-of-voice metrics, builds the data exhibits, drafts each section of the report, has a separate quality-assurance agent check the draft for errors and inconsistencies, makes corrections, produces a finished PDF. The human is involved at the start (defining scope) and at the end (sense-checking the output). The middle five hours of work are gone.

The cost difference is substantial. The first agentic pipeline we built returned an audit at roughly a tenth of the human-hour cost of the manual version, with comparable or better consistency. We now run it at scale.

The point of the example isn't that we're clever — and the leverage point most marketing teams miss is exactly this distinction. The point is the leverage came from changing the layer the AI was operating at. From "answer my question" to "own the workflow".

Why most teams aren't getting there

Two reasons.

Agentic systems are operationally harder to ship than chatbots. A chatbot can fail gracefully — the user gets a bad answer, asks a follow-up, moves on. An agent failing in the middle of a five-step workflow can leave you with half-completed work, partial database writes, and an output that looks finished but is wrong in ways that take time to detect. Production-grade agentic systems require error handling, observability, retry logic, escalation paths, and quality assurance loops that are more like building a production service than building a chatbot.

The path to "an agent would be useful" requires different analytical work. You have to map the workflow, identify the decision points, understand which decisions can be made by a model and which must be made by a human, define the data the agent needs access to, define the tools it can call, design the interfaces between agent and human. That's process engineering, not prompt engineering. Most teams that have invested in AI in the last two years have built up prompt-engineering capability, not process-engineering capability.

This is why the gap between teams shipping chatbots and teams shipping agents is wider than you'd expect from the underlying technology. The technology is broadly the same. The required organisational capability isn't.

Where to start

If you're running an in-house team or a service business, the practical starting point is to identify a workflow with the following properties:

It's high-volume or high-frequency, so the cumulative time saved is meaningful. It has a clear, structured output — a report, a brief, a draft, a classification. The decision points within it are mostly within what a model can do well, with a small number of genuine human-judgment moments. The cost of an error is recoverable — the work has a review cycle before it goes external, so an agent's mistake gets caught.

A workflow that meets these criteria is a candidate. A workflow that doesn't — anything where errors are catastrophic, anything where every decision is genuinely high-judgment — isn't, at least not yet. Don't try to build an agent for those. Build a chatbot, or do nothing.

The second piece of advice, said with feeling: invest in the unsexy parts. Observability. Logging. Quality assurance loops. Human-in-the-loop interfaces. Failure modes. The shiny demo of an agent doing something impressive isn't the hard part. The hard part is making it reliable enough to actually trust with real work, every day, for months.

Why this matters in the broader picture

The companies that will quietly outperform over the next three years aren't the ones with the cleverest chatbot. They're the ones that have thought carefully about which of their workflows can be owned end-to-end by agentic systems, and have done the operational work to make those systems reliable in production.

The ROI gap between the two camps is going to widen, faster than most boards are pricing in. A chatbot saves a percentage of a task. An agent removes the task from the team's calendar. Those aren't the same kind of return, and the difference compounds when you start running multiple agents in concert.

If you're still in the chatbot phase, that's fine — most organisations are. But the question worth pinning to the wall is this: which workflows in our operation could be owned end-to-end by an agent, and what would change in the business if that happened?

That question is where the real leverage is. The chatbot question is a distraction.