All posts
AIProject ManagementSystems Thinking

Building AI Agents That Actually Work: What a Decade of Project Management Taught Me

·5 min read
|

I started my career writing code. iOS apps, production systems, the kind of work where a bug means a crash and a crash means a one-star review. That discipline — build, ship, watch it break, fix it, ship again — shaped how I think about everything that came after.

A decade later, I'm building AI agents. Not as a machine learning engineer — I'm a project manager. But the agents I build are different from most I've seen, because they come from that same discipline: they're designed to work in the real world, not in a demo.

The Demo Trap

Most AI agent projects start with a demo that impresses a room full of executives. The agent summarises a document, drafts a response, chains three tools together. Everyone claps. Budget gets approved.

Six months later, the agent sits unused. Not because it doesn't work — it works exactly as demonstrated. The problem is that the demo scenario represented about 15% of the actual use cases. The other 85% involve messy inputs, ambiguous instructions, edge cases that weren't in the training data, and users who phrase things differently from the prompt engineer.

The question that separates agents that ship from agents that demo is this: what happens when the input is nothing like what you expected?

I've built fifteen agent systems over the past two years. Meeting processors, document drafters, knowledge retrieval systems, communication assistants, project operations tools. The ones that stuck — the ones people actually use every day — share three properties.

Three Properties of Agents That Last

First, they're narrow. Each agent does one thing well. My meeting processor doesn't also draft emails. My communication assistant doesn't also manage tasks. The temptation to build a general-purpose assistant is strong — "just give it all the tools and let it figure it out." In practice, a narrow agent with deep context outperforms a broad agent with shallow context every time.

Second, they're transparent. They show their work. When my meeting summary agent produces minutes, it cites the transcript. When my document drafter makes assumptions, it flags them. Users don't need to trust the agent blindly — they can verify. Trust that can be verified is the only trust that scales.

15+Agent systems built

Third, they're embedded in existing workflows. The agents that failed were the ones that required people to change how they work. The agents that succeeded slotted into processes that already existed — they made an existing step faster, not different. A project manager who already writes meeting minutes will adopt a tool that drafts them. A project manager who doesn't write meeting minutes won't start because you gave them an AI.

The PM Advantage

Here's something I didn't expect: being a project manager made me better at building AI agents than most engineers I've worked with.

Not because of technical skill — the engineers are better at that. But because project management is fundamentally about understanding how people actually work, not how they say they work. It's about mapping processes before optimising them. It's about knowing that the real requirement is never in the brief — it's in the conversation you have when someone says "well, actually, what I really need is..."

When I design an agent, I start with the workflow, not the model. Who uses this? When? What are they trying to accomplish? What do they do when the tool isn't there? What's their tolerance for errors? How will they know if the output is wrong?

These aren't technical questions. They're PM questions. And they determine whether the agent gets used or gets abandoned.

Failure Modes First

The most important design document for any agent system isn't the architecture diagram. It's the failure mode inventory.

Before I write a line of prompt, I list every way the agent can fail. Wrong input format. Missing context. Ambiguous instruction. Hallucinated output. Correct output that's misinterpreted. Correct output that's right but unhelpful. Each failure mode gets a response — sometimes that's a fallback, sometimes it's a clarifying question, sometimes it's a graceful refusal.

An agent that says "I don't have enough context to answer this reliably" is more valuable than one that confidently produces something plausible but wrong.

The agents I've seen fail in production almost always fail the same way: they give a confident answer when they should have asked a question. This is a design failure, not a model failure. The model does what you tell it to do. If you don't design for uncertainty, the model won't either.

What I'd Tell Someone Starting

If you're building AI agents — whether you're an engineer, a PM, or somewhere in between — here's what I'd tell you:

Start with the workflow, not the technology. Build the narrowest useful version first. Design your failure modes before your success modes. Make the agent's reasoning visible. Embed it in a process people already follow. And measure adoption, not capability — the best agent in the world is worthless if nobody uses it.

The technology will keep improving. The models will get better, the tools will get easier, the costs will come down. But the fundamentals won't change: you're building something for people, and people are complicated. A decade of project management taught me that. The AI just makes it more obvious.