Product management for probabilistic machines
AI agents are probabilistic coworkers. Collaborating on software with them requires product management, QA paranoia, and better docs. Read: "Everything Starts Out Looking Like a Toy" #272

Hi, I’m Greg 👋! I write weekly product essays, including system “handshakes”, the expectations for workflow, and the jobs to be done for data. What is Data Operations? was the first post in the series.
This week’s toy: I may be late to the party, but n8n is pretty interesting, especially the self-hosted version.
What’s striking to me is that the gap between thinking “I could connect this bit of data with that over there and do it on a schedule” is shrinking rapidly. Pretty soon we’re going to be able to spin a novel workflow up by talking to Claude or OpenAI (working on this right now).
Edition 272 of this newsletter is here - it’s October 13, 2025.
Thanks for reading! Let me know if there’s a topic you’d like me to cover.
The Big Idea
A short long-form essay about data things
⚙️ Product management for probabilistic machines
The first time I used ChatGPT, it felt like magic. I asked a question, got an answer and it was almost like talking to search results in real time. As I added context to conversations, some of the responses seemed formulaic. Asking a similar question got a similar kind of answer.
At that point (maybe late 2023), AI seemed like a parlor trick. I didn’t think about how things would change when the models added additional modalities.
And then the models got better. After a while, I decided using AI in conversation was more like using a “thinking typewriter”. I’d sit with a question, type a prompt, and watch as the ghost in the machine reflected back a plausible outline, a plan for action, and made suggestions to me to improve (or change?) the idea.
And then the models got better. As the outputs stacked up and I got more comfortable with using AI as a tool, I realized that I had less and less idea what was actually going on when I asked AI to do something for me.
What do you do when a computer can take action without explicit commands? Define a new way of working that’s a lot more like working with other non-deterministic processes: your co-workers.
We are all AI agents, in a way. Besides the fact that we have better long-term memory than an AI process, building software in workflows is a lot like the task of orchestrating various agents to take on tasks, report progress, and identify when they are stuck. We are also familiar with the idea of defining objectives and key results, and tieing those outcomes to a timeline where we make promises to other groups of (agentic) workers.
When the computer can act, our job stops being “ask better questions” and starts being “design a system that delivers consistent outcomes despite the chaos of probabilistic models.”
The non-deterministic elephant in the room
Every workflow (agentic or not) begins with a contradiction. We expect determinism: that we will send instructions and always get a known answer (or type of answer), yet we are not entirely sure what will happen. Exceptions where we have not defined all of the possible states still happen.
When you’re dealing with people in a workflow and you want to account for an exception, you write a standard operating procedure.
If you see this blinking red light, tell someone!
If you get an error code, copy it into this box so that someone knows about it!
Knowing the exception has occurred is the first step in handling it.
The same is true with agentic workflows. Models are wired to predict the next most likely token. That means that when you give instructions to a non deterministic model, it helps a whole lot if you can tell it when it’s “done.” This is not unlike giving instructions to a toddler. When you need them to wear a sweater, ask them which one they’d like to wear rather than asking them to design a new type of garment.
Closed questions or evals help the system to determine whether the non-deterministic model resulted in a deterministic question. They also let you know when the rest of the system is shifting under your feet whether the ground truth has changed. If a new model suddenly tells you the sky is green and you have a ground truth the sky is blue, there might be something wrong.
Because AI agents are essentially co-workers (even if they are not sentient), the whole idea of prompt engineering need to be rethought.
This is product management. We’re defining jobs to be done, mapping out roles for a small team of synthetic experts, setting guardrails for when they should stop, ask for help, or escalate. We are programming strategy and process into a set of contexts that the model will try to inhabit, one token at a time.
Treat the data factory like a product, not a magic trick
If you’ve built software before, you already know what to do. Start with the job. What outcome are you delegating?
Break it into responsibilities. Give each agent a backstory that constrains how it should behave. Provide tools where precision is non-negotiable. Define success metrics that aren’t just vibey confidence scores but concrete checks.
Was the invoice actually paid?
Did the customer receive a useful response?
Did the data land where it was supposed to?
Agents thrive when you give them an outcome. Scope it narrowly, sequence their work, and teach them how to call for backup.
The agentic pattern is a factory with probabilistic machines instead of deterministic ones. You still need a production schedule, test fixtures, and quality control.
You still plan the work and work the plan. The difference is that the machines in this factory get bored, forget what they were doing, and occasionally hallucinate a new process because the old one no longer fits inside their context window.
That weirdness is manageable if you accept it as the baseline and design your workflow around human supervision and automated checks.
It’s tough to help an amnesiac AI do good work
Anyone who has shipped software knows the comfort of a green test suite. It doesn’t guarantee anything, but it gives you a signal: the things we agreed mattered are still behaving.
Agent teams need the same scaffolding. You want unit tests that catch the deterministic pieces--API calls, data transformations, compliance checks. You also need smoke tests that verify end-to-end flows on representative scenarios. And because the model is developing its own approach every time, you need heuristic monitors that catch “this feels off” signals even when the numbers look fine.
The tricky part is that agents operate with short-term memory. Once they fall out of the context window, they forget what they promised, which customers they disappointed, and what you learned yesterday.
Your job is to compensate for that amnesia. Build harnesses that replay past incidents. Store decision logs that agents can reload when they wake up. Add reminders that force an agent to summarize what just happened before you hand it the next task. These are not nice-to-have features; they are the price of working with a stochastic teammate.
Documenting process is the through line to success
When humans work together, memory is social. We share stories, war rooms, one-pagers, and Slack threads. Agents don’t attend the war room, and they can’t scroll Slack. The only memory they have is the one you give them. That’s why every successful agent operation I’ve seen eventually builds a `claude.md`, an `AgentContext` directory, a living binder of everything the system has decided so far. It is not busywork. It is the way you give a probabilistic system a deterministic backbone.
Treat documentation as operating system files, not meeting notes. Record the objectives, the playbooks, the golden questions, the escalation paths. Log every significant decision with enough context that a fresh agent--or a tired human--can reconstruct how you got here.
Think of it as version control for judgment. When the agent inevitably produces a surprising outcome, you have a place to compare intention to reality. When a new team member joins, you can hand them a map that keeps them from repeating last month’s mistakes. Documentation is the bridge between the stochastic and the deterministic. Without it, every run begins from zero.
How do you build “better” in a world without certainty?
The hardest part of running agents is that success is rarely obvious. When the system fails, alarms go off, customers complain, the logs fill with errors. When the system works, you hear silence.
There is no reliable sensation for “this is working better than last week.” You need to invent that sensation. Pick a handful of heuristics that matter: response time, number of escalations, human edits per output, customer satisfaction, dollars recovered. Track them obsessively. Look for slope, not snapshots. The goal is not to reach perfection; it is to know whether your factory is trending in the right direction.
Heuristics also keep you honest about trade-offs. You might accept a higher escalation rate if the escalations happen faster. You might tolerate occasional rework if the agent is uncovering new patterns you never would have seen. The point is to make those trade-offs explicit.
When you’re working with nondeterministic machines, gut feel is a coin flip. Instrumentation is what lets you say, “We are still inside our guardrails,” or, “We need to intervene right now.”
Building for AI agents also helps humans improve
Here is the quiet gift of all this structure: everything you do to make agents reliable also makes human teams better.
The act of writing down the workflow clarifies it for everyone.
The decision log becomes a shared memory for humans who might otherwise reinvent the plan. The golden questions morph into interview guides, onboarding checklists, and customer support macros. The QA harness you build for agents can catch human errors too. The difference between an agent runbook and a human playbook starts to disappear, and that is a win.
When you accept the agentic paradigm, you’re really accepting a new standard for operational excellence. You are forced to articulate your process, observe it in detail, and improve it continuously. That discipline does not disappear when you bring people back into the loop. In fact, it makes them more effective. Building for agents is not a detour from human-centered design; it’s an accelerator.
Adding AI to the workflow loop
If you’re still living with the thinking typewriter, start with a single delegation. Pick a workflow that is messy, repetitive, and annoying. Define the job, write the playbook, list the golden questions, and build a tiny set of tests.
Run the agent in shadow mode next to a human. Each time it surprises you, update the documentation. When it succeeds, capture what “success” looked like so you can tell the difference next time. Only then should you turn the agent loose with autonomy. Scale slowly. What you’re learning is not just how to automate a task; you’re learning how to manage a team of stochastic contributors.
As you expand, invest in three muscles: orchestration, observability, and memory.
Orchestration keeps the work sequenced and the roles clear
Observability tells you when something drifts
Memory lets you reboot a process without losing the wisdom that accumulated along the way
Everything else like new tools, better models, and shinier dashboards are multipliers on top of those fundamentals.
Building with agents demands a mindset shift from prompt tinkering to process stewardship. You are not summoning magic; you are engineering a system that navigates uncertainty on your behalf. That system only works when you combine deterministic anchors with nondeterministic creativity. It thrives when you give it context, constraints, and a way to ask for help. It collapses when you treat it like a mirror that will always reflect your intention.
What’s the takeaway? The future of work belongs to teams that can choreograph agentic action with the discipline of product managers and the paranoia of QA engineers. Document relentlessly, test obsessively, and measure what matters even when the signals are fuzzy. Do that, and turn an amnesiac AI workforce into a reliable partner.
Links for Reading and Sharing
These are links that caught my 👀
1/ We’re gonna need a bigger hard drive - Storage as a service is a product badly needed in today’s world. I’m not talking about iCloud, Google Photos, or other services that you can buy from a hyperscaler. I mean personal data storage just like you can spin up a computer from AWS. The problem? Doing this is complicated, and the product management challenges are large. I think it’s still a very interesting product area.
2/ No one understands AI - The team at Crazy Egg ran a survey earlier this year that found that 97% of users didn’t understand AI features. I don’t think that means those features were 3% effective, but it probably means that regular consumers are really not sure how to use features that infuse AI just yet.
3/ How do you measure AI impact? - If it’s true that users don’t know how to use AI yet, how do you measure impact on your AI efforts? Here’s what several top companies do. (It looks suspiciously like the way we measure success today —> find an outcome and see if it happens less, gets done faster, or has better quality because of AI.
What to do next
Hit reply if you’ve got links to share, data stories, or want to say hello.
The next big thing always starts out being dismissed as a “toy.” - Chris Dixon