HOW AI AGENTS WORK

How AI Agents Work on the Plant Floor (Explained)

By Jason Osajima — former VP of AI at a $250M manufacturer · LinkedIn ·
Quick answer

How AI agents work on the plant floor, explained by an operator: the perceive-decide-act loop, where agents fit, and what they can't do yet.

An AI agent on the plant floor is software that watches a stream of data, decides what to do next based on a goal you gave it, takes an action through systems you already run, and checks whether the action worked. That's the whole loop, and the agent closes it without a person clicking the button. The interesting part isn't the model underneath. It's that the software acts and then verifies its own work, which is what separates an agent from the chatbot your team already pastes things into.

I ran this at a $250M manufacturer. We didn't start with anything exotic. We started with a scheduler that kept getting overridden at 6am because the night shift logged a downtime event nobody saw until standup.

An agent that reads the MES event log, flags the conflict, and re-sequences the next four jobs before the morning meeting isn't magic. But it saved us roughly 40 minutes a day of expediting and one missed customer ship per month. That's the bar. Real, boring, measurable.

The four-step loop, in plant terms

Every agent, no matter how it's marketed, runs the same cycle. This matches how Anthropic defines an agent in its 2024 engineering guide: an LLM autonomously using tools in a loop, gaining ground truth from the environment at each step.

The difference between an agent and a copilot is the Act and Check steps. A copilot answers. An agent does the thing and confirms it landed. If you want the deeper split, we cover it in AI agents vs copilots.

What makes it an "agent" and not just automation

You already have automation. PLCs, fixed RPA scripts, scheduled reports. Those follow rules you hard-coded, and they break the moment reality drifts off the script — a vendor renames a column, a form gets an extra field, a supplier writes "qty" instead of "quantity."

An agent handles the drift. Because the reasoning step uses a language model, it can read a packing slip it's never seen before, figure out which number is the quantity, and map it to your PO. When it's not sure, it asks.

That tolerance for messy, unstructured input is the actual unlock, and the plant floor is nothing but messy input. For a side-by-side on the older tooling, see agentic AI vs RPA for manufacturing operations.

Fixed scripts vs agents, at a glance

Fixed RPA / scripts AI agent
Input Structured, exact format Messy, unstructured, varies
Breaks on change Yes, silently Adapts or asks
Handles a new vendor form Needs a developer Often handles it day one
Knows when it's unsure No Yes — escalates
Build time Weeks per workflow Days
Best for High-volume, never-changes Variable, judgment-light

Neither is better. They're different tools. The agent shines exactly where your scripts keep falling over.

How an agent plugs into your existing systems

An agent doesn't replace your stack. It sits on top of it and reaches into each system through whatever interface that system exposes — an API, a database connection, or an RPA bot driving the screen when there's no API at all.

This is where the ISA-95 standard for enterprise-control system integration earns its keep. ISA-95 maps a plant into levels: Level 1–2 is sensing and control, Level 3 is the MES, Level 4 is the ERP. An agent usually lives between Level 3 and Level 4, reading MES events and writing back to the ERP.

A concrete data path

Take order acknowledgment. Here's the actual flow for one PO:

  1. A customer email lands with a PDF attached.
  2. The agent reads the PDF, extracts customer, part number, quantity, and requested date.
  3. It checks the part master and open orders in the ERP for duplicates and price.
  4. It drafts the sales order and, in approve mode, posts it for a clerk to confirm.
  5. It logs the action and reads back the order number to confirm the write stuck.

Nothing here demands ripping out a system. The agent speaks to the ERP the same way a person or an integration would. We go deeper on the wiring in integrating AI agents with your ERP and MES.

Where agents actually fit first

Don't start with the moonshot. Start where you're already paying people to move data between two screens. The highest-return first agents I've seen across mid-market plants:

Notice what these share: high volume, a clear right answer, a human can verify the output in seconds, and a mistake is annoying but not catastrophic. That's the screening rule. If a single agent error could stop the line or ship bad product unchecked, that workflow waits until you've earned trust. For a ranked menu, see our 15 AI agent use cases for manufacturing operations.

The human stays in the loop (on purpose)

Nobody serious runs a plant agent fully unattended on day one. You run it in three stages:

  1. Shadow — the agent does the work and shows you what it would do. You compare against your team for two weeks. You're measuring its accuracy, not trusting it yet.
  2. Approve — the agent drafts the action, a person clicks yes. You watch the approve rate climb. When it's catching 95%+ correctly, you move on.
  3. Auto with exceptions — the agent acts on the clear cases and only routes the genuinely ambiguous ones to a person. That last 5% is where your people add value now.

This staging isn't just operational caution. The NIST AI Risk Management Framework (2023) builds its whole approach around exactly this — defining when humans need to intervene and how success gets measured before a system goes live.

It's also how you keep finance and quality comfortable. You're not asking them to trust a black box. You're showing them a measured accuracy number first. We lay out the full pattern in human-in-the-loop AI for operations.

What agents can't do yet

Straight talk, because the hype skips this. Agents are weak where the cost of being wrong is high and the answer is genuinely judgment-heavy: pricing exceptions, safety calls, anything regulatory where you need a defensible audit trail of why.

They also degrade quietly. An agent that was 96% accurate can slip to 88% when a supplier changes their format, and you won't notice unless you're tracking accuracy as a metric, not a vibe. NIST's framework calls this out directly — agentic systems carry risks like "excessive agency," where the agent takes actions beyond its intended scope, and cascading failures that compound before anyone catches them.

And they cost real money per action. A workflow that runs 50,000 times a month needs a unit-economics check, not just an accuracy check. Build the monitoring before you build the trust.

Why most of this stalls (and how to not be in that bucket)

Here's the uncomfortable number. MIT's Project NANDA found in 2025 that 95% of enterprise generative AI pilots produced no measurable business return, drawing on more than 300 deployments. The failure wasn't weak models. It was data readiness, workflow integration, and no defined outcome before the build started.

That tracks with what I saw. The pilots that died were the ones chasing a demo instead of a number. The ones that lived picked a boring, high-volume task, defined the hours saved up front, and staged the trust.

McKinsey's State of AI work in 2024 and 2025 shows the same split: adoption is everywhere, value is concentrated in the few firms that rewired the workflow rather than bolting a chatbot onto it. If you want the autopsy in detail, read the AI pilot-to-production gap.

The takeaway for an ops leader

How AI agents work on the plant floor comes down to one shift: software that doesn't just answer, it acts and checks its own work, and it tolerates the mess your existing scripts can't. The technology is ready for the boring, high-volume, judgment-light gaps between your systems. It's not ready to run the plant.

Start where a clerk is retyping data. Stage the trust through shadow, approve, then auto-with-exceptions. Measure accuracy as a hard number, and watch it for drift.

Want to see which five workflows in your plant are the right first agents? We run a free First 5 Agents teardown — you walk us through your day, we map the five highest-return, lowest-risk candidates with rough hours saved on each. Book a 30-minute call and you'll leave with a ranked list whether or not we ever work together.

Frequently asked questions

What is an AI agent on the plant floor?

An AI agent is software that perceives plant data, decides on an action against a goal you set, acts through your existing systems, and checks whether the action worked — all without a person triggering each step. It differs from a chatbot because it completes the task and confirms the result, rather than just answering a question. On a plant floor it typically reads MES events or supplier documents and writes back to your ERP.

How is an AI agent different from RPA or a fixed script?

A fixed RPA script follows hard-coded rules and breaks silently when reality drifts — a renamed column, a new form field, a different word for "quantity." An AI agent uses a language model to read messy, unstructured input, adapt to formats it hasn't seen, and escalate when it's unsure. RPA is best for high-volume work that never changes; agents are best for variable, judgment-light work that scripts keep falling over on.

Are AI agents safe to run unattended in a factory?

Not on day one, and serious operators don't try. You run agents in three stages — shadow (it shows what it would do while you measure accuracy), approve (a person confirms each action), then auto-with-exceptions (it acts on clear cases and routes ambiguous ones to a human). This staged human oversight mirrors the NIST AI Risk Management Framework and keeps finance and quality comfortable because you show them a measured accuracy number before anything goes live.

What plant-floor tasks should I automate with an agent first?

Start where people retype data between two screens: order acknowledgment and entry, supplier follow-up, downtime triage, quality NCR drafting, and shipping document assembly. The screening rule is high volume, a clear right answer, a human can verify the output in seconds, and a mistake is annoying but not catastrophic. Anything that could stop the line or ship bad product unchecked waits until the agent has earned trust.

Why do so many manufacturing AI projects fail?

MIT's Project NANDA found in 2025 that about 95% of enterprise generative AI pilots produced no measurable business return — not because the models were weak, but because of poor data readiness, weak workflow integration, and no defined outcome before the build began. The pilots that succeed pick a boring, high-volume task, define the hours saved up front, and stage the trust gradually. Buying from specialized partners also outperformed internal builds in the same study.

Let's see what's worth building first.

A 15-minute call: tell me where your AI or planning is stuck, and I'll tell you the one thing worth building first — and whether it's worth doing at all.

More field notes

Agentic Automation Glossary for ManufacturersThe AI Pilot-to-Production Gap: Why 90% StallHow to Scale an AI Pilot to Production in ManufacturingWhy AI Pilots Fail at Manufacturers (and Fixes)