DATA READINESS FOR AI

Data Readiness for AI in Manufacturing: A Checklist

By Jason Osajima — former VP of AI at a $250M manufacturer · LinkedIn · Updated June 2026

Quick answer

Data readiness for AI in manufacturing — a practical checklist for COOs. What actually has to be true before an agent works, and what's a myth.

Data readiness for AI in manufacturing is not one company-wide milestone you reach before you start. It's a per-use-case bar, and you clear it one agent at a time by checking whether the specific data that agent needs is accessible, complete enough, consistent, current, and trusted. The data for a supplier-document agent has nothing to do with the data for a demand-planning agent, so you score each candidate on those five dimensions, build the ones that pass, and fix the rest later.

I ran AI at a $250M furniture manufacturer with data that was, charitably, a mess. We shipped anyway. The phrase "data readiness" had become a reason to delay every project on the roadmap.

You've heard the line in your own plant. "We can't do agents until the data's clean." It's half right. The data does need to clear a bar. But the bar is lower and far more specific than the consultants pitching you a two-year data-lake build want you to believe.

Why "get the data ready" is the wrong goal

The core mistake is treating readiness as one giant binary state your whole company has to reach. It isn't. The data needed for a supplier-document agent is unrelated to the data a demand-planner agent reads.

So you don't "get your data ready." You get the data for one agent ready, ship it, then move to the next. That reframing alone kills the eighteen-month delay.

The instinct to fix everything first has a body count. Gartner found that at least 30% of generative AI projects get abandoned after the proof of concept, with poor data quality named as a top cause (Gartner, 2024). Most of those teams weren't blocked by missing data. They were blocked by trying to fix all of it at once.

The five-question readiness check

For any agent you're considering, run its data through five questions. This is the whole framework. Pass, and you build. Fail, and you know the exact dimension to fix instead of waving at "data quality" in the abstract.

Accessible — Can software read this data without a human typing? A DB query, an API, a scheduled export. If a person has to copy-paste it, the agent can't use it.
Complete enough — Are the fields the agent needs actually populated, most of the time? Not perfect. Most of the time.
Consistent — Is an entity represented the same way across records? One spelling per supplier, one format per part number, consistent units.
Current — Is the data fresh enough for the decision? A planning agent tolerates yesterday's data. A live-status agent doesn't.
Trustworthy — Do the people who'd use the agent already trust this source? If ops doesn't believe the ERP's lead-time field today, an agent reading it won't change their mind.

How to score it

Score each question 0, 1, or 2 — no, partial, or yes. Add them up for a number out of 10.

8 or above: ready to build now.
5 to 7: buildable with scoping — narrow the agent to the clean subset of data.
Below 5: fix the specific failing dimension first, or pick a different agent.

The trust dimension does real work here, and the standards bodies agree it belongs on the list. The NIST AI Risk Management Framework treats data quality and provenance as core inputs to trustworthy AI under its MAP and MEASURE functions (NIST, 2023). An agent built on a field ops already distrusts inherits that distrust on day one.

What readiness does NOT require

Just as important is killing the myths that drive the two-year delay. None of the following is a prerequisite for your first agent.

You do not need a data lake or warehouse. Nice to have. Not required. An agent reading a read replica or a nightly export works fine. We ran production agents with no central data platform at all.
You do not need perfect data. You need data good enough that the agent beats the status quo — which is often a human guessing or reading a stale report. The bar is "better than today," not "flawless."
You do not need all your data. You need the slice one agent uses. Ignore the other 95% until an agent needs it.
You do not need a master data management program first. ISO 8000 — the international standard for data quality and master data — is a worthy multi-year discipline (ISO 8000-1:2022). It is not a prerequisite for a supplier-lookup agent.

The risk runs the other way too. Gartner projects that organizations will abandon 60% of AI projects unsupported by AI-ready data through 2026 (Gartner, 2025). "AI-ready" there doesn't mean perfect — it means fit for the specific use case, which is exactly what the five-question check measures.

Readiness by use case

Different agents have wildly different data demands. Here's the realistic picture for the common first agents a mid-market manufacturer considers.

Agent	Data needed	Typical readiness	Why
Supplier-doc intelligence	PDFs: specs, certs, datasheets, POs	High	Documents don't need clean structure — RAG handles unstructured text
Order-status / exception lookup	ERP order + status tables	Medium-high	Usually accessible; depends on consistency
Ops-review prep	ERP + BI exports	Medium	Needs joined data; tolerates batch latency
Demand / inventory Q&A	Planning + inventory data	Medium	Needs consistency and currency to be trusted
Quality / defect analysis	MES + inspection records	Lower	Often messy, free-text, inconsistent capture

Why document agents win first

Notice the pattern: document-heavy agents are the easiest to make ready. Retrieval-augmented generation, the technique introduced by Lewis et al., lets a model pull relevant text chunks at query time and answer over them — no clean schema required (Lewis et al., 2020).

A supplier-doc agent works on the PDFs sitting in a SharePoint folder right now. The data is already "ready" by definition. That's why it's so often the best first build, and why teams that start there get a win in 30 days instead of stalling. If you're still weighing candidates, our guide on how to prioritize your first AI use case walks the trade-offs.

Where structured-data agents get harder

Structured-data agents — order status, demand Q&A — demand more on consistency and currency. They're still very buildable; they just lean harder on the second and third questions of the check.

Quality and defect agents tend to fail the consistency check because shop-floor capture is free-text-heavy and inconsistent. That doesn't mean never. It means later, after you've cleaned that domain or paired the agent with a human-in-the-loop reviewer for the ambiguous cases.

Fix data in the pipeline, not the source

When a use case scores partial on consistency or completeness, the wrong move is a project to clean the source system. That's slow, political, and it blocks the agent for months.

The right move is to handle it in the data pipeline feeding the agent. This is well-trodden ground — analysts have long noted that preparing and cleaning data eats the majority of a data team's time, often cited in the 50–80% range (The New York Times, 2014). You don't pay that tax by reforming the source. You pay a small slice of it, scoped to one agent.

Three pipeline moves that work

Normalize in the staging layer. Map supplier-name variants, standardize units and dates, trim part-number junk — all in the pipe, leaving the source untouched.
Make the agent honest about gaps. When data is missing or ambiguous, the agent says "I don't have that" or "two possible matches." An agent that admits uncertainty earns more trust than one that confidently guesses.
Validate before output. Range checks and business rules catch obviously-wrong data before it reaches a user.

This keeps cleanup scoped to what one agent needs. And it compounds — each agent you ship leaves its data domain a little cleaner for the next one. Getting the staging and validation right is also half the battle in connecting agents to your ERP and MES, where the data actually lives.

The readiness sequence that works

Don't run a company-wide data-readiness program. Run this loop instead. It's the same engine that gets a pilot to production in 90 days without stalling on data.

Pick a candidate agent. Score its data on the five questions.
Act on the score. 8+, build it. 5–7, narrow scope to the clean subset and build that. Under 5, pick a different agent or fix the one failing dimension.
Ship, measure, prep the next. While the first agent runs, clean the data the next agent will need.

Start where the data is already ready

Begin with a document-heavy agent — supplier-doc intelligence is the usual winner — because its data is ready today. Get a win on the board in 30 days.

Then use that momentum, and the data you cleaned along the way, to take on the structured-data agents. The context here matters: McKinsey's 2024 survey found 70% of high-performing AI adopters still cite data difficulties, yet they ship anyway by sequencing the work (McKinsey, 2024). Readiness is a checklist you run per agent, not a destination you reach before starting.

Frequently asked questions

What does "data readiness for AI" actually mean for a manufacturer?

It means the specific data one AI agent needs is accessible to software, populated most of the time, consistent across records, fresh enough for the decision, and trusted by the people who'd use it. It is not a company-wide state. You assess it per use case, so a supplier-document agent can be fully ready while a defect-analysis agent is not.

Do I need a data lake or warehouse before building AI agents?

No. An agent reading a nightly export, a read replica, or an existing API works fine, and many production agents run with no central data platform at all. A data lake is helpful for analytics at scale, but it's never a prerequisite for your first agent. Treating it as one is the most common reason AI initiatives stall for over a year.

How clean does my data have to be for AI to work?

Clean enough that the agent's output beats the current status quo — usually a human guessing or reading a stale report. Per Lewis et al. (2020), document-based agents using retrieval-augmented generation handle messy, unstructured text without any clean schema. For structured data, you fix gaps in the pipeline feeding the agent rather than reforming the source system.

Which AI agent should a manufacturer build first?

Start with a document-heavy agent such as supplier-document intelligence, because PDFs and specs sitting in a shared folder are already "ready" with no schema cleanup. It's the fastest path to a win in 30 days. Once that's live, the momentum and the data you cleaned along the way make the harder structured-data agents far easier to ship next.

Why do so many manufacturing AI projects fail on data?

Gartner attributes a large share of abandoned generative AI projects to poor data quality, and projects that 60% of AI projects unsupported by AI-ready data will be abandoned through 2026. The deeper cause is usually scope, not the data itself — teams try to fix all their data at once instead of readying the slice one agent needs. Sequencing the work per use case is what separates the projects that ship from the ones that stall.

Let's see what's worth building first.

A 15-minute call: tell me where your AI or planning is stuck, and I'll tell you the one thing worth building first — and whether it's worth doing at all.

Book a 15-min call →More field notes

More field notes

AgentOps: Monitoring AI Agents in Production AI Governance for Manufacturers: A Starter Framework AI Agent Security Risks Manufacturers Must Manage Human-in-the-Loop AI for Operations: When to Use It