FORECAST ACCURACY BENCHMARKS

Forecast Accuracy Benchmarks for Manufacturers (2026)

By Jason Osajima — former VP of AI at a $250M manufacturer · LinkedIn · Updated June 2026

Quick answer

2026 forecast accuracy benchmarks for manufacturers by SKU tier, demand profile, and time bucket — plus how to set targets that free working capital.

Realistic 2026 forecast accuracy benchmarks for mid-market manufacturers, measured monthly at the item-location level as 1 − WMAPE, run roughly 82–90% for smooth, high-volume A-items and 50–65% for lumpy or intermittent SKUs. There is no single target number. The right benchmark is a grid of SKU tier crossed with demand profile, measured at your true lead-time lag, with bias and forecast value added sitting right next to accuracy.

I built and ran the demand planning function at a $250M industrial manufacturer. The numbers below are the ones I'd actually hold a team to, broken out the way they should be.

Why most published benchmarks are useless

The forecast accuracy benchmarks floating around LinkedIn don't survive contact with a real manufacturing portfolio. They're either pulled from CPG case studies, where demand is smooth and high-frequency, or invented to sell software. Neither describes lumpy B2B demand, long lead times, or a SKU base where 20% of items drive 80% of the cash.

Here's the rule that fixes half the problem: a forecast accuracy benchmark without an aggregation level and a time bucket attached is a vanity number. A "92% accurate" claim means nothing until you know whether it's quarterly at the national level (easy) or weekly at the item-location level (brutal). Nail those down before you compare yourself to anyone.

The other half of the problem is forecastability. Some demand simply can't be forecast well, and the coefficient of variation tells you which. Research on demand-characteristic-driven forecasting confirms that CV is the single most influential feature affecting accuracy, because a high CV means variance dominates the mean and the signal-to-noise ratio collapses (Nature Scientific Reports, 2025). You can't benchmark a lumpy spare part against a smooth A-item and expect anything sane.

The benchmarks, by SKU tier and demand profile

These are realistic 2026 ranges for monthly, item-location-level forecast accuracy, measured as 1 − WMAPE (weighted by volume). I use the coefficient of variation (CV = standard deviation ÷ mean demand) to separate smooth from lumpy, because that's what determines forecastability.

SKU tier	Smooth (CV < 0.5)	Variable (CV 0.5–1.0)	Lumpy/intermittent (CV > 1.0)
A-items (top 20% of revenue)	82–90%	68–80%	50–65%
B-items (next 30%)	72–82%	58–70%	40–55%
C-items (long tail)	60–72%	45–58%	manage by stock, not accuracy

If your A-item smooth-demand accuracy sits at 90%, you're best-in-class and should redeploy effort elsewhere. If it's at 70%, you're leaving working capital and service on the table. That gap is the first place to dig.

For a fuller treatment of how the tiers and demand bands interact, see our guide to ABC-XYZ inventory analysis.

Adjust for time bucket

The table above is monthly. Shift the bucket and the bar moves:

Weekly: subtract roughly 8–12 points. Weekly is where replenishment lives and where most teams are quietly bleeding.
Quarterly: add 5–10 points. Looks great in board decks, useless for production scheduling.
Lag matters too. Accuracy measured at lag-1 (one period ahead) flatters you. Measure at the lag that equals your replenishment or production lead time. If your lead time is 12 weeks, a 1-week-ahead number is theater.

Classify demand before you benchmark it

CV alone is a shortcut. The cleaner method is the Syntetos-Boylan scheme, which uses both the average inter-demand interval (ADI) and the squared coefficient of variation (CV²) to sort SKUs into smooth, erratic, intermittent, and lumpy, with cutoffs of 1.32 for ADI and 0.49 for CV² (Syntetos & Boylan, 2005).

The point isn't academic. An item with frequent zeros (high ADI) needs a different forecast method and a different benchmark than one that's just noisy. Treat them as one bucket and your scorecard lies to you.

The metrics that belong on a manufacturer's scorecard

Forget reporting 100 − MAPE off the ERP. On a manufacturing portfolio it's actively misleading, because low-volume SKUs detonate the average and MAPE is undefined whenever actual demand is zero. Use these instead.

WMAPE weights error by volume or margin, so the number tracks the P&L rather than the long tail. It stays well-defined for intermittent series full of zeros, which is exactly where plain percentage errors fall apart (Nature Scientific Reports, 2025).
Bias (mean percentage error) belongs on its own line, and you watch the sign. Chronic over-forecasting is the engine of obsolescence; chronic under-forecasting is the engine of expediting.
MASE for intermittent and spare-parts SKUs. It divides your error by the error of a naive baseline, so it's dimension-free and well-defined even with zeros, which is why Hyndman proposed it specifically for intermittent demand (Hyndman, 2006).
Forecast Value Added (FVA) asks whether the planner's override actually beats the naive or statistical baseline. Often it doesn't.

Our deeper dives on MAPE vs WMAPE and how to measure and fix forecast bias walk through the formulas with worked examples.

What world-class actually looks like

Best-in-class isn't a single accuracy number. It's low bias plus positive forecast value added. The top teams I've benchmarked run near-zero bias and a statistical baseline their planners reliably beat.

Top-quartile manufacturers typically run A-item monthly accuracy in the low-to-mid 80s for smooth demand. They treat the long tail as a stocking-policy decision, not a forecasting contest. And they segment everything, because a single company-wide accuracy number with no bias tracking is the laggard's tell.

The FVA piece deserves emphasis. Michael Gilliland's framework at SAS popularized the practice of generating a naive forecast for every item and comparing the human-touched forecast against it (SAS, Gilliland 2015). In a meaningful share of teams, the analyst override makes the forecast worse than doing nothing. That's a benchmark too, and it's free to measure.

Pick the right method for the demand shape

Benchmarks and methods are linked. You can't hit the lumpy-SKU band with a model built for smooth demand.

For intermittent and spare-parts items, the workhorse is still Croston's method, which forecasts demand sizes and inter-demand intervals separately because single exponential smoothing breaks on sporadic demand (Croston review, ResearchGate). Modern ML approaches help, but recent peer-reviewed work shows that for intermittent demand, careful feature engineering matters more than architectural complexity (PMC, 2025).

Translation for a planning leader: don't expect a new neural net to rescue a lumpy SKU. Expect the right method, clean classification, and honest measurement to do most of the work. See our breakdown of AI vs statistical forecasting for where each actually earns its keep.

Translate the benchmark into dollars

A benchmark only earns its keep if it connects to cash. Here's the chain.

Bias → stranded inventory. A persistent +6% bias on your A-items quietly builds excess you'll later discount or write off. On a $40M inventory base, that's millions parked in the wrong SKUs.
Accuracy → safety stock. Tighter, less-biased forecasts let you cut safety stock without dropping fill rate. The relationship is roughly linear in the variability term, so halving forecast error variance on a SKU meaningfully cuts its safety stock at the same service level.
Lead-time-lag accuracy → expediting. Being accurate at lag-1 but wrong at lag-12 means you're constantly air-freighting. Benchmark at your real lead time or the number lies to you.

The upside is real and well-documented. McKinsey reports that AI-driven forecasting can cut errors by 20–50%, reduce lost sales from unavailability by up to 65%, and pull inventory down 20–30% (McKinsey, 2023). Those gains start with knowing which cell of the grid you're under-performing.

How to use these benchmarks this quarter

Pull your accuracy by SKU tier and CV band. Most teams have never cut it this way, and the picture is always uglier and more actionable than the blended number.
Find the gap between your A-item smooth-demand accuracy and the 82–90% band. That gap, times the working capital it would free, is your business case.
Measure FVA on your top 50 SKUs. Kill the overrides that lose to baseline. That's the cheapest accuracy gain you'll ever book.
Re-baseline at your true lead-time lag. Watch the comfortable numbers get honest.

If your demand planning operation isn't yet cutting accuracy this way, our demand planning maturity model shows what the next stage looks like.

The bottom line

Forecast accuracy benchmarks for manufacturers in 2026 aren't a single target. They're a grid of SKU tier by demand profile, measured at your real lead-time lag, with bias and FVA sitting right next to accuracy. Compare yourself to the right cell in the table, not to a CPG case study, and the gaps that convert to cash become obvious.

Want your portfolio benchmarked against these numbers properly? PlanForge runs a free planning-maturity and stranded-inventory teardown. We segment your accuracy by tier and CV band, quantify the working capital trapped in biased SKUs, and hand you a prioritized fix list. Book a 30-minute call and bring your last six months of forecast-versus-actual.

Frequently asked questions

What is a good forecast accuracy for a manufacturer?

It depends entirely on SKU tier and demand profile. For smooth, high-volume A-items measured monthly at the item-location level, 82–90% (as 1 − WMAPE) is good to best-in-class. For lumpy or intermittent items, 50–65% can be excellent, and below the long tail you should manage by stocking policy rather than chasing an accuracy number.

Should I use MAPE or WMAPE to benchmark forecast accuracy?

Use WMAPE for any real manufacturing portfolio. Plain MAPE is undefined whenever actual demand is zero and lets low-volume SKUs distort the average, so it overstates how bad you are on the items that don't matter and hides problems on the ones that do. WMAPE weights error by volume or margin, keeping the number aligned with the P&L.

How do I benchmark forecast accuracy for intermittent or spare-parts demand?

Don't use MAPE-based accuracy at all. Use MASE, which scales your error against a naive baseline and stays well-defined even when the series is full of zeros, as recommended in Hyndman's 2006 work on intermittent-demand metrics. Pair it with Croston's method for the forecast itself and a service-level target rather than an accuracy target.

What time bucket and lag should I measure accuracy at?

Measure at the time bucket and lag that match how you actually replenish or schedule production. If your production lead time is 12 weeks, a one-week-ahead accuracy number is meaningless theater. Weekly buckets typically run 8–12 points below monthly, and lag-1 accuracy always flatters you relative to the lead-time lag that drives real decisions.

Is forecast accuracy the right metric to optimize?

Not by itself. Track bias separately, because a forecast can be accurate on average yet chronically over- or under-biased, which drives obsolescence or expediting. Then measure Forecast Value Added to confirm your planners' overrides actually beat a naive baseline. In many teams, some overrides make the forecast worse, and killing those is the cheapest accuracy gain available.

Let's see what's worth building first.

A 15-minute call: tell me where your AI or planning is stuck, and I'll tell you the one thing worth building first — and whether it's worth doing at all.

Book a 15-min call →More field notes

More field notes

Demand Forecasting Methods: 10 Techniques Compared Forecast Value Added (FVA): A Practical How-To Guide Forecasting Intermittent Demand for Spare Parts New Product Demand Forecasting: Methods With No Data