Industrial AI has a credibility problem, and computer vision is where it shows up first. The pitch deck always works. A camera looks at a part, a box appears around it, a number lands on the screen, the room nods. Then the same model meets the actual plant floor and the number is wrong. Different light, faster line, a smudge on the lens, an angle nobody planned for. The demo that earned the budget cannot survive the place it was built for.
I want to talk about that gap, because it is the whole story. The interesting part of industrial AI is not the model that reads a clean frame in a lab. It is the model that keeps being useful when the conditions move. Two projects our team ran on a metals producer's floor make the point better than any benchmark, and one of them is a project I am happy to tell you went sideways.
Both used cameras and computer vision on real production. One held up and changed how acceptance worked. One showed exactly where production computer vision breaks. You need both to understand what shipping this stuff actually means.
What industrial AI looks like when it survives the floor
The first project was a scrap-quality acceptance system at a metals producer. Incoming material arrives, someone has to judge its grade, contamination, and class, and that judgment decides what gets paid and what goes into the furnace. The old way was a person looking at a pile and making a call. Subjective, slow, and wide open to both honest error and outright fraud. Two inspectors look at the same load and disagree. A supplier learns which inspector is lenient. The plant pays for material it did not get.
What we built was not a single model. It was a hardware-and-software acceptance station. Cameras plus weighing equipment plus computer vision, scoring contamination, class, and density on every load, and wired into the plant systems so the result was a record, not an opinion. The thing that mattered was not the accuracy of any single read. It was coverage. The station inspected 100% of incoming material, the same way, every time, with the output logged.
That is the part people underrate about industrial AI in heavy industry. The win was not "the model is smarter than the inspector". The win was that subjectivity left the process. Human bias dropped out, the fraud window closed, and acceptance became something you could audit instead of argue about. A consistent machine that is right most of the time and always shows its work beat an expert who is occasionally brilliant and impossible to check.
Computer vision manufacturing is harder than the demo admits
The second project is the one I tell people about when they think this is solved.
The task sounded simple. Count steel rods on a rolling line. The rods come down a fast line in bundles, and someone has to know how many are in each batch. Counting by eye is error-prone, and the batches do not separate cleanly, so the human number drifts. A vision model that counts rods reliably saves real money and removes a dull, mistake-prone job. Classic computer vision manufacturing case.
In training it looked great. The detection model hit around 80.8% accuracy on the data we had. Good enough to keep going, good enough to put in a deck. Then we ran it on new video from the line and accuracy fell to about 61%.
Nothing was wrong with the model. Everything was wrong with the assumption that the line would look like the training set. The camera angle on the real footage was different. The line ran at a speed that smeared rods across frames. Motion blur turned clean edges into mush. On the demo run, the model counted 709 rods against an actual 717. Close, and useless. Close does not pass acceptance when eight missing rods is eight missing rods.
Why object detection breaks between the lab and the line
That rod-counting drop is not a one-off embarrassment. It is the default outcome of object detection moving from curated footage to a live line, and it is worth being blunt about the mechanism.
A detection model learns the distribution it was trained on. On a rolling line, the distribution is hostile and it shifts on you:
- Camera angle. Move the mount a few degrees and the rod cross-sections the model learned to recognize change shape. The model never saw the new view.
- Line speed. Faster motion means fewer usable frames and more overlap between objects that were distinct in training.
- Blur. Motion blur and focus hunting destroy the edges that detection leans on. A rod becomes a streak.
- Separation. Rods touch, bundles merge, and a counter that assumed gaps starts double-counting or skipping.
None of these show up when you score a model on a held-out slice of the same clean recording. All of them show up the moment the camera is bolted to a real line. This is the unglamorous truth of ai visual inspection in production. The model is the easy 20% of the work. The other 80% is fighting the physical conditions until the input the model sees on a Tuesday afternoon looks enough like what it was trained on.
| The scrap station | The rod counter | |
|---|---|---|
| Conditions | Controlled acceptance point, fixed camera and lighting | Fast live line, variable angle and speed |
| Training to production | Held up, coverage was the win | 80.8% in training, ~61% on new video |
| Demo result | Consistent 100% inspection, auditable | 709 counted vs 717 actual |
| Lesson | Control the environment, win on consistency | Match training to real conditions or it drifts |
What the accuracy drop actually teaches you
I keep the rod-counting number in the deck on purpose. A vendor who only shows you the scrap station is selling you the easy half of industrial AI and hiding the half that decides whether your project ships.
The difference between the two projects was not model quality. It was how much control we had over the conditions the camera saw. The scrap acceptance station worked because we owned the environment. Fixed mount, controlled framing, a defined moment of capture. The rod counter struggled because the line owned the environment, and the line does not care what your training set looked like.
So the lesson is not "computer vision does not work in manufacturing". It clearly does, the scrap station ran on real material at full coverage. The lesson is where the work lives. Before anyone talks about model architecture, the questions that decide the project are: can we fix the camera, can we control the lighting and framing, can we capture training data that actually matches production speed and angle, and do we have a way to keep checking accuracy as conditions drift. Get those right and a modest model ships. Get them wrong and a brilliant model gives you 709 when the answer was 717.
That also tells you how to scope a pilot honestly. You do not validate object detection on the footage you trained it on. You validate it on new video, from the real line, at the real speed, and you watch what happens to the number. The drop from 80.8% to 61% was not a failure of the project. It was the project doing its job, surfacing the gap before it cost anyone a production decision.
How to scope industrial AI without buying the demo
If you run a plant and someone is showing you a clean vision demo, the useful question is not how accurate the model is. It is what changes between that demo and your floor.
The way our team runs it:
- Map the conditions first. Before any model, we look at where the camera can mount, how the lighting behaves, how fast the process moves, and how much that varies across a shift. The hard constraints decide what is feasible.
- Train and validate on production reality. We capture data at real angle and real speed, then validate on held-out new footage from the line, not a slice of the training recording. If accuracy falls, we find out now.
- Win on what you can control. Acceptance points, inspection stations, fixed-camera checks - these are where computer vision in heavy industry pays off fast, because you own the environment. Live, fast, variable lines are harder and we say so.
The team behind these projects has run 100+ enterprise AI engagements across mining, metals, energy, and finance, with senior specialists only. The pattern is the same every time. The model is rarely the bottleneck. The conditions are. Honest projects fight the conditions early and keep measuring accuracy against reality instead of against the demo.
If you have a process where a person is making a slow, subjective, or error-prone visual judgment, there is probably a case here, and there is probably a hard part that the demo will not show you. Book a call and we will figure out which half of the problem you actually have. You can also read more of the work our team has shipped, or the heavy industry and mining side of it if that is your floor.
The clean frame in the demo is the easy part. The camera angle, the line speed, and the blur are the reason production computer vision is hard, and the reason it is worth scoping like an engineer instead of buying it like a slide.