Diffusion Models for Industrial Defect Detection: 98.4% Accuracy at PROFACTOR GmbH
## The Result First
At PROFACTOR GmbH (Steyr, Austria), I built a machine vision pipeline that detects defects in inkjet-printed building components with 98.4% accuracy in a real-time production environment. This is not a benchmark number — it ran on the factory floor, on live production data, with real consequences.
The project was part of Zer0P — a zero-defect manufacturing initiative funded by the Government of Upper Austria. The goal: eliminate defects in inkjet-printed construction components before they leave the production line.
The Problem: Why Standard Computer Vision Fails Here
Industrial defect detection has properties that make standard classification approaches break down:
- **Defects are rare**: In a production run of 1,327 components, only ~20–30% are defective. Classic classification models become overconfident on the majority class.
- **Defects are diverse**: 8 distinct feature types (edge quality, dot density, distance measurements, angle precision), each with its own failure modes.
- **Zero labeled anomaly examples at training time**: You have thousands of GOOD parts. You rarely have enough BAD ones to train a supervised classifier.
- **Real-time constraint**: 50ms per component max, running on an edge GPU.
Standard approaches (ResNet classifier, simple thresholding) fail because they need labeled defect examples and can't generalize to unseen defect types.
The Architecture: YOLOv8 + Conditional Diffusion Model
Stage 1: YOLOv8 Feature Extraction
YOLOv8 was used as a feature backbone, not as a detector. Instead of using its bounding box outputs, I extracted the intermediate feature maps from the backbone and used them as the input representation.
Why YOLO? The model had already been pre-trained on the component geometry. Its features encode spatial relationships, edge sharpness, and pattern regularity — exactly what matters for inkjet quality.
# Extract features from YOLOv8 backbone
backbone_features = yolo_model.model[:10](image) # layers 0-9 only
# Shape: [B, 512, H/32, W/32]
Stage 2: Conditional Diffusion Model as Generative Classifier
The diffusion model was trained to model p(features | class) — the distribution of YOLO features for GOOD components, conditioned on feature type (edge, dots, distance, etc.).
At inference: 1. Extract YOLO features from the component being inspected 2. Score the features against each class-conditional distribution 3. Low score across all classes → the component is anomalous (BAD)
This is the same framework as my Master's thesis OOD work — industrial data was one of the validation domains.
Multi-Head Conditioning
Because the 8 feature types (angle, dist1, dist6, dots, edge1-4) behave very differently, a single shared model fails. Solution: multi-head conditioning — one learned embedding per feature type, all sharing the same U-Net backbone.
class MultiHeadConditionalDiffusion(nn.Module):
def __init__(self, n_features=8, embed_dim=256):
super().__init__()
# One embedding per feature type
self.feature_embeddings = nn.Embedding(n_features, embed_dim)
self.unet = ConditionalUNet(cond_dim=embed_dim)def forward(self, x, t, feature_type_idx): cond = self.feature_embeddings(feature_type_idx) return self.unet(x, t, cond) ```
Results by Feature Type
| Feature | Baseline AUROC | Method AUROC | Δ | |---------|---------------|-------------|---| | dots | 0.956 | 0.963 | +0.7% | | dist6 | 0.936 | 0.941 | +0.5% | | dist1 | 0.887 | 0.857 | −3.0% | | angle | 0.817 | 0.568 | unreliable (few BAD samples) | | edge1 | 0.796 | 0.818 | +2.2% | | edge2 | 0.813 | 0.803 | −1.0% | | edge3 | 0.744 | 0.875 | +13.1% | | edge4 | 0.762 | 0.736 | −2.6% |
Overall AUROC: 86.7% baseline → 87.3% with separation loss (not statistically significant at p<0.05 after Holm correction — small dataset, high variance). The 98.4% accuracy figure refers to the binary classification accuracy (GOOD/BAD) in the production deployment.
Production Deployment
The deployed system runs as a Docker container on an NVIDIA Jetson AGX at the factory edge:
- **Inference latency**: ~35ms per component (well within 50ms budget)
- **Throughput**: handled 28 components/minute on the production line
- **Uptime**: ran continuously for 6 weeks without restart
Key engineering decisions: - INT8 quantization (TensorRT) without measurable accuracy loss - Ring buffer for streaming inference - Alert threshold calibrated to FPR < 5% (human review only for borderline cases)
Lessons for Production ML in Manufacturing
- **Generative models beat discriminative ones when defect examples are scarce** — you only need GOOD data
- **Feature type matters more than architecture** — the angle feature had AUROC ~0.82 but only 2–4 BAD samples per fold, making it unreliable regardless of method
- **Statistical rigor is non-negotiable** — a single run showing improvement can be noise. We used 5-fold CV + Holm correction
- **Edge deployment is a different beast** — what works on a GPU server may not fit on an edge device
Full technical report: [Diffusion-Based Multi-class Defect Detection](https://ahmed-3m.github.io/Diffusion-Based%20Multi-class%20Defect%20Detection.pdf)
Questions or want to discuss industrial computer vision? Reach me at ahmed.mo.0595@gmail.com.
Frequently Asked Questions
What accuracy did the industrial defect detection system achieve at PROFACTOR?
98.4% binary classification accuracy (GOOD/BAD) in the production deployment. The AUROC was 86.7% baseline, improving to 87.3% with separation loss (not statistically significant on the small industrial dataset).
What is the Zer0P project at PROFACTOR GmbH?
Zer0P is a zero-defect manufacturing initiative funded by the Government of Upper Austria. It aims to eliminate defects in inkjet-printed construction components using machine vision and AI-based quality control.
How does multi-head conditioning work in diffusion models for industrial QC?
Each feature type (edge, dots, distance, angle) gets its own learned embedding while sharing a single U-Net backbone. This allows the model to handle 8 distinct feature types with different defect patterns using one model.