What is Ahmed Mohammed's expertise in AI/ML?

Ahmed Mohammed is an AI/ML Engineer specializing in computer vision, diffusion models, and out-of-distribution detection. He achieved 99.03% AUROC on CIFAR-10 OOD detection in his Master's thesis at JKU Linz and 98.4% defect detection accuracy at PROFACTOR GmbH. He is proficient in PyTorch, YOLOv8, conditional diffusion models, and full-stack ML engineering.

Faultrix is an AI-powered construction quality control SaaS founded by Ahmed Mohammed. It analyzes building photos and generates ÖNORM B 2110-compliant technical reports in under 1 minute. The platform features SHA-256 evidence chain, DSGVO compliance, and AES-256 encryption. Visit faultrix.com for more information.

Where is Ahmed Mohammed based and how can I contact him?

Ahmed Mohammed is based in Linz, Austria. He can be contacted via email at ahmed.mo.0595@gmail.com, on LinkedIn at linkedin.com/in/ahmed-3m, or through his portfolio at ahmed-3m.github.io. He is open to senior AI/ML roles and research collaborations.

What industries does Ahmed Mohammed work with?

Ahmed Mohammed works primarily with industrial manufacturing, construction, and quality control sectors. His work at PROFACTOR GmbH focused on industrial anomaly detection for inkjet-printed building components (Zer0P project, funded by Upper Austria government). Faultrix serves the Austrian construction industry with AI-powered building analysis and ÖNORM-compliant reporting.

What is Ahmed Mohammed's educational background?

Ahmed Mohammed holds a Master of Science in Artificial Intelligence from Johannes Kepler University Linz (JKU), where his thesis on conditional diffusion models for out-of-distribution detection was supervised by Prof. Sepp Hochreiter (inventor of LSTM). He also holds a Bachelor of Science in Mechatronics Engineering from Eastern Mediterranean University in Cyprus.

What was Ahmed Mohammed's Master's thesis about?

Ahmed Mohammed's Master's thesis at JKU Linz (2026) is titled 'Conditional Diffusion Models as Generative Classifiers for Out-of-Distribution Detection'. He achieved 99.03% AUROC on CIFAR-10, improving the baseline by 18.8 percentage points through a novel class-conditional separation loss. The work was supervised by Prof. Sepp Hochreiter and applied to industrial quality control with multi-head conditioning.

Is Ahmed Mohammed available for hire or freelance AI/ML projects?

Ahmed Mohammed is open to senior AI/ML engineering roles and research collaborations. He specializes in computer vision, diffusion models, and out-of-distribution detection. Based in Linz, Austria, he is available for positions in the DACH region and remote roles. Contact: ahmed.mo.0595@gmail.com or LinkedIn: linkedin.com/in/ahmed-3m

What AI services does Faultrix offer for the construction industry?

Faultrix is an AI-powered construction quality control SaaS. It analyzes building site photos and generates ÖNORM B 2110-compliant technical reports in under 1 minute. The platform provides SHA-256 evidence chains for legal defensibility, DSGVO-compliant data handling, and AES-256 encryption. It serves building inspectors, construction companies, and real estate developers primarily in Austria, Germany, and Switzerland. Visit faultrix.com for details.

What types of AI problems does Ahmed Mohammed solve?

Ahmed Mohammed specializes in: (1) industrial defect detection and anomaly detection systems using YOLO and diffusion models — achieving 98.4% accuracy in production at PROFACTOR GmbH; (2) out-of-distribution detection for production AI safety — 99.03% AUROC on CIFAR-10 benchmark; (3) full-stack AI product development, from research to deployed SaaS (Faultrix); (4) computer vision pipelines for manufacturing and construction industries.

Why use YOLO as a feature extractor instead of a detector?

The 8 feature types in inkjet-printed building components (edge quality, dot density, distance measurements, angular precision) do not naturally decompose into bounding boxes. The whole component image is the input; the output is a single GOOD/BAD classification per feature type.

What inference latency did the production system achieve?

~35ms per component using FP16 precision on an NVIDIA Jetson AGX edge device. INT8 quantization achieved 18ms but showed occasional instability on specific edge/angle features, so FP16 was used in production.

How was the model evaluated with only 1,327 images?

5-fold stratified cross-validation (seed=42), stratified on GOOD/BAD ratio per feature type. No data leakage between folds. Per-feature AUROC was reported to capture heterogeneity across feature types.

YOLOv8 for Industrial Quality Control: Architecture Decisions That Moved the Needle

## Not a Tutorial — A Post-Mortem

Most YOLO tutorials show you how to train on COCO and get to 90% mAP in 20 lines of code. That's not industrial quality control.

In a real factory, you have: class imbalance (3:1 GOOD:BAD), image acquisition noise from production lighting, ~1,300 images total (not 118,000), and a 50ms inference budget. Here's what actually moved the needle at PROFACTOR GmbH.

What We Used YOLO For (Not Bounding Boxes)

The first unconventional decision: we did not use YOLOv8 as a detector. We used it as a feature extractor.

The 8 feature types in inkjet-printed building components (edge quality, dot density, distance measurements, angular precision) don't naturally decompose into bounding boxes. The whole component image is the input; the output is a single GOOD/BAD classification per feature type.

YOLOv8's backbone was pre-trained on general visual patterns and then fine-tuned on our component geometry. We extracted the bottleneck features (after layer 9 of the backbone, before the detection head) and fed them into a downstream diffusion model. Details in the [companion post on diffusion models](/blog/diffusion-models-anomaly-detection).

Data Engineering: The Real Work

With 1,327 components across 8 feature types:

Class distribution problem: 70-80% GOOD, 20-30% BAD depending on feature type. The angle feature had only 2–4 BAD samples per fold — we marked it as unreliable and excluded it from performance comparisons.

What worked for augmentation: - Random brightness/contrast shifts (±20%) — simulates production lighting variation - Horizontal/vertical flips — components are orientation-invariant - CutMix between GOOD samples — forces feature robustness

What didn't work: - Synthetic defect generation (GANs) — the synthetic defects didn't match real failure modes - Heavy geometric augmentation — component geometry is measurement-critical; rotating a distance feature changes its meaning

Cross-validation setup: 5-fold stratified CV (seed=42), stratified on GOOD/BAD ratio per feature type. No data leakage between folds. This is non-negotiable when N=1,327.

Training Details That Matter

# Key hyperparameters that moved the needle
config = {
    'lr': 1e-4,                    # lower than default — small dataset
    'batch_size': 16,              # constrained by GPU memory on edge device
    'epochs': 150,                 # with early stopping patience=15
    'warmup_epochs': 5,
    'lr_scheduler': 'cosine',
    'weight_decay': 1e-4,
    'freeze_backbone_epochs': 10,  # freeze backbone, train head first
    'unfreeze_lr_factor': 0.1,    # 10x lower LR for backbone after unfreeze
}

The freeze-then-unfreeze strategy was critical. With 1,300 images, fine-tuning the full backbone from the start causes catastrophic forgetting of the low-level features. Freeze for 10 epochs, then unfreeze at 10× lower LR.

Evaluation: What AUROC Tells You vs What It Doesn't

We reported AUROC as the primary metric because it's threshold-independent. The 98.4% accuracy figure is a threshold-dependent metric calibrated at FPR=5% — meaning we accept up to 5% false positives (GOOD called BAD) in exchange for the highest possible defect recall.

Per-feature AUROC was more informative: - dots: 0.956 — reliable, many samples, clear defect signature - dist6: 0.936 — reliable - edge3: 0.744 — hard problem, high variance across folds - angle: 0.817 ± 0.138 (std!) — unreliable, too few BAD samples

Reporting a single overall accuracy hides this heterogeneity. In a real deployment, you'd set different thresholds per feature type.

Production: From 35ms Research to 35ms Production

The model that ran in research (FP32, batch inference on A100) and the model that ran in production (INT8, streaming on Jetson AGX) had the same architecture but different weight precision.

INT8 quantization with TensorRT calibration: - Calibration set: 100 GOOD samples, balanced across feature types - Accuracy loss: <0.3% AUROC — acceptable - Latency improvement: 35ms → 18ms (2× speedup) - Memory reduction: 4× smaller model footprint

We ultimately ran at FP16 in production (not INT8) because INT8 showed occasional instability on specific edge/angle features. FP16 at 35ms was within the 50ms budget with margin.

What I'd Do Differently

**Collect more BAD samples aggressively from day one** — the angle feature's unreliability was entirely a data problem, not a model problem
**Use Monte Carlo Dropout as a confidence signal** in addition to the diffusion score — gives a second, orthogonal uncertainty estimate
**Log everything from the start** — Weights & Biases was added mid-project; early experiment results were lost in local files

The full technical report is published: [Diffusion-Based Multi-class Defect Detection PDF](https://ahmed-3m.github.io/Diffusion-Based%20Multi-class%20Defect%20Detection.pdf).