Data pipelines
Streaming + batch, schema-enforced at the seams, with backpressure and replay. Idempotent transforms with end-to-end lineage so the question 'where did this number come from?' has a one-click answer.
- Kafka
- dbt
- Spark
- Flink
- Lineage
Enterprise data, shipped as intelligence.
Most ML lives and dies in a notebook. We aren't in that business.
We engineer the systems around the model, the pipelines that feed it, the evaluation harnesses that catch it failing, the deployment surface that lets it ship without taking the rest of the platform down. The model is the easy part; the system around it is what makes it production.
We treat eval as the live system, not a chart from before launch. We treat drift as inevitable, not surprising. We treat the production loop as the actual product, and the offline metric as a single signal in a much larger conversation.
Streaming + batch, schema-enforced at the seams, with backpressure and replay. Idempotent transforms with end-to-end lineage so the question 'where did this number come from?' has a one-click answer.
From notebook to production. Training pipelines as first-class systems, model registries as source of truth, canary deploys and shadow traffic, instant rollback when the eval line crosses.
Defect detection on real factory floors, OCR on real claims documents, identity verification on real ID cards. Not COCO benchmarks, the messy data that breaks them.
RAG that actually works at scale. Tool use you can audit. Token budgets enforced. Eval harnesses that score reasoning chains, not just final answers. Productionised, not demoed.
Offline metrics segmented by cohort and edge case. Online metrics from shadow traffic and holdouts. Calibration checks. Drift detectors that fire before the business notices. Eval is the product.
Latest hour: 88 runs · synthetic illustrative data · replace with verified figures
Engagement with Insurance, computer-vision claims pipeline. Document classification accuracy from 76% baseline to 98.4% with calibrated abstain; manual review queue cut by 71%.
Hover or focus the bar to reveal · client identity protected