The last 10% of productionizing an ML model is 90% of the work. The notebook has been committed, the accuracy is good, and the data scientist has moved on. Now the pipeline needs to run every night at 03:00, alert on failure, backfill on demand, and hand off cleanly when the oncall rotates.
Pick a single orchestration story
Mixing Airflow, cron, and ad-hoc scripts guarantees that eventually a critical job will fail silently in the seam between them. Collapse to one orchestrator early, even if it is slightly more heavy-handed than each piece needs on its own.
Contracts over correctness
The research team and the platform team do not need to agree on the best model — they need to agree on the shape of the data that crosses the boundary. We write a small schema contract (Pydantic, zod, or similar) and make both sides validate against it. Every model change goes through that gate.
Continue reading