on
From DevOps to MLOps: Airflow’s Hybrid Executor and KServe on Kubernetes
Modern MLOps looks a lot like DevOps—pipelines, containers, CI/CD—but with new wrinkles: GPUs, data freshness, and model serving. If you already run Airflow on Kubernetes for data engineering, there’s a timely path to productionizing ML: combine Airflow 2.10’s Hybrid Executor and Dataset improvements with KServe’s CRD-based model serving on your Kubernetes cluster. KServe entered CNCF as an incubating project in September 2025, and its v0.15 release added LLM‑oriented autoscaling, caching, and a gateway integration—making it a strong fit for both classic and generative ML serving. (cncf.io)
What changed recently—and why you should care
- Airflow 2.10 introduced the Hybrid Executor and major Dataset UX enhancements, letting you mix executors per task (e.g., run training on Kubernetes while lightweight bookkeeping stays local) and reason about data-triggered runs more clearly. (airflow.apache.org)
- Dynamic Task Mapping became easier in 2.9 with custom names for mapped tasks, which is great for hyperparameter sweeps—you’ll see lr=0.01 instead of “map index 5” in the UI. (airflow.apache.org)
- The official Airflow Helm chart requires a Kubernetes 1.30+ cluster—relevant if you’re upgrading clusters or building new ones for GPU nodes. (airflow.apache.org)
- KServe v0.15 shipped LLM-friendly features (KEDA-based autoscaling on LLM metrics, KV cache integration, multi-node inference) and an Envoy AI Gateway integration—useful if you’re serving embeddings or chat models behind consistent APIs. (kserve.github.io)
Reference architecture
- Orchestration: Airflow 2.10+ on Kubernetes via the official Helm chart. Select the Hybrid Executor to route some tasks to Kubernetes and others to a local or Celery backend. (airflow.apache.org)
- Training: Run Python tasks on Kubernetes using the KubernetesExecutor (each task in its own pod) or launch bespoke containers with KubernetesPodOperator when you need custom images/GPUs. (airflow.apache.org)
- Serving: Deploy the trained model as a KServe InferenceService CRD. For MLflow-packaged models, you can use KServe’s MLflow runtime; for LLMs, use the Hugging Face or vLLM runtimes and KEDA autoscaling. (mlflow.org)
- Triggers: Use Airflow Datasets and conditional expressions to kick off retraining when data updates, not just on cron. (airflow.apache.org)
A minimal training-to-serving DAG
Below is a compact pattern that:
- listens for a feature table update (Dataset),
- fans out training jobs (dynamic mapping),
- picks a winner,
- deploys to KServe.
from datetime import datetime
from airflow.decorators import dag, task
from airflow.datasets import Dataset
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
FEATS = Dataset("s3://my-ml-bucket/feature-table.parquet")
@dag(
dag_id="train_and_deploy_kserve",
start_date=datetime(2024, 1, 1),
schedule=[FEATS], # data-aware scheduling
catchup=False,
default_args={"executor": "KubernetesExecutor"}, # route tasks to K8s by default
)
def pipeline():
@task(map_index_template="-")
def train(hparams: dict) -> dict:
# Your training code (runs in a pod with KubernetesExecutor)
# Return metrics + model URI
return {"metric": 0.9, "model_uri": f"s3://my-ml-bucket/models/{hparams['lr']}-{hparams['depth']}"}
# Fan-out: hyperparameter sweep with readable map labels in UI (Airflow 2.9+)
trials = train.expand(hparams=[
{"lr": 0.01, "depth": 6},
{"lr": 0.02, "depth": 8},
{"lr": 0.05, "depth": 10},
])
@task
def select_best(results: list[dict]) -> str:
best = max(results, key=lambda r: r["metric"])
return best["model_uri"]
best_uri = select_best(trials)
# Deploy the winner via a tiny kubectl container
deploy = KubernetesPodOperator(
task_id="deploy_to_kserve",
name="kserve-apply",
image="bitnami/kubectl:1.30",
cmds=["/bin/sh", "-c"],
arguments=[r"""
cat <<'EOF' | kubectl apply -n ml -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: churn-model
spec:
predictor:
model:
modelFormat:
name: mlflow
storageUri:
EOF
"""],
get_logs=True,
)
deploy.set_upstream(best_uri)
pipeline()
- With KubernetesExecutor, each task instance runs in its own pod; the PodOperator task uses a purpose-built image (here, kubectl) to apply a KServe InferenceService. (airflow.apache.org)
- If you package models with MLflow, KServe’s MLflow runtime can load the model directly from the storageUri. (mlflow.org)
- For LLM serving, swap the predictor runtime to Hugging Face or vLLM and consider enabling KEDA autoscaling based on token throughput or concurrency. (kserve.github.io)
KubernetesExecutor vs. KubernetesPodOperator in ML pipelines
- KubernetesExecutor: best when your Airflow worker image already contains your training dependencies and you want Airflow to spin up an ephemeral pod per task. It’s simple and scales with your cluster. (airflow.apache.org)
- KubernetesPodOperator: best when each task needs its own container image (e.g., CUDA-enabled PyTorch, custom toolchains). It does not require the KubernetesExecutor and gives you per-task control over images, resources, and env. (airflow.apache.org)
A practical rule: start with KubernetesExecutor for Python-heavy TaskFlow jobs; switch to KubernetesPodOperator for GPU jobs or language-specific containers.
Operations checklist
- Cluster and chart versions: the Airflow Helm chart targets Kubernetes 1.30+. Ensure your kubectl image and cluster APIs match. (airflow.apache.org)
- Scheduling by data, not time: use Dataset conditions (AND/OR) when retraining should wait for multiple upstream data assets. (airflow.apache.org)
- Readable fan-outs: use map_index_template so hyperparameter runs are obvious in the UI. It reduces on-call guesswork. (airflow.apache.org)
- Serving for LLMs: when deploying generative models, use KServe’s v0.15 features like KEDA autoscaling and caching to keep latency predictable under bursty token streams. (kserve.github.io)
- Choose executors per task: with the Hybrid Executor, keep lightweight tasks local/Celery while heavy training and deployment steps run on Kubernetes. This avoids overloading your K8s control plane with tiny pods. (airflow.apache.org)
Takeaway
If your team already knows DevOps on Kubernetes, you’re close to MLOps. Airflow 2.10’s Hybrid Executor and dataset-centric scheduling give you clean control over when and where ML work runs; KServe turns a trained artifact URI into a versioned, autoscaled endpoint. Start with the minimal pattern above, then layer in GPU nodepools, per‑task images, and KEDA policies as your workload grows. (airflow.apache.org)