on
Hybrid Executors, Datasets, and KEDA: a practical path from DevOps to MLOps with Airflow on Kubernetes
Moving ML from notebooks to production feels a bit like taking a garage band on tour. You need the right stage (Kubernetes), a reliable manager (Airflow), and a crew that can scale up for the big shows and dial down for the coffee-shop gigs (autoscaling). Recently, Airflow gained a set of features that make this “tour” a lot easier: multi-executor support, richer data-aware scheduling with DatasetAlias, and better efficiency via deferrable operators. Combined with KEDA-driven autoscaling on Kubernetes, these unlock a clean, cost-aware pattern for MLOps.
Below is a field-tested blueprint you can adapt, plus minimal code snippets to get you moving.
What changed recently—and why it matters
-
Run multiple Airflow executors at once. Since Airflow 2.10, you can configure more than one executor and pick which one a specific DAG or task runs on. That means you can send heavy training to KubernetesExecutor (one Pod per task) while keeping quick Python glue on Celery or Local for snappy startup and lower overhead. The feature is marked experimental in 2.10, but it’s officially documented and supports per-task selection via an
executorparameter. (airflow.apache.org) -
Data-aware scheduling grows up. Airflow’s “Datasets” let downstream DAGs trigger when upstream tasks publish a data event (not just on time). Recent docs add conditional expressions and DatasetAlias so you can decouple producers from the exact storage URI and still trigger consumers—the alias resolves to the real dataset when produced. For ML, that means “new model available” can fire deployment pipelines even if the artifact URI changes by training run. (airflow.apache.org)
-
Fewer idle workers thanks to deferrable operators. In 2.10, deferrable operators can complete directly from the triggerer process without consuming a worker slot the whole time—great for long waits (sensors, external jobs) that are common in ML orchestration. (airflow.apache.org)
-
Elastic workers on Kubernetes with KEDA. The official Airflow Helm chart includes an experimental KEDA integration that scales Celery workers based on queued/running tasks, using a SQL trigger against the Airflow metadata DB. It’s a simple lever to keep costs in check when pipelines spike. (airflow.apache.org)
Taken together, these unlock a pragmatic pattern: keep orchestration overhead small for tiny steps, burst to Kubernetes for model training, trigger downstream steps via datasets, and autoscale the fleet.
Reference design: one pipeline, two executors
Picture a typical supervised-learning pipeline:
- Prepare features (lightweight Python, frequent).
- Train model (heavy container with GPU).
- Evaluate and register model (light).
- Trigger deployment whenever a “model-ready” signal appears.
We’ll configure two executors—Celery and Kubernetes—and show how to route tasks accordingly.
1) Configure Airflow for multi-executor
In airflow.cfg (or via env vars), list executors in a comma-separated string. The first is your default, and you can assign others per task/DAG.
[core]
executor = CeleryExecutor,KubernetesExecutor
You can also define custom aliases for executors to make DAG code cleaner. Then, set the executor at the DAG or task level via the executor parameter. (airflow.apache.org)
2) A minimal DAG that splits work across executors
The snippet below shows:
- Lightweight feature prep on Celery (default).
- A training task on KubernetesExecutor with a custom image and Pod spec (including resource limits).
- Emitting a dataset event via DatasetAlias to trigger downstream deployment without hard-coding the artifact path.
from datetime import timedelta
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.datasets import Dataset, DatasetAlias
# Kubernetes Python client types for pod overrides
from kubernetes.client import models as k8s
# Default to Celery for small tasks; we’ll override on the heavy one
default_args = {
"owner": "mlops",
"retries": 1,
"retry_delay": timedelta(minutes=5),
"executor": "CeleryExecutor", # all tasks unless overridden
}
with DAG(
dag_id="train_and_register_model",
start_date=pendulum.datetime(2025, 1, 1, tz="UTC"),
schedule=None,
catchup=False,
default_args=default_args,
tags=["mlops"],
):
@task
def prepare_features():
# quick, lightweight logic
return {"train_path": "s3://bucket/features/train.parquet"}
# Request GPU/CPU/memory via pod_override (KubernetesExecutor)
pod_override = k8s.V1Pod(
spec=k8s.V1PodSpec(
containers=[
k8s.V1Container(
name="base",
image="us-docker.pkg.dev/myproj/ml/train:cuda-12.2-py310",
resources=k8s.V1ResourceRequirements(
limits={
"cpu": "4",
"memory": "16Gi",
# GPUs are requested via limits; requests must match if you set them
# The device plugin must be installed on the cluster nodes.
"nvidia.com/gpu": "1",
}
),
)
]
)
)
@task(
executor="KubernetesExecutor",
executor_config={"pod_override": pod_override},
outlets=[DatasetAlias("model-ready:fraud-detector")],
)
def train_model(inputs: dict, *, outlet_events):
# Training code would live in the container image; here we just simulate a path
model_uri = "s3://bucket/models/fraud/v123/model.pt"
# Emit a dataset event via alias (decouples consumers from the exact URI)
outlet_events[DatasetAlias("model-ready:fraud-detector")].add(
Dataset(model_uri),
extra={"metric": "auc", "value": 0.93},
)
return {"model_uri": model_uri}
@task
def evaluate_model(artifact: dict):
# quick post-processing or registry calls
return {"approved": True, "model_uri": artifact["model_uri"]}
artifacts = prepare_features()
trained = train_model(artifacts)
evaluate_model(trained)
-
The per-task
executorselection andexecutor_configwithpod_overrideare the key pieces that let you keep Python-only tasks fast (Celery) and run heavyweight training in a dedicated Kubernetes Pod. (airflow.apache.org) -
Requesting GPUs in Kubernetes is done via
resources.limits["nvidia.com/gpu"], and Kubernetes treats GPU limits as requests by default (and requires them to match if you specify both). Make sure the device plugin is installed in your cluster. (kubernetes.io)
3) Decoupled deployment via DatasetAlias
Now create a separate DAG that deploys when a “model-ready” alias is emitted, regardless of where the file actually lives.
from airflow import DAG
from airflow.decorators import task
from airflow.datasets import DatasetAlias
import pendulum
with DAG(
dag_id="deploy_model",
start_date=pendulum.datetime(2025, 1, 1, tz="UTC"),
schedule=DatasetAlias("model-ready:fraud-detector"),
catchup=False,
):
@task
def deploy():
# Pull last dataset event details via inlet_events if needed
# e.g., which model_uri was produced, and its metrics
print("Deploying the latest approved model...")
deploy()
This schedule activates once the alias resolves to an actual dataset event (for example, s3://bucket/models/...), letting you evolve storage layouts over time without breaking triggers. The docs show how to emit via outlet_events and how alias-based schedules reconcile after resolution. (airflow.apache.org)
Autoscale the fleet with KEDA (experimental)
If you run Celery workers for light tasks, enable KEDA to scale them based on the number of queued or running tasks divided by your configured worker concurrency. With the official chart, it’s a few lines:
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda
kubectl create namespace airflow
helm repo add apache-airflow https://airflow.apache.org
helm install airflow apache-airflow/airflow \
--namespace airflow \
--set executor=CeleryExecutor \
--set workers.keda.enabled=true
Under the hood, the chart creates a KEDA ScaledObject and HPA that look at the Airflow metadata DB to decide the worker count. Make sure your database tier can handle that query when load spikes. (airflow.apache.org)
A few operational knobs and gotchas
-
Choose the right executor as a default. In the config, the first executor is the environment’s default; any task or DAG without an explicit
executoruses it. A common pattern is Celery as default and Kubernetes for heavy tasks. You can also give custom executors an alias for readability in code. (airflow.apache.org) -
Keep pods predictable with
pod_override. With the KubernetesExecutor you can override image, volumes, env, sidecars, and resources per task using a fullV1Podspec—handy for training images that differ from your Airflow base image. (airflow.apache.org) -
Deferrable operators to cut costs. For steps that “wait” (file-arrival sensors, external job completion), deferrable operators can now finish from the triggerer so you don’t pin worker slots or Pods for hours. That typically saves real money in cloud clusters. (airflow.apache.org)
-
GPUs require a device plugin and must be requested via limits. In Kubernetes, specify GPUs under
resources.limitsusing the vendor key (for NVIDIA:nvidia.com/gpu). Kubernetes treats the GPU limit as the request if you don’t set one; if you set both, they must match. Managed services like GKE have extra conveniences, but the rule is the same. (kubernetes.io) -
Data-aware scheduling is powerful—don’t forget observability. Airflow 2.10 improved dataset UI, including graph overlays that show which dataset fired a run. When pipelines get complex, those visuals help you debug “why did this run?” quickly. (airflow.apache.org)
-
It’s new; test before you bet the farm. Multi-executor support shipped in 2.10 and is still flagged as experimental. Roll out incrementally and monitor. (airflow.apache.org)
Why this pattern works for MLOps
-
You keep orchestration nimble. The little tasks stay on a quick-start executor; big jobs get their own pods with just the right image and resources.
-
You decouple triggers from storage details. With DatasetAlias, deployment workflows listen to a “model-ready” signal—not a hard-coded S3 key pattern—so teams can change artifact paths without rewiring DAGs. (airflow.apache.org)
-
You scale to fit the music. KEDA expands workers when the queue swells and shrinks when it’s quiet, much like adding session musicians for a stadium show and going back to a trio at the club. (airflow.apache.org)
-
You avoid paying for waiting. Deferrable operators stop burning compute while they wait—ideal for cloud budgets and ML pipelines that juggle external jobs and data-ready events. (airflow.apache.org)
Next steps
- Stand up a staging Airflow on Kubernetes with Celery + Kubernetes executors.
- Move a single training task to KubernetesExecutor via
executor="KubernetesExecutor"and apod_overridethat sets resources and image. - Emit a dataset alias from training and switch your deployment DAG to listen to that alias.
- Enable KEDA for Celery if you see queue bursts (and confirm DB capacity).
- Add deferrable sensors to long waits.
If your team has lived mostly in DevOps land, this is a friendly on-ramp to MLOps: clear ownership per task, cost-aware scaling, and data-driven triggers—without adopting a whole new platform. And just like a good band, the parts sound better together than they do alone.