Hybrid Executors, Datasets, and KEDA: a practical path from DevOps to MLOps with Airflow on Kubernetes

Moving ML from notebooks to production feels a bit like taking a garage band on tour. You need the right stage (Kubernetes), a reliable manager (Airflow), and a crew that can scale up for the big shows and dial down for the coffee-shop gigs (autoscaling). Recently, Airflow gained a set of features that make this “tour” a lot easier: multi-executor support, richer data-aware scheduling with DatasetAlias, and better efficiency via deferrable operators. Combined with KEDA-driven autoscaling on Kubernetes, these unlock a clean, cost-aware pattern for MLOps.

Below is a field-tested blueprint you can adapt, plus minimal code snippets to get you moving.

What changed recently—and why it matters

Taken together, these unlock a pragmatic pattern: keep orchestration overhead small for tiny steps, burst to Kubernetes for model training, trigger downstream steps via datasets, and autoscale the fleet.

Reference design: one pipeline, two executors

Picture a typical supervised-learning pipeline:

We’ll configure two executors—Celery and Kubernetes—and show how to route tasks accordingly.

1) Configure Airflow for multi-executor

In airflow.cfg (or via env vars), list executors in a comma-separated string. The first is your default, and you can assign others per task/DAG.

[core]
executor = CeleryExecutor,KubernetesExecutor

You can also define custom aliases for executors to make DAG code cleaner. Then, set the executor at the DAG or task level via the executor parameter. (airflow.apache.org)

2) A minimal DAG that splits work across executors

The snippet below shows:

from datetime import timedelta
import pendulum

from airflow import DAG
from airflow.decorators import task
from airflow.datasets import Dataset, DatasetAlias

# Kubernetes Python client types for pod overrides
from kubernetes.client import models as k8s

# Default to Celery for small tasks; we’ll override on the heavy one
default_args = {
    "owner": "mlops",
    "retries": 1,
    "retry_delay": timedelta(minutes=5),
    "executor": "CeleryExecutor",  # all tasks unless overridden
}

with DAG(
    dag_id="train_and_register_model",
    start_date=pendulum.datetime(2025, 1, 1, tz="UTC"),
    schedule=None,
    catchup=False,
    default_args=default_args,
    tags=["mlops"],
):
    @task
    def prepare_features():
        # quick, lightweight logic
        return {"train_path": "s3://bucket/features/train.parquet"}

    # Request GPU/CPU/memory via pod_override (KubernetesExecutor)
    pod_override = k8s.V1Pod(
        spec=k8s.V1PodSpec(
            containers=[
                k8s.V1Container(
                    name="base",
                    image="us-docker.pkg.dev/myproj/ml/train:cuda-12.2-py310",
                    resources=k8s.V1ResourceRequirements(
                        limits={
                            "cpu": "4",
                            "memory": "16Gi",
                            # GPUs are requested via limits; requests must match if you set them
                            # The device plugin must be installed on the cluster nodes.
                            "nvidia.com/gpu": "1",
                        }
                    ),
                )
            ]
        )
    )

    @task(
        executor="KubernetesExecutor",
        executor_config={"pod_override": pod_override},
        outlets=[DatasetAlias("model-ready:fraud-detector")],
    )
    def train_model(inputs: dict, *, outlet_events):
        # Training code would live in the container image; here we just simulate a path
        model_uri = "s3://bucket/models/fraud/v123/model.pt"
        # Emit a dataset event via alias (decouples consumers from the exact URI)
        outlet_events[DatasetAlias("model-ready:fraud-detector")].add(
            Dataset(model_uri),
            extra={"metric": "auc", "value": 0.93},
        )
        return {"model_uri": model_uri}

    @task
    def evaluate_model(artifact: dict):
        # quick post-processing or registry calls
        return {"approved": True, "model_uri": artifact["model_uri"]}

    artifacts = prepare_features()
    trained = train_model(artifacts)
    evaluate_model(trained)

3) Decoupled deployment via DatasetAlias

Now create a separate DAG that deploys when a “model-ready” alias is emitted, regardless of where the file actually lives.

from airflow import DAG
from airflow.decorators import task
from airflow.datasets import DatasetAlias

import pendulum

with DAG(
    dag_id="deploy_model",
    start_date=pendulum.datetime(2025, 1, 1, tz="UTC"),
    schedule=DatasetAlias("model-ready:fraud-detector"),
    catchup=False,
):
    @task
    def deploy():
        # Pull last dataset event details via inlet_events if needed
        # e.g., which model_uri was produced, and its metrics
        print("Deploying the latest approved model...")

    deploy()

This schedule activates once the alias resolves to an actual dataset event (for example, s3://bucket/models/...), letting you evolve storage layouts over time without breaking triggers. The docs show how to emit via outlet_events and how alias-based schedules reconcile after resolution. (airflow.apache.org)

Autoscale the fleet with KEDA (experimental)

If you run Celery workers for light tasks, enable KEDA to scale them based on the number of queued or running tasks divided by your configured worker concurrency. With the official chart, it’s a few lines:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda

kubectl create namespace airflow
helm repo add apache-airflow https://airflow.apache.org
helm install airflow apache-airflow/airflow \
  --namespace airflow \
  --set executor=CeleryExecutor \
  --set workers.keda.enabled=true

Under the hood, the chart creates a KEDA ScaledObject and HPA that look at the Airflow metadata DB to decide the worker count. Make sure your database tier can handle that query when load spikes. (airflow.apache.org)

A few operational knobs and gotchas

Why this pattern works for MLOps

Next steps

If your team has lived mostly in DevOps land, this is a friendly on-ramp to MLOps: clear ownership per task, cost-aware scaling, and data-driven triggers—without adopting a whole new platform. And just like a good band, the parts sound better together than they do alone.