Written by Nelson Koning
on January 05, 2026

From DevOps to MLOps: Running GPU-Accelerated ML Pipelines with Airflow on Kubernetes

Orchestrating ML pipelines is like conducting a jazz ensemble: you want tight coordination, room for improvisation, and the ability to scale up when the soloist (your GPU job) needs to roar. For teams moving from DevOps-style CI/CD into production MLOps, Airflow + Kubernetes is a natural pair — but GPUs change the rhythm. This article walks through practical patterns, pitfalls, and code examples for running GPU-accelerated ML tasks with Airflow on Kubernetes.

Why Airflow + Kubernetes for ML pipelines?

Airflow gives you DAG-based orchestration and robust retry/recovery semantics. Kubernetes provides isolation, autoscaling and cluster-level resource management. Put them together and you can:

Launch ML tasks as ephemeral pods so each training or preprocessing job gets its own clean environment.
Use the cluster’s scheduler for placement and resource control (CPU, memory, GPUs).
Scale control plane and workers independently of ML workload size.

Airflow’s Kubernetes integration supports two common modes: the Kubernetes Executor (each task runs in its own pod) and the KubernetesPodOperator (explicit pod creation from a task). Both are first-class options when you’re running Airflow on—or connecting it to—Kubernetes. (airflow.apache.org)

The GPU reality: device plugins, limits, and constraints

Kubernetes exposes GPUs through a device plugin framework. That means GPUs are a schedulable resource (for example nvidia.com/gpu) that pods can request. Important: GPUs are normally specified in the pod spec’s limits (Kubernetes treats GPUs as an allocatable device), which has some implications for how you declare and think about resource reservations. (kubernetes.io)

Two practical consequences:

You request GPUs in the limits section (e.g., limits: { "nvidia.com/gpu": 1 }). Kubernetes will treat that as both request and limit for scheduling.
GPU sharing is not “free” or always straightforward: options like NVIDIA MIG, MPS, or vendor-specific schedulers exist to improve utilization, but each comes with trade-offs (isolation, complexity, licensing). (kubernetes.io)

NVIDIA MIG and GPU Operator — better utilization, some operational cost

Multi-Instance GPU (MIG) lets certain NVIDIA GPUs be partitioned into multiple isolated instances (up to seven on some architectures), each with guaranteed memory and compute slices. In a Kubernetes cluster, the NVIDIA GPU Operator + MIG Manager coordinates driver installation, device plugin registration, and MIG configuration so pods can see and request those slices as discrete resources. This can drastically improve utilization for mixed workloads (not every job needs a full GPU). But reconfiguring MIG geometry may require careful sequencing and occasionally a node reboot. (docs.nvidia.com)

Think of MIG as dividing a piano into multiple smaller instruments: each performer gets a playable keyboard, but re-tuning (rebooting) the room sometimes needs a pause in the concert.

Practical patterns: KubernetesPodOperator and GPU requests

If you want task-level control (image, environment, exact pod spec), KubernetesPodOperator (KPO) is the usual choice. Below is a minimal example that requests a GPU and pins the pod to GPU-capable nodes via a node selector or affinity. This pattern is useful for single-step training tasks, inference jobs, or GPU-based validation.

Python DAG snippet (simplified):

from airflow import DAG
from datetime import datetime
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
from kubernetes.client import models as k8s

with DAG("gpu_training_dag", start_date=datetime(2025, 1, 1), schedule_interval=None) as dag:

    train = KubernetesPodOperator(
        task_id="train_model",
        name="train-model-pod",
        namespace="ml-jobs",
        image="myorg/ml-train:latest",
        cmds=["python", "train.py"],
        resources=k8s.V1ResourceRequirements(
            limits={"nvidia.com/gpu": "1", "cpu": "4", "memory": "16Gi"}
        ),
        node_selector={"accelerator": "nvidia-gpu"},
        get_logs=True,
        is_delete_operator_pod=True,
    )

Notes:

Place the GPU resource under limits as shown; Kubernetes uses that for scheduling. (kubernetes.io)
Use node selectors or node affinity to prefer nodes labeled for GPUs (the GPU Operator usually labels nodes for you).
is_delete_operator_pod=True keeps the cluster tidy by tearing down ephemeral pods.

KubernetesPodOperator is flexible: you can pass complete pod specs (using Kubernetes client V1Pod objects) if your job needs special volumes, init containers, or sidecars. (airflow.apache.org)

KubernetesExecutor vs KubernetesPodOperator: when to use which

Kubernetes Executor: air-gapped, simple operational model where Airflow scheduler spawns worker pods automatically per task. Good for homogeneous workloads and when you want Airflow to manage worker lifecycle. (airflow.apache.org)
KubernetesPodOperator: gives you fine-grained control per task (exact pod definition), which is handy if some tasks need GPUs, special volumes (models, datasets), or different container runtimes.

For mixed pipelines (some CPU-heavy ETL, some GPU-heavy training), a common approach is to let non-GPU tasks run under the KubernetesExecutor while using KubernetesPodOperator for the GPU-heavy steps so you can tailor the pod spec precisely.

Scheduling, utilization and fairness

GPUs are expensive, and naive scheduling often leads to fragmentation (many jobs each requiring a whole GPU but leaving unused memory/compute). Strategies to improve utilization include:

Use MIG or hardware-backed vGPU to partition large GPUs for multiple smaller jobs (better for many small experiments). (nvidia.com)
Time-slicing or MPS (for some NVIDIA stacks) can let multiple processes share a GPU, but beware of interference and reduced predictability.
Use priority classes and preemption for critical training (orference) jobs; combine with queue-based autoscaling for worker node pools to match demand.
Monitor GPU metrics (utilization, memory) and feed them into scheduling decisions; the GPU Operator and DCGM exporters can help export those metrics to Prometheus.

Kubernetes’ device plugin model and available sharing options have improved, but there remain limits and trade-offs — it’s not a plug-and-play for perfect efficiency. (kubernetes.io)

Images, drivers, and runtime expectations

GPU workloads need the right mix of driver, container runtime, and user image:

The cluster must have appropriate GPU drivers and the NVIDIA device plugin (or vendor equivalent) installed — GPU Operator automates much of this lifecycle. (github.com)
Container images should include the compatible CUDA/cuDNN versions required by your frameworks (or rely on base images provided by NVIDIA).
For reproducibility, bake the model training environment into images or use init steps to fetch dependencies; ephemeral pods are great for avoiding “works on my machine” drift.

Observability and safety

Treat GPU pods like fragile soloists: watch their heat (temperature), memory use, and runtime. Export GPU metrics (DCGM, Prometheus) and tie alerts to runaway memory or low utilization. Use Pod security policies and node isolation to prevent noisy neighbors from affecting critical workloads. The GPU Operator and the DCGM exporter are useful components in this visibility stack. (github.com)

Example pitfalls (and how they sound in the wild)

“My job got scheduled but container can’t see the GPU” — usually driver/device-plugin mismatch or missing node labels.
“Jobs sit queued even though cluster looks idle” — often because pods request a full GPU and remaining capacity is fragmented; consider MIG or scaling nodes.
“Training is slow in multi-tenant cluster” — interference from shared GPUs or wrong CUDA/CUDA-driver combos.

These are common operational notes; the remedies are tooling (GPU operator, correct images), policy (affinity/taints), and orchestration choices (KPO vs executor).

Putting it together — the trade-offs

Using Airflow + Kubernetes for GPU workloads gives you control and repeatability, but it raises operational responsibilities:

You get fine-grained orchestration and observability.
You must manage drivers, device plugins, node labeling and the operational model for GPU sharing (MIG vs full device).
For many organizations, the win is efficient reuse of cluster infrastructure and the ability to codify ML workflows in Airflow DAGs while letting Kubernetes do placement.

Kubernetes and vendor tools (for example NVIDIA’s GPU Operator and MIG) have matured significantly and can be production-ready for many teams — but expect an ops phase: tuning scheduling, selecting the right sharing model, and instrumenting observability. (kubernetes.io)

Final note (analogy)

If DevOps is the rhythm section keeping systems steady, MLOps with GPU orchestration is the horn section that needs its solo moments: you want tight tempo and room for dynamic expression. Airflow + Kubernetes gives you the conductor’s stand and the stage lights — but you still have to tune the instruments (drivers, images, scheduling) and choose when to let a GPU take the spotlight.

References

Kubernetes: scheduling GPUs and device plugins (device plugin model; GPUs in limits). (kubernetes.io)
Airflow: Kubernetes integration, KubernetesExecutor, KubernetesPodOperator. (airflow.apache.org)
NVIDIA: GPU Operator and MIG Manager documentation (how MIG is managed in Kubernetes, installation notes, and reconfiguration caveats). (docs.nvidia.com)

← → Top