From DevOps to MLOps: Running GPU-Accelerated ML Pipelines with Airflow on Kubernetes

Orchestrating ML pipelines is like conducting a jazz ensemble: you want tight coordination, room for improvisation, and the ability to scale up when the soloist (your GPU job) needs to roar. For teams moving from DevOps-style CI/CD into production MLOps, Airflow + Kubernetes is a natural pair — but GPUs change the rhythm. This article walks through practical patterns, pitfalls, and code examples for running GPU-accelerated ML tasks with Airflow on Kubernetes.

Why Airflow + Kubernetes for ML pipelines?

Airflow gives you DAG-based orchestration and robust retry/recovery semantics. Kubernetes provides isolation, autoscaling and cluster-level resource management. Put them together and you can:

Airflow’s Kubernetes integration supports two common modes: the Kubernetes Executor (each task runs in its own pod) and the KubernetesPodOperator (explicit pod creation from a task). Both are first-class options when you’re running Airflow on—or connecting it to—Kubernetes. (airflow.apache.org)

The GPU reality: device plugins, limits, and constraints

Kubernetes exposes GPUs through a device plugin framework. That means GPUs are a schedulable resource (for example nvidia.com/gpu) that pods can request. Important: GPUs are normally specified in the pod spec’s limits (Kubernetes treats GPUs as an allocatable device), which has some implications for how you declare and think about resource reservations. (kubernetes.io)

Two practical consequences:

NVIDIA MIG and GPU Operator — better utilization, some operational cost

Multi-Instance GPU (MIG) lets certain NVIDIA GPUs be partitioned into multiple isolated instances (up to seven on some architectures), each with guaranteed memory and compute slices. In a Kubernetes cluster, the NVIDIA GPU Operator + MIG Manager coordinates driver installation, device plugin registration, and MIG configuration so pods can see and request those slices as discrete resources. This can drastically improve utilization for mixed workloads (not every job needs a full GPU). But reconfiguring MIG geometry may require careful sequencing and occasionally a node reboot. (docs.nvidia.com)

Think of MIG as dividing a piano into multiple smaller instruments: each performer gets a playable keyboard, but re-tuning (rebooting) the room sometimes needs a pause in the concert.

Practical patterns: KubernetesPodOperator and GPU requests

If you want task-level control (image, environment, exact pod spec), KubernetesPodOperator (KPO) is the usual choice. Below is a minimal example that requests a GPU and pins the pod to GPU-capable nodes via a node selector or affinity. This pattern is useful for single-step training tasks, inference jobs, or GPU-based validation.

Python DAG snippet (simplified):

from airflow import DAG
from datetime import datetime
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
from kubernetes.client import models as k8s

with DAG("gpu_training_dag", start_date=datetime(2025, 1, 1), schedule_interval=None) as dag:

    train = KubernetesPodOperator(
        task_id="train_model",
        name="train-model-pod",
        namespace="ml-jobs",
        image="myorg/ml-train:latest",
        cmds=["python", "train.py"],
        resources=k8s.V1ResourceRequirements(
            limits={"nvidia.com/gpu": "1", "cpu": "4", "memory": "16Gi"}
        ),
        node_selector={"accelerator": "nvidia-gpu"},
        get_logs=True,
        is_delete_operator_pod=True,
    )

Notes:

KubernetesPodOperator is flexible: you can pass complete pod specs (using Kubernetes client V1Pod objects) if your job needs special volumes, init containers, or sidecars. (airflow.apache.org)

KubernetesExecutor vs KubernetesPodOperator: when to use which

For mixed pipelines (some CPU-heavy ETL, some GPU-heavy training), a common approach is to let non-GPU tasks run under the KubernetesExecutor while using KubernetesPodOperator for the GPU-heavy steps so you can tailor the pod spec precisely.

Scheduling, utilization and fairness

GPUs are expensive, and naive scheduling often leads to fragmentation (many jobs each requiring a whole GPU but leaving unused memory/compute). Strategies to improve utilization include:

Kubernetes’ device plugin model and available sharing options have improved, but there remain limits and trade-offs — it’s not a plug-and-play for perfect efficiency. (kubernetes.io)

Images, drivers, and runtime expectations

GPU workloads need the right mix of driver, container runtime, and user image:

Observability and safety

Treat GPU pods like fragile soloists: watch their heat (temperature), memory use, and runtime. Export GPU metrics (DCGM, Prometheus) and tie alerts to runaway memory or low utilization. Use Pod security policies and node isolation to prevent noisy neighbors from affecting critical workloads. The GPU Operator and the DCGM exporter are useful components in this visibility stack. (github.com)

Example pitfalls (and how they sound in the wild)

These are common operational notes; the remedies are tooling (GPU operator, correct images), policy (affinity/taints), and orchestration choices (KPO vs executor).

Putting it together — the trade-offs

Using Airflow + Kubernetes for GPU workloads gives you control and repeatability, but it raises operational responsibilities:

Kubernetes and vendor tools (for example NVIDIA’s GPU Operator and MIG) have matured significantly and can be production-ready for many teams — but expect an ops phase: tuning scheduling, selecting the right sharing model, and instrumenting observability. (kubernetes.io)

Final note (analogy)

If DevOps is the rhythm section keeping systems steady, MLOps with GPU orchestration is the horn section that needs its solo moments: you want tight tempo and room for dynamic expression. Airflow + Kubernetes gives you the conductor’s stand and the stage lights — but you still have to tune the instruments (drivers, images, scheduling) and choose when to let a GPU take the spotlight.

References