Bringing GPUs into Airflow + Kubernetes ML Pipelines: practical patterns for efficient training and inference

Machine learning pipelines increasingly need more than CPU cycles — they need GPUs. If you’re running Airflow on Kubernetes, that’s good news: Kubernetes gives you the hardware and scheduling primitives, and Airflow gives you the orchestration. But combining them for efficient, secure, and cost-conscious ML workloads demands a few modern patterns: GPU-aware scheduling, node pools and taints, fractional GPU sharing (where supported), and observability. This article walks through recent developments and practical patterns you can adopt today to run GPU-backed training and inference as Airflow tasks on Kubernetes.

Why GPU scheduling matters for Airflow tasks

What’s changed recently (short summary)

Architecture and patterns you should consider Below are the practical patterns and configuration details I recommend when combining Airflow + Kubernetes for GPU ML workloads.

1) Pick how Airflow will launch GPU tasks: KubernetesExecutor vs KubernetesPodOperator

Example: a simple KubernetesPodOperator task that requests a GPU

from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
from datetime import datetime

with DAG("gpu_training_dag", start_date=datetime(2025, 1, 1), schedule_interval=None) as dag:
    train = KubernetesPodOperator(
        task_id="train_model",
        name="train-model",
        namespace="ml-jobs",
        image="myrepo/ml-train:latest",
        cmds=["python","train.py"],
        resources={
            "limits": {"nvidia.com/gpu": "1"},
            "requests": {"nvidia.com/gpu": "1", "cpu": "4", "memory": "16Gi"},
        },
        get_logs=True,
    )

This requests a whole GPU. If you have fractional GPU slices available (see next section), the resource requests will look different depending on your device plugin and scheduler.

2) Use dedicated GPU node pools + taints/tolerations + nodeSelectors

3) Consider GPU partitioning (MIG) and fractional GPU sharing

4) Autoscaling and cost-efficiency: cluster autoscalers and GPU-aware provisioning

5) Validate drivers and images at pod startup

6) Observability: measure per-job GPU utilization

Security and multi-tenant isolation

Operational checklist (quick)

When to avoid GPUs for Airflow tasks

Getting started (actionable next steps)

  1. Inventory your current workloads: identify which Airflow tasks truly need GPU acceleration (not just “might be faster”).
  2. Create a GPU node pool and taint it. Test a simple KubernetesPodOperator job that requests 1 GPU and verify logs and device availability.
  3. If you have H100/H700 or MIG-capable hardware, enable MIG on nodes and run a few small jobs to validate fractional allocation and observability.
  4. Add Prometheus + NVIDIA DCGM exporter to your cluster and start tracking GPU utilization against Airflow task runs.
  5. Iterate: if GPUs sit mostly idle, evaluate batching, MIG, or a GPU-sharing scheduler.

Conclusion Running ML training and inference as Airflow tasks on Kubernetes is now a practical and production-ready approach — but getting efficient, secure, and cost-effective GPU usage requires a few architectural decisions. Use node pools and taints to isolate GPUs, prefer KubernetesPodOperator or KubernetesExecutor depending on control needs, enable MIG or scheduler-based sharing where available, and instrument GPU usage through monitoring. Recent improvements in Airflow’s Kubernetes integration and the maturing GPU ecosystem make this a great moment to modernize your MLOps pipelines. Start small, measure utilization, and evolve toward fractional GPU sharing and autoscaled GPU pools as your needs grow. (airflow.apache.org)

Further reading and references

If you’d like, I can: