on
Bringing GPUs into Airflow + Kubernetes ML Pipelines: practical patterns for efficient training and inference
Machine learning pipelines increasingly need more than CPU cycles — they need GPUs. If you’re running Airflow on Kubernetes, that’s good news: Kubernetes gives you the hardware and scheduling primitives, and Airflow gives you the orchestration. But combining them for efficient, secure, and cost-conscious ML workloads demands a few modern patterns: GPU-aware scheduling, node pools and taints, fractional GPU sharing (where supported), and observability. This article walks through recent developments and practical patterns you can adopt today to run GPU-backed training and inference as Airflow tasks on Kubernetes.
Why GPU scheduling matters for Airflow tasks
- Kubernetes historically treats GPUs as integer resources: you request whole devices (nvidia.com/gpu: 1) and the runtime assigns a whole physical GPU to the pod. For many ML tasks — short training runs, small inference batches, or multiple notebooks — this all-or-nothing allocation wastes capacity and money. Cloud providers and docs point this out as a key operational concern. (cloud.google.com)
- The good news: there’s a fast-moving ecosystem around GPU partitioning (MIG), fractional sharing systems, and scheduler-level optimizers that reduce waste. Several clouds and open-source projects now offer ways to slice GPUs or schedule sharing-friendly workloads. (medium.com)
What’s changed recently (short summary)
- Airflow on Kubernetes keeps evolving: the KubernetesExecutor and KubernetesPodOperator remain the canonical ways to run tasks as K8s pods, and the community has been improving multi-namespace security and Helm chart ergonomics to make Kubernetes deployments safer and more flexible. If you deploy Airflow using the official Helm chart, keep it up to date. (airflow.apache.org)
- On the GPU side, work like NVIDIA MIG support, specialized GPU schedulers (e.g., KAI), and cloud vendor guidance for GPU best practices have matured, enabling multi-tenant GPU usage and better utilization patterns. These capabilities are what make running many small Airflow-driven training jobs feasible without buying an entire GPU per job. (medium.com)
Architecture and patterns you should consider Below are the practical patterns and configuration details I recommend when combining Airflow + Kubernetes for GPU ML workloads.
1) Pick how Airflow will launch GPU tasks: KubernetesExecutor vs KubernetesPodOperator
- KubernetesExecutor launches worker pods dynamically for tasks; it integrates tightly with the scheduler and is useful when you want Airflow itself to scale task pods automatically. KubernetesExecutor is still a solid choice for many deployments. (airflow.apache.org)
- KubernetesPodOperator gives you explicit control inside a DAG: you describe the pod spec (image, resources, volumes) and Airflow creates the pod. Use it when tasks require custom containers, GPUs, or specific node selectors.
- Recommendation: use KubernetesExecutor for broad scalability; use KubernetesPodOperator for specialized GPU workloads where you need exact pod specs.
Example: a simple KubernetesPodOperator task that requests a GPU
from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
from datetime import datetime
with DAG("gpu_training_dag", start_date=datetime(2025, 1, 1), schedule_interval=None) as dag:
train = KubernetesPodOperator(
task_id="train_model",
name="train-model",
namespace="ml-jobs",
image="myrepo/ml-train:latest",
cmds=["python","train.py"],
resources={
"limits": {"nvidia.com/gpu": "1"},
"requests": {"nvidia.com/gpu": "1", "cpu": "4", "memory": "16Gi"},
},
get_logs=True,
)
This requests a whole GPU. If you have fractional GPU slices available (see next section), the resource requests will look different depending on your device plugin and scheduler.
2) Use dedicated GPU node pools + taints/tolerations + nodeSelectors
- Put GPU hardware into its own node pool (or instance group) and taint those nodes (e.g., key=gpu:NoSchedule). Then let GPU pods use tolerations and nodeSelectors to land on GPU nodes. This isolates GPU capacity from normal workloads and simplifies autoscaling. Cloud provider best practices (AKS, GKE) explicitly recommend this. (learn.microsoft.com)
- Example pod spec fields: nodeSelector: {“cloud.google.com/gke-nodepool”:”gpu-pool”} and tolerations for the gpu taint.
3) Consider GPU partitioning (MIG) and fractional GPU sharing
- If your cluster uses hardware that supports MIG (e.g., NVIDIA H100/H700), you can split a physical GPU into multiple isolated instances. That allows multiple pods to use a single card safely and reduces waste for small jobs. Cloud providers and device plugins are increasingly exposing MIG instances to Kubernetes scheduling. (medium.com)
- If MIG isn’t available, investigate scheduler-side solutions (e.g., KAI, other community projects) that implement GPU-sharing semantics at the scheduling level. These can increase utilization but add operational complexity. (medium.com)
4) Autoscaling and cost-efficiency: cluster autoscalers and GPU-aware provisioning
- Use a cluster autoscaler or instance/ node-pool autoscaling that respects GPU node pools. For bursty training workloads launched as Airflow tasks, autoscaling prevents idle GPU waste while keeping queues short.
- Newer autoscalers and schedulers (including projects like Karpenter on AWS or cloud autoscaling features) can provision GPU nodes on demand; pair them with Airflow’s concurrency limits and queueing to avoid thundering herd starts.
5) Validate drivers and images at pod startup
- GPU pods require correct CUDA drivers and device plugin support. Use init containers to validate /dev/nvidia* presence or run a small startup check to fail fast if the node isn’t properly configured. Cloud docs call this out as part of GPU best practices. (learn.microsoft.com)
6) Observability: measure per-job GPU utilization
- Track GPU metrics with NVIDIA DCGM exporter + Prometheus and correlate them to Airflow tasks. Monitoring lets you see whether jobs underutilize GPUs and whether you should migrate to MIG/ fractional approaches.
- Airflow metrics (task duration, queued time) combined with GPU metrics lets you make data-driven decisions about node sizes, the number of GPUs, and autoscaling policies.
Security and multi-tenant isolation
- Use Kubernetes namespaces, RBAC, and network policies to isolate ML teams. Airflow deployments running in multi-tenant environments should make use of features that limit scheduler privileges — the Airflow community has been improving better multi-namespace support to remove the need for cluster-wide roles in some setups. If you run Airflow in a cluster with many teams, review recent improvements and the Helm chart options for namespace isolation. (airflowsummit.org)
Operational checklist (quick)
- Use a separate node pool for GPU nodes and taint them.
- Choose KubernetesPodOperator for GPU tasks that require custom images or KubernetesExecutor for wide scalability.
- If your hardware supports MIG, enable it and configure the device plugin so pods can request fractional GPU instances.
- Add init checks in pods to validate drivers and device availability.
- Configure autoscaling for GPU node pools and set conservative batch sizes/concurrency in Airflow DAGs.
- Collect GPU metrics (DCGM) and correlate with Airflow task metrics before changing infrastructure.
When to avoid GPUs for Airflow tasks
- Don’t run tiny CPU-only steps (data ETL, lightweight preprocessing) on GPU nodes — this wastes expensive capacity.
- If jobs are short and frequent but low-CPU, consider batching them or using CPU nodes; GPUs are best for compute-heavy training or large-batch inference.
Getting started (actionable next steps)
- Inventory your current workloads: identify which Airflow tasks truly need GPU acceleration (not just “might be faster”).
- Create a GPU node pool and taint it. Test a simple KubernetesPodOperator job that requests 1 GPU and verify logs and device availability.
- If you have H100/H700 or MIG-capable hardware, enable MIG on nodes and run a few small jobs to validate fractional allocation and observability.
- Add Prometheus + NVIDIA DCGM exporter to your cluster and start tracking GPU utilization against Airflow task runs.
- Iterate: if GPUs sit mostly idle, evaluate batching, MIG, or a GPU-sharing scheduler.
Conclusion Running ML training and inference as Airflow tasks on Kubernetes is now a practical and production-ready approach — but getting efficient, secure, and cost-effective GPU usage requires a few architectural decisions. Use node pools and taints to isolate GPUs, prefer KubernetesPodOperator or KubernetesExecutor depending on control needs, enable MIG or scheduler-based sharing where available, and instrument GPU usage through monitoring. Recent improvements in Airflow’s Kubernetes integration and the maturing GPU ecosystem make this a great moment to modernize your MLOps pipelines. Start small, measure utilization, and evolve toward fractional GPU sharing and autoscaled GPU pools as your needs grow. (airflow.apache.org)
Further reading and references
- Airflow KubernetesExecutor and providers docs (Kubernetes integration). (airflow.apache.org)
- Airflow Helm chart and announcements. (airflow.apache.org)
- Azure AKS GPU best practices. (learn.microsoft.com)
- GKE guidance on GPU sharing strategies (MIG and timesharing). (cloud.google.com)
- Articles and community posts on GPU sharing and MIG adoption. (medium.com)
If you’d like, I can:
- Draft a sample Airflow DAG with GPU tasks for your environment (GKE, AKS, or EKS), or
- Produce a checklist/Helm values snippet to deploy Airflow with secure multi-namespace settings and GPU node pools. Which would help you more right now?