A GitOps Blueprint to Unite DevOps and MLOps for LLM and ML Services

If your software delivery rhythm feels like two bands playing different tempos—one for app code (DevOps) and one for models (MLOps)—you’re not alone. The good news: a handful of recent building blocks make it practical to conduct both with one baton. In the last year, KServe added OpenAI‑compatible endpoints for LLMs, GitHub Actions gained a ready‑to‑use step to gate AI quality in CI, and model registries like MLflow matured aliasing and promotion flows. Put together, you can ship AI the same way you ship microservices: declarative, test‑gated, and versioned from Git. (kserve.github.io)

This article gives you a practical, vendor‑neutral blueprint to unify DevOps and MLOps using:

Why now? Because the interfaces and tooling finally line up. KServe speaks familiar OpenAI endpoints for LLM workloads, which means your application code—and even SDKs—can point at your own cluster the same way they would at a hosted LLM API. That removes a huge integration wrinkle. Meanwhile, GitOps tools like Argo CD handle the same “desired state in Git, reconciled to clusters” pattern you already use for microservices. (kserve.github.io)

The target architecture at a glance

Under the hood, KServe also supports the Open Inference Protocol (V2) for classic predictive models and can front different runtimes (Hugging Face, vLLM, Triton, etc.). If you need many models per cluster with high cache‑efficiency, ModelMesh adds the “router + on‑demand loading” layer. (github.com)

Step‑by‑step blueprint

1) Treat model quality like code quality (gates in CI)

Stop merging prompt or model changes without testing their behavior. Evidently’s new GitHub Action runs evaluation suites on every PR or commit and fails the build if your thresholds are missed. It wraps the Evidently CLI and can store artifacts locally or in Evidently Cloud for trending. Think of it as unit tests for model behavior, from classification metrics to “LLM‑as‑a‑judge” checks. (github.com)

Example GitHub workflow snippet to run a drift or LLM quality suite:

name: ci-ai-quality
on:
  pull_request:
  push:
    branches: [main]

jobs:
  eval:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      statuses: write
    steps:
      - uses: actions/checkout@v4

      # Run Evidently report/tests; fail CI if tests fail
      - name: Run AI quality checks
        uses: evidentlyai/evidently-report-action@v1
        with:
          config_path: "evidently_config.json"
          input_path: "data/current.csv"
          reference_path: "data/reference.csv"
          output: "reports/run-$"
          test_summary: "true"
          upload_artifacts: "true"

This one small gate builds the habit: no PR merges unless the model’s measured quality is at least as good as yesterday. (github.com)

2) Register models and promote by alias, not by guesswork

When tests pass, register the model and use MLflow aliases like candidate and champion. Your serving layer can point at an alias and you “promote” by moving the pointer—no YAML rewrites needed if your serving runtime loads by model URI. Aliases are ideal for gated promotions and fast rollbacks. (mlflow.org)

A tiny Python sketch:

import mlflow
from mlflow import MlflowClient

mlflow.set_tracking_uri("http://mlflow.yourdomain")
client = MlflowClient()

name = "demand-forecaster"
registered = mlflow.register_model(
    model_uri="runs:/<run_id>/model",
    name=name,
)
client.set_registered_model_alias(name, "candidate", registered.version)

If staging goes well, repoint the champion alias to the new version and your production traffic follows. (mlflow.org)

3) Deliver like you do everything else: GitOps with Argo CD

Keep your KServe InferenceService manifests under version control. Argo CD watches that repo and keeps clusters reconciled. Rollbacks become “git revert,” drift is visible, and promotion is a PR and a human approval, not a kubectl command at midnight. (argo-cd.readthedocs.io)

Minimal Argo CD Application:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: llm-qa
spec:
  project: default
  source:
    repoURL: https://github.com/yourorg/ai-manifests
    targetRevision: main
    path: kserve/llm-qa
  destination:
    server: https://kubernetes.default.svc
    namespace: ml-inference
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

4) Serve via a standard API: KServe with OpenAI‑compatible endpoints

Here’s a bare‑bones KServe service for an LLM using the Hugging Face runtime; it exposes familiar OpenAI endpoints like /v1/chat/completions:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: huggingface-llama3
  namespace: ml-inference
spec:
  predictor:
    model:
      modelFormat:
        name: huggingface
      args:
        - --model_name=llama3
        - --model_id=meta-llama/meta-llama-3-8b-instruct
      resources:
        limits:
          cpu: "6"
          memory: "24Gi"
          nvidia.com/gpu: "1"
        requests:
          cpu: "6"
          memory: "24Gi"
          nvidia.com/gpu: "1"

Once deployed, your app can use the normal OpenAI SDK—just change base_url:

from openai import OpenAI
client = OpenAI(base_url=f"http://{SERVICE_HOSTNAME}/openai/v1", api_key="empty")
resp = client.chat.completions.create(
  model="llama3",
  messages=[{"role":"user","content":"Quick sanity check: 2+2?"}],
)
print(resp.choices[0].message.content)

KServe’s data plane supports these OpenAI‑style endpoints alongside V1/V2 predictive protocols, so the same platform can serve both classic ML and LLMs. (kserve.github.io)

Tip: For high‑throughput LLM serving, consider the vLLM backend; KServe supports that path too. (kserve.github.io)

5) Optional: managed CT with Vertex AI Pipelines + Cloud Build

Prefer not to host your own training orchestrator? Vertex AI Pipelines provide multi‑step training workflows (preprocess → train → eval → deploy) and can be triggered by Cloud Build for CI/CD and continuous training. You can still log to MLflow and feed the same GitOps loop on the serving side. (cloud.google.com)

Putting it together: a practical workflow

Progressive delivery and scale

What about foundation models and vendor runtimes?

If you’re packaging vendor‑optimized inference services, treat them the same way: Helm charts in Git, Argo CD for reconciliation. NVIDIA’s NIM microservices ship official Helm charts and an Operator for Kubernetes. You can manage those charts declaratively with Argo CD, and in some scenarios even front them through KServe for a uniform API surface. (docs.nvidia.com)

Guardrails: the boring but important bits

A minimal starter repo layout

With a couple hundred lines of YAML and a small test dataset, you’ll have a pipeline where:

That’s Unified DevOps + MLOps in practice—not a buzzword, just the same disciplined software delivery you already use, applied to AI.

Further reading and docs used here:

With these pieces in place, your delivery soundtrack goes from chaotic jam session to a steady groove—one pipeline, one Git history, and clear gates from idea to inference.