Top 7 Kubernetes Best Practices Every DevOps Engineer Must Know in 2026

The Reality of Running Kubernetes in Production

There's a gap between learning Kubernetes and running it in production that nobody warns you about. The tutorials teach you how to deploy pods. The documentation explains the API objects. But what actually keeps production clusters stable — at 3am, during a traffic spike, three days before a major release — is a set of operational disciplines that only become obvious after you've been burned by their absence.

I've been running Kubernetes in production since 2018, across environments ranging from 5-node clusters to 500-node multi-region deployments. The seven practices in this guide aren't theoretical — each one is something I've either implemented from scratch or had to retrofit onto a cluster that was in trouble because it was missing.

One important note on scope: this guide focuses on operational and deployment practices, not security. Security hardening — RBAC, network policies, pod security standards, supply chain security — is a topic that deserves its own comprehensive treatment. The practices here are about reliability, scalability, and operational excellence.

Let's get into it.

Server infrastructure representing production Kubernetes clusters — Photo by panumas nikhomkhai on Pexels

Practice 1: GitOps — Every Change Through Version Control

The single highest-leverage practice for operational Kubernetes is GitOps: the principle that the desired state of your cluster is defined in a Git repository, and an automated operator continuously reconciles the actual state to match it. Manual kubectl apply commands against production should be a thing of the past.

The benefits compound. When every change goes through Git, you have a complete audit trail of who changed what and when. Rollbacks are git reverts. Disaster recovery is a git clone and a cluster bootstrap. New environments are branches or Kustomize overlays.

ArgoCD: Declarative GitOps for Kubernetes

ArgoCD is the most widely deployed GitOps operator for Kubernetes. It watches a Git repository and continuously syncs cluster state to match the manifests in the repo. When there's drift — someone manually edited a ConfigMap, a controller mutated a Deployment spec — ArgoCD flags it and (optionally) auto-remediates.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-api
  namespace: argocd
spec:
  project: production
  source:
    repoURL: https://github.com/myorg/k8s-manifests
    targetRevision: HEAD
    path: apps/payments-api/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: team-payments
  syncPolicy:
    automated:
      prune: true         # Delete resources removed from git
      selfHeal: true      # Auto-remediate configuration drift
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Key ArgoCD operational patterns:

Use Projects to scope which repos and clusters each ArgoCD Application can target
Enable automated sync with self-heal for production to prevent drift accumulation
Use Sync Waves and Sync Hooks to control deployment ordering when applications have dependencies
Configure health checks for custom CRDs so ArgoCD accurately reports application health

Flux: The GitOps Toolkit Alternative

Flux v2 takes a more modular approach — it's a toolkit of controllers (Source, Kustomize, Helm, Notification) that you compose. This gives you more flexibility but requires more assembly. Flux shines in multi-tenant environments and when you need fine-grained control over how different parts of your cluster are managed.

GitOps Tool Comparison

Feature	ArgoCD	Flux v2	Jenkins X
Architecture	Monolithic controller + UI	Modular toolkit	Opinionated CI/CD platform
UI/Dashboard	Rich, built-in	CLI-first, Weave GitOps UI addon	Built-in
Multi-cluster	Yes, single ArgoCD manages multiple	Yes, per-cluster or hub-spoke	Limited
Drift detection	Yes, with auto-heal option	Yes	Yes
Helm support	Native Helm app type	HelmRelease controller	Built-in (Jenkins pipeline)
CNCF graduated	Yes (2022)	Yes (2022)	No (CNCF sandbox)
Learning curve	Moderate	Moderate (more components)	Steep
Best for	Teams wanting visibility + control	Multi-tenant, modular setups	Jenkins-native teams

My recommendation for most teams: start with ArgoCD. The UI alone pays for the learning curve — being able to see every Kubernetes resource's sync status, drift state, and health in a single dashboard prevents hours of debugging. Migrate to Flux if you need the additional modularity at scale.

Engineer working on code and deployment pipelines — Photo by ThisIsEngineering on Pexels

Practice 2: Namespace + RBAC Hierarchy Design

Kubernetes namespaces are a weak isolation boundary — they're not security zones — but they are an essential organizational boundary. Designing your namespace hierarchy correctly from the start prevents a sprawl problem that's painful to fix after the fact.

The principle I follow is: one namespace per team per environment, with a consistent naming convention. For a company with teams named payments, inventory, and notifications running production, staging, and dev environments:

team-payments-prod
team-payments-staging
team-payments-dev
team-inventory-prod
team-inventory-staging
team-inventory-dev
team-notifications-prod
team-notifications-staging
team-notifications-dev
infra-monitoring     # Platform team namespaces
infra-logging
infra-ingress

The prefix convention (team-, infra-) lets you apply ClusterRoleBindings by prefix pattern, which simplifies access control significantly.

RBAC Hierarchy

A practical three-tier RBAC model for engineering organizations:

Platform Engineer: ClusterAdmin scoped to platform namespaces. Can modify cluster-level resources (CRDs, ClusterRoles, StorageClasses). Cannot modify team namespaces directly.

Team Lead / Senior Engineer: Admin within their team's namespaces. Can create and modify Deployments, Services, ConfigMaps, Secrets. Cannot create ResourceQuotas (platform team owns those).

Developer: Edit within their team's namespaces. Can read most resources, create and update Deployments, view logs. Cannot delete namespaced resources or access Secrets directly.

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-payments-developers
  namespace: team-payments-prod
subjects:
- kind: Group
  name: "payments-engineers"  # Maps to SSO group
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: edit
  apiGroup: rbac.authorization.k8s.io

Operational insight: Bind RBAC to SSO groups, not individual users. When someone joins or leaves a team, you update the SSO group — not dozens of RoleBindings across all namespaces. This also enables automated deprovisioning when employees leave the company.

Practice 3: Liveness, Readiness, and Startup Probes Done Right

Probes are one of the most commonly misconfigured Kubernetes features. Incorrectly configured probes cause cascading pod restarts, traffic sent to unhealthy pods, and startup failures that look like application bugs. Getting them right is not optional for production workloads.

The three probe types serve distinct purposes:

Startup Probe: Delays liveness and readiness checks until the application has finished initializing. Without this, a slow-starting application will be killed by liveness checks before it finishes loading. Set the timeout high enough to cover your worst-case cold start time.

Readiness Probe: Controls when traffic is sent to the pod. When readiness fails, the pod is removed from Service endpoints — traffic stops. Use this to signal "I'm temporarily unable to handle requests" (cache warming, connection pool saturation, dependent service unavailable).

Liveness Probe: Controls when the pod is restarted. When liveness fails, Kubernetes kills and restarts the pod. This should only fail when the application is truly stuck in an unrecoverable state.

spec:
  containers:
  - name: payments-api
    startupProbe:
      httpGet:
        path: /health/startup
        port: 8080
      failureThreshold: 30    # 30 * 10s = 5 minutes for startup
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 3     # Unhealthy after 15s
      successThreshold: 1
    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 15
      failureThreshold: 5     # Only restart after 75s of failure
      timeoutSeconds: 5

Three common probe mistakes to avoid:

Mistake 1: Liveness and readiness pointing at the same endpoint. If your readiness probe fails (correct behavior when the pod is overloaded), and your liveness probe is the same endpoint, Kubernetes will restart pods that are merely overloaded. This turns a traffic problem into a crash loop.

Mistake 2: Readiness probes that check external dependencies. If your readiness probe fails when your database is down, and your database goes down, all your pods become unready simultaneously. You now have a cascading failure instead of degraded operation. Readiness should reflect "can I handle requests right now" — not "are all my dependencies healthy."

Mistake 3: No startup probe for slow-starting applications. JVM applications, Python applications with large model loading, and services that pre-warm caches all need startup probes. Without them, liveness probes trigger during initialization and create a restart loop that prevents the pod from ever becoming ready.

Practice 4: Resource Requests and Limits — A Scientific Approach

Setting resource requests and limits is not guesswork. There's a repeatable, data-driven process that produces correct values without the cargo-culting that most teams default to.

The process has three steps: observe, analyze, and set with margin.

Step 1: Observe. Run VPA in recommendation mode for 14 days. Use Prometheus to capture p50, p95, and p99 CPU and memory usage. Don't rely on averages — peak usage is what matters for limits, and typical usage is what matters for requests.

# Prometheus queries for resource sizing analysis

# p95 CPU usage over 14 days (per container)
quantile_over_time(0.95,
  rate(container_cpu_usage_seconds_total{
    container="payments-api",
    namespace="team-payments-prod"
  }[5m])[14d:5m]
)

# p99 memory usage over 14 days
quantile_over_time(0.99,
  container_memory_working_set_bytes{
    container="payments-api",
    namespace="team-payments-prod"
  }[14d:5m]
)

Step 2: Analyze. Compare VPA recommendations against your Prometheus data. If they diverge significantly, investigate why — VPA may not have observed a traffic spike that Prometheus captured.

Step 3: Set with margin. Use this formula:

CPU request = p50 CPU usage × 1.2
CPU limit = p99 CPU usage × 1.5 (or omit CPU limits entirely — CPU throttling is often worse than allowing burst)
Memory request = p95 memory usage × 1.1
Memory limit = p99 memory usage × 1.25 (never omit memory limits)

The asymmetry between CPU and memory limits is intentional. CPU is compressible — a pod that exceeds its CPU limit gets throttled but keeps running. Memory is not compressible — a pod that exceeds its memory limit gets OOMKilled. The correct response to "we keep getting OOMKilled" is almost always to increase the memory limit, not to debug application memory leaks (though both should be investigated).

Practice 5: PodDisruptionBudget and Topology Spread Constraints

High availability in Kubernetes doesn't happen automatically just because you have multiple replicas. You need two additional mechanisms: PodDisruptionBudgets to protect against voluntary disruptions, and Topology Spread Constraints to ensure replicas are actually distributed across failure domains.

PodDisruptionBudgets

A PDB defines the maximum number of pods that can be simultaneously unavailable during voluntary disruptions (node drains, cluster upgrades, maintenance). Without PDBs, a node drain can take down all replicas of a service if they happen to land on the same node.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payments-api-pdb
  namespace: team-payments-prod
spec:
  minAvailable: 2  # At least 2 pods must remain available
  selector:
    matchLabels:
      app: payments-api
---
# Alternative: percentage-based
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: notifications-api-pdb
spec:
  maxUnavailable: "25%"  # At most 25% can be down at once
  selector:
    matchLabels:
      app: notifications-api

For critical tier-1 services, use minAvailable: 2 with at least 3 replicas. For less critical services, maxUnavailable: 25% with 4+ replicas gives the cluster maintenance flexibility without complete service disruption.

Topology Spread Constraints

PDBs protect against disruption but don't control where pods are scheduled. Topology Spread Constraints ensure your replicas are distributed across zones, nodes, or other topology keys:

spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: payments-api
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: payments-api

The first constraint (zone) uses DoNotSchedule — if we can't spread across zones, don't schedule at all. The second constraint (hostname) uses ScheduleAnyway — prefer to spread across nodes, but don't block scheduling if you can't. This combination gives you zone-level HA with best-effort node distribution.

Engineers collaborating on software architecture design — Photo by Christina Morillo on Pexels

Practice 6: HPA + KEDA — Layered Autoscaling

Horizontal autoscaling in 2026 means layering HPA and KEDA. HPA handles CPU/memory-based scaling for synchronous request-driven workloads. KEDA handles event-driven scaling — queue depth, Kafka consumer lag, cron schedules, custom metrics — and enables scale-to-zero.

The key insight for production autoscaling is that scale-up speed and scale-down conservatism are not symmetric. You want to scale up fast (respond quickly to traffic spikes) and scale down slowly (avoid thrashing and premature decommissioning that causes latency spikes as new pods warm up).

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Pods
        value: 4
        periodSeconds: 15
      - type: Percent
        value: 100
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60

For event-driven workloads, layer KEDA on top:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: notification-worker-scaler
spec:
  scaleTargetRef:
    name: notification-worker
  minReplicaCount: 0
  maxReplicaCount: 30
  pollingInterval: 10
  cooldownPeriod: 120
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 180
  triggers:
  - type: rabbitmq
    metadata:
      host: amqp://rabbitmq.infra-messaging:5672
      queueName: notifications
      queueLength: "10"
  - type: cron
    metadata:
      timezone: America/New_York
      start: "0 8 * * 1-5"   # Business hours baseline
      end: "0 20 * * 1-5"
      desiredReplicas: "3"

The cron trigger in the KEDA configuration pre-warms the service to 3 replicas before business hours start, then KEDA scales dynamically on top of that baseline. Without the cron trigger, the first wave of morning traffic hits a cold-started single pod and experiences elevated latency while the autoscaler catches up.

Autoscaling anti-pattern to avoid: Setting minReplicas: 1 for production services. A single replica means any pod restart, eviction, or rolling update causes a brief period of zero replicas — a complete service outage. Production tier-1 services should have minReplicas: 2 at minimum, with PodDisruptionBudgets to match.

Practice 7: Observability Stack — Prometheus, Grafana, and OpenTelemetry

You cannot debug what you cannot observe. A production Kubernetes cluster without comprehensive metrics, logs, and traces is flying blind. The good news is that the CNCF observability stack has matured to the point where you can stand up a production-grade observability platform in a day.

The Core Stack

Prometheus + Alertmanager: Metrics collection, storage, and alerting. Use the kube-prometheus-stack Helm chart — it deploys Prometheus, Alertmanager, Grafana, and all the necessary ServiceMonitors and recording rules in one chart.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  --namespace infra-monitoring \
  --create-namespace \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \
  --set alertmanager.config.global.slack_api_url=$SLACK_WEBHOOK_URL

Grafana: Dashboards and visualization. The kube-prometheus-stack includes pre-built dashboards for cluster health, namespace resource usage, pod lifecycle, and network metrics. Add custom dashboards for your application-specific metrics.

OpenTelemetry Collector: The emerging standard for trace, metric, and log collection. Deploy the OpenTelemetry Operator and DaemonSet collector to collect traces from all pods automatically without code changes (via auto-instrumentation for supported runtimes).

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: auto-instrumentation
  namespace: team-payments-prod
spec:
  exporter:
    endpoint: http://otel-collector.infra-monitoring:4317
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"  # 10% sampling rate in production
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest

The Four Golden Signals as Kubernetes Alerts

Google's SRE book defines four golden signals: latency, traffic, errors, and saturation. Every production service should have alerts on all four:

groups:
- name: golden-signals
  rules:
  - alert: HighErrorRate
    expr: |
      sum(rate(http_requests_total{status=~"5.."}[5m]))
      /
      sum(rate(http_requests_total[5m])) > 0.01
    for: 5m
    annotations:
      summary: "Error rate above 1% for 5 minutes"

  - alert: HighLatency
    expr: |
      histogram_quantile(0.99,
        sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
      ) > 1.0
    for: 5m
    annotations:
      summary: "p99 latency above 1s"

  - alert: PodCPUSaturation
    expr: |
      rate(container_cpu_usage_seconds_total[5m])
      /
      kube_pod_container_resource_limits{resource="cpu"} > 0.9
    for: 10m
    annotations:
      summary: "Pod CPU usage above 90% of limit"

Monitoring dashboard with metrics and graphs — Photo by Pixabay on Pexels

Deployment Strategies: Rolling Updates, Canary, and Blue-Green

Kubernetes provides rolling updates natively, but production deployments often require more sophisticated strategies. Understanding when to use each approach is a core DevOps competency.

Rolling Update (default): Kubernetes gradually replaces old pods with new ones. Zero-downtime for stateless services. The risk: if the new version has a bug, it's deployed to 100% of traffic before you might notice. Good for low-risk, frequently-deployed services where fast rollback is sufficient.

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 2         # Max pods above desired count during update
    maxUnavailable: 0   # Never reduce below desired count (true zero-downtime)

Canary Deployment: Route a small percentage of traffic (5-10%) to the new version and monitor metrics before expanding. Requires either Istio/Linkerd for traffic splitting or Argo Rollouts for more sophisticated control. Ideal for high-risk changes where you want production validation before full rollout.

Blue-Green Deployment: Maintain two complete deployment sets (blue=current, green=new). Switch traffic at the load balancer level. Provides instant rollback — just switch back to blue. The cost: requires 2x the compute while both environments are running. Suitable for major version changes where rollback speed is critical.

Argo Rollouts provides a Kubernetes-native implementation of both canary and blue-green with automated analysis (checking Prometheus metrics to decide whether to proceed or rollback):

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payments-api
spec:
  replicas: 10
  strategy:
    canary:
      steps:
      - setWeight: 10
      - pause: {duration: 5m}
      - analysis:
          templates:
          - templateName: success-rate
      - setWeight: 50
      - pause: {duration: 5m}
      - setWeight: 100
      canaryMetadata:
        labels:
          deployment: canary
      stableMetadata:
        labels:
          deployment: stable

Helm Chart Best Practices for Production

Helm remains the dominant Kubernetes package manager, and the quality of your Helm charts directly impacts your deployment reliability. These practices separate production-grade charts from the ones that work in demos but break in the real world.

Always define resource requests and limits in values.yaml. Don't hardcode them in templates. This makes it easy to override per-environment without modifying the chart.

Use helm.sh/chart and app.kubernetes.io standard labels. This enables tooling like Helm Diff, Helm Secrets, and various GitOps operators to work correctly with your charts.

Parameterize the image tag, never hardcode latest. latest in production is a production incident waiting to happen — you can't roll back to a specific version, and any image push changes what's deployed on the next pod restart.

Include a NOTES.txt with actionable post-install information. This template is displayed after helm install and should tell the operator exactly how to verify the deployment succeeded and how to access the service.

Use Helm Secrets (or Sealed Secrets / External Secrets) for sensitive values. Never store plaintext secrets in your Helm values files, even in private repositories. Git history is forever.

Multi-Cluster Management Strategy

Most organizations outgrow a single Kubernetes cluster. Regulatory requirements, geographic distribution, blast radius isolation, or simply scaling constraints push you toward multiple clusters. The key is establishing a management layer that treats the cluster fleet as a coherent system rather than a collection of independent clusters.

The primary patterns for multi-cluster management in 2026:

ArgoCD Hub-Spoke: A central ArgoCD instance manages multiple spoke clusters. Application definitions live in a central Git repository. Each spoke cluster has an ArgoCD agent. This pattern works well for 2-20 clusters with relatively homogeneous workloads.

Cluster API + Fleet Management: Use Cluster API to provision clusters declaratively, then Rancher Fleet or ArgoCD ApplicationSets to deploy workloads across them. Better for large fleets (50+ clusters) or environments where cluster lifecycle management is as important as workload management.

Crossplane: For teams that want to manage both Kubernetes clusters and cloud infrastructure (databases, message queues, DNS) through the Kubernetes API, Crossplane provides a unified control plane. The learning curve is steep but the operational model is elegant at scale.

DevOps team reviewing deployment processes — Photo by Ron Lach on Pexels

Putting It All Together: The Production Kubernetes Maturity Model

These seven practices don't need to be implemented simultaneously. Here's a phased approach that builds operational maturity progressively without overwhelming your team:

Phase 1 — Foundation (Month 1-2): GitOps with ArgoCD, Namespace + RBAC hierarchy, Probes on all workloads. These are the non-negotiable basics. Without GitOps, every other practice is harder to implement consistently.

Phase 2 — Reliability (Month 3-4): Data-driven resource requests/limits, PodDisruptionBudgets, Topology Spread Constraints. These practices prevent the most common categories of production incidents.

Phase 3 — Scalability (Month 5-6): HPA + KEDA autoscaling, Observability stack. By this point, you have the baseline stability to safely add dynamic scaling without the autoscaler masking underlying reliability problems.

Phase 4 — Maturity (Ongoing): Advanced deployment strategies (canary/blue-green), multi-cluster management, continuous reliability improvements guided by observability data.

Key Takeaways

GitOps is the foundation everything else builds on — before adding more tooling, get every change going through ArgoCD or Flux. The audit trail, drift detection, and rollback capabilities pay dividends from day one.
Probe misconfiguration causes more production incidents than application bugs — audit every production deployment for liveness/readiness/startup probe correctness. Specifically, verify that liveness and readiness probe endpoints are different and that readiness probes don't check external dependencies.
Resource requests drive scheduler decisions; limits drive throttling and OOM behavior — use VPA + 14 days of Prometheus data to set them scientifically. The 2-minute exercise of collecting p95/p99 data pays for itself the first time it prevents an OOMKill incident.
PodDisruptionBudgets + Topology Spread = actual high availability — multiple replicas without PDBs and spread constraints is a false sense of security. A single node drain can take down all replicas that happened to land on the same node.
Scale up fast, scale down slow — the asymmetric scaling behavior in HPA is not an accident. Premature scale-down causes latency spikes as pods warm up; the cost of keeping a few extra pods running is almost always less than the cost of customer-facing latency degradation.
KEDA's scale-to-zero eliminates idle compute waste for event-driven workloads — any batch processor, queue consumer, or scheduled job that doesn't need to run continuously should be configured with KEDA and scale-to-zero. The operational simplicity is worth more than the cost savings.
The four golden signals (latency, traffic, errors, saturation) should be alerted on for every production service — if you can only instrument one thing in your observability stack, instrument these four metrics per service with actionable alerts. Everything else is refinement.

Want to automate your Kubernetes operations runbook? — See what I built

The Practical CTO

이 블로그 검색