I have spent the better part of the last four years helping teams secure Kubernetes clusters — and if there is one consistent observation I can offer, it is this: most Kubernetes security problems are not caused by sophisticated attackers exploiting unknown vulnerabilities. They are caused by well-intentioned engineers who did not fully understand the security implications of the configuration decisions they made while trying to get something running quickly.
A pod running as root with a hostPath volume mount and a wildcard RBAC ClusterRoleBinding did not get that way through malice. It got that way because someone needed to debug something in production at 11pm, and the easiest path to "it works" involved removing constraints. The constraint never came back.
This article covers the full Kubernetes security landscape in 2026 — from architectural threat modeling through specific tooling choices — with the goal of helping teams understand not just what to configure, but why it matters and what the failure modes look like when it is not configured.
1. Why Kubernetes Security Is Hard: The Attack Surface Problem
Kubernetes is a distributed system with an unusually large and complex attack surface. Understanding the dimensions of that surface is the prerequisite for prioritizing which controls matter most.
The Control Plane
The Kubernetes API server is the single point of truth for all cluster state. An attacker who can make authenticated API calls with sufficient permissions can do almost anything: create privileged pods, extract secrets, modify RBAC bindings, escalate to cluster-admin, or exfiltrate data from any namespace. The API server is accessible over the network — often over the public internet in managed cloud Kubernetes services — which makes authentication, authorization, and network exposure the first security concerns.
The Node Attack Surface
Each Kubernetes node runs a kubelet process that accepts commands from the API server and executes workloads. A compromised node gives an attacker access to all workloads running on that node, the node's filesystem and network interfaces, credentials used by pods running on the node, and potentially the kubelet API itself. Node hardening — OS configuration, kubelet configuration, node network policies — is a distinct security concern from workload security.
The Workload Attack Surface
Each container in a Kubernetes cluster is a potential entry point. A vulnerability in an application running in a container can give an attacker a foothold inside the cluster network. From that foothold, the attacker can attempt lateral movement (reaching other services in the cluster network), credential theft (from environment variables, mounted secrets, or cloud metadata APIs), and privilege escalation (from container to node, from node to cluster admin).
The Supply Chain
Container images are built from base images and application dependencies, each of which may contain vulnerabilities. The build pipeline itself — the CI system that builds and pushes images — is a high-value target: compromising the build pipeline can introduce backdoors into images that pass standard vulnerability scans.
Attack surface summary: Control plane (API server, etcd, controller manager), data plane (nodes, kubelets), workload layer (container vulnerabilities, insecure configurations), network layer (inter-service communication, external exposure), supply chain (images, build pipelines), and identity layer (service accounts, RBAC, cloud IAM). Each layer requires distinct controls.
2. CIS Kubernetes Benchmark: The Compliance Foundation
The Center for Internet Security (CIS) Kubernetes Benchmark is the de facto standard configuration baseline for Kubernetes clusters. The current version covers the API server, controller manager, scheduler, etcd, and kubelet configuration, as well as worker node OS-level settings. Each control maps to a specific configuration parameter and explains the security rationale.
In my experience, running a CIS benchmark assessment against a cluster that has not been explicitly hardened typically reveals pass rates between 40% and 65% for managed cloud Kubernetes services (EKS, GKE, AKS), and lower for self-managed clusters. The failing controls are rarely exotic — they are consistently basic settings like:
- API server anonymous authentication not disabled
- kubelet read-only port exposed
- etcd not requiring client certificate authentication
- Audit logging not configured or storing insufficient detail
- Node authorization mode not set to Webhook
The benchmark provides a scored subset (Level 1, meant to be achievable without significant operational impact) and an unscored subset (Level 2, higher security at higher operational cost). For most production clusters, Level 1 compliance should be a baseline requirement, not an aspirational goal.
Tooling for CIS Assessment
kube-bench, from Aqua Security, is the standard open-source tool for automated CIS benchmark assessment. It runs as a Job in the cluster and produces a report against the CIS benchmark for the detected Kubernetes version. Most managed cloud providers also offer native compliance dashboards — AWS Security Hub has Kubernetes CIS benchmark controls, GKE has Security Posture dashboards, and Azure Defender for Containers covers AKS.
3. RBAC: Implementing Least Privilege That Actually Holds
Kubernetes RBAC is powerful but its default configuration is permissive in ways that surprise engineers who are not thinking about it adversarially. Understanding the default bindings is the starting point for any RBAC security review.
The Default Service Account Problem
Every pod that does not specify a service account is automatically assigned the default service account for its namespace. By default, the default service account's token is automatically mounted into every pod. This means that every pod in a cluster — including pods running third-party workloads, batch jobs, and sidecars — has an API credential that can be used to query the Kubernetes API.
In older Kubernetes clusters (pre-1.24), the default service account token was a long-lived, non-expiring credential. In Kubernetes 1.24+, auto-mounted tokens are projected tokens with bounded lifetime. Both configurations still expose an API credential to every pod unless explicitly disabled.
The correct configuration is: set automountServiceAccountToken: false at the namespace level as the default, and explicitly enable it only for service accounts that genuinely need API access. This single change removes a credential from the vast majority of pods in most clusters.
RBAC Least Privilege in Practice
The gap between documented least privilege principles and actual RBAC configurations in most clusters I have assessed is significant. The common failure modes:
Wildcard permissions: verbs: ["*"] or resources: ["*"] in Role or ClusterRole definitions. Usually added during initial setup "to get things working" and never narrowed.
ClusterRole where Role would suffice: A service account that needs to read ConfigMaps in its own namespace does not need a ClusterRole that grants that permission across all namespaces. Namespace-scoped Roles are always preferable to ClusterRoles when the access is logically confined to one namespace.
Bind to service account instead of to a specific pod: A service account used by multiple pods with different privilege requirements shares the permission of the most permissive pod. Using distinct service accounts per workload type is more maintainable.
cluster-admin bindings outside the bootstrap: cluster-admin is the Kubernetes equivalent of root. Auditing all ClusterRoleBindings to cluster-admin and removing non-essential ones is one of the highest-impact RBAC hardening steps available.
RBAC Auditing Tools
kubectl-who-can (from Aqua) and rbac-lookup (from FairwindsOps) are the tools I use most frequently for RBAC auditing. rbac-audit provides a more comprehensive view of what permissions are granted across the cluster. For continuous monitoring, Polaris (Fairwinds) includes RBAC checks in its configuration validation suite.
4. Pod Security Standards: The Replacement for PodSecurityPolicy
PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement, Pod Security Standards, offers three built-in security profiles with different restriction levels:
- Privileged: No restrictions. Appropriate for trusted system-level workloads only.
- Baseline: Prevents known privilege escalation vectors while allowing most legitimate workloads. Blocks privileged containers, host namespace sharing, and certain volume types.
- Restricted: Enforces the current hardening best practices. Requires non-root user, drops all capabilities, requires read-only root filesystem, restricts volume types.
Pod Security Standards are enforced at the namespace level via labels: pod-security.kubernetes.io/enforce: restricted. The enforcement modes are: enforce (reject non-compliant pods), audit (allow but log), and warn (allow but return warning to kubectl).
My recommended starting approach for existing clusters: apply warn and audit at the baseline level to all namespaces. Run for two weeks to identify non-compliant workloads. Fix violations. Move to enforce at baseline. Identify namespaces suitable for restricted and migrate those. This graduated approach prevents the mass-enforcement-breaks-everything scenario that caused PodSecurityPolicy adoption to stall in many organizations.
Critical workload security requirements that baseline enforcement catches:
— Privileged containers (direct root access to the node)
— hostPath volume mounts (arbitrary host filesystem access)
— hostNetwork: true (bypasses CNI network policies)
— hostPID: true (can see and signal all processes on the node)
— CAP_SYS_ADMIN and other dangerous capabilities
5. Network Policies: Securing East-West Traffic
By default, all pods in a Kubernetes cluster can communicate with all other pods across all namespaces. This is the most permissive possible network configuration and should be the starting point for a network hardening effort, not the final state.
Network Policies are Kubernetes objects that define allowed ingress and egress traffic for pods based on pod labels, namespace selectors, and IP blocks. They are enforced by the CNI plugin — not all CNI plugins support Network Policies, so verifying CNI support is a prerequisite. Calico, Cilium, and Weave Net all support Network Policies; Flannel without an additional policy engine does not.
Default Deny: The Right Starting Point
The recommended starting configuration is a default-deny policy applied to every namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: your-namespace
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
With this policy in place, no pod can receive or send traffic unless a permissive NetworkPolicy explicitly allows it. Then, add specific allow policies for the traffic flows that are actually required: frontend-to-backend, backend-to-database, pods-to-DNS (kube-dns), and monitoring agent egress.
Cilium Network Policies: Beyond Basic L4
The standard Kubernetes NetworkPolicy API operates at L3/L4 (IP and port). Cilium extends this with L7 policies that can enforce HTTP methods, DNS names, and gRPC service methods. In 2026, Cilium has become the dominant CNI for security-focused Kubernetes deployments, and its eBPF-based enforcement provides performance advantages over iptables-based implementations at scale.
Cilium's network visibility feature — which provides per-connection flow logs that show allowed and denied traffic — is particularly valuable during policy development. Being able to see exactly which connections are being made helps teams write precise Network Policies rather than overly broad ones.
6. Secrets Management: Kubernetes Secrets Are Not Enough
Kubernetes Secrets are base64-encoded, not encrypted, by default. Without additional configuration, they are stored in etcd in base64 form and accessible to anyone with the ability to read Secrets in the given namespace or to read etcd directly. This is a frequently misunderstood security property that has led to real credential exposures.
Enabling Encryption at Rest
The first step is enabling encryption at rest for the Secrets resource in the kube-apiserver configuration using the EncryptionConfiguration API. This encrypts Secrets in etcd using AES-CBC or AES-GCM. The encryption key itself must be protected — ideally using a KMS provider (AWS KMS, GCP KMS, Azure Key Vault) so the encryption key is not stored in the same etcd as the encrypted Secrets.
HashiCorp Vault Integration
For organizations with more demanding secrets management requirements — secret rotation, dynamic credentials, audit logging of secret access, fine-grained access control — HashiCorp Vault remains the standard solution in 2026. The Vault Agent Injector pattern allows Vault secrets to be injected into pods without the application knowing about Vault: a sidecar container authenticates to Vault, retrieves the secret, and writes it to a shared in-memory volume that the application reads as a file.
The Vault CSI Provider is an alternative that uses the Kubernetes Secrets Store CSI Driver to mount Vault secrets as volumes without sidecar injection. This is cleaner architecturally but requires the CSI driver to be installed cluster-wide.
Sealed Secrets: GitOps-Friendly Encryption
Sealed Secrets, from Bitnami, provides a different solution for a different problem: how to store Secrets in Git without exposing sensitive values. A SealedSecret is a Kubernetes custom resource containing encrypted secret data that can only be decrypted by the Sealed Secrets controller running in the cluster. This allows Secrets to be stored in GitOps repositories without credential exposure — the encrypted form is safe to commit.
I use Sealed Secrets in environments where GitOps is the deployment model and where the overhead of a full Vault deployment is not justified. For environments with dynamic credentials, rotation requirements, or cross-cluster secret sharing, Vault remains the more capable option.
7. Supply Chain Security: Cosign and SLSA
Supply chain attacks against Kubernetes environments have become more sophisticated. The XZ Utils backdoor in 2024, the SolarWinds incident earlier, and multiple compromised container images on Docker Hub have established that the container supply chain is a credible attack vector. Defending it requires controls at image build, distribution, and deployment.
Cosign and Image Signing
Cosign, from Sigstore, is the standard tool for signing container images with cryptographic signatures. The signing model uses keyless signing (via OIDC identity from CI/CD systems like GitHub Actions) or traditional keypair signing, and stores signatures in the container registry alongside the image.
The enforcement point is the admission controller: an OPA Gatekeeper or Kyverno policy that verifies image signatures at admission time, rejecting any image that does not have a valid signature from a trusted key or identity. This prevents unsigned images — whether modified by an attacker or simply built outside the approved pipeline — from running in the cluster.
The Sigstore ecosystem also includes Rekor (a tamper-evident transparency log of signatures) and Fulcio (a certificate authority for short-lived signing certificates). Together, these provide a complete public record of what was signed, when, and by what identity — useful for post-incident investigation and compliance audit.
SLSA Framework
SLSA (Supply-chain Levels for Software Artifacts, pronounced "salsa") is a framework for incrementally improving supply chain security. The four SLSA levels define progressively stronger guarantees about build provenance:
- SLSA 1: Build process is documented and scripts are available
- SLSA 2: Build process is hosted in version control with provenance generated
- SLSA 3: Source provenance and non-falsifiable build provenance; build environment is hardened
- SLSA 4: Two-person review required; hermetic, reproducible builds
For most organizations, SLSA 2 is achievable with current tooling and represents a meaningful security improvement: provenance documents generated by the CI system (GitHub Actions, Tekton) and attached to images provide a verifiable record of what source commit produced what image, using what build environment. This makes it possible to answer "was this image built from our official pipeline?" — a question that is surprisingly difficult to answer in many organizations today.
8. Runtime Security: Falco and Tetragon
Static configuration and admission control address what can run in the cluster. Runtime security addresses what is actually happening at runtime — detecting anomalous behavior that indicates an attack is in progress even if the workload itself passed all pre-deployment checks.
Falco
Falco is the CNCF-graduated runtime security tool and remains the most widely deployed option in 2026. It monitors system calls made by containers and generates alerts when behavior matches defined rules. The default Falco rule set covers the most common attack patterns:
- Shell execution inside a container (a common post-exploitation step)
- File writes to sensitive paths (e.g.,
/etc/passwd,/proc) - New outbound connections to unexpected destinations
- Privilege escalation attempts
- Container image mismatch (running image differs from what was admitted)
Falco requires a kernel module or eBPF probe to observe system calls. The eBPF deployment mode is preferred in managed cloud environments where loading kernel modules is restricted or inadvisable. Falco's alert output integrates with SIEM platforms, Slack, PagerDuty, and most incident response toolchains via Falcosidekick.
Tetragon
Tetragon, from Isovalent (the company behind Cilium), is an eBPF-based security observability and enforcement tool that operates at a lower level than Falco. Where Falco observes system calls after they are made, Tetragon can enforce policies by blocking system calls in-kernel — preventing the action rather than only alerting after the fact.
In 2026, Tetragon has gained significant adoption in environments that are already running Cilium, because the two tools share the eBPF infrastructure and operate well together. Tetragon's TracingPolicy custom resources allow defining very fine-grained behavioral policies — for example, blocking a specific process from making network connections, or preventing any process from writing to a specific directory — with enforcement rather than detection.
The tradeoff is operational complexity: Tetragon's enforcement model can cause application breakage if policies are not carefully tuned. Starting in observe mode and transitioning to enforce mode after baseline characterization is the correct operational pattern.
9. Admission Control: OPA Gatekeeper and Kyverno
Admission controllers intercept API server requests before objects are persisted to etcd. This is the enforcement point for cluster-wide policy: ensuring that all pods, deployments, and other resources meet defined standards before they are admitted to the cluster.
OPA Gatekeeper
OPA Gatekeeper uses the Open Policy Agent engine with Rego policies to enforce admission control. Policies are defined as ConstraintTemplates (defining the policy logic in Rego) and Constraints (applying the template with specific parameters to specific resource types or namespaces).
Gatekeeper's strength is flexibility — Rego is a full policy language capable of expressing complex logic, cross-resource relationships, and external data lookups. Its weakness is the learning curve: Rego is unfamiliar to most Kubernetes practitioners, and complex Rego policies can be difficult to test and debug.
Kyverno
Kyverno is a Kubernetes-native policy engine that uses YAML policies rather than a separate policy language. For teams that are already comfortable with Kubernetes YAML and do not need Rego's full expressiveness, Kyverno's policies are significantly easier to write and review. Kyverno also supports policy generation (automatically creating objects when other objects are created — for example, creating a default NetworkPolicy when a namespace is created) and mutation (modifying objects to add required labels or defaults), which Gatekeeper does not support natively.
In 2026, Kyverno has largely overtaken Gatekeeper for new deployments, particularly in organizations that do not already have OPA expertise. The Kyverno Policy Library provides a large collection of pre-written policies covering Pod Security Standards, RBAC controls, image signing verification, and resource constraints that teams can adopt and customize.
| Category | Primary Tool(s) | Enforcement Point | Key Strength |
|---|---|---|---|
| CIS Benchmark Assessment | kube-bench | Reporting (periodic) | Comprehensive baseline coverage |
| RBAC Analysis | rbac-lookup, kubectl-who-can | Audit (on-demand) | Exposes implicit over-permissions |
| Network Policy | Cilium, Calico | Runtime enforcement | L4/L7 traffic control, eBPF performance |
| Secrets Management | Vault, Sealed Secrets | Storage + injection | Dynamic credentials, rotation, audit trail |
| Supply Chain | Cosign, SLSA tooling | Build + admission | Cryptographic provenance verification |
| Runtime Detection | Falco, Tetragon | Runtime (continuous) | Behavioral anomaly detection, eBPF enforcement |
| Admission Control | Kyverno, OPA Gatekeeper | API admission (pre-persistence) | Policy-as-code, multi-domain coverage |
10. Multi-Tenant Isolation: When One Cluster Runs Many Teams
Multi-tenancy in Kubernetes — running workloads from multiple teams, business units, or external customers in a shared cluster — introduces isolation requirements that the default Kubernetes model does not fully address. Namespaces provide some isolation (separate RBAC, separate Network Policies, separate resource quotas) but do not provide the hard isolation of separate clusters.
Namespace-Level Isolation
For soft multi-tenancy (multiple internal teams, mutually trusting but requiring separation), namespace-level isolation is usually sufficient when combined with:
- RBAC bindings scoped to namespace (Roles, not ClusterRoles)
- Default-deny NetworkPolicy with explicit allow for inter-namespace traffic only where required
- ResourceQuota on each namespace to prevent resource exhaustion by one tenant
- LimitRange to enforce minimum and maximum resource requests/limits per pod
- Pod Security Standards at the Restricted or Baseline level per namespace
Hierarchical Namespaces
The Hierarchical Namespace Controller (HNC) from Google addresses a common operational pain point: propagating RBAC, NetworkPolicies, and other objects from a parent namespace to child namespaces. This enables a team to have a parent namespace with shared policies and multiple child namespaces (dev, staging, prod) that inherit those policies without manual duplication. HNC significantly reduces the operational overhead of maintaining consistent policy across many namespaces.
Hard Multi-Tenancy
For hard multi-tenancy — where tenant workloads are from different organizations or where a compromise of one tenant's workload should not affect other tenants — Kubernetes namespaces are generally insufficient. The standard architectural recommendation is separate clusters per tenant (or per trust boundary), potentially managed through a fleet management layer like Cluster API, ACM (Google Anthos Configuration Management), or Fleet (Microsoft).
Virtual cluster technology (vCluster, from Loft Labs) provides an intermediate option: a virtual Kubernetes API server and control plane that shares the physical nodes of a host cluster but provides each tenant with the appearance of an independent cluster. vCluster is appropriate for use cases where the cost of separate physical clusters is prohibitive but stronger isolation than namespaces is required.
11. Vulnerability Scanning Automation: The Continuous Posture Problem
Container vulnerability scanning at build time is necessary but not sufficient. New CVEs are disclosed daily, and an image that was clean at build time may be vulnerable twenty-four hours later. Continuous scanning in the registry and in running clusters is required to maintain current posture visibility.
Registry-Level Continuous Scanning
Container registries with integrated scanning — Amazon ECR with Inspector, Google Artifact Registry, Harbor with Trivy integration — continuously re-scan stored images as new vulnerability data becomes available. This means that a vulnerability disclosed today triggers an alert for images built three months ago that contain the affected package.
The operational requirement is a process for acting on registry scan findings: routing high-severity new findings to the team that owns the affected image, with an SLA for remediation. Many organizations have the scanning configured but lack the routing and SLA, resulting in findings that accumulate without action.
In-Cluster Scanning
In-cluster scanning tools — Trivy Operator, Starboard (deprecated in favor of Trivy Operator), and Snyk Controller — run inside the cluster and continuously scan the images of running pods. The advantage over registry scanning alone is that in-cluster tools correlate findings with workload metadata: they know which deployment, in which namespace, owned by which team, is running the vulnerable image. This makes routing findings to the correct owner more automatic.
Recommended scanning architecture:
1. CI scan (Trivy in pipeline) — block images with unfixed critical/high before push
2. Registry scan (ECR Inspector, GCR scanning, or Harbor) — continuous re-scan of stored images as new CVEs are disclosed
3. In-cluster scan (Trivy Operator) — correlate running workload vulnerabilities with owning team for action routing
4. Admission control (Kyverno image scan check) — optional block of images with active critical findings from deploying to production namespace
12. Kubernetes Audit Logging: The Forensics Layer
Kubernetes audit logging captures a record of every API server request — who made it, what was requested, and whether it was allowed or denied. Without audit logging, post-incident investigation of a Kubernetes compromise is severely limited. With it, you can reconstruct exactly what an attacker (or compromised service account) did inside the cluster.
Audit log configuration requires decisions about policy — which API calls to log at what verbosity level. Logging all requests at maximum verbosity produces volumes that are expensive to store and query. The policy I use as a baseline:
- Log all write operations (create, update, patch, delete) for all resources at the Request level (logging request metadata and body, not response)
- Log all access to Secrets at the Metadata level (request metadata only — not the secret content, which avoids storing credentials in logs)
- Log all access by non-system service accounts to the cluster-admin ClusterRole at the RequestResponse level
- Drop read-only operations on non-sensitive resources to avoid log volume dominated by routine controller activity
Audit logs should be forwarded to a SIEM or log management platform — not just stored locally on the control plane nodes, where they are accessible to anyone who compromises the node and where they do not survive node replacement. AWS CloudTrail (for EKS), Google Cloud Logging (for GKE), and Azure Monitor (for AKS) provide managed audit log collection that is separate from the cluster's own storage.
13. Key Takeaways
Seven things I believe are true about Kubernetes security in 2026:
1. The default Kubernetes configuration is not production-ready from a security perspective. Default RBAC bindings, default service account token mounting, unrestricted pod specs, and no network policies represent a starting point that requires significant hardening before hosting sensitive workloads.
2. RBAC audit is the highest-ROI single activity for an existing cluster. Running rbac-lookup and auditing cluster-admin bindings in an untouched cluster almost always reveals significant over-permission that can be removed without functional impact.
3. Runtime detection cannot be substituted for pre-runtime controls — and vice versa. Falco cannot catch a vulnerability that was present in the image before the container started. Trivy cannot catch anomalous process execution that happens at runtime. Both layers are necessary.
4. Network Policy default-deny is the most impactful network control available, and it is free. It does require understanding your application's traffic patterns, which is an investment — but the security benefit of isolating compromised workloads is significant.
5. Supply chain security is now baseline, not advanced. Image signing with Cosign, verified at admission, is a straightforward control that eliminates an entire class of supply chain attack vector. The Sigstore tooling has matured to the point where adoption friction is low.
6. Multi-tenancy on shared clusters requires more than namespaces. ResourceQuotas, LimitRanges, RBAC scoping, NetworkPolicies, and Pod Security Standards must all be configured correctly to achieve meaningful tenant isolation. Any one of them missing undermines the others.
7. Audit logging and runtime visibility are your forensics capability. When — not if — an incident occurs, the difference between a two-hour investigation and a two-week investigation is the quality of your audit log and runtime event data. Configure both before you need them.
Conclusion
Kubernetes security in 2026 is not a configuration you reach once and maintain passively. It is a continuous practice: running CIS benchmark checks after every cluster version upgrade, auditing RBAC as teams change, reviewing Network Policies as application architectures evolve, updating admission policies as new workload types are introduced, and monitoring runtime behavior for the signals that indicate something has gone wrong.
The tooling ecosystem has matured significantly. What was experimental three years ago — Falco eBPF probes, Cilium L7 policies, Cosign keyless signing, Kyverno policy generation — is now production-grade and widely deployed. The gap between organizations with strong Kubernetes security posture and those without is no longer a tooling gap. It is an operational discipline gap: consistently applying known controls, reviewing them as circumstances change, and treating Kubernetes security as an ongoing engineering concern rather than a one-time setup task.
Want to automate Kubernetes compliance reporting? — See what I built
댓글
댓글 쓰기