Kubernetes v1.36 “Haru”: The Biggest Release for Resource Management in Years

Kubernetes v1.36, codenamed Haru, just dropped, and it’s one of the most significant releases in recent memory. After years of incremental improvements to resource management and security isolation, three major features have reached critical maturity milestones: In-Place Vertical Scaling for Pod-Level Resources graduates to Beta, User Namespaces finally hit GA, and Memory QoS gets a substantial upgrade with tiered protection. Let’s break down what each means for your clusters and how to start using them.

In-Place Vertical Scaling for Pod-Level Resources (Beta)

Kubernetes v1.34 introduced Pod-level resources as a Beta feature, letting you define an aggregate resource budget that containers share. In v1.35, in-place vertical scaling became GA for container-level resources. Now in v1.36, these two capabilities converge: you can resize the shared resource pool of a running Pod without restarting it.

Why does this matter? Consider a Pod running a main application alongside a sidecar for logging and metrics. Previously, if you needed to increase CPU during a traffic spike, you’d have to recalculate per-container limits, update the spec, and deal with a Pod restart. With Pod-level resources, you define one aggregate pool — and now you can scale that pool on the fly.

How It Works

Containers that don’t define individual limits automatically inherit from the Pod-level budget. When you resize the Pod-level limit, the Kubelet consults each container’s resizePolicy to decide whether a restart is needed:

apiVersion: v1
kind: Pod
metadata:
  name: shared-pool-app
spec:
  resources:
    limits:
      cpu: "2"
      memory: "4Gi"
  containers:
  - name: main-app
    image: my-app:v1
    resizePolicy:
      - resourceName: "cpu"
        restartPolicy: "NotRequired"
  - name: sidecar
    image: logger:v1
    resizePolicy:
      - resourceName: "cpu"
        restartPolicy: "NotRequired"

To double the CPU capacity without a restart:

kubectl patch pod shared-pool-app --subresource resize \
  --patch '{"spec":{"resources":{"limits":{"cpu":"4"}}}}'

The Kubelet performs a feasibility check against the node’s allocatable capacity before admitting the resize. If the node is overcommitted, the Pod gets a PodResizePending condition with a Deferred or Infeasible status — you get immediate feedback rather than a silent failure.

The resize sequence matters for safety: when increasing, the Pod-level cgroup expands first, then individual container cgroups grow into the new space. When decreasing, containers are throttled first, then the aggregate boundary shrinks. This prevents resource overshoot in either direction.

Requirements: cgroup v2, containerd v2.0+ or CRI-O, and Linux nodes. Enable the InPlacePodLevelResourcesVerticalScaling feature gate (on by default in v1.36).

User Namespaces Are Finally GA

This one has been a decade in the making — the original KEP was opened 10 years ago. User Namespaces in Kubernetes are now Generally Available, and they fundamentally change the security model for container workloads.

The core problem is straightforward: a process running as UID 0 (root) inside a container is also root from the kernel’s perspective on the host. If an attacker escapes the container through a kernel vulnerability or misconfigured mount, they’re root on the host. Traditional security measures (seccomp, AppArmor, SELinux) mitigate this but don’t change the process’s underlying identity.

User Namespaces solve this by giving the container its own UID/GID mapping. Root inside the container maps to an unprivileged user on the host. But the real power goes further — when hostUsers: false is set, capabilities like CAP_NET_ADMIN become namespaced. This means you can run workloads that need specific privileges (like network configuration) without granting any host-level power. It’s a pattern that was previously impossible without running fully privileged containers.

Using User Namespaces

The API is refreshingly simple. Just set hostUsers: false in your Pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: isolated-workload
spec:
  hostUsers: false
  containers:
  - name: app
    image: fedora:42
    securityContext:
      runAsUser: 0

No changes to your container images, no complex configuration. The key enabler that made this performant was ID-mapped mounts, introduced in Linux 5.12. Instead of recursively chowning every file in a volume (expensive for large volumes), the kernel remaps UIDs/GIDs at mount time — an O(1) operation. Files appear owned by UID 0 inside the container while their on-disk ownership stays unchanged.

If you’re running workloads that handle untrusted input, process network traffic, or run third-party code, User Namespaces should be on your adoption roadmap. The security boundary they provide is fundamentally stronger than any userspace sandbox.

Memory QoS Gets Tiered Protection

Memory management in Kubernetes has always been a balancing act. You set requests and limits, but the kernel’s behavior under memory pressure has been hard to control. The Memory QoS feature, which has been evolving since v1.22, gets a major update in v1.36 with tiered memory protection.

The key change is the introduction of memoryReservationPolicy, which separates throttling from reservation. In earlier versions, enabling Memory QoS immediately set memory.min for every container with a memory request — a hard reservation the kernel would never reclaim, even under extreme pressure. This could cause problems on nodes with many Burstable Pods where total requests approached physical memory.

v1.36 introduces a smarter approach with TieredReservation mode:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
  MemoryQoS: true
memoryReservationPolicy: TieredReservation
memoryThrottlingFactor: 0.9

With tiered protection, the QoS class of your Pod determines how aggressively the kernel protects its memory:

  • Guaranteed Pods get hard protection via memory.min — the kernel will invoke the OOM killer on other processes before reclaiming this memory.
  • Burstable Pods get soft protection via memory.low — the kernel avoids reclaiming under normal pressure but can reclaim under extreme pressure to prevent system-wide OOM.
  • BestEffort Pods get no reservation — their memory is fully reclaimable.

This is a much more nuanced approach. Consider a node with 8 GiB of RAM where Burstable Pod requests total 7 GiB. Under the old model, that 7 GiB was locked as memory.min, leaving almost nothing for system daemons. With tiered reservation, those Burstable requests map to memory.low — still protected under normal conditions but reclaimable when the system is truly desperate.

The feature also adds two observability metrics on the kubelet /metrics endpoint: kubelet_memory_qos_node_memory_min_bytes and kubelet_memory_qos_node_memory_low_bytes. If the first metric is creeping toward your node’s physical memory, you know hard reservation is getting tight — useful for capacity planning before you start seeing OOM kills.

Other Notable v1.36 Features

Beyond these three headline features, v1.36 includes several other improvements worth knowing about:

  • Fine-Grained Kubelet API Authorization graduates to GA — giving cluster administrators more precise control over which kubelet endpoints workloads can access.
  • SELinux Volume Label Changes goes GA — making volume setup faster for most workloads. If you run with SELinux in enforcing mode, plan ahead: v1.37 is expected to enable this by default.
  • Mutable Pod Resources for Suspended Jobs (Beta) — allowing you to modify resource requests/limits on suspended Job Pods before they start.
  • Staleness Mitigation and Observability for Controllers — new tooling to detect and mitigate stale controller state.
  • Gateway API v1.5 — moving several features to Stable status, continuing the path toward replacing Ingress as the standard routing API.

Getting Started

The safest approach is to test these features on a non-production cluster first. For User Namespaces and In-Place Pod-Level Resource Scaling, make sure you’re running cgroup v2 with a supported container runtime (containerd v2.0+ recommended). For Memory QoS, start with memoryReservationPolicy: None to enable throttling only, observe your workload behavior, then opt into TieredReservation when you’re confident in your node capacity headroom.

All three features represent years of upstream work and signal Kubernetes’ continued evolution from a container orchestrator into a mature platform for running workloads with fine-grained resource control and defense-in-depth security. The release notes and feature documentation at kubernetes.io have full details, including migration guides and known limitations.

Leave a Reply

Your email address will not be published. Required fields are marked *