Kubernetes v1.36: Dynamic Resource Adjustment for Suspended Jobs Now in Beta

From Wandaeps, the free encyclopedia of technology

Quick Facts

Category: Education & Careers
Published: 2026-05-03 15:27:12
Understanding the Shift from cgroup v1 CPU Shares to cgroup v2 CPU Weight in Kubernetes
Revitalizing Legacy Systems: A Practical Guide to UX Modernization
Everything About Learning from the Vercel breach: Shadow AI & OAuth sprawl
How to Protect Your macOS and Linux Systems from the Critical ASP.NET Core Vulnerability (CVE-2026-40372)
Transforming Your Astro Workflow: A How-To for the Markdown Component

Kubernetes v1.36 has promoted the ability to modify container resource requests and limits in the pod template of a suspended Job to beta. First introduced as an alpha feature in v1.35, this enhancement empowers queue controllers and cluster administrators to fine-tune CPU, memory, GPU, and extended resource specifications on a Job while it remains suspended—either before it starts or when it resumes running.

Why Mutable Pod Resources for Suspended Jobs?

Batch processing and machine learning workloads often face resource requirements that are not precisely known at Job creation time. The optimal allocation depends on current cluster capacity, queue priorities, and the availability of specialized hardware such as GPUs. Prior to this feature, resource requirements in a Job’s pod template were immutable once set. If a queue controller like Kueue determined that a suspended Job should run with different resources, the only option was to delete and recreate the Job—losing any associated metadata, status, or history.

Kubernetes v1.36: Dynamic Resource Adjustment for Suspended Jobs Now in Beta

This feature also provides a way to allow a specific Job instance for a CronJob to progress slowly with reduced resources, rather than completely failing to run if the cluster is heavily loaded. For example, a machine learning training Job initially requests 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
 name: training-job-example-abcd123
 labels:
 app.kubernetes.io/name: trainer
spec:
 suspend: true
 template:
 metadata:
 annotations:
 kubernetes.io/description: "ML training, ID abcd123"
 spec:
 containers:
 - name: trainer
 image: example-registry.example.com/training:2026-04-23T150405.678
 resources:
 requests:
 cpu: "8"
 memory: "32Gi"
 example-hardware-vendor.com/gpu: "4"
 limits:
 cpu: "8"
 memory: "32Gi"
 example-hardware-vendor.com/gpu: "4"
 restartPolicy: Never

A queue controller managing cluster resources might determine that only 2 GPUs are available. With this feature, the controller can update the Job’s resource requests before resuming it:

apiVersion: batch/v1
kind: Job
metadata:
 name: training-job-example-abcd123
 labels:
 app.kubernetes.io/name: trainer
spec:
 suspend: true
 template:
 metadata:
 annotations:
 kubernetes.io/description: "ML training, ID abcd123"
 spec:
 containers:
 - name: trainer
 image: example-registry.example.com/training:2026-04-23T150405.678
 resources:
 requests:
 cpu: "4"
 memory: "16Gi"
 example-hardware-vendor.com/gpu: "2"
 limits:
 cpu: "4"
 memory: "16Gi"
 example-hardware-vendor.com/gpu: "2"
 restartPolicy: Never

Once the resources are updated, the controller resumes the Job by setting spec.suspend to false, and the new Pods are created with the adjusted resource specifications.

How the Feature Works

The Kubernetes API server relaxes the immutability constraint on pod template resource fields specifically for suspended Jobs. No new API types have been introduced; the existing Job and pod template structures accommodate the change through relaxed validation logic. When a Job is not suspended (i.e., in active state), the resources remain immutable as before—the feature only applies while the Job is suspended.

This design ensures backward compatibility and avoids introducing complexity for workloads that do not need dynamic resource adjustment. Queue controllers and administrators can now adjust resources on a per-Job basis without deleting the Job object, preserving labels, annotations, and history.

Implementation Details

Only for suspended Jobs: The immutability is relaxed exclusively for Jobs with spec.suspend: true.
All resource types supported: CPU, memory, GPU, and extended resources are adjustable.
Standard Kubernetes API: Updates are performed using the regular PATCH or PUT endpoints on the Job resource.
No new CRDs or controllers needed: Existing queue controllers (e.g., Kueue) can leverage this feature with minimal changes.

Benefits for Cluster Administrators and Queue Controllers

This feature brings several concrete advantages:

No Job deletion required: Avoid losing metadata, status, or history when resource demands change.
Improved cluster utilization: Dynamically resize Jobs to fit available capacity rather than failing or queuing indefinitely.
Graceful degradation: CronJob instances can be scaled down under heavy load rather than skipped entirely.
Simpler automation: Queue controllers can update resource requests in-place without complex workarounds.

For example, consider a CronJob that runs nightly batch processing. If the cluster is unusually busy, the controller can reduce the resource requests of the suspended Job instance, allowing it to run with lower concurrency. When the workload completes successfully, the CronJob history remains intact.

With this beta release, Kubernetes v1.36 solidifies a critical capability for batch and ML workloads, making cluster operations more flexible and resilient.

Categories: Understanding the Shift from cgroup v1 CPU Shares to cgroup v2 CPU Weight in Kubernetes Revitalizing Legacy Systems: A Practical Guide to UX Modernization Everything About Learning from the Vercel breach: Shadow AI & OAuth sprawl How to Protect Your macOS and Linux Systems from the Critical ASP.NET Core Vulnerability (CVE-2026-40372) Transforming Your Astro Workflow: A How-To for the Markdown Component