Quick Facts
- Category: Education & Careers
- Published: 2026-05-03 15:27:12
- Understanding the Shift from cgroup v1 CPU Shares to cgroup v2 CPU Weight in Kubernetes
- Revitalizing Legacy Systems: A Practical Guide to UX Modernization
- Everything About Learning from the Vercel breach: Shadow AI & OAuth sprawl
- How to Protect Your macOS and Linux Systems from the Critical ASP.NET Core Vulnerability (CVE-2026-40372)
- Transforming Your Astro Workflow: A How-To for the Markdown Component
Kubernetes v1.36 has promoted the ability to modify container resource requests and limits in the pod template of a suspended Job to beta. First introduced as an alpha feature in v1.35, this enhancement empowers queue controllers and cluster administrators to fine-tune CPU, memory, GPU, and extended resource specifications on a Job while it remains suspended—either before it starts or when it resumes running.
Why Mutable Pod Resources for Suspended Jobs?
Batch processing and machine learning workloads often face resource requirements that are not precisely known at Job creation time. The optimal allocation depends on current cluster capacity, queue priorities, and the availability of specialized hardware such as GPUs. Prior to this feature, resource requirements in a Job’s pod template were immutable once set. If a queue controller like Kueue determined that a suspended Job should run with different resources, the only option was to delete and recreate the Job—losing any associated metadata, status, or history.
This feature also provides a way to allow a specific Job instance for a CronJob to progress slowly with reduced resources, rather than completely failing to run if the cluster is heavily loaded. For example, a machine learning training Job initially requests 4 GPUs:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
limits:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
restartPolicy: Never
A queue controller managing cluster resources might determine that only 2 GPUs are available. With this feature, the controller can update the Job’s resource requests before resuming it:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
limits:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
restartPolicy: Never
Once the resources are updated, the controller resumes the Job by setting spec.suspend to false, and the new Pods are created with the adjusted resource specifications.
How the Feature Works
The Kubernetes API server relaxes the immutability constraint on pod template resource fields specifically for suspended Jobs. No new API types have been introduced; the existing Job and pod template structures accommodate the change through relaxed validation logic. When a Job is not suspended (i.e., in active state), the resources remain immutable as before—the feature only applies while the Job is suspended.
This design ensures backward compatibility and avoids introducing complexity for workloads that do not need dynamic resource adjustment. Queue controllers and administrators can now adjust resources on a per-Job basis without deleting the Job object, preserving labels, annotations, and history.
Implementation Details
- Only for suspended Jobs: The immutability is relaxed exclusively for Jobs with
spec.suspend: true. - All resource types supported: CPU, memory, GPU, and extended resources are adjustable.
- Standard Kubernetes API: Updates are performed using the regular
PATCHorPUTendpoints on the Job resource. - No new CRDs or controllers needed: Existing queue controllers (e.g., Kueue) can leverage this feature with minimal changes.
Benefits for Cluster Administrators and Queue Controllers
This feature brings several concrete advantages:
- No Job deletion required: Avoid losing metadata, status, or history when resource demands change.
- Improved cluster utilization: Dynamically resize Jobs to fit available capacity rather than failing or queuing indefinitely.
- Graceful degradation: CronJob instances can be scaled down under heavy load rather than skipped entirely.
- Simpler automation: Queue controllers can update resource requests in-place without complex workarounds.
For example, consider a CronJob that runs nightly batch processing. If the cluster is unusually busy, the controller can reduce the resource requests of the suspended Job instance, allowing it to run with lower concurrency. When the workload completes successfully, the CronJob history remains intact.
With this beta release, Kubernetes v1.36 solidifies a critical capability for batch and ML workloads, making cluster operations more flexible and resilient.