Even mature cloud-native teams report challenges with Kubernetes implementation, particularly around complexity, scaling, and security. Resource utilization is another common pain point. According to the 2025 Kubernetes Benchmark Report, the average CPU utilization is just 10%, and memory utilization is only 23%.
This represents a significant opportunity for optimization.
This guide distills years of platform engineering experience into actionable enterprise Kubernetes best practices for implementations that are resilient to failures, secure by design, and optimized for resource utilization.
Whether you’re operating on AWS EKS, Azure AKS, Google GKE, or on-premises infrastructure, these principles will help you build enterprise-grade Kubernetes platforms that scale efficiently while controlling cloud costs.
Get the guide – Enterprise Kubernetes Best Practices
Resilience engineering in Kubernetes
Resilience—maintaining service availability despite infrastructure failures—requires deliberate design decisions in your Kubernetes architecture.
Here are three best practices for boosting your implementation’s resilience.
Multi-zone pod distribution with topology spread constraints
The foundation of resilient Kubernetes workloads is proper pod distribution across infrastructure failure domains. Topology spread constraints provide declarative control over how pods distribute across your cluster.
Here’s an example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: resilient-application
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: resilient-application
# Provide fallback for when zones have issues
- maxSkew: 2
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: resilient-applicationThis configuration ensures your applications distribute evenly across availability zones while providing a fallback for node-level distribution. Our research shows that properly implemented topology constraints can reduce mean time to recovery (MTTR) by up to 43% during zone failures.
DevOps Pro Tip: For workloads with 2-3 replicas, use pod anti-affinity instead of topology spread constraints to ensure strict separation with minimal configuration.
Comprehensive health probes
Kubernetes offers three types of health probes, each serving a distinct purpose in your resilience strategy:
- Liveness probes – detects broken application states and triggers container restarts
- Readiness probes – controls traffic routing to pods that are ready to serve requests
- Startup probes – allows applications with lengthy initialization to avoid premature restarts
Here’s an example:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
startupProbe:
httpGet:
path: /started
port: 8080
failureThreshold: 30
periodSeconds: 10Properly configured health probes can prevent many customer-impacting outages. Implementing them is not merely a best practice—it’s a critical component of site reliability engineering (SRE).
Pod disruption budgets: controlled maintenance
Pod disruption budgets (PDBs) protect application availability during voluntary disruptions like node upgrades or cluster scaling:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
spec:
minAvailable: 2 # or use maxUnavailable
selector:
matchLabels:
app: critical-servicePDBs control how many pods of a workload can be down simultaneously during voluntary disruptions, ensuring service continuity. They work by:
- Defining the minimum number of pods that must remain available (minAvailable)
- Or defining the maximum number of pods that can be unavailable (maxUnavailable)
- Blocking voluntary disruptions when budget constraints are violated
PDB configuration guidelines:
- Align with replica count: Your PDB must be configured to your deployment’s replica count. For example:
- A replica count of 3 with maxUnavailable: 1 allows 1 pod to be disrupted (66% available)
- A replica count of 3 with minAvailable: 2 produces the identical outcome
- A replica count of 1 with maxUnavailable: 1 will permit the single pod to be disrupted, potentially causing a service outage
- Common misconfiguration: Setting maxUnavailable: 1 with a replica count of 1 will allow the single pod to be evicted during node drains, causing service downtime. To prevent this, use minAvailable: 1 for single-replica workloads.
- Percentage-based configurations: You can also use percentages:
spec: minAvailable: "50%" # or maxUnavailable: "50%"This approach automatically adjusts as replica counts change.
- Multiple workloads: For applications with multiple components, ensure each deployment has its own PDB.
A survey revealed that 83% of organizations experienced Kubernetes-related outages, many of which were tied to upgrades due to improper planning or configuration.
GitOps-based disaster recovery
Modern disaster recovery leverages Infrastructure as Code (IaC) and GitOps patterns to enable rapid, consistent recovery:
- Store all infrastructure and application configurations in Git repositories
- Use declarative IaC tools like Terraform to manage cluster and cloud resources
- Implement automated recovery pipelines that can rebuild environments in minutes
- Test DR procedures regularly with chaos engineering practices
Organizations leveraging GitOps-based disaster recovery strategies experience significant enhancements, with 60% of users reporting faster repair times and rollback capabilities and 53% citing easier rollbacks, leading to reduced configuration errors through automation.
Get the one-pager: Resilience Engineering in Kubernetes
Resource optimization strategies
According to the 2025 Kubernetes Benchmark Report, the average CPU utilization was just 10%, while memory utilization averaged only 23%. This means nearly 90% of CPU and 77% of memory resources are wasted in typical Kubernetes deployments.
Here are five key areas you should improve to maximize your usage of compute resources.
Rightsizing workloads
The foundation of resource optimization is appropriate workload sizing:
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
memory: 512MiHere are a few best practices for production workloads:
- Set memory requests equal to limits to guarantee Quality of Service (QoS)
- Configure CPU requests based on 95th percentile usage in production
- Consider leaving CPU limits undefined to allow burst capacity
- Implement automated right-sizing with Vertical Pod Autoscaler
The report shows that automated memory request adjustments are particularly critical, as 5.7% of containers exceed their requested memory at some point during 24 hours, leading to instability and performance issues.
Horizontal Pod Autoscaler for dynamic workloads
Horizontal Pod Autoscaler (HPA) enables automatic scaling based on metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
scaleUp:
stabilizationWindowSeconds: 60Here are four resource optimization best practices for HPA:
- Use custom metrics (requests-per-second, queue depth) over CPU when possible
- Configure appropriate scale-up/down behaviors to avoid thrashing
- Set minReplicas based on minimum acceptable availability, not minimum load
- Implement cluster autoscaler in conjunction with HPA for node-level scaling
Cross-AZ traffic optimization with topology-aware routing
Topology-aware routing optimizes traffic by prioritizing local or same-zone pod communication over cross-zone routing to minimize cross-zone networking costs in Kubernetes. Here’s how to configure it:
apiVersion: v1
kind: Service
metadata:
name: app-service
annotations:
service.kubernetes.io/topology-aware-hints: "auto"
spec:
selector:This configuration enables topology-aware routing with the following behavior:
- Routes traffic to pods on the same node when possible.
- If no local pods are available, routes to pods in the same availability zone.
- Falls back to cross-zone routing only when necessary.
Depending on workload and cluster configuration, topology-aware routing can significantly reduce cross-AZ data transfer costs, potentially by 20-50% in well-optimized multi-zone Kubernetes deployments.
Ensure your cluster has topology-aware hints enabled, and nodes are properly labeled with topology information (e.g., topology.kubernetes.io/zone). Monitor traffic patterns using tools like Grafana or AWS CloudWatch for precise savings.
Get the one-pager: Resource Optimization Strategies
Node selection and bin-packing
Advanced resource optimization requires strategic node selection:
- Consolidate workloads onto fewer, larger nodes rather than many small nodes
- Implement cluster autoscaler with bin-packing strategy
- Consider Spot/Preemptible Instances for stateless, fault-tolerant workloads
- Use node taints and tolerations judiciously, avoiding excessive node specialization
Best practice example
Different GPUs carry different Spot Instance prices, thereby unlocking new savings. Here’s an example of the AWS G5 instance types offering varying GPU configurations depending on the size of the instance.
Specifically, the G5.xlarge, G5.4xlarge, and G5.16xlarge (NVIDIA A10G Tensor Core GPU) instances are each equipped with one GPU. The G5.24xl has four GPUs, while the G5.48xl provides eight GPUs.
When using the G5.16xl instance, users can use a single GPU while accessing significant additional compute resources (CPU). Workloads that don’t require the GPU can utilize these additional resources, leading to cost-effective computation.
Regarding price per GPU, larger multi-GPU instances provide better value than several smaller, single-GPU instances. When you look at the prices per GPU, there isn’t much difference between the 4-GPU (G5.24xl) and 8-GPU (G5.48xl) instances. This means that larger instances are cheaper for tasks that use a lot of GPUs.
Cost-effective compute selection
Our 2025 Kubernetes Benchmark Report shows that being flexible with compute options can yield significant savings:
- Consider both Arm and x86 architectures – Azure offers up to 65% savings with Arm CPUs.
- Leverage Spot Instances – organizations using a mix of On-Demand and Spot Instances realized 59% average savings, while Spot-only clusters achieved 77% savings.
- Be aware of regional price differences – for GPU workloads, selecting the optimal region and availability zone can reduce costs by 2-7x compared to average Spot prices.
What is the best region/AZ to run your AI workload?
When running GPU-heavy workloads, where you choose to run them can make a huge difference in cost.
The report analyzed AWS p4d.24xlarge instances, equipped with 8 NVIDIA A100 GPUs from January 2024 to February 2025, revealing significant cost variations across regions and AZs. Some regions and AZs are up to six times cheaper than the average.
If you can adjust your AI or GPU-intensive workloads to the most cost-efficient regions and AZs—rather than defaulting to higher-cost zones like us-east-1a—you could achieve massive savings:
🔹 2x-7x savings compared to the average Spot Instance price globally
🔹 3x-10x savings compared to the average On-Demand Instance price
GPU pricing and availability fluctuate frequently, so flexibility in choosing regions can be a powerful way to optimize costs and ensure you’re making the most of your cloud resources.
Security hardening for enterprise Kubernetes
Kubernetes security requires a defense-in-depth approach spanning multiple control points.
Here are several key areas teams should focus on when it comes to security:
Network policy implementation
Network policies provide Kubernetes-native microsegmentation:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressBest practices for network security:
- Start with default-deny policies and explicitly allow required traffic
- Implement namespace isolation for multi-tenant clusters
- Monitor policy violations before enforcing in production
- Consider a service mesh for advanced traffic management and encryption
External secrets management
Decoupling secrets from application code and Kubernetes manifests is critical:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: database-credentials-k8s
data:
- secretKey: username
remoteRef:
key: database/credentials
property: username
- secretKey: password
remoteRef:
key: database/credentials
property: passwordIntegrate with cloud provider secrets services (AWS Secrets Manager, Azure Key Vault, Google Secret Manager) or dedicated solutions like HashiCorp Vault.
Comprehensive container security
Container security requires a multi-layered approach:
- Static image scanning: Implement vulnerability scanning in CI/CD pipelines
- Container runtime security: Deploy runtime monitoring to detect behavioral anomalies
- CIS benchmark compliance: Regularly audit against industry security standards
Example Jenkins pipeline stage for container scanning:
stage('Security Scan') {
steps {
sh 'trivy image --severity HIGH,CRITICAL --exit-code 1 ${IMAGE_NAME}:${IMAGE_TAG}'
}
}Organizations with comprehensive container security programs experience significantly fewer Kubernetes security incidents. Studies show that advanced DevSecOps practices can reduce incident rates by up to 50% compared to those with inadequate security measures.
Research indicates that 89% of organizations face Kubernetes incidents annually, yet robust security approaches, including vulnerability management and runtime protection, substantially lower these risks.
Get the one-pager: Security Hardening for Kubernetes
Policy enforcement with OPA Gatekeeper
Enforce security and compliance requirements with policy-as-code:
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8srequiredprobes
spec:
crd:
spec:
names:
kind: K8sRequiredProbes
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredprobes
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.livenessProbe
msg := sprintf("Container %v must specify a livenessProbe", [container.name])
}Key policies to implement:
- Required resource requests/limits
- Restricted image repositories
- Enforcement of health probes
- Container security context requirements
Comprehensive audit logging
Enable and configure audit logging in your cloud provider’s Kubernetes service:
# AWS EKS example
aws eks update-cluster-config \
--region us-west-2 \
--name production-cluster \
--logging '{"clusterLogging":[{"types":["audit"],"enabled":true}]}'Audit log best practices:
- Export logs to a centralized SIEM solution
- Create alerts for suspicious administrative activities
- Implement appropriate retention policies based on compliance requirements
- Regularly review access patterns to detect potential security issues
Implementation roadmap
Implementing these best practices calls for a phased approach:
Phase 1: Assessment and Foundation (Weeks 1-4)
- Assess current Kubernetes implementation against best practices
- Implement resource requests/limits for all workloads
- Deploy basic health probes for critical services
- Establish baseline monitoring and observability
- Measure current CPU and memory utilization
Phase 2: Resilience Engineering (Weeks 5-8)
- Implement topology spread constraints for critical workloads
- Configure pod disruption budgets
- Deploy advanced health probes with all three probe types
- Test failover scenarios with controlled chaos engineering
Phase 3: Security Hardening (Weeks 9-12)
- Implement network policies in monitoring mode
- Deploy external secrets management
- Set up container image scanning in CI/CD pipelines
- Configure OPA Gatekeeper with critical policies
Phase 4: Resource Optimization (Weeks 13-16)
- Implement automated right-sizing for workloads
- Configure horizontal pod autoscalers with custom metrics
- Optimize node configurations and bin-packing
- Implement service topology for traffic optimization
- Consider Spot Instances for appropriate workloads
Phase 5: GitOps and Automation (Weeks 17-20)
- Deploy GitOps workflows for application deployment
- Implement infrastructure as code for all components
- Create automated disaster recovery procedures
- Develop continuous compliance monitoring
- Implement agentic autoscaling for dynamic resource management
Get the one-pager: Implementation Roadmap
Start implementing Kubernetes best practices today
Building resilient, secure, and cost-optimized Kubernetes infrastructures requires a deliberate approach spanning multiple disciplines—from platform engineering to site reliability engineering to security. By implementing the best practices outlined in this guide, organizations can achieve up to:
- Up to 99.99% availability for critical services through multi-zone pod distribution and comprehensive health probes
- 40-60% reduction in cloud infrastructure costs by right-sizing workloads and implementing Horizontal Pod Autoscaler (HPA)
- Up to 50% fewer security incidents by establishing comprehensive container security and policy enforcement
- Up to 70% reduction in downtime by using Pod Disruption Budgets (PDBs) for controlled maintenance and GitOps for disaster recovery
With average CPU utilization at just 10% and memory at 23%, most organizations have significant room for improvement. Organizations can dramatically reduce waste by applying these best practices—particularly around right-sizing, bin-packing, and strategically using Spot Instances—while maintaining or improving application performance and reliability.
The journey to Kubernetes excellence is continuous. Start with the highest-impact items—topology spread constraints, resource rightsizing, and network policies—and progressively implement the remaining best practices based on your organization’s priorities and resources.
Kubernetes cost optimization
Monitor resource spending, automate resource allocation, and scale instantly with zero downtime.



