ImagePullBackOff means Kubernetes tried to pull a container image, failed, and is now backing off before retrying. The kubelet cycles through ErrImagePull (the first failure) and ImagePullBackOff (the waiting state between retries). Retries follow exponential backoff, starting at a few seconds and capping at 5 minutes. Permanent problems like a typo in the image name or a missing pull secret will not self-heal.
Key Takeaways
- ImagePullBackOff is a retry state, not a new error – ErrImagePull fires first; ImagePullBackOff is the waiting state between retries, capping at 5 minutes per attempt
kubectl describe podis your first tool – the Events section gives you the exact error string that maps directly to a root cause- Eight distinct causes, one definitive fix each – wrong image name, nonexistent tag, missing imagePullSecret, expired credentials, Docker Hub rate limit, unreachable registry, imagePullPolicy: Never, architecture mismatch
- Permanent problems don’t self-heal – if a pod has been in ImagePullBackOff for 20+ minutes, act; don’t wait
- Delete the pod to bypass the backoff timer – once the fix is applied,
kubectl delete podorkubectl rollout restartforces an immediate retry - ECR tokens expire after 12 hours – use IRSA or workload identity in production to avoid unexpected credential-expiry failures
What ImagePullBackOff and ErrImagePull Mean
These two error states are part of the same state machine, not two separate problems. Understanding the difference matters when you are triaging a broken deployment at 2 a.m.
ErrImagePull appears immediately on the first failed pull attempt. The kubelet tried to contact the registry, something went wrong, and it reported the failure. This status is brief. Within seconds, Kubernetes transitions the pod to ImagePullBackOff, which is the pod’s waiting state between retry attempts.
Retries follow exponential backoff, starting at a few seconds and capping at 300 seconds (5 minutes). Kubernetes keeps retrying indefinitely at that ceiling. If you see a pod stuck in ImagePullBackOff for 20 minutes with no sign of recovery, the underlying problem is almost certainly permanent: a typo, a missing secret, or a tag that was never pushed.
For single-replica deployments, this means 100% unavailability immediately. Multi-replica deployments keep running pods alive until the issue is resolved.
Transient issues, like a brief registry outage or a DNS hiccup, can self-heal once the registry comes back. Permanent issues will not. Knowing which category you are dealing with tells you whether to wait or act.
Init containers can trigger the same errors. When they do, kubectl describe pod events reference the init container image, not the main container. Check the init container name in the events before assuming the main application image is the one that is missing.
If the image pulls successfully but the container crashes immediately after, that is a different problem entirely. See our guide on CrashLoopBackOff for that failure mode.
How to Diagnose ImagePullBackOff
Use The Five-Step Diagnosis to isolate the root cause systematically before touching any configuration.
Step 1: Confirm the pod status.
kubectl get pods -n <namespace>Look for ImagePullBackOff or ErrImagePull in the STATUS column.
Step 2: Inspect the failing pod events.
kubectl describe pod <name> -n <namespace>Scroll to the Events section at the bottom. The error message there maps directly to a root cause. More detail on reading this output is in the section below.
Step 3: Check the namespace-level event stream.
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod>Useful when multiple pods are failing or when you want a time-ordered view across the namespace.
Step 4: Verify the exact image reference in the pod spec.
kubectl get pod <name> -n <namespace> -o jsonpath='{.spec.containers[*].image}'Copy the output and verify it character by character against your registry. Typos in image names and tags account for a large share of these failures.
Step 5: Check which imagePullSecrets are attached to the pod.
kubectl get pod <name> -n <namespace> -o jsonpath='{.spec.imagePullSecrets[*].name}'If this returns empty and the image lives in a private registry, that is your problem.
kubectl describe pod
The Events section at the bottom of kubectl describe pod <name> -n <namespace> tells you the exact error message from the container runtime. This is where the error patterns live: manifest not found, unauthorized, toomanyrequests, i/o timeout.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m default-scheduler Successfully assigned default/my-pod to node-01
Normal Pulling 4m (x3 over 5m) kubelet Pulling image "myrepo/myapp:v2"
Warning Failed 4m (x3 over 5m) kubelet Failed to pull image "myrepo/myapp:v2": rpc error: code = Unknown desc = Error response from daemon: manifest for myrepo/myapp:v2 not found
Warning Failed 4m (x3 over 5m) kubelet Error: ErrImagePull
Normal BackOff 3m (x6 over 4m) kubelet Back-off pulling image "myrepo/myapp:v2"
Warning Failed 3m (x6 over 4m) kubelet Error: ImagePullBackOffEvents appear in chronological order – earliest at the top, most recent at the bottom. Start at the top to follow the failure sequence, or jump to the bottom for the latest state. Here, manifest for myrepo/myapp:v2 not found tells you the tag v2 does not exist in the registry. The pod has already attempted three pulls and is now in a 5-minute backoff loop. No amount of waiting will fix this. The tag needs to exist before the pod can proceed.
kubectl get events
When multiple pods are failing, pull events filtered to a specific pod name to avoid noise from the rest of the namespace:
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod>Events age out after one hour by default. If the pod has been failing for longer than that, kubectl describe pod still holds the condensed event history. Use both.
Forcing a Retry
Once you fix the root cause, the pod will not automatically retry immediately. The backoff timer may still have minutes remaining. To force a fresh pull attempt, delete the pod:
kubectl delete pod <name> -n <namespace>For deployments, use a rolling restart instead of deleting the pod directly. This preserves availability if you have multiple replicas running:
kubectl rollout restart deployment/<name> -n <namespace>Kubernetes will create a new pod and attempt the pull from scratch, bypassing the existing backoff state entirely.
Common Causes and Fixes
The error message in the Events section maps directly to a cause. Use the table below to match the message to the fix.
| Cause | Error Message | Fix |
|---|---|---|
| Typo in image name | repository not found | Correct the image field in the pod spec |
| Nonexistent tag | manifest not found | Push the tag to the registry or reference an existing tag |
| Missing imagePullSecret | unauthorized 401 | Create the secret and add it to spec.imagePullSecrets |
| Expired credentials | 401 Unauthorized | Recreate the secret with fresh credentials |
| Docker Hub rate limit | toomanyrequests | Authenticate pulls or use a pull-through cache |
| Registry unreachable | i/o timeout | Check egress rules, DNS, firewall, and NetworkPolicies |
imagePullPolicy: Never (image not cached) | ErrImageNeverPull | Change the policy or pre-load the image onto the node |
| Architecture mismatch | no matching manifest for linux/arm64 | Use a multi-arch image or add a nodeSelector |
Wrong Image Name or Tag
This is the most common cause. The pod spec references an image or tag that does not exist in the registry.
The error message tells you which case you are in:
repository not found: the image name itself is wrong. Check the registry path, organization name, and capitalization.manifest not found: the image name is correct but the tag does not exist. Either push the correct tag or update the pod spec to reference an existing one.
Avoid using latest in production. Pinning to a specific digest (myrepo/myapp@sha256:abc123) makes the reference immutable and prevents this category of failure entirely.
Missing imagePullSecret
Private registries require authentication. Kubernetes stores registry credentials in a Secret of type kubernetes.io/dockerconfigjson and references it via spec.imagePullSecrets in the pod spec.
Create the secret manually:
kubectl create secret docker-registry regcred \
--docker-server=<registry-server> \
--docker-username=<username> \
--docker-password=<password> \
--docker-email=<email> \
-n <namespace>The raw YAML form of the same secret looks like this:
apiVersion: v1
kind: Secret
metadata:
name: regcred
namespace: default
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: <base64-encoded-.docker/config.json>Reference the secret in your pod spec:
spec:
imagePullSecrets:
- name: regcred
containers:
- name: myapp
image: myrepo/myapp:1.0.0If you want every pod in a namespace to use the same pull credentials without adding the reference to each manifest individually, patch the default ServiceAccount:
kubectl patch serviceaccount default \
-n <namespace> \
-p '{"imagePullSecrets": [{"name": "regcred"}]}'Every pod that uses the default ServiceAccount in that namespace will inherit the pull credentials automatically.
Private Registry Authentication (ECR / Artifact Registry / ACR)
Managed cloud registries each have their own credential model. The commands below create the correct secret for each.
Amazon ECR
kubectl create secret docker-registry regcred \
--docker-server=<account>.dkr.ecr.<region>.amazonaws.com \
--docker-username=AWS \
--docker-password=$(aws ecr get-login-password --region <region>) \
[email protected] \
-n <namespace>Note: this pattern exposes the token in shell history. For production use, pipe via stdin: aws ecr get-login-password --region <region> | kubectl create secret docker-registry ecr-secret ... or use IRSA-based workload identity instead.
ECR tokens expire after 12 hours. For production clusters, use IRSA (IAM Roles for Service Accounts) instead of static secrets. On managed clusters (EKS, GKE, AKS), cloud-provider workload identity integration can handle credential exchange automatically, eliminating the need for static imagePullSecrets in supported configurations. See your cloud provider’s documentation for setup. For credential-based secrets without workload identity, automate rotation with the External Secrets Operator or AWS Secrets Manager sync.
Google Artifact Registry
kubectl create secret docker-registry regcred \
--docker-server=<region>-docker.pkg.dev/<project-id>/<repo> \
--docker-username=_json_key \
--docker-password="$(cat keyfile.json)" \
[email protected] \
-n <namespace>Note: gcr.io still works for legacy projects, but Google Artifact Registry (<region>-docker.pkg.dev) is the current Google Cloud container registry standard. New projects should use Artifact Registry.
Azure Container Registry (ACR)
kubectl create secret docker-registry regcred \
--docker-server=<registry>.azurecr.io \
--docker-username=<service-principal-id> \
--docker-password=<secret> \
[email protected] \
-n <namespace>In all three cases, the credential expires on a schedule set by the provider. Add credential rotation to your operations runbook. A pod that ran fine yesterday and fails today with a 401 usually has an expired secret, not a new configuration problem.
Docker Hub Rate Limit
Docker Hub enforces pull rate limits based on authentication status. As of June 2025, the limits are:
- Unauthenticated: 100 pulls per 6 hours per IP address
- Personal free account: 200 pulls per 6 hours
- Pro, Team, Business: unlimited
Source: docs.docker.com/docker-hub/usage/pulls/
In a cluster with many nodes pulling unauthenticated, all traffic from a single NAT gateway shares one IP. A moderate-scale deployment can exhaust 100 pulls in minutes during a rollout or node replacement event. The error looks like:
toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating.You have three fixes:
- Authenticate all pulls via an imagePullSecret with Docker Hub credentials. Free accounts get double the rate, paid accounts get unlimited.
- Use a pull-through cache: Harbor, Amazon ECR Public, or JFrog Artifactory can proxy Docker Hub pulls and cache images locally. Nodes hit your cache; the cache hits Docker Hub once per unique image layer.
- Upgrade the Docker Hub plan for unlimited pulls. This is the simplest option if your team already relies heavily on Docker Hub.
A pull-through cache also reduces egress costs and improves pull latency, especially during large scale-out events when many nodes pull the same image simultaneously.
Registry Unreachable
An i/o timeout error means the kubelet could not reach the registry at all. Unlike an authentication failure, the connection itself never completes. Three things cause this: egress firewall rules blocking outbound port 443, a Kubernetes NetworkPolicy restricting egress from the node or pod namespace, or a DNS resolution failure returning no such host.
Start with a connectivity test. Spin up a temporary pod and run a DNS lookup against the registry host:
kubectl run test --image=alpine --restart=Never -- nslookup registry.hub.docker.comIf the lookup returns no such host, the problem is DNS. Check your cluster’s CoreDNS configuration and whether the node can reach the DNS server. If the lookup succeeds but the pull still times out, the issue is network egress. Check for NetworkPolicies in the namespace:
kubectl describe networkpolicy -n <namespace>Look for egress rules that do not allow outbound traffic on port 443. A NetworkPolicy that permits internal traffic but denies all external egress will block registry pulls silently. Check your cloud provider’s security group and firewall rules as well. Nodes need outbound TCP 443 to any registry host they pull from.
For self-hosted or on-premises registries, TLS certificate errors also appear as pull failures. The event reads: x509: certificate signed by unknown authority. The fix depends on your setup: add the registry CA cert to the node’s trusted CA bundle (via /etc/containerd/certs.d/<registry>/ca.crt), or configure containerd’s registry mirror/insecure settings. Marking a registry as insecure is only appropriate for development environments.
Architecture Mismatch
If your cluster runs ARM nodes (AWS Graviton, Ampere Altra) and the image only has an amd64 build, you will see:
no matching manifest for linux/arm64 in the manifest list entriesBefore changing anything, inspect the image manifest to confirm which platforms it supports:
docker manifest inspect <image>If the manifest lists only linux/amd64, you have two options. The correct fix is to build and push a multi-arch image using docker buildx with --platform linux/amd64,linux/arm64. If you need a faster workaround, constrain the pod to run on amd64 nodes only using a nodeSelector:
spec:
nodeSelector:
kubernetes.io/arch: amd64The nodeSelector approach works but limits scheduling options. In a mixed-arch cluster at scale, pinning workloads to amd64 reduces the pool of eligible nodes and can leave ARM capacity underutilized. Multi-arch images are the correct long-term fix.
imagePullPolicy: Never
ErrImageNeverPull is a distinct error from ImagePullBackOff. It means the pod’s imagePullPolicy is set to Never, which tells the kubelet not to pull the image under any circumstances. If the image is not already cached on the node, the pod fails immediately with this error instead of attempting a pull at all.
This usually happens when a developer sets imagePullPolicy: Never for local testing (to use a locally loaded image) and that manifest reaches a cluster node where the image has never been pulled. The fix is to change the policy to IfNotPresent (pull only if not already on the node) or Always (pull on every pod start). Alternatively, pre-load the image onto the target node using ctr images import or crictl pull before the pod gets scheduled there.
Conclusion
ImagePullBackOff always has a specific cause. The Events section of kubectl describe pod gives you the error string. That string maps to one of the problems in the table above. Fix the underlying cause, then delete the pod or trigger a rollout restart to force an immediate pull attempt rather than waiting out the backoff timer.
If you run multiple clusters, debugging ImagePullBackOff pod by pod doesn’t scale. Cast AI’s Kvisor surfaces startup failure patterns across all namespaces in a single view, so you see clusters with recurring pull errors before they affect availability. It also scans your container images for vulnerabilities using the same registry access – configured once in the Cast AI console.
Once your pods are pulling successfully, connect your cluster to Cast AI. Kvisor scans container images for vulnerabilities using registry credentials configured in the Cast AI console. Add your registry once, and both your pods and Kvisor can access the same private images.
Frequently Asked Questions
ImagePullBackOff is a pod status in Kubernetes indicating the kubelet failed to pull the container image and is waiting before retrying. It follows ErrImagePull, which is the initial failure state. Retries follow exponential backoff, starting at a few seconds and capping at 5 minutes. The pod stays in this loop until the underlying problem is resolved or the pod is deleted.
Run kubectl describe pod <name> -n <namespace> and read the error message in the Events section. Match it to a cause: manifest not found means a wrong tag, unauthorized means a missing or expired imagePullSecret, toomanyrequests means a Docker Hub rate limit. Fix the root cause, then delete the pod to force an immediate retry rather than waiting for the backoff timer.
An imagePullSecret is a Kubernetes Secret of type kubernetes.io/dockerconfigjson that stores container registry credentials. You reference it in a pod spec via spec.imagePullSecrets to allow the kubelet to authenticate against a private registry when pulling images. You can also attach it to a ServiceAccount to apply it automatically to all pods in a namespace without modifying each manifest individually.
Docker Hub limits unauthenticated pulls to 100 per 6 hours per IP address (as of June 2025). In a Kubernetes cluster where nodes share a single NAT gateway IP, all unauthenticated pulls count against one quota. A single rolling deployment or node replacement event can exhaust the limit quickly. Authenticate pulls with an imagePullSecret, use a pull-through cache like Harbor or ECR Public, or upgrade to a Docker Hub paid plan to remove the limit.



