ImagePullBackOff - ECR Private Image
In this section we will learn how to troubleshoot the pod ImagePullBackOff error for a ECR private image. Now let's verify if the deployment is created, so we can start troubleshooting the scenario.
NAME READY UP-TO-DATE AVAILABLE AGE
ui-private 0/1 1 0 4m25s
If you get the same output, it means you are ready to start the troubleshooting.
The task for you in this troubleshooting section is to find the cause for the deployment ui-private to be in 0/1 ready state and to fix it, so that the deployment will have one pod ready and running.
Let's start the troubleshooting
Step 1: Check pod status
First, we need to verify the status of our pods.
NAME READY STATUS RESTARTS AGE
ui-private-7655bf59b9-jprrj 0/1 ImagePullBackOff 0 4m42s
Step 2: Describe the pod
You can see that the pod status is showing as ImagePullBackOff. Let's describe the pod to see the events.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m15s default-scheduler Successfully assigned default/ui-private-7655bf59b9-jprrj to ip-10-42-33-232.us-west-2.compute.internal
Normal Pulling 3m53s (x4 over 5m15s) kubelet Pulling image "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0"
Warning Failed 3m53s (x4 over 5m14s) kubelet Failed to pull image "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0": failed to pull and unpack image "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0": failed to resolve reference "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0": unexpected status from HEAD request to https:/"1234567890.dkr.ecr.us-west-2.amazonaws.com/v2/retail-sample-app-ui/manifests/0.4.0: 403 Forbidden
Warning Failed 3m53s (x4 over 5m14s) kubelet Error: ErrImagePull
Warning Failed 3m27s (x6 over 5m14s) kubelet Error: ImagePullBackOff
Normal BackOff 4s (x21 over 5m14s) kubelet Back-off pulling image "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0"
From the events of the pod, we can see the 'Failed to pull image' warning, with cause as 403 Forbidden. This indicates that the kubelet faced access denied while trying to pull the image used in the deployment. Let's get the URI of the image used in the deployment.
"1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0"
Step 3: Check the image reference
From the image URI, the image is referenced from the account where our EKS cluster is in. Let's check the ECR repository to see if any such image exists.
{
"imageDetails": [
{
"registryId": "1234567890",
"repositoryName": "retail-sample-app-ui",
"imageDigest": "sha256:b338785abbf5a5d7e0f6ebeb8b8fc66e2ef08c05b2b48e5dfe89d03710eec2c1",
"imageTags": [
"0.4.0"
],
"imageSizeInBytes": 268443135,
"imagePushedAt": "2024-10-11T14:03:01.207000+00:00",
"imageManifestMediaType": "application/vnd.docker.distribution.manifest.v2+json",
"artifactMediaType": "application/vnd.docker.container.image.v1+json"
}
]
}
The image path we have in deployment i.e. account_id.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0 have a valid registryId i.e. account-number, valid repositoryName i.e. "retail-sample-app-ui" and valid imageTag i.e. "0.4.0". Which confirms the path of the image is correct and is not a wrong reference.
Alternatively, you can also check from the ECR console. Click the button below to open the ECR Console. Then click on retail-sample-app-ui repository and the image tag 0.4.0.

Step 4: Check kubelet permissions
As we confirmed that the image URI is correct, let's check the permissions of the kubelet and see if the permissions required to pull images from ECR exists.
Get the IAM role attached to worker nodes in the managed node group of the cluster and list the IAM policies attached to the role.
{
"AttachedPolicies": [
{
"PolicyName": "AmazonSSMManagedInstanceCore",
"PolicyArn": "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
},
{
"PolicyName": "AmazonEC2ContainerRegistryReadOnly",
"PolicyArn": "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
},
{
"PolicyName": "AmazonEKSWorkerNodePolicy",
"PolicyArn": "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
},
{
"PolicyName": "AmazonSSMPatchAssociation",
"PolicyArn": "arn:aws:iam::aws:policy/AmazonSSMPatchAssociation"
}
]
}
The AWS managed policy "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly" is attached to the worker node role and this policy should provide enough permissions to pull a Image from ECR private repository.
Step 5: Check ECR repo permissions
The permissions to the ECR repository can be managed at both Identity and Resource level. The Identity level permissions are provided at IAM and the resource level permissions are provided at the repository level. As we confirmed that identity based permissions are good, let's the check the policy for ECR repo.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "new policy",
"Effect": "Deny",
"Principal": {
"AWS": "arn:aws:iam::1234567890:role/EksNodeGroupRole"
},
"Action": [
"ecr:UploadLayerPart",
"ecr:SetRepositoryPolicy",
"ecr:PutImage",
"ecr:ListImages",
"ecr:InitiateLayerUpload",
"ecr:GetRepositoryPolicy",
"ecr:GetDownloadUrlForLayer",
"ecr:DescribeRepositories",
"ecr:DeleteRepositoryPolicy",
"ecr:DeleteRepository",
"ecr:CompleteLayerUpload",
"ecr:BatchGetImage",
"ecr:BatchDeleteImage",
"ecr:BatchCheckLayerAvailability"
]
}
]
}
The ECR repository policy has Effect as Deny and the Principal as the EKS managed node role. Which is restricting the kubelet from pulling images in this repository. Let's change the effect to allow and see if the kubelet is able to pull the image.
We will be using below json file to modify the ECR repository permissions.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "new policy",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::1234567890:role/EksNodeGroupRole"
},
"Action": [
"ecr:UploadLayerPart",
"ecr:SetRepositoryPolicy",
"ecr:PutImage",
"ecr:ListImages",
"ecr:InitiateLayerUpload",
"ecr:GetRepositoryPolicy",
"ecr:GetDownloadUrlForLayer",
"ecr:DescribeRepositories",
"ecr:DeleteRepositoryPolicy",
"ecr:DeleteRepository",
"ecr:CompleteLayerUpload",
"ecr:BatchGetImage",
"ecr:BatchDeleteImage",
"ecr:BatchCheckLayerAvailability"
]
}
]
}
Step 6: Restart the deployment and verify the pod status
Now, restart the deployment and check if the pods are running.
NAME READY STATUS RESTARTS AGE
ui-private-7655bf59b9-s9pvb 1/1 Running 0 65m
Wrapping it up
General troubleshooting workflow of the pod with ImagePullBackOff on private image includes:
- Check the pod events for a clue on cause of the issue such as "not found", "access denied" or "timeout".
- If "not found", ensure that the image exists in the path referenced in the private ECR repositories.
- For "access denied", check the permissions on worker node role and the ECR repository policy.
- For timeout on ECR, ensure that the worker node is configured to reach the ECR endpoint.