Skip to main content

ImagePullBackOff - ECR Private Image

In this section we will learn how to troubleshoot the pod ImagePullBackOff error for a ECR private image. Now let's verify if the deployment is created, so we can start troubleshooting the scenario.

~$kubectl get deploy ui-private -n default
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
ui-private   0/1     1            0           4m25s
info

If you get the same output, it means you are ready to start the troubleshooting.

The task for you in this troubleshooting section is to find the cause for the deployment ui-private to be in 0/1 ready state and to fix it, so that the deployment will have one pod ready and running.

Let's start the troubleshooting

Step 1: Check pod status

First, we need to verify the status of our pods.

~$kubectl get pods -l app=app-private
NAME                          READY   STATUS             RESTARTS   AGE
ui-private-7655bf59b9-jprrj   0/1     ImagePullBackOff   0          4m42s

Step 2: Describe the pod

You can see that the pod status is showing as ImagePullBackOff. Let's describe the pod to see the events.

~$POD=`kubectl get pods -l app=app-private -o jsonpath='{.items[*].metadata.name}'`
~$kubectl describe pod $POD | awk '/Events:/,/^$/'
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m15s                  default-scheduler  Successfully assigned default/ui-private-7655bf59b9-jprrj to ip-10-42-33-232.us-west-2.compute.internal
  Normal   Pulling    3m53s (x4 over 5m15s)  kubelet            Pulling image "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0"
  Warning  Failed     3m53s (x4 over 5m14s)  kubelet            Failed to pull image "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0": failed to pull and unpack image "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0": failed to resolve reference "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0": unexpected status from HEAD request to https:/"1234567890.dkr.ecr.us-west-2.amazonaws.com/v2/retail-sample-app-ui/manifests/0.4.0: 403 Forbidden
  Warning  Failed     3m53s (x4 over 5m14s)  kubelet            Error: ErrImagePull
  Warning  Failed     3m27s (x6 over 5m14s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    4s (x21 over 5m14s)    kubelet            Back-off pulling image "1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0"

From the events of the pod, we can see the 'Failed to pull image' warning, with cause as 403 Forbidden. This indicates that the kubelet faced access denied while trying to pull the image used in the deployment. Let's get the URI of the image used in the deployment.

~$kubectl get deploy ui-private -o jsonpath='{.spec.template.spec.containers[*].image}'
"1234567890.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0"

Step 3: Check the image reference

From the image URI, the image is referenced from the account where our EKS cluster is in. Let's check the ECR repository to see if any such image exists.

~$aws ecr describe-images --repository-name retail-sample-app-ui --image-ids imageTag=0.4.0
{
    "imageDetails": [
        {
            "registryId": "1234567890",
            "repositoryName": "retail-sample-app-ui",
            "imageDigest": "sha256:b338785abbf5a5d7e0f6ebeb8b8fc66e2ef08c05b2b48e5dfe89d03710eec2c1",
            "imageTags": [
                "0.4.0"
            ],
            "imageSizeInBytes": 268443135,
            "imagePushedAt": "2024-10-11T14:03:01.207000+00:00",
            "imageManifestMediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "artifactMediaType": "application/vnd.docker.container.image.v1+json"
        }
    ]
}

The image path we have in deployment i.e. account_id.dkr.ecr.us-west-2.amazonaws.com/retail-sample-app-ui:0.4.0 have a valid registryId i.e. account-number, valid repositoryName i.e. "retail-sample-app-ui" and valid imageTag i.e. "0.4.0". Which confirms the path of the image is correct and is not a wrong reference.

info

Alternatively, you can also check from the ECR console. Click the button below to open the ECR Console. Then click on retail-sample-app-ui repository and the image tag 0.4.0.

AWS console iconOpen ECR Console Tab

Step 4: Check kubelet permissions

As we confirmed that the image URI is correct, let's check the permissions of the kubelet and see if the permissions required to pull images from ECR exists.

Get the IAM role attached to worker nodes in the managed node group of the cluster and list the IAM policies attached to the role.

~$ROLE_NAME=`aws eks describe-nodegroup --cluster-name eks-workshop --nodegroup-name default --query 'nodegroup.nodeRole' --output text | cut -d'/' -f2`
~$aws iam list-attached-role-policies --role-name $ROLE_NAME
{
    "AttachedPolicies": [
        {
            "PolicyName": "AmazonSSMManagedInstanceCore",
            "PolicyArn": "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
        },
        {
            "PolicyName": "AmazonEC2ContainerRegistryReadOnly",
            "PolicyArn": "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
        },
        {
            "PolicyName": "AmazonEKSWorkerNodePolicy",
            "PolicyArn": "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
        },
        {
            "PolicyName": "AmazonSSMPatchAssociation",
            "PolicyArn": "arn:aws:iam::aws:policy/AmazonSSMPatchAssociation"
        }
    ]
}

The AWS managed policy "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly" is attached to the worker node role and this policy should provide enough permissions to pull a Image from ECR private repository.

Step 5: Check ECR repo permissions

The permissions to the ECR repository can be managed at both Identity and Resource level. The Identity level permissions are provided at IAM and the resource level permissions are provided at the repository level. As we confirmed that identity based permissions are good, let's the check the policy for ECR repo.

~$aws ecr get-repository-policy --repository-name retail-sample-app-ui --query policyText --output text | jq .
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "new policy",
      "Effect": "Deny",
      "Principal": {
        "AWS": "arn:aws:iam::1234567890:role/EksNodeGroupRole"
      },
      "Action": [
        "ecr:UploadLayerPart",
        "ecr:SetRepositoryPolicy",
        "ecr:PutImage",
        "ecr:ListImages",
        "ecr:InitiateLayerUpload",
        "ecr:GetRepositoryPolicy",
        "ecr:GetDownloadUrlForLayer",
        "ecr:DescribeRepositories",
        "ecr:DeleteRepositoryPolicy",
        "ecr:DeleteRepository",
        "ecr:CompleteLayerUpload",
        "ecr:BatchGetImage",
        "ecr:BatchDeleteImage",
        "ecr:BatchCheckLayerAvailability"
      ]
    }
  ]
}

The ECR repository policy has Effect as Deny and the Principal as the EKS managed node role. Which is restricting the kubelet from pulling images in this repository. Let's change the effect to allow and see if the kubelet is able to pull the image.

note

We will be using below json file to modify the ECR repository permissions.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "new policy",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::1234567890:role/EksNodeGroupRole"
},
"Action": [
"ecr:UploadLayerPart",
"ecr:SetRepositoryPolicy",
"ecr:PutImage",
"ecr:ListImages",
"ecr:InitiateLayerUpload",
"ecr:GetRepositoryPolicy",
"ecr:GetDownloadUrlForLayer",
"ecr:DescribeRepositories",
"ecr:DeleteRepositoryPolicy",
"ecr:DeleteRepository",
"ecr:CompleteLayerUpload",
"ecr:BatchGetImage",
"ecr:BatchDeleteImage",
"ecr:BatchCheckLayerAvailability"
]
}
]
}
~$export ROLE_ARN=`aws eks describe-nodegroup --cluster-name ${EKS_CLUSTER_NAME} --nodegroup-name default --query 'nodegroup.nodeRole'`
~$echo '{"Version":"2012-10-17","Statement":[{"Sid":"new policy","Effect":"Allow","Principal":{"AWS":'${ROLE_ARN}'},"Action":["ecr:BatchCheckLayerAvailability","ecr:BatchDeleteImage","ecr:BatchGetImage","ecr:CompleteLayerUpload","ecr:DeleteRepository","ecr:DeleteRepositoryPolicy","ecr:DescribeRepositories","ecr:GetDownloadUrlForLayer","ecr:GetRepositoryPolicy","ecr:InitiateLayerUpload","ecr:ListImages","ecr:PutImage","ecr:SetRepositoryPolicy","ecr:UploadLayerPart"]}]}' > ~/ecr-policy.json
~$aws ecr set-repository-policy --repository-name retail-sample-app-ui --policy-text file://~/ecr-policy.json

Step 6: Restart the deployment and verify the pod status

Now, restart the deployment and check if the pods are running.

~$kubectl rollout restart deploy ui-private
~$kubectl get pods -l app=app-private
NAME                          READY   STATUS    RESTARTS   AGE
ui-private-7655bf59b9-s9pvb   1/1     Running   0          65m

Wrapping it up

General troubleshooting workflow of the pod with ImagePullBackOff on private image includes:

  • Check the pod events for a clue on cause of the issue such as "not found", "access denied" or "timeout".
  • If "not found", ensure that the image exists in the path referenced in the private ECR repositories.
  • For "access denied", check the permissions on worker node role and the ECR repository policy.
  • For timeout on ECR, ensure that the worker node is configured to reach the ECR endpoint.

Additional Resources