Checking CoreDNS pods
In EKS clusters, CoreDNS pods handle DNS resolution. Let's verify that these pods are running correctly.
Step 1 - Check pod status
First, check CoreDNS pods in the kube-system namespace:
NAME READY STATUS RESTARTS AGE
CoreDNS-6fdb8f5699-dq7xw 0/1 Pending 0 42s
CoreDNS-6fdb8f5699-z57jw 0/1 Pending 0 42s
We can see that CoreDNS pods are not running which clearly explains the DNS resolution issues in the cluster.
The pods are in Pending state, indicating they haven't been scheduled to any node.
Step 2 - Check pod events
Let's investigate further by checking events related to these pods in their descriptions:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 29s default-scheduler 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
The warning message indicates a mismatch between node labels and the CoreDNS pod node selector/affinity.
Step 3 - Check node selection
Let's examine the CoreDNS pod node selector:
{
"workshop-default": "no"
}
Now, check the worker node labels:
{
"alpha.eksctl.io/cluster-name": "eks-workshop",
"alpha.eksctl.io/nodegroup-name": "default",
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/instance-type": "m5.large",
"beta.kubernetes.io/os": "linux",
"eks.amazonaws.com/capacityType": "ON_DEMAND",
"eks.amazonaws.com/nodegroup": "default",
"eks.amazonaws.com/nodegroup-image": "ami-07fdc65a0c344a252",
"eks.amazonaws.com/sourceLaunchTemplateId": "lt-0f7c7c3c9cb770aaa",
"eks.amazonaws.com/sourceLaunchTemplateVersion": "1",
"failure-domain.beta.kubernetes.io/region": "us-west-2",
"failure-domain.beta.kubernetes.io/zone": "us-west-2a",
"k8s.io/cloud-provider-aws": "b2c4991f4c3acb5b142be2a5d455731a",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "ip-10-42-100-65.us-west-2.compute.internal",
"kubernetes.io/os": "linux",
"node.kubernetes.io/instance-type": "m5.large",
"topology.k8s.aws/zone-id": "usw2-az1",
"topology.kubernetes.io/region": "us-west-2",
"topology.kubernetes.io/zone": "us-west-2a",
"workshop-default": "yes"
}
The CoreDNS pod requires nodes with label workshop-default: no
, however the nodes are labeled with workshop-default: yes
.
There are different options in pod's yaml manifest to influence pod scheduling on nodes. Other parameters include affinity, anti-affinity, and pod topology spread constraints. More details in the Kubernetes documentation.
Root Cause
In production environments, teams often use node selectors with CoreDNS to run these pods on dedicated nodes for cluster system components. However, if the selectors don't match node labels, pods remain in Pending state.
In this case, the CoreDNS addon was configured with a node selector that doesn't match any existing nodes, preventing the pods from running.
Resolution
To fix this, we'll update the CoreDNS addon to use its default configuration, removing the nodeSelector requirements:
{
"update": {
"id": "b3e7d81c-112a-33ea-bb28-1b1052bc3969",
"status": "InProgress",
"type": "AddonUpdate",
"params": [
{
"type": "ResolveConflicts",
"value": "OVERWRITE"
},
{
"type": "ConfigurationValues",
"value": "{}"
}
],
"createdAt": "20XX-XX-09T16:25:15.885000-05:00",
"errors": []
}
}
Then verify that CoreDNS pods are now running:
NAME READY STATUS RESTARTS AGE
CoreDNS-7f6dd6865f-7qcjr 1/1 Running 0 100s
CoreDNS-7f6dd6865f-kxw2x 1/1 Running 0 100s
Finally, check CoreDNS logs to ensure the application is running without errors:
.:53
[INFO] plugin/reload: Running configuration SHA512 = 8a7d59126e7f114ab49c6d2613be93d8ef7d408af8ee61a710210843dc409f03133727e38f64469d9bb180f396c84ebf48a42bde3b3769730865ca9df5eb281c
CoreDNS-1.11.1
linux/amd64, go1.21.5, e9c721d80
.:53
[INFO] plugin/reload: Running configuration SHA512 = 8a7d59126e7f114ab49c6d2613be93d8ef7d408af8ee61a710210843dc409f03133727e38f64469d9bb180f396c84ebf48a42bde3b3769730865ca9df5eb281c
CoreDNS-1.11.1
linux/amd64, go1.21.5, e9c721d80
The logs show no errors, indicating that CoreDNS is now processing DNS requests correctly.
Next Steps
We've resolved the CoreDNS pod scheduling issue and verified the application is running properly. Let's proceed to the next lab for additional DNS resolution troubleshooting steps.