--- title: "Troubleshooting" linkTitle: "Troubleshooting" weight: 90 description: > Troubleshoot Karpenter problems --- ## Installation ### Missing Service Linked Role Unless your AWS account has already onboarded to EC2 Spot, you will need to create the service linked role to avoid `ServiceLinkedRoleCreationNotPermitted`. ``` AuthFailure.ServiceLinkedRoleCreationNotPermitted: The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances ``` This can be resolved by creating the [Service Linked Role](https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html). ``` aws iam create-service-linked-role --aws-service-name spot.amazonaws.com ``` ### Karpenter Role names exceeding 64-character limit If you use a tool such as AWS CDK to generate your Kubernetes cluster name, when you add Karpenter to your cluster you could end up with a cluster name that is too long to incorporate into your KarpenterNodeRole name (which is limited to 64 characters). Node role names for Karpenter are created in the form `KarpenterNodeRole-${Cluster_Name}` in the [Create the KarpenterNode IAM Role]({{}}) section of the getting started guide. If a long cluster name causes the Karpenter node role name to exceed 64 characters, creating that object will fail. Keep in mind that `KarpenterNodeRole-` is just a recommendation from the getting started guide. Instead using of the eksctl role, you can shorten the name to anything you like, as long as it has the right permissions. ### Unknown field in Provisioner spec If you are upgrading from an older version of Karpenter, there may have been changes in the CRD between versions. Attempting to utilize newer functionality which is surfaced in newer versions of the CRD may result in the following error message: ``` error: error validating "STDIN": error validating data: ValidationError(Provisioner.spec): unknown field "" in sh.karpenter.v1alpha5.Provisioner.spec; if you choose to ignore these errors, turn validation off with --validate=false ``` If you see this error, you can solve the problem by following the [Custom Resource Definition Upgrade Guidance](../upgrade-guide/#custom-resource-definition-crd-upgrades). Info on whether there has been a change to the CRD between versions of Karpenter can be found in the [Release Notes](../upgrade-guide/#released-upgrade-notes) ### Unable to schedule pod due to insufficient node group instances v0.16.0 changed the default replicas from 1 to 2. Karpenter won't launch capacity to run itself (log related to the `karpenter.sh/provisioner-name DoesNotExist requirement`) so it can't provision for the second Karpenter pod. To solve this you can either reduce the replicas back from 2 to 1, or ensure there is enough capacity that isn't being managed by Karpenter (these are instances with the name `karpenter.sh/provisioner-name/`) to run both pods. To do so on AWS increase the `minimum` and `desired` parameters on the node group autoscaling group to launch at lease 2 instances. ### Helm Error When Pulling the Chart If Helm is showing an error when trying to install Karpenter helm charts: - Ensure you are using a newer Helm version, Helm started supporting OCI images since v3.8.0. - Helm does not have an `helm repo add` concept in OCI, so to install Karpenter you no longer need this - Verify that the image you are trying to pull actually exists in [gallery.ecr.aws/karpenter](https://gallery.ecr.aws/karpenter/karpenter) - Sometimes Helm generates a generic error, you can add the --debug switch to any of the helm commands in this doc for more verbose error messages - If you are getting a 403 forbidden error, you can try `docker logout public.ecr.aws` as explained [here](https://docs.aws.amazon.com/AmazonECR/latest/public/public-troubleshooting.html) - If you are receiving this error: `Error: failed to download "oci://public.ecr.aws/karpenter/karpenter" at version "0.17.0"`, then you need to prepend a `v` to the version number: `v0.17.0`. Before Karpenter moved to OCI helm charts (pre-v0.17.0), both `v0.16.0` and `0.16.0` would work, but OCI charts require an exact version match. ### Helm Error when upgrading from older karpenter version Upgrading from older karpenter version that did not include `awsnodetemplates.karpenter.k8s.aws` labels and annotations requires manual CRDs annotations before issuing `helm upgrade`. - In the case of `invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"` run: ```shell kubectl label crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh app.kubernetes.io/managed-by=Helm --overwrite ``` - In the case of `annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "karpenter"` run: ```shell kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-name=karpenter-crd --overwrite kubectl annotate crd awsnodetemplates.karpenter.k8s.aws provisioners.karpenter.sh meta.helm.sh/release-namespace=karpenter --overwrite ``` ## Uninstallation ### Unable to delete nodes after uninstalling Karpenter Karpenter adds a [finalizer](https://github.com/aws/karpenter/pull/466) to nodes that it provisions to support graceful node termination. If Karpenter is uninstalled, these finalizers will cause the API Server to block deletion until the finalizers are removed. You can fix this by patching the node objects: - `kubectl edit node ` and remove the line that says `karpenter.sh/termination` in the finalizers field. - Run the following script that gets all nodes with the finalizer and removes all the finalizers from those nodes. - NOTE: this will remove ALL finalizers from nodes with the karpenter finalizer. ```bash kubectl get nodes -ojsonpath='{range .items[*].metadata}{@.name}:{@.finalizers}{"\n"}' | grep "karpenter.sh/termination" | cut -d ':' -f 1 | xargs kubectl patch node --type='json' -p='[{"op": "remove", "path": "/metadata/finalizers"}]' ``` ## Webhooks ### Failed calling webhook "validation.webhook.provisioners.karpenter.sh" If you are not able to create a provisioner due to `Internal error occurred: failed calling webhook "validation.webhook.provisioners.karpenter.sh":` Webhooks were renamed in v0.19.0. There's a bug in ArgoCD's upgrade workflow where webhooks are leaked. This results in Provisioner's failing to be validated, since the validation server no longer corresponds to the webhook definition. Delete the stale webhooks. ``` kubectl delete mutatingwebhookconfigurations defaulting.webhook.provisioners.karpenter.sh kubectl delete validatingwebhookconfiguration validation.webhook.provisioners.karpenter.sh ``` ### Failed calling webhook "defaulting.webhook.karpenter.sh" If you are not able to create a provisioner due to `Error from server (InternalError): error when creating "provisioner.yaml": Internal error occurred: failed calling webhook "defaulting.webhook.karpenter.sh": Post "https://karpenter-webhook.karpenter.svc:443/default-resource?timeout=10s": context deadline exceeded` Verify that the karpenter pod is running (should see 2/2 containers with a "Ready" status) ```text kubectl get po -A -l app.kubernetes.io/name=karpenter NAME READY STATUS RESTARTS AGE karpenter-7b46fb5c-gcr9z 2/2 Running 0 17h ``` Karpenter service has endpoints assigned to it ```text kubectl get ep -A -l app.kubernetes.io/name=karpenter NAMESPACE NAME ENDPOINTS AGE karpenter karpenter 192.168.39.88:8443,192.168.39.88:8080 16d ``` Your security groups are not blocking you from reaching your webhook. This is especially relevant if you have used `terraform-eks-module` version `>=18` since that version changed its security approach, and now it's much more restrictive. ## Provisioning ### DaemonSets can result in deployment failures For Karpenter versions 0.5.3 and earlier, DaemonSets were not properly considered when provisioning nodes. This sometimes caused nodes to be deployed that could not meet the needs of the requested DaemonSets and workloads. This issue no longer occurs after Karpenter version 0.5.3 (see [PR #1155](https://github.com/aws/karpenter/pull/1155)). If you are using a pre-0.5.3 version of Karpenter, one workaround is to set your provisioner to only use larger instance types that you know will be big enough for the DaemonSet and the workload. For more information, see [Issue #1084](https://github.com/aws/karpenter/issues/1084). Examples of this behavior are included in [Issue #1180](https://github.com/aws/karpenter/issues/1180). ### Unspecified resource requests cause scheduling/bin-pack failures Not using the Kubernetes [LimitRanges](https://kubernetes.io/docs/concepts/policy/limit-range/) feature to enforce minimum resource request sizes will allow pods with very low or non-existent resource requests to be scheduled. This can cause issues as Karpenter bin-packs pods based on the resource requests. If the resource requests do not reflect the actual resource usage of the pod, Karpenter will place too many of these pods onto the same node resulting in the pods getting CPU throttled or terminated due to the OOM killer. This behavior is not unique to Karpenter and can also occur with the standard `kube-scheduler` with pods that don't have accurate resource requests. To prevent this, you can set LimitRanges on pod deployments on a per-namespace basis. See the Karpenter [Best Practices Guide](https://aws.github.io/aws-eks-best-practices/karpenter/#use-limitranges-to-configure-defaults-for-resource-requests-and-limits) for further information on the use of LimitRanges. ### Missing subnetSelector and securityGroupSelector tags causes provisioning failures Starting with Karpenter v0.5.5, if you are using Karpenter-generated launch template, provisioners require that [subnetSelector]({{}}) and [securityGroupSelector]({{}}) tags be set to match your cluster. The [Provisioner]({{}}) section in the Karpenter Getting Started Guide uses the following example: ```text kind: AWSNodeTemplate spec: subnetSelector: karpenter.sh/discovery: ${CLUSTER_NAME} securityGroupSelector: karpenter.sh/discovery: ${CLUSTER_NAME} ``` To check your subnet and security group selectors, type the following: ```bash aws ec2 describe-subnets --filters Name=tag:karpenter.sh/discovery,Values=${CLUSTER_NAME} ``` *Returns subnets matching the selector* ```bash aws ec2 describe-security-groups --filters Name=tag:karpenter.sh/discovery,Values=${CLUSTER_NAME} ``` *Returns security groups matching the selector* Provisioners created without those tags and run in more recent Karpenter versions will fail with this message when you try to run the provisioner: ```text field(s): spec.provider.securityGroupSelector, spec.provider.subnetSelector ``` ### Pods using Security Groups for Pods stuck in "ContainerCreating" state for up to 30 minutes before transitioning to "Running" When leveraging [Security Groups for Pods](https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html), Karpenter will launch nodes as expected but pods will be stuck in "ContainerCreating" state for up to 30 minutes before transitioning to "Running". This is related to an interaction between Karpenter and the [amazon-vpc-resource-controller](https://github.com/aws/amazon-vpc-resource-controller-k8s) when a pod requests `vpc.amazonaws.com/pod-eni` resources. More info can be found in [issue #1252](https://github.com/aws/karpenter/issues/1252). To workaround this problem, add the `vpc.amazonaws.com/has-trunk-attached: "false"` label in your Karpenter Provisioner spec and ensure instance-type requirements include [instance-types which support ENI trunking](https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/pkg/aws/vpc/limits.go). ```yaml apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: default spec: labels: vpc.amazonaws.com/has-trunk-attached: "false" ttlSecondsAfterEmpty: 30 ``` ### CNI is unable to allocate IPs to pods _Note: This troubleshooting guidance is specific to the VPC CNI that is shipped by default with EKS clusters. If you are using a custom CNI, some of this guidance may not apply to your cluster._ Whenever a new pod is assigned to a node, the CNI will assign an IP address to that pod (assuming it isn't using host networking), allowing it to communicate with other pods on the cluster. It's possible for this IP allocation and assignment process to fail for a number of reasons. If this process fails, you may see an error similar to the one below. ```bash time=2023-06-12T19:18:15Z type=Warning reason=FailedCreatePodSandBox from=kubelet message=Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0f46f3f1289eed7afab81b6945c49336ef556861fe5bb09a902a00772848b7cc": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container ``` #### `maxPods` is greater than the node's supported pod density By default, the number of pods on a node is limited by both the number of networking interfaces (ENIs) that may be attached to an instance type and the number of IP addresses that can be assigned to each ENI. See [IP addresses per network interface per instance type](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI) for a more detailed information on these instance types' limits. If the max-pods (configured through your Provisioner [`kubeletConfiguration`]({{}})) is greater than the number of supported IPs for a given instance type, the CNI will fail to assign an IP to the pod and your pod will be left in a `ContainerCreating` state. ##### Solutions To avoid this discrepancy between `maxPods` and the supported pod density of the EC2 instance based on ENIs and allocatable IPs, you can perform one of the following actions on your cluster: 1. Enable [Prefix Delegation](https://www.eksworkshop.com/docs/networking/prefix/) to increase the number of allocatable IPs for the ENIs on each instance type 2. Reduce your `maxPods` value to be under the maximum pod density for the instance types assigned to your Provisioner 3. Remove the `maxPods` value from your [`kubeletConfiguration`]({{}}) if you no longer need it and instead rely on the defaulted values from Karpenter and EKS AMIs. For more information on pod density, view the [Pod Density Conceptual Documentation]({{}}). #### IP exhaustion in a subnet When a node is launched by Karpenter, it is assigned to a subnet within your VPC based on the [`subnetSelector`]({{}}) value in your [`AWSNodeTemplate`]({{}})). When a subnet becomes IP address constrained, EC2 may think that it can successfully launch an instance in the subnet; however, when the CNI tries to assign IPs to the pods, there are none remaining. In this case, your pod will stay in a `ContainerCreating` state until an IP address is freed in the subnet and the CNI can assign one to the pod. ##### Solutions 1. Use `topologySpreadConstraints` on `topology.kubernetes.io/zone` to spread your pods and nodes more evenly across zones 2. Increase the IP address space (CIDR) for the subnets selected by your `AWSNodeTemplate` 3. Use [custom networking](https://www.eksworkshop.com/docs/networking/custom-networking/) to assign separate IP address spaces to your pods and your nodes 4. [Run your EKS cluster on IPv6](https://aws.github.io/aws-eks-best-practices/networking/ipv6/) (Note: IPv6 clusters have some known limitations which should be well-understood before choosing to use one) For more troubleshooting information on why your pod may have a `FailedCreateSandbox` error, view the [EKS CreatePodSandbox Knowledge Center Post](https://repost.aws/knowledge-center/eks-failed-create-pod-sandbox). ## Deprovisioning ### Nodes not deprovisioned There are a few cases where requesting to deprovision a Karpenter node will fail or will never be attempted. These cases are outlined below in detail. #### Initialization Karpenter determines the nodes that it can begin to consider for deprovisioning by looking at the `karpenter.sh/initialized` node label. If this node label is not set on a Node, Karpenter will not consider it for any automatic deprovisioning. For more details on what may be preventing nodes from being initialized, see [Nodes not initialized]({{}}). #### Disruption budgets Karpenter respects Pod Disruption Budgets (PDBs) by using a backoff retry eviction strategy. Pods will never be forcibly deleted, so pods that fail to shut down will prevent a node from deprovisioning. Kubernetes PDBs let you specify how much of a Deployment, ReplicationController, ReplicaSet, or StatefulSet must be protected from disruptions when pod eviction requests are made. PDBs can be used to strike a balance by protecting the application's availability while still allowing a cluster administrator to manage the cluster. Here is an example where the pods matching the label `myapp` will block node termination if evicting the pod would reduce the number of available pods below 4. ```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: myapp-pdb spec: minAvailable: 4 selector: matchLabels: app: myapp ``` You can set `minAvailable` or `maxUnavailable` as integers or as a percentage. Review what [disruptions are](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/), and [how to configure them](https://kubernetes.io/docs/tasks/run-application/configure-pdb/). #### `karpenter.sh/do-not-evict` Annotation If a pod exists with the annotation `karpenter.sh/do-not-evict: true` on a node, and a request is made to delete the node, Karpenter will not drain any pods from that node or otherwise try to delete the node. Nodes that have pods with a `do-not-evict` annotation are not considered for consolidation, though their unused capacity is considered for the purposes of running pods from other nodes which can be consolidated. If you want to terminate a node with a `do-not-evict` pod, you can simply remove the annotation and the deprovisioning process will continue. #### Scheduling Constraints (Consolidation Only) Consolidation will be unable to consolidate a node if, as a result of its scheduling simulation, it determines that the pods on a node cannot run on other nodes due to inter-pod affinity/anti-affinity, topology spread constraints, or some other scheduling restriction that couldn't be fulfilled. ## Node Launch/Readiness ### Node not created In some circumstances, Karpenter controller can fail to start up a node. For example, providing the wrong block storage device name in a custom launch template can result in a failure to start the node and an error similar to: ```bash 2022-01-19T18:22:23.366Z ERROR controller.provisioning Could not launch node, launching instances, with fleet error(s), InvalidBlockDeviceMapping: Invalid device name /dev/xvda; ... ``` You can see errors like this by viewing Karpenter controller logs: ```bash kubectl get pods -A | grep karpenter ``` ```bash karpenter karpenter-XXXX 2/2 Running 2 21d ``` ```bash kubectl logs karpenter-XXXX -c controller -n karpenter | less ``` ### Nodes not initialized Karpenter uses node initialization to understand when to begin using the real node capacity and allocatable details for scheduling. It also utilizes initialization to determine when it can being consolidating nodes managed by Karpenter. Karpenter determines node initialization using three factors: 1. Node readiness 2. Expected resources are registered 3. Provisioner startup taints are removed #### Node Readiness Karpenter checks the `Ready` condition type and expects it to be `True`. To see troubleshooting around what might be preventing nodes from becoming ready, see [Node NotReady]({{}}) #### Expected resources are registered Karpenter pull instance type information, including all expected resources that should register to your node. It then expects all these resources to properly register to a non-zero quantity in node `.status.allocatable`. Common resources that don't register and leave nodes in a non-initialized state: 1. `nvidia.com/gpu` (or any gpu-based resource): A GPU instance type that supports the `nvidia.com/gpu` resource is launched but the daemon/daemonset to register the resource on the node doesn't exist 2. `vpc.amazonaws.com/pod-eni`: An instance type is launched by the `ENABLE_POD_ENI` value is set to `false` in the `vpc-cni` plugin. Karpenter will expect that the `vpc.amazonaws.com/pod-eni` will be registered, but it never will. #### Provisioner startup taints are removed Karpenter expects all startup taints specified in `.spec.startupTaints` of the provisioner to be completely removed from node `.spec.taints` before it will consider the node initialized. ### Node NotReady There are cases where the node starts, but fails to join the cluster and is marked "Node NotReady". Reasons that a node can fail to join the cluster include: - Permissions - Security Groups - Networking The easiest way to start debugging is to connect to the instance and get the Kubelet logs. For an AL2 based node: ```bash # List the nodes managed by Karpenter kubectl get node -l karpenter.sh/provisioner-name # Extract the instance ID (replace with a node name from the above listing) INSTANCE_ID=$(kubectl get node -ojson | jq -r ".spec.providerID" | cut -d \/ -f5) # Connect to the instance aws ssm start-session --target $INSTANCE_ID # Check Kubelet logs sudo journalctl -u kubelet ``` For Bottlerocket, you'll need to get access to the root filesystem: ```bash # List the nodes managed by Karpenter kubectl get node -l karpenter.sh/provisioner-name # Extract the instance ID (replace with a node name from the above listing) INSTANCE_ID=$(kubectl get node -ojson | jq -r ".spec.providerID" | cut -d \/ -f5) # Connect to the instance aws ssm start-session --target $INSTANCE_ID # Enter the admin container enter-admin-container # Run sheltie sudo sheltie # Check Kubelet logs journalctl -u kubelet ``` Here are examples of errors from Node NotReady issues that you might see from `journalctl`: * The runtime network not being ready can reflect a problem with IAM role permissions: ``` KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized ``` See [Amazon EKS node IAM role](https://docs.aws.amazon.com/eks/latest/userguide/create-node-role.html) for details. If you’re using `eksctl`, the VPC CNI pods may be given permissions through IRSA instead. Verify that this set up is working as intended. You can also look at the logs for your CNI plugin from the `aws-node` pod: ```bash kubectl get pods -n kube-system | grep aws-node ``` ``` aws-node-????? 1/1 Running 2 20d ``` ```bash kubectl logs aws-node-????? -n kube-system ``` * Not being able to register the node with the Kubernetes API server indicates an error condition like the following: ``` Attempting to register node" node="ip-192-168-67-130.ec2.internal" Unable to register node with API server" err="Unauthorized" node="ip-192-168-67-130.ec2.internal" Error getting node" err="node \"ip-192-168-67-130.ec2.internal\" not found Failed to contact API server when waiting for CSINode publishing: Unauthorized ``` Check the ConfigMap to check whether or not the correct node role is there. For example: ```bash $ kubectl get configmaps -n kube-system aws-auth -o yaml ``` ```yaml apiVersion: v1 data: mapRoles: | - groups: - system:bootstrappers - system:nodes rolearn: arn:aws:iam::973227887653:role/eksctl-johnw-karpenter-demo-NodeInstanceRole-72CV61KQNOYS username: system:node:{{EC2PrivateDNSName}} - groups: - system:bootstrappers - system:nodes rolearn: arn:aws:iam::973227887653:role/KarpenterNodeRole-johnw-karpenter-demo username: system:node:{{EC2PrivateDNSName}} mapUsers: | [] kind: ConfigMap ... ``` If you are not able to resolve the Node NotReady issue on your own, run the [EKS Logs Collector](https://github.com/awslabs/amazon-eks-ami/blob/master/log-collector-script/linux/README.md) (if it’s an EKS optimized AMI) and look in the following places in the log: * Your UserData (in `/var_log/cloud-init-output.log` and `/var_log/cloud-init.log`) * Your kubelets (`/kubelet/kubelet.log`) * Your networking pod logs (`/var_log/aws-node`) Reach out to the Karpenter team on [Slack](https://kubernetes.slack.com/archives/C02SFFZSA2K) or [GitHub](https://github.com/aws/karpenter/) if you are still stuck. ### Nodes stuck in pending and not running the kubelet due to outdated CNI If you have an EC2 instance get launched that is stuck in pending and ultimately not running the kubelet, you may see a message like this in your `/var/log/user-data.log`: > No entry for c6i.xlarge in /etc/eks/eni-max-pods.txt This means that your CNI plugin is out of date. You can find instructions on how to update your plugin [here](https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html). ### Node terminates before ready on failed encrypted EBS volume If you are using a custom launch template and an encrypted EBS volume, the IAM principal launching the node may not have sufficient permissions to use the KMS customer managed key (CMK) for the EC2 EBS root volume. This issue also applies to [Block Device Mappings]({{}}) specified in the Provisioner. In either case, this results in the node terminating almost immediately upon creation. Keep in mind that it is possible that EBS Encryption can be enabled without your knowledge. EBS encryption could have been enabled by an account administrator or by default on a per region basis. See [Encryption by default](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html#encryption-by-default) for details. To correct the problem if it occurs, you can use the approach that AWS EBS uses, which avoids adding particular roles to the KMS policy. Below is an example of a policy applied to the KMS key: ```json [ { "Sid": "Allow access through EBS for all principals in the account that are authorized to use EBS", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:CreateGrant", "kms:DescribeKey" ], "Resource": "*", "Condition": { "StringEquals": { "kms:ViaService": "ec2.${AWS_REGION}.amazonaws.com", "kms:CallerAccount": "${AWS_ACCOUNT_ID}" } } }, { "Sid": "Allow direct access to key metadata to the account", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::${AWS_ACCOUNT_ID}:root" }, "Action": [ "kms:Describe", "kms:Get*", "kms:List*", "kms:RevokeGrant" ], "Resource": "*" } ] ``` ### Node is not deleted, even though `ttlSecondsUntilExpired` is set or the node is empty This typically occurs when the node has not been considered fully initialized for some reason. If you look at the logs, you may see something related to an `Inflight check failed for node...` that gives more information about why the node is not considered initialized. ### Log message of `inflight check failed for node, Expected resource "vpc.amazonaws.com/pod-eni" didn't register on the node` is reported This error indicates that the `vpc.amazonaws.com/pod-eni` resource was never reported on the node. If you've enabled Pod ENI for Karpenter nodes via the `aws.enablePodENI` setting, you will need to make the corresponding change to the VPC CNI to enable [security groups for pods](https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html) which will cause the resource to be registered. ## Pricing ### Stale pricing data on isolated subnet The following pricing-related error occurs if you are running Karpenter in an isolated private subnet (no Internet egress via IGW or NAT gateways): ```text ERROR controller.aws.pricing updating on-demand pricing, RequestError: send request failed caused by: Post "https://api.pricing.us-east-1.amazonaws.com/": dial tcp 52.94.231.236:443: i/o timeout; RequestError: send request failed caused by: Post "https://api.pricing.us-east-1.amazonaws.com/": dial tcp 52.94.231.236:443: i/o timeout, using existing pricing data from 2022-08-17T00:19:52Z {"commit": "4b5f953"} ``` This network timeout occurs because there is no VPC endpoint available for the [Price List Query API.](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/using-pelong.html). To workaround this issue, Karpenter ships updated on-demand pricing data as part of the Karpenter binary; however, this means that pricing data will only be updated on Karpenter version upgrades. To disable pricing lookups and avoid the error messages, set the AWS_ISOLATED_VPC environment variable (or the `--aws-isolated-vpc` option) to true. See [Environment Variables / CLI Flags]({{}}) for details.