neurons firing from a keyboard

thoughts about devops, technology, and faster business from a random guy from dallas.

Troubleshooting Kubernetes Namespaces That Won't Delete

Reading Time: Approximately 7 minutes.
View this post on GitHub.

Usually, deleting Kubernetes namespaces is easy:

kubectl delete ns delete-me

Sometimes, however, deleting them takes way longer than expected…

kubectl delete ns delete-me
# still deleting, two months later...

This quick “way longer than I acutally ever thought possible” post shows you a few troubleshooting tricks for dealing with this.

Forget everything you know about the word “all”

kubectl delete --all -n delete-me

is a lie.

While the kubectl delete man page suggests that “–all” means “all”:

$: kubectl delete --help | grep -A 3 -B 3 -- '--all=false'
  kubectl delete pods --all

        Delete all resources, in the namespace of the specified resource types.

    -A, --all-namespaces=false:

It turns out that “all”, in fact, meant two different things throughout the history of Kubernetes, neither of which mean what you think “all” actually means.

“all” v0: “all” == “Initialized”

In 2017, the Kubernetes maintainers introduced the concept of Initializers. This allows admission controllers to add routines that execute when they generate, or “initialize”, new objects. Since there is almost no documentation on this feature gate anymore, here’s the original pull request proposing the feature.

Back then, --all did not include “uninitialized” objects, or objects that were either created by controllers without initializers or objects that were marked as uninitialized in their metadata.

A pull request was created that introduced --include-uninitialized to fix this problem.

If you search for troubleshooting tips to fix hung namespaces, you’ll likely see a reference to this flag towards the top of your results. Which is great, except…

$: kubectl get --help | grep uninitialized ; echo $?

It doesn’t exist!


As it happens, “Initializers” were “finalized” from the Internet in two steps:

  • The first act was feature-gating Initializers as an alpha feature and and disabling by default due to it depending on a cluster plugin that wasn’t installed on most clusters at that time. (Interestingly, this meant that any solutions suggesting --include-uninitialized were incorrect for most people!)
  • The final act was, unceremoniously, erasing the feature in favor of webhook admission, which does everything Initializers do and more.

“all” v1: “all” is, actually, a construct

At around the same time as “Initializers” were being introduced, Custom Resources gained the ability to be put into “categories”.

Categories allow users to get multiple resources in a cluster or namespace with a single type.

For example, if you have two resources in your cluster, like, say, a Pod and a Service, whose categories both include, say, all, you could do this:

kubectl get all

or, more importantly to us here:

kubectl delete all

and get or delete both of these resources in the output:

NAME                                READY   STATUS    RESTARTS   AGE
pod/external-dns-84ffbcc88d-84zj6   1/1     Running   0          44h

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/external-dns   1/1     1            1           3d

But what you won’t get are resources that aren’t a part of the all category, which in a brand, spanking new cluster is MOST OF THEM:

kind create cluster --name why-are-you-like-this-kubectl
comm \
    <(kubectl api-resources --categories all --no-headers | sort) \
    <(kubectl api-resources --no-headers | sort) -1
apiservices                                      false   APIService
bindings                                     v1                                     true    Binding
certificatesigningrequests        csr                 false   CertificateSigningRequest
clusterrolebindings                           false   ClusterRoleBinding
clusterroles                                  false   ClusterRole
componentstatuses                 cs         v1                                     false   ComponentStatus
configmaps                        cm         v1                                     true    ConfigMap
controllerrevisions                          apps/v1                                true    ControllerRevision
cronjobs                          cj         batch/v1                               true    CronJob
csidrivers                                               false   CSIDriver
csinodes                                                 false   CSINode
csistoragecapacities                                     true    CSIStorageCapacity
customresourcedefinitions         crd,crds                false   CustomResourceDefinition
daemonsets                        ds         apps/v1                                true    DaemonSet
deployments                       deploy     apps/v1                                true    Deployment
endpoints                         ep         v1                                     true    Endpoints
endpointslices                                         true    EndpointSlice
events                            ev                       true    Event
events                            ev         v1                                     true    Event
flowschemas                           false   FlowSchema
horizontalpodautoscalers          hpa        autoscaling/v2                         true    HorizontalPodAutoscaler
ingressclasses                                        false   IngressClass
ingresses                         ing                   true    Ingress
jobs                                         batch/v1                               true    Job
leases                                              true    Lease
limitranges                       limits     v1                                     true    LimitRange
localsubjectaccessreviews                          true    LocalSubjectAccessReview
mutatingwebhookconfigurations              false   MutatingWebhookConfiguration
namespaces                        ns         v1                                     false   Namespace
networkpolicies                   netpol                   true    NetworkPolicy
nodes                             no         v1                                     false   Node
persistentvolumeclaims            pvc        v1                                     true    PersistentVolumeClaim
persistentvolumes                 pv         v1                                     false   PersistentVolume
poddisruptionbudgets              pdb        policy/v1                              true    PodDisruptionBudget
pods                              po         v1                                     true    Pod
podtemplates                                 v1                                     true    PodTemplate
priorityclasses                   pc                   false   PriorityClass
prioritylevelconfigurations           false   PriorityLevelConfiguration
replicasets                       rs         apps/v1                                true    ReplicaSet
replicationcontrollers            rc         v1                                     true    ReplicationController
resourcequotas                    quota      v1                                     true    ResourceQuota
rolebindings                                  true    RoleBinding
roles                                         true    Role
runtimeclasses                                              false   RuntimeClass
secrets                                      v1                                     true    Secret
selfsubjectaccessreviews                           false   SelfSubjectAccessReview
selfsubjectrulesreviews                            false   SelfSubjectRulesReview
serviceaccounts                   sa         v1                                     true    ServiceAccount
services                          svc        v1                                     true    Service
statefulsets                      sts        apps/v1                                true    StatefulSet
storageclasses                    sc                      false   StorageClass
subjectaccessreviews                               false   SubjectAccessReview
tokenreviews                                      false   TokenReview
validatingwebhookconfigurations            false   ValidatingWebhookConfiguration
volumeattachments                                        false   VolumeAttachment

this is actually a huge issue

Let’s go back to why I started writing this:

kubectl delete ns delete-me

When a namespace is deleted, a termination request is submitted for every resource within it. Two things happen when these request are submitted:

  • The object’s deletionTimestamp is set to the time of the request, and
  • Kubernetes waits for the object’s finalizers to be empty before finally purging the object from etcd and moving on with life.

Finalizers are a list of annotations that controllers listen to when objects get deleted. This allows controllers to perform clean-up duties that must happen before the object goes poof.

They look like this:

kubectl get ns delete-me -o jsonpath={.spec.finalizers}

An object’s list of finalizers must be empty before Kubernetes will proceed with deleting the object.

This applies for all objects.

Unfortunately, because all != all in Kubernetes-land, there are many objects in your namespace that you won’t see that have finalizers on them that never get cleared for a number of reasons that, in pure Kubernetes form, you will never see or know are happening:

  • The Pods for the controller that acknowledges that finalizer were deleted before it could be acknowledged
  • There’s a bug in the controller preventing the finalizer from being cleared
  • An error occurred while the finalizer was acknowledged that prevents the controller from removing it

Becuase everything under a namespace must be gone before Kubernetes can begin deleting the namespace, your namespace gets stuck in limbo forever and forever waiting for things that won’t happen.

SIGH. We’re FINALLY ready to talk about troubleshooting this situation.

Troubleshooting stuck namespace deletions

Delete actually all resources in the namespace

Use kubectl api-resources and kubectl delete to wipe out all resources in the cluster.

kubectl api-resources --namespaced \
  ---verbs get \
  -o name | xargs -n 1 kubectl delete -n [NAMESPACE]

⚠️ Make sure that you include --namespaced! This is really important. If you don’t include it, you’ll delete cluster-scope resources, like that fancy Istio service mesh you just spent 35 straight hours configuring!!!

Remove all finalizers from all resources in the namespace

When the above inevitably hangs, you can use the same tactic above with kubectl patch to remove every object’s finalizers and try to kick the deletion along:

kubectl api-resources --namespaced \
  ---verbs get \
  -o name | xargs -n 1 xargs -n 1 kubectl patch \
    --type json \
    --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]' \
    -n [NAMESPACE]

You can, then, bulk-run kubectl get to make sure the resources were deleted. You should get an empty response if so.

kubectl api-resources --namespaced \
  ---verbs get \
  -o name | xargs -n 1 xargs -n 1 kubectl get \
    -n [NAMESPACE] \

Delete unhealthy API Services in the namespace

Some resources might be hanging on an API service that is no longer reachable. You’ll usually be able to see this as a Kubernetes event when you run kubectl describe against it.

This will happen if you delete the Pod running the API Service’s controller before you delete the API service.

You can find these unhealthy API services by running this:

kubectl get apiresources -n [NAMESPACE] | grep False

Delete any that show up. Any stuck resources should get deleted shortly after.

Delete the namespace in etcd with etcdctl

While I absolutely, 100% do not recommend doing this, I’m including it for completeness.

Since every object is persisted to etcd (or whichever database backend your cluster is using, which is etcd 99.9% of the time), you can manually drop it with etcdctl if you have access to the control plane (i.e. not EKS, AKS, GKE, etc.)

ETCDCTL_API=3 etcdctl \
    --endpoint=http://[ETCD_HOST]:[ETCD_PORT] \
    --cert=/path/to/etcd/cert \
    --key=/path/to/etcd/key \
    --cacert=/path/to/etcd/cacert \
    rm /namespaces/delete-me

Take a look at the --etcd-servers flag provided to kube-apiserver to get ETCD_HOST and ETCD_PORT. Since etcd is a distributed database, you can use any of the servers.