Debug Why Azure Nodes Have Not Joined
If your control plane API endpoint has become available, but the nodes have not joined the hosted cluster, check the following:
Verify machines were created
HC_NAMESPACE="clusters"
HC_NAME="cluster-name"
CONTROL_PLANE_NAMESPACE="${HC_NAMESPACE}-${HC_NAME}"
oc get machines.cluster.x-k8s.io -n $CONTROL_PLANE_NAMESPACE
oc get azuremachines.infrastructure.cluster.x-k8s.io -n $CONTROL_PLANE_NAMESPACE
If machines don't exist, check that a machinedeployment and machineset have been created:
oc get machinedeployment -n $CONTROL_PLANE_NAMESPACE
oc get machineset -n $CONTROL_PLANE_NAMESPACE
In the case that no machinedeployment was created, look at the logs of the hypershift operator:
oc logs -l app=operator -n hypershift --tail=$NUMBER_OF_LINES
If the machines exist but have not been provisioned, check the log of the cluster API provider:
oc logs deployment/capi-provider -c manager -n $CONTROL_PLANE_NAMESPACE
Create a bastion to SSH to a node
If the machines look like they have been provisioned correctly, you can directly access the virtual machines related to your nodes through a bastion.
Prerequisites
- Download the
az
cli - Add the following extensions to the cli:
az extension update --name bastionaz extension update --name bastion
az extension add -n ssh
- Extract the private key for the cluster. If you created the cluster with the --generate-ssh flag, a
ssh key for your cluster was placed in the same namespace as the hosted cluster (default
clusters
). If you specified your own key and know how to access it, you can skip this step. oc get secret -n clusters ${HC_NAME}-ssh-key -o jsonpath='{ .data.id_rsa }' | base64 -d > /tmp/ssh/id_rsa
- Set the permissions on the key
chmod 400 /tmp/ssh/id_rsa
Create a bastion machine
- Log into the Azure Portal and go to the resource group where your virtual machine was created
- Click on the
Connect
button thenConnect to Bastion
- Accept the defaults and click
Deploy Bastion
- Once the bastion and its related resources are created, you will need to modify some of its configuration
- Click on the bastion resource
- Click Configuration
- Set the Tier to
Standard
and checkNative client support
- Wait for the new settings to take effect
- Log into the bastion via terminal through this command
az login --scope https://management.core.windows.net//.defaultnetwork bastion ssh --name <bastion-name> --resource-group <resource-group-name> --target-resource-id <vm-id> --auth-type ssh-key --username core --ssh-key <path-to-your-rsa-secret>
- You should now be able to access various directories and logs to debug why the machine did not join
- One suggestion would be to look at the journal logs and look for a repeating error near the bottom that should indicate why the kubelet has not been able to join the cluster:
sudo journalctl