NodePool lifecycle
NodePools represent homogeneous groups of Nodes with a common lifecycle management and upgrade cadence.
Upgrades and data propagation
There are three main areas that will trigger rolling upgrades across the Nodes when they are changed:
- OCP Version dictated by
spec.release
. - Machine configuration via
spec.config
, a knob formachineconfiguration.openshift.io
. - Platform specific changes via
.spec.platform
. Some fields might be immutable whereas other might allow changes e.g. aws instance type.
Some cluster config changes (e.g. proxy, certs) may also trigger a rolling upgrade if the change needs to be propagated to the node.
NodePools support two types of rolling upgrades: Replace and InPlace, specified via UpgradeType.
Important
You cannot switch the UpgradeType once the NodePool is created. You must specify UpgradeType during NodePool creation. Modifying the field after the fact may cause nodes to become unmanaged.
Replace Upgrades
This will create new instances in the new version while removing old nodes in a rolling fashion. This is usually a good choice in cloud environments where this level of immutability is cost effective.
InPlace Upgrades
This will directly perform updates to the Operating System of the existing instances. This is usually a good choice for environments where the infrastructure constraints are higher e.g. bare metal.
When you are using in place upgrades, Platform specific changes will only affect upcoming new Nodes.
Data propagation
There some fields which will only propagate in place regardless of the upgrade strategy that is set.
.spec.nodeLabels
and .spec.taints
will be propagated only to new upcoming machines.
Triggering Upgrades examples
Upgrading to a new OCP version
These upgrades can be triggered via changing the spec.release.image
of the NodePool. Note that you should only upgrade NodePools to the current version of the Hosted Control Plane.
Adding a new MachineConfig
You can create a MachineConfig inside a ConfigMap in the management cluster as follows:
apiVersion: v1
kind: ConfigMap
metadata:
name: ${CONFIGMAP_NAME}
namespace: clusters
data:
config: |
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: ${MACHINECONFIG_NAME}
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:...
mode: 420
overwrite: true
path: ${PATH}
Once that is applied to the cluster, you can specify that to the NodePool via:
spec:
config:
- name: ${CONFIGMAP_NAME}
Scale Down
Scaling a NodePool down will remove Nodes from the hosted cluster.
Scaling To Zero
Node(s) can become stuck when removing all Nodes from a cluster (scaling NodePool(s) down to zero) because the Node(s) cannot be drained successfully from the cluster.
Several conditions can prevent Node(s) from being drained successfully:
- The hosted cluster contains
PodDisruptionBudgets
that require at least - The hosted cluster contains pods that use `PersistentVolumes``
Prevention
To prevent Nodes from becoming stuck when scaling down, set the .spec.nodeDrainTimeout
and .spec.nodeVolumeDetachTimeout
in the NodePool CR to a value greater than 0s
.
This forces Nodes to be removed once the timeout specified in the field has been reached, regardless of whether the node can be drained or the volumes can be detached successfully.
Note
See the Hypershift API reference page for more details.