# Multiple worker node groups ## Introduction **Problem:** Currently users of eks-anywhere can only create one worker node group by specifying the configuration details in a cluster spec. This limits the ability of a user to create worker nodes with different configurations and formulate application deployment strategy based on the configurations of the worker nodes. Also, this creates a problem while tainting a node. Unless, eks-anywhere clusters support multiple worker node groups, worker nodes can not be tainted with `NoExecute` and `NoSchedule` effects, since nodes in a node group have the same configuration, setting a taint with either `NoExecute` or `NoSchedule` effect would essentially make the worker nodes unusable for eks-anywhere specific deployments, as the general approach for eks-anywhere is to not add tolerations on deployments. ### Tenets ***Simple:*** Specifying different node group configurations in the cluster spec should be simple and readable. ### Goals and Objectives As a Kubernetes administrator I want to: * Add multiple worker node group configurations in the `workerNodeGroupConfigurations` array in cluster specs. * Have the ability to point each worker node group configuration to a different machine configuration. * Have the ability to point multiple worker node groups to same machine config. * Specify separate node counts and taints information for each node group. ### Statement of Scope **In scope** * Providing users the ability to add multiple worker node groups configuration in the cluster spec and bootstrap kubernetes clusters with multiple worker node groups. ## Overview of Solution With this feature, a user can create a cluster config file with multiple worker node group configurations. Upon running eks-anywhere cli, these info will be fetched and be added in the CAPI specification file. In the CAPI spec, configuration details will be appended for each worker node groups one after another. Examples of each of these two files are added in the next section. ## Solution Details With this feature, worker node specific parts of the cluster spec file will look like below. ``` apiVersion: anywhere.eks.amazonaws.com/v1alpha1 kind: Cluster ........ workerNodeGroupConfigurations: - count: 3 machineGroupRef: kind: VSphereMachineConfig name: eksa-test-1 taints: - key: Key2 value: value2 effect: PreferNoSchedule - count: 3 machineGroupRef: kind: VSphereMachineConfig name: eksa-test-2 taints: - key: Key3 value: value3 effect: PreferNoSchedule status: {} --- apiVersion: anywhere.eks.amazonaws.com/v1alpha1 kind: VSphereMachineConfig metadata: creationTimestamp: null name: eksa-test-1 spec: ... status: {} --- apiVersion: anywhere.eks.amazonaws.com/v1alpha1 kind: VSphereMachineConfig metadata: creationTimestamp: null name: eksa-test-2 spec: ... status: {} --- ``` Once it is processed through cli, the generated CAPI spec file should have the worker nodes specific configurations like below. ``` --- apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: KubeadmConfigTemplate metadata: name: eksa-test-1-md-0 namespace: eksa-system spec: template: spec: joinConfiguration: pause: imageRepository: public.ecr.aws/eks-distro/kubernetes/pause imageTag: v1.20.7-eks-1-20-8 bottlerocketBootstrap: imageRepository: public.ecr.aws/l0g8r8j6/bottlerocket-bootstrap imageTag: v1-20-8-eks-a-v0.0.0-dev-build.579 nodeRegistration: criSocket: /var/run/containerd/containerd.sock kubeletExtraArgs: cloud-provider: external read-only-port: "0" anonymous-auth: "false" tls-cipher-suites: Something name: '{{ ds.meta_data.hostname }}' ... --- apiVersion: cluster.x-k8s.io/v1alpha3 kind: MachineDeployment metadata: labels: cluster.x-k8s.io/cluster-name: eksa-test name: eksa-test-1-md-0 namespace: eksa-system spec: clusterName: eksa-test replicas: 3 selector: matchLabels: {} template: metadata: labels: cluster.x-k8s.io/cluster-name: eksa-test spec: bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: KubeadmConfigTemplate name: eksa-test-1-md-0 clusterName: eksa-test infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3 kind: VSphereMachineTemplate name: eksa-test-worker-node-template-1638469395669 version: v1.20.7-eks-1-20-8 --- apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3 kind: VSphereMachineTemplate metadata: name: eksa-test-worker-node-template-1638469395669 namespace: eksa-system spec: template: spec: ... --- apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: KubeadmConfigTemplate metadata: name: eksa-test-2-md-0 namespace: eksa-system spec: template: spec: joinConfiguration: pause: imageRepository: public.ecr.aws/eks-distro/kubernetes/pause imageTag: v1.20.7-eks-1-20-8 bottlerocketBootstrap: imageRepository: public.ecr.aws/l0g8r8j6/bottlerocket-bootstrap imageTag: v1-20-8-eks-a-v0.0.0-dev-build.579 nodeRegistration: criSocket: /var/run/containerd/containerd.sock kubeletExtraArgs: cloud-provider: external read-only-port: "0" anonymous-auth: "false" tls-cipher-suites: Something name: '{{ ds.meta_data.hostname }}' ... --- apiVersion: cluster.x-k8s.io/v1alpha3 kind: MachineDeployment metadata: labels: cluster.x-k8s.io/cluster-name: eksa-test name: eksa-test-2-md-0 namespace: eksa-system spec: clusterName: eksa-test replicas: 3 selector: matchLabels: {} template: metadata: labels: cluster.x-k8s.io/cluster-name: eksa-test spec: bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3 kind: KubeadmConfigTemplate name: eksa-test-2-md-0 clusterName: eksa-test infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3 kind: VSphereMachineTemplate name: test version: v1.20.7-eks-1-20-8 --- apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3 kind: VSphereMachineTemplate metadata: name: test namespace: eksa-system spec: template: spec: ... --- ``` For each worker node groups, CAPI spec file will continue to have the following 3 kind fields. * KubeadmConfigTemplate * MachineDeployment * VSphereMachineTemplate For each group, we will append these three aforementioned fields corresponding to that group in the CAPI spec. Right now, the cli assumes that there will be only one group and it treats worker node group configuration array as a collection of only one element. As a result, the controller just refers to the first element of this array in different places of the code. So we need to do the same operations in loops, which includes CAPI spec creation, cluster spec validation etc. Once a CAPI spec is created with this approach, the workload cluster will be created with multiple worker nodes. We will create a struct with these three CAPI object types and use an array of that struct to store the worker node group configurations and then generate CAPI spec file using that array. The definitions of these object types can be found in CAPI and CAPV code bases. Also, it needs to be made sure that at the least one of the worker node groups does not have `NoExecute` or `NoSchedule` taint. This validation will be done at the preflight validation stage. To delete a worker node group we will perform the following steps. * We will add a name field in the cluster spec, so that a user can specify names of each group. Since we also want to support upgrade on the existing clusters, we will assign a default name to the first node group. The default name will be -md-0, since this is how we name the node groups in single node group eks-anywhere clusters with existing implementation. * While upgrading a cluster, we will first apply the new CAPI spec file to create/modify worker node groups as specified by the user. * Then we will delete the machine deployments of the extra node groups. The examples in this design are for vsphere provider. But the same strategy applies for other providers as well. ## Testing To make sure that the implementation of this feature is correct, we need to add unit tests for each providers to validate the correctness of generated CAPI specs. Also, we need to add e2e tests for each providers to test the following scenarios. * Cluster creation with one worker node group * Cluster creation with multiple worker node groups * Adding and removing worker node groups during cluster upgrade ## Conclusion Current implementation of eks-anywhere cli implements an array of worker node group configurations and assumes that the array has only one element. With this design we can enhance the scope the current implementation to make sure it can handle multiple elements in that array. This will help us achieve our goal to support multiple worker node groups.