Operators
As we saw in the previous section, Stateful sets are extremely useful when it comes to managing persistent data. But managing the stateful applications come with a certain amount of leg work. It’s entirely down to the architecture of how Kubernetes has been designed. If we consider a stateless application, Kubernetes is great since it handles everything automatically. If you want to scale your application up or down, add new resources while the application is running, or perform rolling updates, you have minimal work on your hands since Kubernetes can do all this for you. If a pod goes down, Kubernetes bring it back up. If a job fails to execute, Kubernetes tries again. This all relies on the fact that these resources are ephemeral, and that one instance is indistinguishable from the next.
However, this is not the case with stateful sets. You can’t take down one database pod, start a second database pod, and expect all the data to remain. Stateful sets exist for this purpose, ensuring that resources are persisted. But this means that no longer can Kubernetes manage your application for you; you must do it yourself. Each database, the logger has its own way of doing things, so there is no one-size-fits-all solution, and you need to keep an eye on things across the entire application lifecycle. This requires at least one dedicated person if the application is small, and an entire DevOps team for larger applications. At this point, it appears as if the whole purpose of using Kubernetes might be lost, but that is where operators come in.
What is an operator
The operator is supposed to act as a replacement for the DevOps team that would otherwise be doing all the tasks related to managing the cluster. This operator consists of a combined set of scripts specifically built for that particular application and has knowledge on how to handle clusters built around that application only. So, while the same Kubernetes processes can handle both an Apache and an Nginx server, the same set of operator scripts cannot handle both Prometheus and MongoDB applications. However, these scripts are reusable when managing the same application across different clusters. This means that if you have 10 replicas of a database cluster and you need to run a certain operation on them, instead of doing that operation on each cluster manually, you can get an operator to do it for you. Deploying resources, creating replicas, and fault tolerance; all of these are tasks that you need to handle manually when using stateful sets. These are the things that get scripted into operators so that they can be handled automatically.
So how does all this work? Kubernetes works using a control loop that continuously checks for things such as changed configurations, the number of replicas that should be running, etc… and you can think that the operator is a custom control loop. Just like the Kubernetes controller, it does a repeated cycle of observing the cluster, checking the differences between the current cluster state vs the required cluster state, then takes action to change the cluster to the required cluster state. It keeps an eye on the necessary resources, and if, for example, a pod goes down, it gets a new pod up and running. Operators also make use of CRDs. We covered CRD’s briefly in the ArgoCD section. CRDs are similar to ordinary Kubernetes resources (such as deployments, services, etc…), except they are defined by you, so this is a custom resource we would be defining. These CRDs, along with the stateful sets and other resources combined with the application-specific business logic to form the operator. For example, if a Prometheus resource goes down, how should it be brought back up? If a MongoDB pod stop running, what should be the process in place? Each application has its own way of management, and therefore, each application ends up with its own operator that has its application-specific business logic.
As such, the operator itself needs to be written by experts in their respective areas. Only someone who fully understands the business logic behind Prometheus can write an operator that can then be re-used across any Prometheus cluster and expect it to be fully managed. Information about how the resources should be installed, updated, run, synchronized, etc… This means that all the heavy lifting would have already been done for you. The only time you would have to create an operator yourself is if you are running a custom cluster with a brand new application that the community has not used before. OperatorHub.io is a centralized hub that hosts various operators. Additionally, there are GitHub repositories that host operators and a quick search for the operator of your choice should reveal some results. You could also create your own operators using the Operator SDK, although that might be not necessary under most circumstances.