Extending the Kubernetes API using Operators

15 min readOct 12, 2022

This article is aimed at developers already familiar with Kubernetes, and who are interested in extending the capabilities of a Kubernetes cluster.

Overview

Kubernetes defines a set of primitives that are the building blocks for running containerised applications. These include Pods (for running containers), Persistent Volumes (for defining storage), and Services (for exposing applications on a network). Each of these primitives is a different Kind (analogous to a type in a programming language).

The full set of primitives enable you to build, run, and operate sophisticated applications, but they are very generic abstractions. Running real world applications and services (like message queues and databases) gets complicated quickly.

Each object on your Kubernetes cluster is defined with YAML configuration file. And they all start like these ones:

apiVersion: v1
kind: Pod 
---
apiVersion: v1
kind: PersistentVolume
---
apiVersion: v1
kind: Service

The API for the Kubernetes primitives is the only one which does not have an explicit group name (explained in next section), which is why configuration files for those start apiVersion: v1. You can extend the Kubernetes API on a cluster by defining a new Kind and the operator code to provide the behaviour needed to run objects of the new Kind on the cluster.

New Kinds

You add a new Kind to a cluster by creating and installing a Custom Resource Definition (CRD). The new Kind will belong to a versioned API group you also define. Often, several related Kinds will all be defined as part of the same API group. For example, cert-manager defines an API for managing X.509 certificates in a cluster. It defines several Kinds including Certificate and Issuer.

These both belong to the cert-manager.io API Group, currently at version v1. The configuration file for Kubernetes objects always starts with the specification for the Group Version Kind (GVK). For example, a cert-manager Issuer configuration always starts like this:

apiVersion: cert-manager.io/v1
kind: Issuer

Controllers

The CRD describes the configuration needed to define the desired state of an object. That state is usually expressed as Pods and other Kubernetes primitives that will run on the cluster (although in theory a Kind could actually be a cluster representation of an external resource). As well as a CRD you need a Controller which can read the definition of the object, and then make the changes needed to bring about the desired state.

The Reconcile function is the heart of the controller. The Kubernetes control plane calls this function for each object of the Kind registered for that controller. The Reconcile function compares the state of the cluster with the desired object state, and makes any changes needed to bring that state about. The function return value is the interval to wait before calling it for that object again.

The control plane will continue to call the Reconcile function for the lifetime of the object. The reconcilation loop is at the heart of the Kubernetes state-driven model; the Kubernetes user specifies a desired state, the Control Plane compares this to the current state and makes adjustments accordingly. In the case of custom Kinds, the Control Plane delegates this work to the controller for that Kind.

The Reconciliation loop. The Control Plane calls the Reconcile() function with the object to reconcile, and the Controller adjusts state and returns time till next reconciliation — The reconciliation loop

This reconciliation loop enables the controller to request a change and then wait to see if the desired state is now reached. It also gives the controller the opportunity to repair the state if it changes due to other events.

Why Extend the Kubernetes API?

Writing your own Operators and CRDs to extend the Kubernetes API is not a trivial exercise, so what new capability does it unlock? There are two answers to that question, and the first answer is “nothing” — you are still running software on your cluster using the standard Kubernetes objects — Pods, Services, Deployments, etc.

The second answer is “quite a lot”; you can build new abstractions on top of the standard Kubernetes primitives that hide complexity from developers and administrators. This simplifies running complex workloads (such as databases or message queues) and reduces the scope for errors.

For example, running a stateful application like a database in production is more complex than just instantiating a Pod running a database container. You need persistent storage, defined networking, secrets, and failover. Defining an API for this enables you to embody a set of rules into your CRD and Operator for how a database should be set up and managed. Anyone can instantiate the new database Kind you have defined, setting the parameters allowed by the CRD to produce consistent setups.

You could argue that a package manager like Helm does something similar, in that it simplifies the setup of a complex package and provides the end-user a set of values they can alter to meet their needs. For example, Helm packages often include parameters for things like CPU and memory requirements. A Helm package simplifies setup, but it does not define new Kinds or provide a reconciliation loop to operate them.

This article describes a simple CRD that defines the Postgresql Kind for running a Postgres database. For simplicity it runs the database as a single Pod with no persistent storage; it is not intended for serious use. This example is meant to be simple enough to understand easily, while still showing how to create, delete, and read the state of other Kubernetes objects.

Even a simple example like this abstracts away some of the details that you would normally have to set up to run a database in a Pod and has two parameters — the default user, and the default user’s password, which you would otherwise set up through separate ConfigMaps or Secrets.

The CRD is part of API group database.db.exmple.com/v1, and defines Kind Postgresql. You can imagine extending the API group database.db.example.com/v1 to simplify operating a production database. For example, adding a Backup kind would let you create objects for full or incremental backup, each with its own schedule for running, and a destination URL for the data.

Now your database backup schedules can be defined declaratively as Kubernetes objects and the control plane/operator reconciliation loop runs them for you. Extending the Kubernetes API enables you to add functionality in a way that is very simple for cluster users or administrators to consume. This is the kind of scenario that Operators and CRDs enable that would be hard to achieve using Kubernetes primitives or a package manager.

The Example Code

The Postgresql example is at https://github.com/pkpivot/pg-simple-operator on Github. You can clone or fork the repository and run it on your own Kubernetes cluster to see it working. This repository contains the code to extend a cluster with a database v1 API Group, consisting of a single new Kind, Postgresql. The example is intended as a simple start to the world of Kubernetes Operators, showing you how to create the Kubernetes objects that represent a custom Kind, as well as how to clean up when those custom objects are deleted.

This example does not show you how to run a production ready Postgresql server on a Kubernetes cluster. That would be too complex for a short article like this. The remainder of this article will explain how to run the example, and how the code works.

Prerequisites

Before you start you will need access to a Kubernetes cluster. Everything in this article was developed using Minikube. You also need access to an image repository. If your cluster can access the internet you can use a free account on Docker Hub.

You also need a Go development environment. Kubernetes is written in Go, and although you can write controller code in other languages, Go is the best supported for this. All the Kubernetes APIs are available through Go modules. You can install Go on MacOS with homebrew:

brew install go

Finally, you will need to install kubebuilder. At the moment there is no support for kubebuilder on Windows, but it will run on the Windows Subsystem for Linux (WSL). You need to install Go on the WSL before you can install kubebuilder.

Running the API Extension

In this article you can follow a worked example of extending the Kubernetes API with a database v1 Group, consisting of a single new Kind, Postgresql. This example shows you how to get started with extending the Kubernetes API with your own CRDs.

A Postgresql manifest looks like this:

apiVersion: database.db.example.com/v1
kind: Postgresql
 metadata:
    name: postgresql-sample
spec:
    password: "Password123!"
    defaultuser: "postgres"

If you apply this file to a cluster running the Controller defined here, you will get a Pod running a database server, configured with a password and default user.

As stated earlier, this won’t create a database server you could actually rely on. For example, you would want to be able to specify the compute and persistent storage for the database; otherwise a Pod crash could wipe out all your data. You would probably also use a StatefulSet rather than a Pod so that you could define and configure a server cluster with failover capability.

To build and run the example you will follow these steps:

Generate the project for a new API Group Version.
Create a template Custom Resource Definition
Define the CRD specification and status as Go types.
Use kubebuilder to generate the Kubernetes manifest specification for the CRD from the Go type definitions.
Define the controller Reconcile() function.
Generate a CRD manifest from the Go definitions and install into your cluster.
Run the controller outside the cluster.
Install the controller into the cluster.

Generate Project and CRD template

In this first step you name and version your API group. Kubernetes requires that every API group belongs to a domain. The domain ensures that all APIs have a unique fully qualified name. For example, the cred-manager APIs (mentioned in the New Kinds section earlier) are part of the cert-manager.io domain. When you generate the project with kubebuilder, one of the parameters is the domain. When providing an API you intend to make publicly available you should use a domain name that you control in order to avoid potential name clashes with other third-party APIs. However, kubebuilder only checks that a domain name is well-formed, not its ownership, so for this article, example.com is fine.

The other important parameter is the repo name. This is used to name the Go module for the project, to avoid name-clashes with other Go projects. In the command below I’ve used the repo containing the original code for this example; you can create your own github repo and use that name instead.

To create the project, create a new directory and run:

kubebuilder init --domain example.com --repo github.com/pkpivot/pg-simple-operator

To generate the CRD and controller templates run the command below and respond y to the Create resource and Create Controller prompts:

kubebuilder create api --group database --version v1 --kind Postgresql

At this point you have a Go project with a module initialized with the dependencies needed to work with the Kubernetes API. The main points of interest:

main.go contains the code that will connect your controller (PostgresqlReconciler) to the Kubernetes control plane.
Makefile includes targets for building and running the controller, testing it, and for generating the Kubernetes manifest for your CRD.
Under api/v1 the postgresql_types.go file contains the template where you will define your CRD using Go types.
Under controllers the postgresql_controller.go file defines the PostgresqlReconciler type together with a template Reconcile method.
Under config/crd are kustomize templates to generate your CRD manifest (run make manifests)
Under config/samples a template manifest for running an object of your new kind on the cluster

In the next section, you add the code defining your new CRD.

Create the CRD

CRDs are specified to a Kubernetes cluster by YAML manifest. However, kubebuilder enables us to use Go type definitions as a specification from which it generates the manifest. This is easier than creating the manifest by hand, and also ensures that the Go and Kubernetes definitions of the CRD are aligned.

To generate the CRDs and Go templates:

Run the following command and respond y to the Create resource and Create Controller prompts:

kubebuilder create api --group database --version v1 --kind Postgresql

This generates several new directories and files in your project, including theapi/v1 and controllers directories. The Go type definition tenplates for the Postgresql kind are in the api/v1 directory, and the controller template is in the controllers directory. Now you need to fill in the details.

Open api/v1/postgresql_types.go. You are going to define two fields for your Postgresql object, DefaultUser and Password.
Find the type definition for PostgresqlSpec and change it to:

type PostgresqlSpec struct {
	DefaultUser string `json:"defaultUser"`
	Password string `json:"password"`
}

We also want to define a status for our object so add the following new type definition and constants:

type PgPhase string

const (
	PgUp      PgPhase = "up"
	PgPending PgPhase = "pending"
	PgFailed  PgPhase = "Failed"
)

Then change the PostgresqlStatus type definition to:

type PostgresqlStatus struct {
	Phase PgPhase `json:"pgPhase,omitempty"`
	Active corev1.ObjectReference `json:"active,omitempty"`
}

Once you’ve finished, your postgresql_types.go file should look like the one in the original example repository https://github.com/pkpivot/pg-simple-operator.

You have now defined the Go object that represents the custom Kind Postgresql. To create the manifests for the CRD, and install them into your cluster, run:

make manifests
make install

If you later change the definition of the Go object rerun those commands to keep the CRD in step with the Go code. At this point you have defined custom Kind Postgresql, and made it available on your cluster. You can even create a Postgresql object on the cluster, but no other Kubernetes objects will be created to do the actual work of the object. For that you need the controller code described in the next section.

Add code to the Reconcile Function

Kubebuilder has created the outline of a controller in controllers/postgresql_controller.go. This file defines a controllers package and the PostgresqlReconciler type:

type PostgresqlReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

This struct includes an anonymous Client object (from the Kubernetes package sigs.k8s.io/controller-runtime/pkg/client). All Client public functions are available as part of PostgresqlReconciler — Go uses aggregation rather than inheritance to reuse functionality from other types. The Reconcile function looks like this at the moment:

func (r *PostgresqlReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    _ = log.FromContext(ctx)

    // TODO(user): your logic here

    return ctrl.Result{}, nil
}

To create a fully functioning controller, copy controllers/postgresql_controller.go from the sample repository https://github.com/pkpivot/pg-simple-operator over the template. There now follows a description of the main parts of the code. The Reconcile() function contains the basic logic flow, although some of the code has been broken out into helper functions to make it easier to follow.

This is what happens in the Reconcile() function. Remember, this function will be called for each Postgresql object on the cluster in turn. The very last statement in the cluster returns a value to the Kubernetes Control Plane specifying a 5 second wait before invoking Reconcile() for the same object again:

return ctrl.Result{RequeueAfter: time.Second * 5}, nil

The rest of the logic is as follows:

After fetching a logger from the context, try to retrieve a Pod of the same name as the Postgresql object and in the same namespace. If one doesn’t exist, create it, and return to the control plane, but requeue this object for the next reconciliation pass in 5 seconds.
The second time through Reconcile() the expected Pod should be found. Now we check the Pod's lifecycle phase, so that we can update the status of the Postgresql object. Once the Pod is running, the Postgresql object status is set to up. You can see the status of a Postgresql object name by running kubectl describe postgresql name.
The next thing Reconcile() does is to register a finalizer for the Postgresql object. This is done by a helper method which will only register a finalizer if it doesn’t already exist. Finalizers are explained below.
Next it checks to see if the Postgresql object is actually in a deleting state - ie, someone has run the kubectl delete command against it. If it is being deleted, we use another helper method to delete all the resources associated with it — in this case, just the Pod.

You can examine the code in Reconcile() and the other helper methods in detail to see exactly how they work. Most of the code is calling functions on the Kubernetes API (provided by the aggregated client object as explained above).

RBAC Markers

In postgresql_controller.go, between the type definition of the PostgresqlReconciler and the Reconcile() function, are four lines of comments in this form:

//+kubebuilder:rbac...

When you deploy the controller to a cluster these validation markers create Role-based Access Control (RBAC) permissions applied to your controller. The first three lines are those generated by kubebuilder. There is one extra line added to enable the controller to list, create, and delete Pods. When you run the controller outside the cluster (in the section Running the Operator) the controller has your user permissions, but when you run deploy it onto the cluster it only has the permissions defined by ClusterRole pg-simple-operator-manager-role.

Finalizers

What is the finalizer for? When you register a finalizer against an object, the Kubernetes Control Plane doesn’t mark an object as deleted until the finalizer has been removed. In the case of this controller, when you give the command to delete a Postgresql object, it is put into the deleting state. When the deleting state is detected in the Reconcile() function, the deleteExternalResources() function checks to see if the finalizer is present, requests pod deletion, and then removes the finalizer.

Running the Operator

The Go code for the operator is a runnable application with a Main() function (in main.go). To run the application:

Open a terminal.
Change directory to the root of your project folder.
To start the controller:
make run
Alternatively:
go run main.go

The application will run using the credentials and connection details in your current kubeconfig context. The code that retrieves the context and connects your controller to the Kubernetes control plane is in the application’s Main() function. Leave the application running in the terminal so that we can try out the functionality:

Open another terminal.
Install the CRDs into your cluster.
make manifests
make install
Now you can query to see if there are any Postgresql instances running in the current namespace:
kubectl get postgresql
You should see the message No resources found in default namespace. This shows you that the CRD is installed because the cluster now recognises postgresql as a type of resource — otherwise you'd see an error message instead.
Now you can create a manifest for a Postgresql instance. When you added the CRD earlier, kubebuilder created a template. Open config/samples/database_v1_postgresql.yaml and edit it to look like this:

apiVersion: database.db.example.com/v1 kind: Postgresql metadata: name: mydb spec: defaultuser: "pgowner" password: "password123"

Then run:
kubectl apply -f config/samples/database_v1_postgresql.yaml
Log messages from the Reconcile() function in the controller start scrolling up the terminal running the controller application. The first message or two will show that the pod created by the controller is in the Pending phase.
To see the Pod that has been created to run the database server we asked for:
kubectl get pod
You can also see information about the Postgresql resource created:
kubectl describe postgresql postgresql-sample
At the bottom of the information from this command you should see:
Pg Phase: up
(It will only be up once the Pod is Running).
You can also demonstrate the declarative nature of Kubernetes. By applying the manifest in step 5, we told Kubernetes that the state we want maintained is a running Pod that hosts the database we have declared. Give the commands:
kubectl delete pod postgresql-sample kubectl get pod
The get pod command will show that a pod has been created. The reconciliation loop for Postgresql postgresql-sample is being run every 5 seconds, and the first time it runs after the pod has been deleted it will create a new pod because it can't find the one it expects.
Delete the Postgresql resource:
kubectl delete postgresql postgresql-sample
The deletion code that is part of the reconciliation loop for Postgresql resources will remove the associated Pod and then the Kubernetes control plane removes the Postgresql resource once the finalizer is released.

You’ve now seen the controller run, and seen how it creates and deletes resources associated with the Postgresql kind we’ve added to the cluster. Running the controller locally (that is, not on the cluster) like this makes it much easier to develop and debug your code. To put your operator into production though, you need to deploy it on the cluster, which you’ll do in the next section.

Deploying the Controller

You deploy the controller in the same way that you deploy anything to Kubernetes; build an image and then run it in a Pod on the cluster. The makefile created by kubebuilder has targets to simplify this for you. However, the docker-build target assumes you have docker installed; if you are using an alternative tool for building and managing container images you can either change the docker-build and docker-push targets in the makefile, or run the commands yourself.

To use the make targets to build the controller image and push it to your image repository:

export IMG=your-image-repo/pg-simple-operator:latest
make docker-build docker-push

To deploy the controller to your cluster:

make deploy

When you deploy the controller, the resources are created in a new namespace, pg-simple-operator-system. To see the resources created:

kubectl get all -n pg-simple-operator-system

Conclusion

The extensible Kubernetes API enables administrators to add new facilities to a cluster, simplifying the deployment and management of complex applications. I hope this article has given you a starting point to explore building your own operators.