Pivert's Blog

Migrate MariaDB Databases to a Galera Cluster on a Kubernetes 3-node Server with OpenEBS ZFSPV StorageClass


, ,
Reading Time: 14 minutes

Introduction

High availability is paramount for stateful applications like MariaDB in Kubernetes (K8S). While our primary focus is MariaDB, these principles can be applied to various high-availability clusters such as MySQL, Kafka, Elasticsearch, Redis, or PostgreSQL with Operator to name few. This also applies to any workload not requiring strong persistence such as cache proxies.
In Kubernetes, local volumes provide Best-in-Class Cost-Performance Balance, but until recently, dynamic provisioning was a challenge. This is where the OpenEBS ZFS CSI Driver comes to the rescue, providing a solution that bridges the gap while leveraging ZFS’s advanced capabilities.

High Availability with Galera for MariaDB

MariaDB, a popular open-source database system, relies on high-availability clusters such as Galera to ensure data resilience and continuous service. Kubernetes is an ideal platform for achieving this level of availability.

Challenges of Local Volumes in Kubernetes

Local volumes offer low latency and high performance, ideal for stateful applications. However, dynamic provisioning was historically complex, hindering the agility required for maintaining high availability.

Leveraging ZFS for High Availability

The OpenEBS ZFS CSI Driver is instrumental in enabling high availability for MariaDB in Kubernetes. By utilizing ZFS as the underlying storage, you gain numerous advantages:

  1. Dynamic Provisioning: ZFS-based storage, coupled with the OpenEBS ZFS CSI Driver, enables dynamic provisioning of local volumes, simplifying scaling and reducing manual intervention.
  2. Low Latency and Performance: ZFS on local drives guarantees high performance and low-latency storage, vital for demanding applications like MariaDB.
  3. Simplified Management: The driver abstracts the complexities of local volumes, streamlining storage management for MariaDB in Kubernetes.
  4. Cost Efficiency: Utilizing local drives with ZFS and OpenEBS provides high availability without the cost overhead of remote storage solutions.
  5. Advanced Features: ZFS brings additional advantages, including data compression, encryption, dynamic volume resizing, and the ability to create snapshots of volumes, which can be used for creating clones and backups. This simplifies data management and protection.

Summary

Highly available stateful applications like MariaDB rely on resilient and high-performance storage. Kubernetes, in combination with the Galera cluster and OpenEBS ZFS CSI Driver, offers a robust solution. Leveraging ZFS for underlying storage ensures continuous MariaDB availability, low latency, high performance, simplified storage management, and advanced features like compression, encryption, dynamic resizing, and snapshots for cloning and backup creation. This powerful combination sets the stage for a reliable and efficient data platform in Kubernetes, ensuring your MariaDB clusters are always accessible and protected.

Implementation

Prerequisites

  • A working K8S cluster with at least 3 worker nodes. In the example we will use a cluster of 6 RKE2 nodes with K8S v1.28.2+rke2r1 :
  • 3 masters
  • 3 workers
  • Local block devices

Infrastructure

The infrastructure we will use is based on 3 Proxmox VE (pve) physical servers. Each of these servers runs :

  • 1 Master node in VM
  • 1 Worker node in VM
  • Local LVM and an SSD backed VolumeGroup with 512 GB free storage we can pass through to each worker node

To ease the setup, we will use ansible ad-hoc commands with this inventory file:

[pve]
pve1
pve2
pve3

[masters]
master1 rke2_type=server
master2 rke2_type=server
master3 rke2_type=server

[workers]
worker1 rke2_type=agent
worker2 rke2_type=agent
worker3 rke2_type=agent

[k8s_cluster:children]
masters
workers

Optional: use Seashell toolbox

I’ll use the pivert/seashell docker container remotely, since it contains all the required tools and commands, including helm, ansible, ssh, kubectl, jq, yq, rg, k9s and many more. Before starting the pivert/seashell startup script, make sure that:

  • You have docker installed (if not seashell script will propose you to install it)
  • You have ssh-agent running locally, and you did the ssh-add to unlock your key for SSH into your VMs and PVE hosts.
  • You have a valid ~/.kube/config to access your K8S cluster (You do not need local kubectl)
  • You have a valid /etc/ansible/hosts file such as the one shown below. (You do not need local ansible)
  • You use a dark console theme such as Solarized dark, and preferably use patched fonts such as Hack Nerd Font
  • You preferably are in a newly created folder that will be mounted under /workdir in the container.
    You should have exactly what is shown below.

This boils down to

curl -O https://gitlab.com/pivert/seashell/-/raw/main/seashell && chmod a+x seashell
./seashell

Example

The check you have admin rights

Provide local fast storage to the Workers

Create the new volumes on the PVE hosts

On each host on which a worker is running, we must provision or provide a big and fast block device. Since I cannot dedicate a full disk, I’ll provide a new LV from an existing patriot-vg VolumeGroup.

Check that we have the space

ansible pve -a vgs

OK, we have 819.25GB free on the patriot-vg on each PVE host.

Create the new LogicalVolumes

lvcreate -L 512G -n openebs-local-zfs /dev/patriot-vg

Attach each LogicalVolumes to the local Worker VM

In Proxmox, there are many ways to achieve this, and if you already use LVM with thin provisioning you can already do that from the Web interface.

But in my case, I want to use another Disk/LV that is not included in ProxmoxVE config, so I have to do it «manually».

To do this on Proxmox, we must first identify the numerical vmid, then pass the volume to the VM with the qm command

You should be able to match both the disks (LV) and the Proxmox vmids with the below commands. Make sure there’s exactly one match per node.

We can now attach the disks. First do the command starting with echo to ensure it does what you want. (take care of the escapes, so it’s not interpreted by the local shell)

ansible pve -mshell -a 'echo qm set $(qm list | grep "worker" | awk "{print \$1}") -scsi2 $(find /dev/disk/by-id/ | grep zfs)'

If satisfied, remove the leading echo, and of course, make sure the scsi2 is not already existing in your worker Proxmox configuration. I.E.

In the VMs you should end up with this

From that screen, also ensure that :

  • All other storages use local disks (localblock is a thinpool)
  • You use the VirtIO SCSI driver for performance

Setup ZFS in the worker nodes

Step 1: Prepare the ZFS

Before you set up the operator you need to create a local ZFS zpool in each worker VM.

Execute the following commands using Ansible on all worker nodes:

  1. Install ZFS utilities:
   ansible workers -m apt -a "name=zfsutils-linux state=present"
  1. Check the available block devices:
   ansible workers -m shell -a 'lsblk'

OK. The new 512 GB disks are at /dev/sdb on each worker.

  1. Create a ZFS pool named zfspv-pool using a suitable block device (e.g., /dev/sdb):
   ansible workers -m shell -a 'zpool create zfspv-pool /dev/sdb'
  1. Verify the ZFS pool status:
   ansible workers -m shell -a 'zpool status'

Deploy the OpenEBS ZFS Operator

The ZFS Operator manages ZFS local storage on worker nodes in your Kubernetes cluster.

Check the existing kubelet directory:

   ansible workers -m shell -a 'ls -l /var/lib/kubelet'

ZFS Operator will look for that folder, so in case it’s elsewhere on your K8S, you’ll have to modify the zfs-operator.yaml file.

Apply the ZFS Operator manifest:

curl -LO https://openebs.github.io/charts/zfs-operator.yaml
kubectl apply -f ./zfs-operator.yaml

Now that we have OpenEBS ZFS Operator installed, we must create a StorageClass to match our ZFS above created zfspv-pool

Copy and paste the StorageClass definition to openebs-zfspv-sc.yaml

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-zfspv
allowVolumeExpansion: true
parameters:
  recordsize: "128k"
  compression: "off"
  dedup: "off"
  fstype: "zfs"
  poolname: "zfspv-pool"
provisioner: zfs.csi.openebs.io
allowedTopologies:
  - matchLabelExpressions:
      - key: kubernetes.io/hostname
        values:
          - worker1
          - worker2
          - worker3

Make sure the poolname is correct. Note that I explicitly locked it to the worker nodes on which the ZFS pool is available.

Then apply and check

You can also check the pods. The 2 commands below are identical (<ctrl>+j to expand). The first is just one of the many shortcuts provided in pivert/seashell

Add the VolumeSnapshotClass

Create a snapshotclass.yaml file with

kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
metadata:
  name: zfspv-snapclass
  annotations:
    snapshot.storage.kubernetes.io/is-default-class: "true"
driver: zfs.csi.openebs.io
deletionPolicy: Delete

Then ka snapshotclass.yaml to apply this CRD. It will allow us to easily take Zfs snapthots of the volumes as described in the troobleshooting section.

Install Galera Cluster using Helm

Next, you need to deploy the Galera Cluster on your Kubernetes environment. Helm is a package manager for Kubernetes that simplifies the deployment process. You can use the Helm chart to install Galera.

Check the possible values

helm show values oci://registry-1.docker.io/bitnamicharts/mariadb-galera > mariadb-galera-values.yaml

Here is my mariadb-galera-custom-values.yaml file

---
global:
  storageClass: "openebs-zfspv"
service:
  type: LoadBalancer
  ports:
    mysql: 3306
  loadBalancerIP: "192.168.66.195"
persistence:
  size: 12Gi
metrics:
  enabled: true
  service:
    type: LoadBalancer
    port: 9104
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "9104"
    loadBalancerIP: "192.168.66.195"


The minimum you have to ensure, is that you use the proper StorageClass configured.

Also check the MariaDB Galera Helm chart documentation for more options

Then check how you want to expose the service.
In my case, I have KubeVIP in place with an IP range, so I prefer to use the LoadBalancer service and provide a fixed IP address to the service.
Since my Prometheus is not (yet) in K8S, I also enabled it, and exposed it via the KubeVIP LoadBalancer.

Install Galera Cluster using Helm and your values file by executing the following command:

k create ns galera
kubens galera
helm install galera oci://registry-1.docker.io/bitnamicharts/mariadb-galera --values ./mariadb-galera-custom-values.yaml

This command deploys Galera Cluster with MariaDB as its core database engine.

From pivert/seashell, you can use

k9s -n galera -c 'xray sts'

to monitor the deployment. Feel free to move to a starting pod, and press l (for logs) or d (for describe). If you type :helm you can see the chart and the help message.

Checks

Here is a bunch of commands and checks you can make to ensure everything is OK or possibly debug if something went wrong.

OpenEBS checks

  • kubectl logs -f openebs-zfs-controller-0 -n kube-system -c openebs-zfs-plugin
  • kubectl logs -f openebs-zfs-node-[xxxx] -n kube-system -c openebs-zfs-plugin
  • kubectl get pods -n kube-system
  • kubectl get zv -A -o yaml

To get the DB password (from the galera namespace)

export PASSWORD=$(kubectl get secret galera-mariadb-galera -o jsonpath="{.data.mariadb-root-password}" | base64 --decode)

Other checks (using the kubectl aliases from Ahmet included, also explained on the pivert/seashell page. Almost all are self explanatory, if it ends with n, it means the first argument is the namespace.

helm show readme oci://registry-1.docker.io/bitnamicharts/mariadb-galera
helm show all oci://registry-1.docker.io/bitnamicharts/mariadb-galera
kgsts
kgpoowide
kgsvc
kg pv
kg pvc
export PASSWORD=$(kubectl get secret galera-mariadb-galera -o jsonpath="{.data.mariadb-root-password}" | base64 --decode)
k scale statefulset galera-mariadb-galera --replicas 2
mariadb -uroot --password=${PASSWORD} -h 192.168.66.195 <<< "SHOW GLOBAL STATUS WHERE variable_name REGEXP 'wsrep_(cluster|apply|provider|incoming|received|local_state_comment|desync)'"
curl 192.168.66.195:9104/metrics
kgsvcojson galera-mariadb-galera | jq -r '.spec.loadBalancerIP'
export DBHOST=$(kgsvcojson galera-mariadb-galera | jq -r '.spec.loadBalancerIP')

Example

Always make sure the wsrep_local_state_comment is Synced, and check the size and cluster weight.
Also observe the incoming_addresses.

Stress test

Either using k9s, either manually, start to kill pods, reduce replicas, delete PVs, … and check how the Operator is dealing with the situation.

Example (you might have to open the image in full screen)

During these operations, it’s always nice to check the DB logs (pod logs), and to evidence that the zfs PV is completely deleted (DELETE policy) with zfs ls on the concerned worker node.

Also, I highly encourage you learn to use k9s, you can easily do all the above from the ui. The few commands you will need are: :ns :sts :pods :pvc :pv, and the other shortcuts are displayed on top of the window. To start it direcly with the xray explorer : start k9s -n galera -c 'xray sts' from pivert/seashell

For instance, press Enter then x to get to the secret decoder

To delete the MariaDB Galera installation

If you need to reinstall or just delete the installation, you will need those commands

kubens galera                                  # ensure you're in the galera namespace (or the one you choose)
k scale sts galera-mariadb-galera --replicas 0
kgpow                                          # ensure all pods are killed
helm uninstall galera
krm pvc data-galera-mariadb-galera-0           # delete all PVCs
krm pvc data-galera-mariadb-galera-1
krm pvc data-galera-mariadb-galera-2
# krm ns galera                                # Delete the namespace if you do not want to reinstall

As you can see when uninstalling the helm chart, the PVC won’t be deleted. This can be useful in case you must debug or re-install to force bootstrap for instance. Let’s talk about that in the «Troubleshoot and Fix» section.

Migrate the databases

Be careful not to migrate the mysql database nor the root user, it will break the Galera cluster.

Dump data from the source server

I lost the root password, so I do it from one of the source VM called galera1 😉 using OS/socket authentication (so no password).
Use xz to get an excellent compression.

mysql -uroot <<< 'show databases'                                             # List databases you would like to export
mysqldump -u root --databases db1 db2 db3 | xz > databases.sql.xz             # Dump the DB you want to export

Also export the users and privileges. I used a script from this post that did a great job.

mysql -u<user> -p<password> -h<host> -e"select concat('show grants for ','\'',user,'\'@\'',host,'\'') from mysql.user" > user_list_with_header.txt
sed '1d' user_list_with_header.txt > ./user.txt
while read user; do  mysql -u<user> -p<password> -h<host> -e"$user" > user_grant.txt; sed '1d' user_grant.txt >> user_privileges.txt; echo "flush privileges" >> user_privileges.txt; done < user.txt
awk '{print $0";"}'  user_privileges.txt >user_privileges_final.sql
rm user.txt user_list_with_header.txt user_grant.txt user_privileges.txt

Adapt the above to your needs, and edit the user_privileges_final.sql file to keep only the users you want to keep.

Import the data into the new Galera cluster

Install the mariadb client since it’s not in seashell by default

sudo apt install mariadb-client-core -y
scp galera1:/root/databases.sql.xz ./
export DBHOST=$(kgsvcojson galera-mariadb-galera | jq -r '.spec.loadBalancerIP')
export PASSWORD=$(kubectl get secret galera-mariadb-galera -o jsonpath="{.data.mariadb-root-password}" | base64 --decode)
xz -d -c /databases.sql.xz |mariadb -uroot --password=${PASSWORD} -h ${DBHOST}
mariadb -uroot --password=${PASSWORD} -h ${DBHOST} <<< 'show databases'

As another option, you can also do the above extract and load in a single command through ssh. Useful if you need frequent refresh.

ssh -l root galera1 'mysqldump -u root --databases db1 db2 db3' | mariadb -uroot --password=${PASSWORD} -h ${DBHOST}
mariadb -uroot --password=${PASSWORD} -h ${DBHOST} <<< 'show databases'

Import the users

Connect to the DB on the `mysql` database and copy/paste the users privileges from the modified user_privileges_final.sql file above.

mariadb -uroot --password=${PASSWORD} -h ${DBHOST} mysql

By following these steps, you can migrate your databases from MariaDB to the Galera Cluster running on your Kubernetes environment with OpenEBS ZFSPV storage.

Backup

Here is an example of a backup staging on CephFS (using the CephFS CSI). It goes like this:

  1. Create a galera-backup-pvc PVC on a highly available CephFS
  2. Create a galera-backup-cronjob

Example definitions

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: galera-backup-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: csi-cephfs-sc
apiVersion: batch/v1
kind: CronJob
metadata:
  name: galera-backup-cronjob
  namespace: galera
spec:
  schedule: "0 6 * * *" # Runs daily at 6 AM
  jobTemplate:
    spec:
      template:
        spec:
          initContainers:
            - name: setup-permissions
              image: busybox
              command:
                - sh
                - "-c"
                - "chown -R 1001:1001 /backup"
              volumeMounts:
                - name: backup-data
                  mountPath: /backup
          containers:
            - name: galera-backup
              image: docker.io/bitnami/mariadb-galera:11.1.2
              command:
                - /bin/sh
                - -c
              args:
                - |
                  /opt/bitnami/mariadb/bin/mariadb-dump -h galera-mariadb-galera -uroot --password=${MARIADB_ROOT_PASS} --all-databases | gzip > /backup/MariaDB-Galera-AllDB_$(date +%Y%m%d-%Hh%M).sql.gz
              env:
                - name: MARIADB_ROOT_PASS
                  valueFrom:
                    secretKeyRef:
                      key: mariadb-root-password
                      name: galera-mariadb-galera
              volumeMounts:
                - name: backup-data
                  mountPath: /backup
          restartPolicy: Never
          volumes:
            - name: backup-data
              persistentVolumeClaim:
                claimName: galera-backup-pvc

After few days, you can check directly from the CephFS

Get the subvolumePath from the PV description with CLI or k9s

Troubleshoot & fix

There are too many cases to cover, you must check Galera documentation and MariaDB documentation.

Nodes in CrashLoopBackOff


The 3 nodes can end up in a CrashLoopBackOff state for instance if you reboot all underlying Kubernetes nodes too fast or at the same time.
We will cover this case here, as this introduce general steps to troubleshoot Galera Helm. Check bootstraping a node other than 0

0. Check

You can start from k9s -n galera -c 'xray sts' or any other method.

  • All Galera pods are in CrashLoopBackOff
  • Check the PV & PVCs are bound
  • Check the logs to find out which node has the safe_to_bootstrap: 1
    You can use the method from bootstraping a node other than 0, but most of the time you should easily find it by filtering on safe_to_bootstrap in the logs with k9s or klo.

You identified the node it’s safe to bootstrap from.

2. Snapthot the PVs

Make sure you have a fresh backup of the DB. You can also take a snapshot of the local PVs before uninstalling the Galera helm chart. Make sure you have added the VolumeSnapshotClass CRD (kg volumesnapshotclass) as described above or do it now.

For each worker/node, create a VolumeSnapshot definition such as the example data-galera-mariadb-galera-0-snapshot.yaml here:

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: data-galera-mariadb-galera-0-snap
spec:
  volumeSnapshotClassName: zfspv-snapclass
  source:
    persistentVolumeClaimName: data-galera-mariadb-galera-0

Create a snapthot nof each of the 3 nodes PVCs with the above template.

ka data-galera-mariadb-galera-0-snapshot.yaml
ka data-galera-mariadb-galera-1-snapshot.yaml
ka data-galera-mariadb-galera-2-snapshot.yaml

Or if you prefer to do all at once, create all 3 `VolumeSnapshot` in a single file and apply with a bash command like:

for x in {0..2}; do cat <<EOF >> data-galera-mariadb-galera-VolumeSnapshots.yaml; done;                                                     
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: data-galera-mariadb-galera-$x-snap
spec:
  volumeSnapshotClassName: zfspv-snapclass
  source:
    persistentVolumeClaimName: data-galera-mariadb-galera-$x
EOF
ka data-galera-mariadb-galera-VolumeSnapshots.yaml

Then check with for instance:

kgn openebs zfssnap
ansible workers -a 'zfs list -t all'  # You should see PVCs with ...@snapshot-... and the current diff size close to 0
kubectl get volumesnapshot.snapshot
kgoyaml volumesnapshot.snapshot data-galera-mariadb-galera-0-snap
kgoyaml volumesnapshot.snapshot data-galera-mariadb-galera-0-snap | yq '.status.readyToUse'

Make sure the volumesnapshots are in READYTOUSE state.

3. Re-install Galera Helm chat

Store the passwords (Important)

This step is really important, since you won’t be able to get them after you uninstall galera, and you must re-use the same passwords because it’s also stored in the DB.

export MARIADB_ROOT_PASSWORD=$(kubectl get secret --namespace "galera" galera-mariadb-galera -o jsonpath="{.data.mariadb-root-password}" | base64 -d)
export MARIADB_GALERA_MARIABACKUP_PASSWORD=$(kubectl get secret --namespace "galera" galera-mariadb-galera -o jsonpath="{.data.mariadb-galera-mariabackup-password}" | base64 -d)

Re-install Galera helm chart and force Bootstrap from a healthy node

Make sure you saved the passwords !!! (above step)
In this case this is the node 0 I need to bootstrap from:

helm uninstall galera
helm install galera oci://registry-1.docker.io/bitnamicharts/mariadb-galera  \
  -f mariadb-galera-custom-values.yaml \
  --version 10.0.1 \
  --set rootUser.password=$MARIADB_ROOT_PASSWORD \
  --set galera.mariabackup.password=$MARIADB_GALERA_MARIABACKUP_PASSWORD \
  --set db.name=galera \
  --set galera.bootstrap.forceBootstrap=true \
  --set galera.bootstrap.bootstrapFromNode=0 \
  --set podManagementPolicy=Parallel

The installation will take back the previous PVCs and data, but force the bootstrapFromNode=0

Check the logs and database state. Somme commands:

klo -l 'app.kubernetes.io/name=mariadb-galera'
mariadb -uroot --password=${MARIADB_ROOT_PASSWORD} -h ${DBHOST} <<< "SHOW GLOBAL STATUS WHERE variable_name REGEXP 'wsrep_(cluster|apply|provider|incoming|received|local_state_comment|desync)'"

Remove the bootstrap options (important)

When the server is back in Sync again, remove the forceBootstrap and bootstrapFromNode options

helm upgrade galera oci://registry-1.docker.io/bitnamicharts/mariadb-galera \
  -f mariadb-galera-custom-values.yaml \
  --set rootUser.password=$MARIADB_ROOT_PASSWORD \
  --set galera.mariabackup.password=$MARIADB_GALERA_MARIABACKUP_PASSWORD \
  --set db.name=galera \
  --set podManagementPolicy=Parallel

4. Remove the snapshots

After some days, feel free to remove the snapshots

One node in CrashLoopBackOff

You can restart a single Pod and/or reduce StatefulSet to the health Pod. When scaling down, StatefulSets by default delete pods in reverse order of their creation. So make sure the healthy Pod is the oldest one, or delete others accordingly.

If the scale down and scale up of the cluster does not heal the cluster, you might have to force the bootstrap.

Make sure first the cluster is not already in a force bootstrap mode : check the current values if the following parameters are set:

galera:                                                                                                                        
  bootstrap:                                                                                                                             
    bootstrapFromNode: 0                                                                                                                 
    forceBootstrap: true

I had a case where those settings were forgotten from a previous forceBootstrap: https://github.com/bitnami/containers/issues/54037
To fix, remove from values and helm upgrade as described above. The cluster should heal by itself.

Further reading

Conclusion

Migrating databases from one server to another can be a complex task, but with the right tools and careful planning, it can be accomplished efficiently. In this article, provided new local block storage to each worker, prepared the ZFS Operator, installed the Galera Cluster using Helm, and outlined the general steps to migrate the data from a MariaDB server to the new Galera Cluster on a 3-node Kubernetes server. This process can provide you with a highly available and scalable database solution for your containerized applications.

Like it ?

Get notified on new posts (max 1 / month)
Soyez informés lors des prochains articles

Leave a Reply

Your email address will not be published. Required fields are marked *