tisdag 2 juni 2020

InnoDB Cluster on OCI using Kubernetes and StatefulSets

In this demo we are setting up InnDB Cluster on Kubernetes, we will use a StatefulSets and NFS as storage. This demo was created on Oracle Cloud (OCI) but vanilla Kubernetes and NFS was used so should work for any cloud or on-prem.
More information on my github page here

Setup NFS Server to act as your persistent volume.

Setup a NFS Server for your persistent volumes, howto here If you are using a public cloud provider you can most likely use dynamic storage options for handling of PV.
In bellow examples I have a NFS Server on IP: 10.0.0.50 This NFS exposes folders:
  • /var/nfs/pv0
  • /var/nfs/pv1
  • /var/nfs/pv2

Kubernetes configuration

You can look at configuration for kubernetes in yamls folder.
First we are creating three persistent volumes (pv0-pv2) for our InnoDB Cluster nodes. We are specifying that this volume can only be accessed by one node (ReadWriteOnce) We are also specifying that we will use our NFS server for storage. More information on PV here.
After we have created the persistent volumes we will create the StatefulSet. StatefulSets is a way in Kubernetes to manage stateful applications First we create services for our cluster nodes to expose them on the network. Next we configure our StatefulSet, we want to have three replicas (three InnoDB Cluster nodes) that we are starting in parallel. We also use the simplified way by defining a volume claim template (volumeClaimTemplates) that will claim the three previously created volumes.

Create PV:

kubectl create -f yamls/02-mysql-pv.yaml
kubectl get pv (short for kubectl get persistentvolumes)
               (should be in STATUS Available)

Create unique namespace for the cluster:

Namespaces in k8s are like compartments in OCI, lets create a unique namespace for our cluster.
kubectl create namespace mysql-cluster
kubectl get namespaces
Set default namespace for coming commands to mysql-cluster:
kubectl config set-context --current --namespace=mysql-cluster

Start the cluster nodes using StatefulSets

kubectl create  -f  yamls/02-mysql-innodb-cluster-manual.yaml

Look at:

kubectl get pv,pvc
watch kubectl get all -o wide
(or kubectl get all -o wide -n mysql-cluster)
(or kubectl get all -o wide --all-namespaces)
If there are problems look at logs (mysqld is started direcly + error log is set to stderr in our docker image):
kubectl logs mysql-innodb-cluster-0
or prev failed pods by running:
kubectl logs -p mysql-innodb-cluster-1)
Look at configuration for the pod:
kubectl describe pod mysql-innodb-cluster-1

Login to MySQL:

kubectl exec -it  mysql-innodb-cluster-0 -- mysql -uroot -p_MySQL2020_
kubectl exec -it  mysql-innodb-cluster-0 -- mysqlsh -uroot -p_MySQL2020_ -S/var/run/mysqld/mysqlx.sock

Create InnoDB Cluster

Create admin user for InnoDB Cluster on all nodes:

kubectl exec -it  mysql-innodb-cluster-0 -- mysql -uroot -p_MySQL2020_ -e"SET SQL_LOG_BIN=0; CREATE USER 'idcAdmin'@'%' IDENTIFIED BY '
idcAdmin'; GRANT ALL ON *.* TO 'idcAdmin'@'%' WI
TH GRANT OPTION";
kubectl exec -it  mysql-innodb-cluster-1 -- mysql -uroot -p_MySQL2020_ -e"SET SQL_LOG_BIN=0; CREATE USER 'idcAdmin'@'%' IDENTIFIED BY '
idcAdmin'; GRANT ALL ON *.* TO 'idcAdmin'@'%' WI
TH GRANT OPTION";
kubectl exec -it  mysql-innodb-cluster-2 -- mysql -uroot -p_MySQL2020_ -e"SET SQL_LOG_BIN=0; CREATE USER 'idcAdmin'@'%' IDENTIFIED BY '
idcAdmin'; GRANT ALL ON *.* TO 'idcAdmin'@'%' WI
TH GRANT OPTION";

Create your cluster using shell from one of the pods (mysql-innodb-cluster-0):

Login to shell:
kubectl exec -it  mysql-innodb-cluster-0 -- mysqlsh -uidcAdmin -pidcAdmin -S/var/run/mysqld/mysqlx.sock
and then configure the instances:
dba.configureInstance('idcAdmin@mysql-innodb-cluster-0:3306',{password:'idcAdmin',interactive:false,restart:true});
dba.configureInstance('idcAdmin@mysql-innodb-cluster-1:3306',{password:'idcAdmin',interactive:false,restart:true});
dba.configureInstance('idcAdmin@mysql-innodb-cluster-2:3306',{password:'idcAdmin',interactive:false,restart:true});
You get an error when running "dba.configureInstance" ERROR: Remote restart of MySQL server failed: MySQL Error 3707 (HY000): Restart server failed (mysqld is not managed by supervisor process).
This is due some limitation running "restart" command in MySQL for our docker container, we are working on solving this. Please restart MySQL manually to enable new settings, easiest to scale down + scale up again like:
kubectl scale statefulset --replicas=0 mysql-innodb-cluster
Look at: watch kubectl get all -o wide during the scale up/down.
kubectl scale statefulset --replicas=3 mysql-innodb-cluster
Login to shell again:
kubectl exec -it  mysql-innodb-cluster-0 -- mysqlsh -uidcAdmin -pidcAdmin -S/var/run/mysqld/mysqlx.sock
and run:
cluster=dba.createCluster("mycluster",{exitStateAction:'OFFLINE_MODE',autoRejoinTries:'20',consistency:'BEFORE_ON_PRIMARY_FAILOVER'});
cluster.status()
cluster.addInstance('idcAdmin@mysql-innodb-cluster-1:3306',{password:'idcAdmin',recoveryMethod:'clone'});
cluster.addInstance('idcAdmin@mysql-innodb-cluster-2:3306',{password:'idcAdmin',recoveryMethod:'clone'});
cluster.status()
Done, you should now have a running InnoDB Cluster using statefulSets on Kubernetes.

Simulate a failure

Look at cluster status, login to mysql shell:
kubectl exec -it  mysql-innodb-cluster-1 -- mysqlsh -uidcAdmin -pidcAdmin -S/var/run/mysqld/mysqlx.sock
And look at cluster status:
cluster=dba.getCluster()
cluster.status()

Also look at pods watch kubectl get all -o wide -n mysql-cluster
Kill the pod that is primary (RW) (mysql-innodb-cluster-0 most likely)
kubectl delete pod mysql-innodb-cluster-0
You should now see that the deleted pod is restarted and that the "old" primary (RW) will join after restart as secondary (RO).

If you want to remove everything

kubectl delete -f yamls/02-mysql-innodb-cluster-manual.yaml
kubectl delete pvc mysql-persistent-storage-mysql-innodb-cluster-0
kubectl delete pvc mysql-persistent-storage-mysql-innodb-cluster-1
kubectl delete pvc mysql-persistent-storage-mysql-innodb-cluster-2
kubectl delete -f yamls/02-mysql-pv.yaml
Make sure all is deleted:
kubectl get pv,pv
kubectl get all -o wide
Remember to also empty out the datadir on NFS between tests:
sudo rm -fr /var/nfs/pv[0,1,2]/*
ls /var/nfs/pv[0,1,2]/

Extras

måndag 1 juni 2020

Running MySQL on Kubernetes - deployment using persistent volumes

In this demo we are setting up one MySQL Server using k8s, we will use a deployment and NFS as storage. This demo was created on Oracle Cloud (OCI), standard Kubernetes and NFS was use so the setup should work for any cloud deployment or on-prem.
More information on my github page here

Persistent volumes

Setup a NFS Server for your persistent volumes, howto here If you are using a public cloud provider you can most likely use dynamic storage options for PV.
In bellow examples I have a NFS Server on IP: 10.0.0.50 The NFS exposes folder:
  • /var/nfs/pv099

Kubernetes configuration

You can look at configuration for kubernetes in yamls folder.
First we are creating a persistent volume and a persistant volume clame. We are specifying that this volume can only be accessed by one node (ReadWriteOnce) We are also specifying that we will use our NFS server for storage. More information on PV here.
After we have created the persistent volume we will create the MySQL deployment. First we create a service to expose our application on the network. Next we create the MySQL deployment using the resourses created earlier.
  1. Create a persisten volume (PV):
kubectl create -f yamls/01-mysql-pv.yaml
  1. Start MySQL using a deplyments (one MySQL Server using NFS PV)
kubectl create -f yamls/01-mysql-deployment.yaml
Done!

If you want to remove everything

kubectl delete -f yamls/01-mysql-deployment.yaml 
kubectl delete -f yamls/01-mysql-pv.yaml
Make sure everything is deleted:
kubectl get pv,pv
kubectl get all -o wide
Remember to also empty out the datadir on NFS between tests:
sudo rm -fr /var/nfs/pv099/*
ls /var/nfs/pv099/

tisdag 19 maj 2020

Automate MySQL Partitioning using events and procedures


The purpose of this post if to show how you can automate the creation and pruning of partitioned tables. If you want to read about partitioning I recommend reading our manual.

In short partitioning makes it possible to spread your individual tables across a file system according to the partition rules you specify.

Reasons for partition your tables might be:
- Insert performance, smaller index trees for stable insert throughput.
- Select performance, only read data from selected (aka partitioning pruning) partitions.
- Delete performance, drop partitioning is must quicker than doing range deletes of old data.

Partitioning definitions is part for the CREATE/ALTER table statements, in the samples below we will use RANGE partitions on a DATE column in our table. This design would be appropriate for log/transaction tables where you want to remove old data.

Lets first create a simple table called logs:

       
CREATE TABLE logs (
    id INT AUTO_INCREMENT NOT NULL,
    message VARCHAR(500) NOT NULL,
    code TINYINT UNSIGNED NOT NULL,
    created DATETIME DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id,created)
); 


Next step we want to partition this table using the column created, we will use range partitoning and partition data per day like:

       
ALTER TABLE logs PARTITION BY RANGE COLUMNS(created) (
    PARTITION p20200217 VALUES LESS THAN ('2020-02-17'),
    PARTITION p20200218 VALUES LESS THAN ('2020-02-18'),
    PARTITION p20200219 VALUES LESS THAN ('2020-02-19'),
    PARTITION p20200220 VALUES LESS THAN ('2020-02-20'),
    PARTITION p20200221 VALUES LESS THAN ('2020-02-21')
);


After the initial setup of our partitioned table it's now time to think about how to make sure we always have new partitions to insert data to and also remove old partitions. You can of course do this as a manual task using ALTER TABLE statements but that will become a tedious and error prune task over time.

To solve this problem we can use events and stored procedures in MySQL, we will also have a helper table to store some meta data of the tables we want to automate partitioning for.

Lets first create a table that will contain some information about the tables we want to automate partition handling for.

       
CREATE TABLE IF NOT EXISTS `MyPartTables` (
  `schema_name` VARCHAR(100) NOT NULL,
  `table_name` VARCHAR(100) NOT NULL,
  `no_of_future_partitions` INT NOT NULL,
  `max_partitions` INT NOT NULL,
  PRIMARY KEY (`schema_name`,`table_name`)
);


By looking at column names its quite clear what they mean, you simply insert one row per table you want to automate partition handling for. For our example table above this would look something like:

       
INSERT INTO MyPartTables VALUES ('ted','logs',2,20);


So, for out logs table in schema ted this means, create 2 partitions in the future and keep a maximum of 20 partitions before starting to remove old partitions.

Next we need to have some stored procedures to handle the create of new partitions and removal of old partitions. Both of these stored procedures will use INFORMATION_SCHEMA.PARTITIONS to look at current state and then add/remove any needed partitions as specified by information in MyPartTables for the specific table.

First lets have a look at AddPartitions, this procedure will fetch all rows from MyPartTables and then add any needed partitions. It's today hard coded to work only with RANGE partitioning using days for tables that look like our test table logs above.

       
CREATE PROCEDURE `AddPartitions`()
BEGIN
  DECLARE done   BOOL DEFAULT FALSE;
  DECLARE maxp   INT DEFAULT 0;
  DECLARE pdate  INT DEFAULT 0;
  DECLARE pname  VARCHAR(20);    
  DECLARE v_table_name  VARCHAR(100);
  DECLARE v_schema_name  VARCHAR(100);
  DECLARE v_no_future_part INT;

  DECLARE list CURSOR FOR SELECT `table_name`,`schema_name`,`no_of_future_partitions` FROM MyPartTables;

  DECLARE CONTINUE HANDLER FOR
  SQLSTATE '02000'
  SET done = TRUE;

  OPEN list;
  tloop: LOOP
    IF done THEN
      CLOSE list;
      LEAVE tloop;
    END IF;

    FETCH list INTO v_table_name,v_schema_name,v_no_future_part;
    -- DEBUG SELECT v_table_name,v_schema_name,v_no_future_part;

    SET pdate=TO_DAYS(DATE(NOW() + INTERVAL (v_no_future_part) DAY));
    SELECT TO_DAYS(TRIM(BOTH "'" FROM MAX(PARTITION_DESCRIPTION))) INTO maxp
           FROM INFORMATION_SCHEMA.PARTITIONS
           WHERE TABLE_NAME=v_table_name
           AND TABLE_SCHEMA=v_schema_name
           AND PARTITION_DESCRIPTION!="MAXVALUE";
    -- DEBUG' SELECT pdate, maxp;

    WHILE pdate > maxp DO
       SET maxp = maxp + 1;
       SET pname = CONCAT('p', DATE_FORMAT(FROM_DAYS(maxp), '%Y%m%d'));
       SET @qry = CONCAT('ALTER TABLE ',v_schema_name,'.',v_table_name, ' ADD PARTITION (PARTITION ',pname,' VALUES LESS THAN ("',DATE_FOR
MAT(FROM_DAYS(maxp),'%Y-%m-%d'),'"))');
       -- DEBUG SELECT @qry;
       PREPARE stmt FROM @qry;
       EXECUTE stmt;
       DEALLOCATE PREPARE stmt;
    END WHILE;
  END LOOP;
END ;;

Next lets look at the the procedure for removing old partitions, this is structured in the same way as the AddPartitions procedure above but instead of adding partitions it removes any partitions above the limit given in MyPartTables.

       
CREATE PROCEDURE RemovePartitions ()
BEGIN
  DECLARE done BOOL DEFAULT FALSE;
  DECLARE v_table_name  VARCHAR(100);
  DECLARE v_schema_name  VARCHAR(100);
  DECLARE v_max_partitions INT;
  DECLARE v_no_of_partitions_to_remove INT;

  DECLARE cur CURSOR FOR SELECT `table_name`,`schema_name`,`max_partitions` FROM MyPartTables;

  DECLARE CONTINUE HANDLER FOR
  SQLSTATE '02000'
  SET done = TRUE;

  OPEN cur;

  tloop: LOOP
    FETCH cur INTO v_table_name,v_schema_name,v_max_partitions;
    IF done THEN
      CLOSE cur;
      LEAVE tloop;
    END IF;
    -- DEBUG SELECT v_table_name,v_schema_name,v_max_partitions;
    SET @qry = CONCAT('SELECT COUNT(DISTINCT(PARTITION_DESCRIPTION)) INTO @v_no_of_partitions ',
                     'FROM INFORMATION_SCHEMA.PARTITIONS ',
                     'WHERE TABLE_NAME="',v_table_name, '" ',
                     'AND TABLE_SCHEMA = "',v_schema_name, '" ');
    -- DEBUG SELECT @qry;
    PREPARE stmt FROM @qry;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    -- DEBUG SELECT @v_no_of_partitions, v_max_partitions;

    IF @v_no_of_partitions > v_max_partitions THEN
      SET v_no_of_partitions_to_remove = @v_no_of_partitions - v_max_partitions;
      SET @qry = CONCAT('SELECT GROUP_CONCAT(DISTINCT(PARTITION_NAME)) INTO @v_list_of_partitions_to_remove FROM ( ',
                            'SELECT PARTITION_NAME ',
                            'FROM INFORMATION_SCHEMA.PARTITIONS ',
                            'WHERE TABLE_NAME = "',v_table_name, '" ',
                            'AND TABLE_SCHEMA = "',v_schema_name, '" ',
                            'AND PARTITION_DESCRIPTION!="MAXVALUE" ',
                            'ORDER BY PARTITION_DESCRIPTION LIMIT ',v_no_of_partitions_to_remove,
                            ' ) as slabb');
      -- DEBUG SELECT @qry;
      -- DEBUG SELECT @v_partitions_to_remove;
      PREPARE stmt FROM @qry;
      EXECUTE stmt;
      DEALLOCATE PREPARE stmt;

      SET @qry = CONCAT('ALTER TABLE ',v_table_name, ' DROP PARTITION ', @v_list_of_partitions_to_remove);
      -- DEBUG SELECT @qry;
      PREPARE stmt FROM @qry;
      EXECUTE stmt;
      DEALLOCATE PREPARE stmt;
    END IF;
  END LOOP;
END ;;


Now we only need to create two events that will run our add and remove partition procedures above, as we are dealing with daily partitions its enough to run the procedures once per day like:

       
CREATE EVENT AddPartitions ON SCHEDULE EVERY 1 DAY
DO
  BEGIN
  CALL AddPartitions();
END ;;

CREATE EVENT RemovePartitions ON SCHEDULE EVERY 1 DAY
DO
  BEGIN
  CALL RemovePartitions();
END ;;


If you want to try this out there is complete running example here.

This is just an example on how you can automate the procedure of handling your partitioned tables in MySQL, the code in the stored procedures is hard coded to only work with RANGE and daily partitioned tables but you adopt these to work with any kind of partition setups.

Having a metadata table like MyPartTables  makes it easy handle all of our partitioned tables.

If the result from the GROUP_CONCAT operation is large you need to increase the group_concat_max_len parameter, I recommend increasing group_concat_max_len to 50000.