Recovering from quorum loss¶
Quorum loss in MySQL happens when the majority of nodes (the quorum) required to make decisions and maintain consistency is no longer available. This can happen due to network issues, node failures, or other disruptions. When this occurs, the cluster may become unavailable or enter a read-only state.
Although the charm cannot automatically recover from quorum loss, you can take the following steps to manually recover the cluster.
Warning
Recovery from quorum loss should be performed with caution, as it can impact the availability and cause loss of data.
Ensure the cluster is in no-quorum state¶
A quorum loss will typically look like this in the juju status output:
Model Controller Cloud/Region Version SLA Timestamp
mymodel localhost default 3.6.8 unsupported 17:52:19Z
App Version Status Scale Charm Channel Rev Address Exposed Message
mysql 8.0.42-0ubuntu0.22.04.2 waiting 3 mysql-k8s 8.0/edge 279 10.152.183.61 no waiting for units to settle down
Unit Workload Agent Address Ports Message
mysql/0* maintenance idle 10.1.2.48 offline
mysql/1 maintenance idle 10.1.0.195 offline
mysql/2 active idle 10.1.1.81
From an active unit, check the cluster status with:
juju run mysql/2 get-cluster-status
Which will output the current status of the cluster.
Running operation 17 with 1 task
- task 18 on unit-mysql-2
Waiting for task 18...
status:
clustername: cluster-3eab807dee6797402ecfc52b5a84d15b
clusterrole: primary
defaultreplicaset:
name: default
primary: mysql-0.mysql-endpoints.m3.svc.cluster.local.:3306
ssl: required
status: no_quorum
statustext: cluster has no quorum as visible from 'mysql-2.mysql-endpoints.m3.svc.cluster.local.:3306'
and cannot process write transactions. 2 members are not active.
topology:
mysql-0:
address: mysql-0.mysql-endpoints.m3.svc.cluster.local.:3306
instanceerrors: '[''note: group_replication is stopped.'']'
memberrole: primary
memberstate: offline
mode: n/a
role: ha
status: unreachable
version: 8.0.42
mysql-1:
address: mysql-1.mysql-endpoints.m3.svc.cluster.local.:3306
instanceerrors: '[''note: group_replication is stopped.'']'
memberrole: secondary
memberstate: offline
mode: n/a
role: ha
status: unreachable
version: 8.0.42
mysql-2:
address: mysql-2.mysql-endpoints.m3.svc.cluster.local.:3306
memberrole: secondary
mode: r/o
replicationlagfromimmediatesource: ""
replicationlagfromoriginalsource: ""
role: ha
status: online
version: 8.0.42
topologymode: single-primary
domainname: cluster-set-3eab807dee6797402ecfc52b5a84d15b
groupinformationsourcemember: mysql-2.mysql-endpoints.m3.svc.cluster.local.:3306
success: "True"
Note from the output, we can see that the cluster is in a no-quorum state, with status: no_quorum
.
Recover the cluster from the active unit¶
Using the available active unit, run the action:
juju run mysql/2 promote-to-primary scope=unit force=true
The unit will become the new primary. Other offline units, if reachable, will rejoin automatically on the follow up update-status
events.