Mysql Failover With MHA (Master High Availability)

A post from dbpandit

MySQL Automatic Failover with MHA (Master High Availability)

Few questions and answers before to begin :
Q 1) What is MySQL Failover?
Q 2) What is MHA and How does it do MySQL Failover?
Q 3) Is there any alternative available?
Q 4) How does MHA works?

Ans 1) As most of us are dependent on Master - Slave architecture for our relational MySQL database where all writes/changes goes to a single machine called as master database and every other db (slave) replicates all changes from it. In such a scenario, We must think about solutions which help us in placing new master asap in case existing master fails. Manual process can take about 1-2 hours.

Ans 2) MHA stands for Master High Availability and it's nothing but a set of perl scripts which automates failover process and completes within few seconds (20-30).

Ans 3) Yes, There are alternatives available like mysqlfailover utility. This utility is quite cool though has its own dependencies like enabling GTID based replication on every server, which requires complete downtime.

Ans 4) Working is defined as below :

  • MHA has two parts. MHA Node and MHA Manager.
  • MHA Node is installed at all MySQL Servers including both Master and all of its direct slaves and MHA Manager can be installed on any slave or a separate server having ssh access to all of the database servers except master db server as in case of failure or host unavailability, This won't function.
  • MHA Manager keeps pinging existing master for checking its availability, By Default, duration is 3 secs. MySQL Master health can be checked by pinging it from multiple machines to ensure that ping failure isn't happening because of any network issue.
  • If it is found that Master is down, MHA immediately comes into picture and start process of promoting candidate master to master or choose any latest slave to promote as master.
  • In order to ensure that our candidate master remains latest slave, We can also setup semi sync replication on it.
  • It also switch all slaves to the new master by executing "change master to" command on each of them.
  • At the same time, It also execute masteripfailover script which immediately switch master's secondary ip to new master and disable it on the old one.
  • This process completes within 20-30 secs.

Major benefits of MySQL MHA :

  • Automatic failover
  • Short downtime
  • Take care of MySQL-Replication consistency
  • No change required at application level

After passing though above mentioned details. Let's have a look at its installation and configuration part :

Installation of MHA Node and Manager :

MHA Node installation is required on all db servers - Master and its direct slaves :

wget http://mysql-master-ha.googlecode.com/files/mha4mysql-node_0.54-0_all.deb  
apt-get install libdbd-mysql-perl  
dpkg -i mha4mysql-node_0.54-0_all.deb  

Install MHA Manager only on slave server from where failover will be controlled.

wget http://mysql-master-ha.googlecode.com/files/mha4mysql-manager_0.55-0_all.deb  
apt-get install libdbd-mysql-perl  
apt-get install libconfig-tiny-perl  
apt-get install liblog-dispatch-perl  
apt-get install libswitch-perl  
apt-get install libparallel-forkmanager-perl  
dpkg -i mha4mysql-manager_0.55-0_all.deb  

We also need to have a configuration file on the same node as mha manager at the following location : /etc/mha/app1.cnf after creating /etc/mha folder.

[server default]
# mysql user and password
user=mhauser  
password=password  
ssh_user=root

# Replication user credentials
#repl_user=mhauser
repl_password=password  
repl_user=mhauser  
# working directory on the manager
manager_workdir=/var/log/masterha/app1  
manager_log=/var/log/masterha/mha.log  
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

# Below script is called by mha manager at different times, First for checking cluster status during manager 
master_ip_failover_script= /etc/mha/master_ip_failover 

[server1]
hostname=1.1.1.1 -- This must be a secondary ip i.e. assigned to Master database server.

[server2]
hostname=2.2.2.2  
candidate_master=1

[server3]
hostname=3.3.3.3  
no_master=1

# Below command helps us in checking MySQL status from other host as well before to declare MySQL unavailability.
secondary_check_script = masterha_secondary_check -s 2.2.2.2 -s r3.3.3.3

log_level=debug  

masteripfailover script is called by mha manager whenever it detects Master failure. In case of AWS, To switch secondary IP to new master. Also, Grant all privileges to mysql user : mhauser and allow its access from around the required hosts :

mysql> grant all on *.* to 'mhauser'@'%'  identified by 'password';  
mysql> flush privileges;  

Now, Create a masteripfailover script as /etc/mha/masteripfailover and paste below content into it. Don't forget to make it executable :

#!/usr/bin/env perl

use strict;  
use warnings FATAL => 'all';

use Getopt::Long;

use Net::Ping;  
use Switch;

my ($command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password);


GetOptions(  
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);


my $vip = '4.4.4.4';  # Virtual IP  
my $master_srv = '1.1.1.1';  
my $new_master_srv = '2.2.2.2';  
my $timeout = 5;  
#my $key = "1";
my $key = "0";  
#my $ssh_start_vip = "sudo /sbin/ifconfig eth0:$key $vip";
my $ssh_start_vip = "sudo ifup eth0:$key";  
#my $ssh_stop_vip = "sudo /sbin/ifconfig eth0:$key down";
my $ssh_stop_vip = "sudo ifdown eth0:$key";  
my $INSTANCE_ID = `ssh $ssh_user\@$new_master_srv curl --silent http://169.254.169.254/latest/meta-data/instance-id`;  
my $ENI_ID = `ssh $ssh_user\@$new_master_srv aws ec2 describe-instances --instance-ids $INSTANCE_ID | grep NetworkInterfaceId -m 1 | awk '{print \$2;}' | tr -d ',' | tr -d '\n'`;  
print $ENI_ID;  
exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {
        print "Disabling the VIP on old master if the server is still UP: $orig_master_host \n";
        my $p=Net::Ping->new('icmp');
        &stop_vip() if $p->ping($master_srv, $timeout);
        $p->close();
        $exit_code = 0;
    };
    if ($@) {
        warn "Got Error: $@\n";
        exit $exit_code;
    }
    exit $exit_code;
}
elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;  
    eval {
        print "Enabling the VIP - $vip on the new master - $new_master_host \n";
        &start_vip();
        $exit_code = 0;
    };
    if ($@) {
        warn $@;
        exit $exit_code;
    }
    exit $exit_code;
}
elsif ( $command eq "status" ) {  
    print "Checking the Status of the script.. OK \n";
    #`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    exit 0;
}
else {  
    &usage();
    exit 1;
}
}

# A simple system call that enable the VIP on the new master
sub start_vip() {  
#    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
print "aws ec2 assign-private-ip-addresses --network-interface-id $ENI_ID --private-ip-addresses $vip --allow-reassignment;";  
`aws ec2 assign-private-ip-addresses --network-interface-id "$ENI_ID" --private-ip-addresses $vip --allow-reassignment`;
#`ifconfig eth0:$key $vip`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {  
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {  
print  
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

In masteripfailover script, We must also update server ips as required. Here, Function startvip & stopvip are the once required most attention.

Also, To support IP Failover in AWS Environment, We must have to set few more things like assigning a secondary ip to the existing NIC of our existing master by using AWS Console by following few steps shown below :

After completing above steps, We also have to make changes at OS level at both Existing and New master db server in order to make them knowing the secondary IP by adding a file on them at following location : /etc/network/interfaces.d/eth1.cfg with below mentioned content :

# The primary network interface
auto eth0:0  
iface eth0:0 inet static  
name Ethernet alias LAN card  
address 4.4.4.4  
netmask 255.255.252.0  
broadcast 10.0.43.255  
network 10.0.43.0  

After adding and changing address as per requirement in the above eth1.cfg file, Follow below mentioned steps to enable it :

ifup eth0:0 is the command here for adding secondary ip. We can disable it with ifdown eth0:0 Note : After setting this file on both existing and new master, Execute ifup eth0:0 on both of them. This will show you secondary ip after doing ifconfig.

Enabling SSH across all servers to each other. We can set it up with key based authentication by simply generating ssh-keygen at all hosts and put content of idrsa.pub of every host to authorizedkeys of every other host in the cluster.
Also, If we are using any one of our slave for running MySQL MHA, That node must have its own idrsa.pub available in its authorizedkeys file. This can also be called as self ssh and can be achieved by below statement :

cat .ssh/id_rsa.pub >> .ssh/authorized_keys  

So, if have 3 node cluster then all 3 must be able to access each other by ssh
A -> B B -> A B -> C
A -> C C -> A C -> B

Also, Make sure to have ssh access of admin user only like root.

It's suggested to verify ssh connectivity with below script as well. This will be available on mha manager's node.

masterha_check_ssh --conf=/etc/mha/app1.cnf  

You should have a message like : "All SSH connection tests passed successfully" after executing above script.
Then, Check replication around all servers by running below mentioned script :

masterha_check_repl --conf=/etc/mha/app1.cnf  

Please be sure that you have the “MySQL Replication Health is OK.” before to move forward. In case you are having any MariaDB Slave, You will see error like "MySQL Replication Health is not ok". This might be because MHA doesn't support MariaDB. For solutions around please follow : https://code.google.com/p/mysql-master-ha/issues/detail?id=70

If everything goes fine, Start mha manager with below mentioned command :

nohup masterha_manager --conf=/etc/mha/app1.cnf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &  

For logs of mha manager, Refer /var/log/masterha/mha.log

Also, After initiating mha manager process, check its status with :

masterha_check_status --conf=/etc/mha/app1.cnf  

After completing above processes, We are good to check MySQL Failover.

  • Shutdown MYSQL Master Server by executing /etc/init.d/mysql stop
  • Checkout logs at mha manager node - /var/log/masterha/mha.log

After failover, You will find below mentioned report in mha.log

----- Failover Report -----

app1: MySQL Master failover 1.1.1.1 to 2.2.2.2 succeeded

Master 1.1.1.1 is down!

Check MHA Manager logs at dawslmkttestdb03:/var/log/masterha/mha.log for details.

Started automated(non-interactive) failover.  
Invalidated master IP address on 1.1.1.1.  
The latest slave 2.2.2.2(2.2.2.2:3306) has all relay logs for recovery.  
Selected 2.2.2.2 as a new master.  
2.2.2.2: OK: Applying all logs succeeded.  
2.2.2.2: OK: Activated master IP address.  
3.3.3.3: This host has the latest relay log events.  
Generating relay diff files from the latest slave succeeded.  
3.3.3.3: OK: Applying all logs succeeded. Slave started, replicating from 2.2.2.2.  
2.2.2.2: Resetting slave info succeeded.  
Master failover to 2.2.2.2(2.2.2.2:3306) completed successfully.  

Notes :

MySQL MHA Manager gets shutdown after every failover and requires deletion of following file for starting it again rm /var/log/masterha/app1/app1.failover.complete.

Ref Links # http://www.arborisoft.com/how-to-configure-mysql-masterslave-replication-with-mha-automatic-failover/

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/MultipleIP.html#MultipleIPReqs

https://code.google.com/p/mysql-master-ha/issues/detail?id=70

https://code.google.com/p/mysql-master-ha/wiki/Parameters#masteripfailoverscript - masterip_failover script parameter