再谈MysqlMHA

关于 mysql 数据库的高可用以及 mysql 的 proxy 中间件的选型一直是个很活跃的技术话题。以高可用为例，解决方案有 mysqlndb 集群， mmm,mha,drbd 等多种选择。 mysql 的 proxy 中间件则有 mysql-proxy,atlas,cobar,mycat,tddl 等，可谓五花八门。以上对应
关于mysql数据库的高可用以及mysql的proxy中间件的选型一直是个很活跃的技术话题。以高可用为例，解决方案有mysql ndb集群，mmm , mha, drbd等多种选择。mysql 的proxy中间件则有mysql-proxy, atlas , cobar, mycat, tddl等，可谓五花八门。
以上对应的解决方案都有各种的优缺点，生产环境的选型的原则无非几个字：“简单，符合要求且高效！”一句话：适合自己的，才是最好的！
本文将介绍采用在mysql 1主3从的环境下，采用mha架构实现mysql master的ha和自动failover。进而结合atlas数据库代理完成mysql数据库的整体读写请求分离，同时在atlas这层实现ha和failover，避免单点故障。数据库架构设计拓扑如下：
650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k1451512-0.jpg title=图片1.jpg alt=wkiom1rprp2j2_9saakyuifj7i4157.jpg>
一：环境介绍
atlas 主：192.168.1.12/24
atlas 从：192.168.1.81/24
atlas vip:  192.168.1.230/24
atlas write:
master：192.168.1.225/24
slave1：192.168.1.226/24
slave2：192.168.1.227/24
管理节点：192.168.1.12/24
mha vip:  192.168.1.231/24
atlas  read:
slave2：192.168.1.227/24
slave3：192.168.1.228/24
由于文章篇幅关系，mha和atlas将分开两篇文章来介绍！
其实早在两年前就尝试过mha, 由于种种原因，当时只研究了皮毛，本文也算是对前文的一个补充！前文传送门：http://ylw6006.blog.51cto.com/470441/890360
二：配置数据库复制, 这个比较简单，配置步骤省略(如有问题请百度一下！)。
master
650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k1455e7-1.jpg title=图片2.png alt=wkiol1rpr33q8kadaahydyxmwuc190.jpg>
slave:(226,227,228均为此状态)
650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k1453s0-2.jpg title=图片3.png alt=wkiol1rpr6lsbxipaawhcpqa56k294.jpg>
二：安装及配置mha
1: 配置主机间的ssh密钥信任，此处以225服务器为例，其他服务器执行同样的操作
# ssh-keygen# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.12# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.225# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.226# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.227
2：安装mha，这里为了方便，直接采用rpm包形式进行安装
rpm包下载地址，需要翻墙：https://code.google.com/p/mysql-master-ha/downloads/detail?name=mha4mysql-manager-0.53-0.el6.noarch.rpm https://code.google.com/p/mysql-master-ha/downloads/detail?name=mha4mysql-node-0.52-0.noarch.rpm 管理节点：# yum -y localinstall \mha4mysql-node-0.52-0.noarch.rpm  \mha4mysql-manager-0.53-0.el6.noarch.rpm 数据库节点：# yum -y localinstall mha4mysql-node-0.52-0.noarch.rpm
3: 配置mha主配置文件
管理节点：# mkdir -p /usr/local/mha# mkdir -p /etc/mha# cat /etc/mha/mha.conf[server default]user=rootpassword=123456manager_workdir=/usr/local/mhamanager_log=/usr/local/mha/manager.logremote_workdir=/usr/local/mhassh_user=rootrepl_user=replicationrepl_password=123456ping_interval=1secondary_check_script= masterha_secondary_check -s 192.168.1.226 -s 192.168.1.227master_ip_failover_script=/usr/local/scripts/master_ip_failover[server1]hostname=192.168.1.225ssh_port=22master_binlog_dir=/mydatacandidate_master=1[server2]hostname=192.168.1.226ssh_port=22master_binlog_dir=/mydatacandidate_master=1[server3]hostname=192.168.1.227ssh_port=22master_binlog_dir=/mydatano_master=1
4：准备failover脚本
# cat /usr/local/scripts/master_ip_failover
#!/usr/bin/env perluse strict;use warnings fatal => 'all';use getopt::long;my ($command, $ssh_user, $orig_master_host, $orig_master_ip,$orig_master_port, $new_master_host, $new_master_ip, $new_master_port);my $vip = '192.168.1.231'; # virtual ipmy $gateway = '192.168.1.1';#gateway ipmy $interface = 'eth0';my $key = 1;my $ssh_start_vip = /sbin/ifconfig $interface:$key $vip;/sbin/arping -i $interface -c 3 -s $vip $gateway >/dev/null 2>&1;my $ssh_stop_vip = /sbin/ifconfig $interface:$key down;getoptions('command=s' => \$command,'ssh_user=s' => \$ssh_user,'orig_master_host=s' => \$orig_master_host,'orig_master_ip=s' => \$orig_master_ip,'orig_master_port=i' => \$orig_master_port,'new_master_host=s' => \$new_master_host,'new_master_ip=s' => \$new_master_ip,'new_master_port=i' => \$new_master_port,);exit &main();sub main {print \n\nin script test====$ssh_stop_vip==$ssh_start_vip===\n\n;if ( $command eq stop || $command eq stopssh ) {# $orig_master_host, $orig_master_ip, $orig_master_port are passed.# if you manage master ip address at global catalog database,# invalidate orig_master_ip here.my $exit_code = 1;eval {print disabling the vip on old master: $orig_master_host \n;&stop_vip();$exit_code = 0;};if ($@) {warn got error: $@\n;exit $exit_code;}exit $exit_code;}elsif ( $command eq start ) {# all arguments are passed.# if you manage master ip address at global catalog database,# activate new_master_ip here.# you can also grant write access (create user, set read_only=0, etc) here.my $exit_code = 10;eval {print enabling the vip - $vip on the new master - $new_master_host \n;&start_vip();$exit_code = 0;};if ($@) {warn $@;exit $exit_code;}exit $exit_code;}elsif ( $command eq status ) {print checking the status of the script.. ok \n;`ssh $ssh_user\@$orig_master_host \ $ssh_start_vip \`;exit 0;}else {&usage();exit 1;}}# a simple system call that enable the vip on the new mastersub start_vip() {`ssh $ssh_user\@$new_master_host \ $ssh_start_vip \`;}# a simple system call that disable the vip on the old_mastersub stop_vip() {`ssh $ssh_user\@$orig_master_host \ $ssh_stop_vip \`;}sub usage {printusage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n;}
# chmod +x /usr/local/scripts/master_ip_failover
5：进行ssh检查
# masterha_check_ssh  --conf=/etc/mha/mha.conf
650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k1452958-3.jpg title=图片4.png alt=wkiol1rpsrntud9paaiz-mrutka102.jpg>
# cp -rvp /usr/lib/perl5/vendor_perl/mha  /usr/local/lib64/perl5/
（mha的数据库节点和管理节点均需要执行此步骤）
# masterha_check_ssh  --conf=/etc/mha/mha.conf
650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k1454121-4.jpg title=图片5.png alt=wkiom1rpsorzcsvkaaa8p7tbqwq456.jpg>
6：进行同步检查
# masterha_check_repl --conf=/etc/mha/mha.conf
tue oct 28 10:58:35 2014 - [warning] global configuration file /etc/masterha_default.cnf not found. skipping.tue oct 28 10:58:35 2014 - [info] reading application default configurations from /etc/mha/mha.conf..tue oct 28 10:58:35 2014 - [info] reading server configurations from /etc/mha/mha.conf..tue oct 28 10:58:35 2014 - [info] mha::mastermonitor version 0.53.tue oct 28 10:58:36 2014 - [info] dead servers:tue oct 28 10:58:36 2014 - [info] alive servers:tue oct 28 10:58:36 2014 - [info]   192.168.1.225(192.168.1.225:3306)tue oct 28 10:58:36 2014 - [info]   192.168.1.226(192.168.1.226:3306)tue oct 28 10:58:36 2014 - [info]   192.168.1.227(192.168.1.227:3306)tue oct 28 10:58:36 2014 - [info] alive slaves:tue oct 28 10:58:36 2014 - [info]   192.168.1.226(192.168.1.226:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 10:58:36 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 10:58:36 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 10:58:36 2014 - [info]   192.168.1.227(192.168.1.227:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 10:58:36 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 10:58:36 2014 - [info]     not candidate for the new master (no_master is set)tue oct 28 10:58:36 2014 - [info] current alive master: 192.168.1.225(192.168.1.225:3306)tue oct 28 10:58:36 2014 - [info] checking slave configurations..tue oct 28 10:58:36 2014 - [warning]  relay_log_purge=0 is not set on slave 192.168.1.227(192.168.1.227:3306).tue oct 28 10:58:36 2014 - [info] checking replication filtering settings..tue oct 28 10:58:36 2014 - [info]  binlog_do_db= , binlog_ignore_db=tue oct 28 10:58:36 2014 - [info]  replication filtering check ok.tue oct 28 10:58:36 2014 - [info] starting ssh connection tests..^ctue oct 28 10:58:41 2014 - [info] got terminate signal. exit.[root@test_redis scripts]# masterha_check_repl --conf=/etc/mha/mha.conftue oct 28 10:58:51 2014 - [warning] global configuration file /etc/masterha_default.cnf not found. skipping.tue oct 28 10:58:51 2014 - [info] reading application default configurations from /etc/mha/mha.conf..tue oct 28 10:58:51 2014 - [info] reading server configurations from /etc/mha/mha.conf..tue oct 28 10:58:51 2014 - [info] mha::mastermonitor version 0.53.tue oct 28 10:58:52 2014 - [info] dead servers:tue oct 28 10:58:52 2014 - [info] alive servers:tue oct 28 10:58:52 2014 - [info]   192.168.1.225(192.168.1.225:3306)tue oct 28 10:58:52 2014 - [info]   192.168.1.226(192.168.1.226:3306)tue oct 28 10:58:52 2014 - [info]   192.168.1.227(192.168.1.227:3306)tue oct 28 10:58:52 2014 - [info] alive slaves:tue oct 28 10:58:52 2014 - [info]   192.168.1.226(192.168.1.226:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 10:58:52 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 10:58:52 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 10:58:52 2014 - [info]   192.168.1.227(192.168.1.227:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 10:58:52 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 10:58:52 2014 - [info]     not candidate for the new master (no_master is set)tue oct 28 10:58:52 2014 - [info] current alive master: 192.168.1.225(192.168.1.225:3306)tue oct 28 10:58:52 2014 - [info] checking slave configurations..tue oct 28 10:58:52 2014 - [info] checking replication filtering settings..tue oct 28 10:58:52 2014 - [info]  binlog_do_db= , binlog_ignore_db=tue oct 28 10:58:52 2014 - [info]  replication filtering check ok.tue oct 28 10:58:52 2014 - [info] starting ssh connection tests..tue oct 28 10:58:55 2014 - [info] all ssh connection tests passed successfully.tue oct 28 10:58:55 2014 - [info] checking mha node version..tue oct 28 10:58:55 2014 - [info]  version check ok.tue oct 28 10:58:55 2014 - [info] checking ssh publickey authentication settings on the current master..tue oct 28 10:58:56 2014 - [info] healthcheck: ssh to 192.168.1.225 is reachable.tue oct 28 10:58:56 2014 - [info] master mha node version is 0.52.tue oct 28 10:58:56 2014 - [info] checking recovery script configurations on the current master..tue oct 28 10:58:56 2014 - [info]   executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mydata --output_file=/usr/local/mha/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000013tue oct 28 10:58:56 2014 - [info]   connecting to root@192.168.1.225(192.168.1.225)..creating /usr/local/mha if not exists..    ok.checking output directory is accessible or not..ok.binlog found at /mydata, up to mysql-bin.000013tue oct 28 10:58:56 2014 - [info] master setting check done.tue oct 28 10:58:56 2014 - [info] checking ssh publickey authentication and checking recovery script configurations on all alive slave servers..tue oct 28 10:58:56 2014 - [info]   executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.1.226 --slave_ip=192.168.1.226 --slave_port=3306 --workdir=/usr/local/mha --target_version=5.6.17-log --manager_version=0.53 --relay_log_info=/mydata/relay-log.info  --relay_dir=/mydata/  --slave_pass=xxxtue oct 28 10:58:56 2014 - [info]   connecting to root@192.168.1.226(192.168.1.226:22)..checking slave recovery environment settings..opening /mydata/relay-log.info ... ok.relay log found at /mydata, up to mysql-relay-bin.000006temporary relay log file is /mydata/mysql-relay-bin.000006testing mysql connection and privileges.. done.testing mysqlbinlog output.. done.cleaning up test file(s).. done.tue oct 28 10:58:57 2014 - [info]   executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.1.227 --slave_ip=192.168.1.227 --slave_port=3306 --workdir=/usr/local/mha --target_version=5.6.17-log --manager_version=0.53 --relay_log_info=/mydata/relay-log.info  --relay_dir=/mydata/  --slave_pass=xxxtue oct 28 10:58:57 2014 - [info]   connecting to root@192.168.1.227(192.168.1.227:22)..checking slave recovery environment settings..opening /mydata/relay-log.info ... ok.relay log found at /mydata, up to mysql-relay-bin.000016temporary relay log file is /mydata/mysql-relay-bin.000016testing mysql connection and privileges.. done.testing mysqlbinlog output.. done.cleaning up test file(s).. done.tue oct 28 10:58:58 2014 - [info] slaves settings check done.tue oct 28 10:58:58 2014 - [info]192.168.1.225 (current master)+--192.168.1.226+--192.168.1.227 tue oct 28 10:58:58 2014 - [info] checking replication health on 192.168.1.226..tue oct 28 10:58:58 2014 - [info]  ok.tue oct 28 10:58:58 2014 - [info] checking replication health on 192.168.1.227..tue oct 28 10:58:58 2014 - [info]  ok.tue oct 28 10:58:58 2014 - [info] checking master_ip_failover_script status:tue oct 28 10:58:58 2014 - [info]   /usr/local/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.225 --orig_master_ip=192.168.1.225 --orig_master_port=3306  in script test====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.1.231;/sbin/arping -i eth0 -c 3 -s 192.168.1.231 192.168.1.1 >/dev/null 2>&1=== checking the status of the script.. oktue oct 28 10:59:02 2014 - [info]  ok.tue oct 28 10:59:02 2014 - [warning] shutdown_script is not defined.tue oct 28 10:59:02 2014 - [info] got exit code 0 (not master dead). mysql replication health is ok.
7: 管理节点启动manager进程
# nohup masterha_manager --conf=/etc/mha/mha.conf > /tmp/mha_manager.log  &1 & # masterha_check_status --conf=/etc/mha/mha.confmha (pid:4228) is running(0:ping_ok), master:192.168.1.225
三：测试mhafailover
1：master节点(关闭mysql进程)
650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k14551h-5.jpg title=图片6.png alt=wkiol1rpsacs9tgpaagst84sj3a742.jpg>
2：226服务器(自动获取vip，且转换为master)
650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k145lo-6.jpg title=图片7.png alt=wkiom1rpswbhoo4yaaoulwd7yai295.jpg>650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k14514s-7.jpg title=图片8.png alt=wkiol1rpscfgssmeaakcp5m1lj0299.jpg>
3：227服务器(自动从新的master上进行复制)
650) this.width=650; src=http://www.68idc.cn/help/uploads/allimg/151213/0k14531z-8.jpg title=图片9.png alt=wkiom1rpsyqx_c0zaas_urv_ehe101.jpg>
4：管理节点查看到的日志(fairover之后，mha manager进程会自动关闭)
# tail -f /usr/local/mha/manager.log
tue oct 28 11:31:36 2014 - [info] healthcheck: ssh to 192.168.1.225 is reachable.monitoring server 192.168.1.226 is reachable, master is not reachable from 192.168.1.226. ok.monitoring server 192.168.1.227 is reachable, master is not reachable from 192.168.1.227. ok.tue oct 28 11:31:37 2014 - [info] master is not reachable from all other monitoring servers. failover should start.tue oct 28 11:31:37 2014 - [warning] got error on mysql connect: 2013 (lost connection to mysql server at 'reading initial communication packet', system error: 111)tue oct 28 11:31:37 2014 - [warning] connection failed 1 time(s)..tue oct 28 11:31:38 2014 - [warning] got error on mysql connect: 2013 (lost connection to mysql server at 'reading initial communication packet', system error: 111)tue oct 28 11:31:38 2014 - [warning] connection failed 2 time(s)..tue oct 28 11:31:39 2014 - [warning] got error on mysql connect: 2013 (lost connection to mysql server at 'reading initial communication packet', system error: 111)tue oct 28 11:31:39 2014 - [warning] connection failed 3 time(s)..tue oct 28 11:31:39 2014 - [warning] master is not reachable from health checker!tue oct 28 11:31:39 2014 - [warning] master 192.168.1.225(192.168.1.225:3306) is not reachable!tue oct 28 11:31:39 2014 - [warning] ssh is reachable.tue oct 28 11:31:39 2014 - [info] connecting to a master server failed. reading configuration file /etc/masterha_default.cnf and /etc/mha/mha.conf again, and trying to connect to all servers to check server status..tue oct 28 11:31:39 2014 - [warning] global configuration file /etc/masterha_default.cnf not found. skipping.tue oct 28 11:31:39 2014 - [info] reading application default configurations from /etc/mha/mha.conf..tue oct 28 11:31:39 2014 - [info] reading server configurations from /etc/mha/mha.conf..tue oct 28 11:31:39 2014 - [info] dead servers:tue oct 28 11:31:39 2014 - [info]   192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info] alive servers:tue oct 28 11:31:39 2014 - [info]   192.168.1.226(192.168.1.226:3306)tue oct 28 11:31:39 2014 - [info]   192.168.1.227(192.168.1.227:3306)tue oct 28 11:31:39 2014 - [info] alive slaves:tue oct 28 11:31:39 2014 - [info]   192.168.1.226(192.168.1.226:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:39 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:39 2014 - [info]   192.168.1.227(192.168.1.227:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:39 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:39 2014 - [info] checking slave configurations..tue oct 28 11:31:39 2014 - [info] checking replication filtering settings..tue oct 28 11:31:39 2014 - [info]  replication filtering check ok.tue oct 28 11:31:39 2014 - [info] master is down!tue oct 28 11:31:39 2014 - [info] terminating monitoring script.tue oct 28 11:31:39 2014 - [info] got exit code 20 (master dead).tue oct 28 11:31:39 2014 - [info] mha::masterfailover version 0.53.tue oct 28 11:31:39 2014 - [info] starting master failover.tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:39 2014 - [info] * phase 1: configuration check phase..tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:39 2014 - [info] dead servers:tue oct 28 11:31:39 2014 - [info]   192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info] checking master reachability via mysql(double check)..tue oct 28 11:31:39 2014 - [info]  ok.tue oct 28 11:31:39 2014 - [info] alive servers:tue oct 28 11:31:39 2014 - [info]   192.168.1.226(192.168.1.226:3306)tue oct 28 11:31:39 2014 - [info]   192.168.1.227(192.168.1.227:3306)tue oct 28 11:31:39 2014 - [info] alive slaves:tue oct 28 11:31:39 2014 - [info]   192.168.1.226(192.168.1.226:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:39 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:39 2014 - [info]   192.168.1.227(192.168.1.227:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:39 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:39 2014 - [info] ** phase 1: configuration check phase completed.tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:39 2014 - [info] * phase 2: dead master shutdown phase..tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:39 2014 - [info] forcing shutdown so that applications never connect to the current master..tue oct 28 11:31:39 2014 - [info] executing master ip deactivatation script:tue oct 28 11:31:39 2014 - [info]   /usr/local/scripts/master_ip_failover --orig_master_host=192.168.1.225 --orig_master_ip=192.168.1.225 --orig_master_port=3306 --command=stopssh --ssh_user=root  in script test====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.1.231;/sbin/arping -i eth0 -c 3 -s 192.168.1.231 192.168.1.1 >/dev/null 2>&1=== disabling the vip on old master: 192.168.1.225tue oct 28 11:31:39 2014 - [info]  done.tue oct 28 11:31:39 2014 - [warning] shutdown_script is not set. skipping explicit shutting down of the dead master.tue oct 28 11:31:39 2014 - [info] * phase 2: dead master shutdown phase completed.tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:39 2014 - [info] * phase 3: master recovery phase..tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:39 2014 - [info] * phase 3.1: getting latest slaves phase..tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:39 2014 - [info] the latest binary log file/position on all slaves is mysql-bin.000016:120tue oct 28 11:31:39 2014 - [info] latest slaves (slaves that received relay log files to the latest):tue oct 28 11:31:39 2014 - [info]   192.168.1.226(192.168.1.226:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:39 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:39 2014 - [info]   192.168.1.227(192.168.1.227:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:39 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:39 2014 - [info] the oldest binary log file/position on all slaves is mysql-bin.000016:120tue oct 28 11:31:39 2014 - [info] oldest slaves:tue oct 28 11:31:39 2014 - [info]   192.168.1.226(192.168.1.226:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:39 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:39 2014 - [info]   192.168.1.227(192.168.1.227:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:39 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:39 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:39 2014 - [info] * phase 3.2: saving dead master's binlog phase..tue oct 28 11:31:39 2014 - [info]tue oct 28 11:31:40 2014 - [info] fetching dead master's binary logs..tue oct 28 11:31:40 2014 - [info] executing command on the dead master 192.168.1.225(192.168.1.225:3306): save_binary_logs --command=save --start_file=mysql-bin.000016  --start_pos=120 --binlog_dir=/mydata --output_file=/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53creating /usr/local/mha if not exists..    ok.concat binary/relay logs from mysql-bin.000016 pos 120 to mysql-bin.000016 eof into /usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog ..dumping binlog format description event, from position 0 to 120.. ok.dumping effective binlog data from /mydata/mysql-bin.000016 position 120 to tail(143).. ok.concat succeeded.tue oct 28 11:31:41 2014 - [info] scp from root@192.168.1.225:/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog to local:/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog succeeded.tue oct 28 11:31:41 2014 - [info] healthcheck: ssh to 192.168.1.226 is reachable.tue oct 28 11:31:42 2014 - [info] healthcheck: ssh to 192.168.1.227 is reachable.tue oct 28 11:31:42 2014 - [info]tue oct 28 11:31:42 2014 - [info] * phase 3.3: determining new master phase..tue oct 28 11:31:42 2014 - [info]tue oct 28 11:31:42 2014 - [info] finding the latest slave that has all relay logs for recovering other slaves..tue oct 28 11:31:42 2014 - [info] all slaves received relay logs to the same position. no need to resync each other.tue oct 28 11:31:42 2014 - [info] searching new master from slaves..tue oct 28 11:31:42 2014 - [info]  candidate masters from the configuration file:tue oct 28 11:31:42 2014 - [info]   192.168.1.226(192.168.1.226:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:42 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:42 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:42 2014 - [info]   192.168.1.227(192.168.1.227:3306)  version=5.6.17-log (oldest major version between slaves) log-bin:enabledtue oct 28 11:31:42 2014 - [info]     replicating from 192.168.1.225(192.168.1.225:3306)tue oct 28 11:31:42 2014 - [info]     primary candidate for the new master (candidate_master is set)tue oct 28 11:31:42 2014 - [info]  non-candidate masters:tue oct 28 11:31:42 2014 - [info]  searching from candidate_master slaves which have received the latest relay log events..tue oct 28 11:31:42 2014 - [info] new master is 192.168.1.226(192.168.1.226:3306)tue oct 28 11:31:42 2014 - [info] starting master failover..tue oct 28 11:31:42 2014 - [info]from:192.168.1.225 (current master)+--192.168.1.226+--192.168.1.227 to:192.168.1.226 (new master)+--192.168.1.227tue oct 28 11:31:42 2014 - [info]tue oct 28 11:31:42 2014 - [info] * phase 3.3: new master diff log generation phase..tue oct 28 11:31:42 2014 - [info]tue oct 28 11:31:42 2014 - [info]  this server has all relay logs. no need to generate diff files from the latest slave.tue oct 28 11:31:42 2014 - [info] sending binlog..tue oct 28 11:31:43 2014 - [info] scp from local:/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog to root@192.168.1.226:/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog succeeded.tue oct 28 11:31:43 2014 - [info]tue oct 28 11:31:43 2014 - [info] * phase 3.4: master log apply phase..tue oct 28 11:31:43 2014 - [info]tue oct 28 11:31:43 2014 - [info] *notice: if any error happens from this phase, manual recovery is needed.tue oct 28 11:31:43 2014 - [info] starting recovery on 192.168.1.226(192.168.1.226:3306)..tue oct 28 11:31:43 2014 - [info]  generating diffs succeeded.tue oct 28 11:31:43 2014 - [info] waiting until all relay logs are applied.tue oct 28 11:31:43 2014 - [info]  done.tue oct 28 11:31:43 2014 - [info] getting slave status..tue oct 28 11:31:43 2014 - [info] this slave(192.168.1.226)'s exec_master_log_pos equals to read_master_log_pos(mysql-bin.000016:120). no need to recover from exec_master_log_pos.tue oct 28 11:31:43 2014 - [info] connecting to the target slave host 192.168.1.226, running recover script..tue oct 28 11:31:43 2014 - [info] executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.1.226 --slave_ip=192.168.1.226  --slave_port=3306 --apply_files=/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog --workdir=/usr/local/mha --target_version=5.6.17-log --timestamp=20141028113139 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxxtue oct 28 11:31:43 2014 - [info]applying differential binary/relay log files /usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog on 192.168.1.226:3306. this may take long time...applying log files succeeded.tue oct 28 11:31:43 2014 - [info]  all relay logs were successfully applied.tue oct 28 11:31:43 2014 - [info] getting new master's binlog name and position..tue oct 28 11:31:43 2014 - [info]  mysql-bin.000010:120tue oct 28 11:31:43 2014 - [info]  all other slaves should start replication from here. statement should be: change master to master_host='192.168.1.226', master_port=3306, master_log_file='mysql-bin.000010', master_log_pos=120, master_user='replication', master_password='xxx';tue oct 28 11:31:43 2014 - [info] executing master ip activate script:tue oct 28 11:31:43 2014 - [info]   /usr/local/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.1.225 --orig_master_ip=192.168.1.225 --orig_master_port=3306 --new_master_host=192.168.1.226 --new_master_ip=192.168.1.226 --new_master_port=3306  in script test====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.1.231;/sbin/arping -i eth0 -c 3 -s 192.168.1.231 192.168.1.1 >/dev/null 2>&1=== enabling the vip - 192.168.1.231 on the new master - 192.168.1.226tue oct 28 11:31:46 2014 - [info]  ok.tue oct 28 11:31:46 2014 - [info] setting read_only=0 on 192.168.1.226(192.168.1.226:3306)..tue oct 28 11:31:46 2014 - [info]  ok.tue oct 28 11:31:46 2014 - [info] ** finished master recovery successfully.tue oct 28 11:31:46 2014 - [info] * phase 3: master recovery phase completed.tue oct 28 11:31:46 2014 - [info]tue oct 28 11:31:46 2014 - [info] * phase 4: slaves recovery phase..tue oct 28 11:31:46 2014 - [info]tue oct 28 11:31:46 2014 - [info] * phase 4.1: starting parallel slave diff log generation phase..tue oct 28 11:31:46 2014 - [info]tue oct 28 11:31:46 2014 - [info] -- slave diff file generation on host 192.168.1.227(192.168.1.227:3306) started, pid: 2761. check tmp log /usr/local/mha/192.168.1.227_3306_20141028113139.log if it takes time..tue oct 28 11:31:46 2014 - [info]tue oct 28 11:31:46 2014 - [info] log messages from 192.168.1.227 ...tue oct 28 11:31:46 2014 - [info]tue oct 28 11:31:46 2014 - [info]  this server has all relay logs. no need to generate diff files from the latest slave.tue oct 28 11:31:46 2014 - [info] end of log messages from 192.168.1.227.tue oct 28 11:31:46 2014 - [info] -- 192.168.1.227(192.168.1.227:3306) has the latest relay log events.tue oct 28 11:31:46 2014 - [info] generating relay diff files from the latest slave succeeded.tue oct 28 11:31:46 2014 - [info]tue oct 28 11:31:46 2014 - [info] * phase 4.2: starting parallel slave log apply phase..tue oct 28 11:31:46 2014 - [info]tue oct 28 11:31:46 2014 - [info] -- slave recovery on host 192.168.1.227(192.168.1.227:3306) started, pid: 2763. check tmp log /usr/local/mha/192.168.1.227_3306_20141028113139.log if it takes time..tue oct 28 11:31:48 2014 - [info]tue oct 28 11:31:48 2014 - [info] log messages from 192.168.1.227 ...tue oct 28 11:31:48 2014 - [info]tue oct 28 11:31:46 2014 - [info] sending binlog..tue oct 28 11:31:47 2014 - [info] scp from local:/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog to root@192.168.1.227:/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog succeeded.tue oct 28 11:31:47 2014 - [info] starting recovery on 192.168.1.227(192.168.1.227:3306)..tue oct 28 11:31:47 2014 - [info]  generating diffs succeeded.tue oct 28 11:31:47 2014 - [info] waiting until all relay logs are applied.tue oct 28 11:31:47 2014 - [info]  done.tue oct 28 11:31:47 2014 - [info] getting slave status..tue oct 28 11:31:47 2014 - [info] this slave(192.168.1.227)'s exec_master_log_pos equals to read_master_log_pos(mysql-bin.000016:120). no need to recover from exec_master_log_pos.tue oct 28 11:31:47 2014 - [info] connecting to the target slave host 192.168.1.227, running recover script..tue oct 28 11:31:47 2014 - [info] executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.1.227 --slave_ip=192.168.1.227  --slave_port=3306 --apply_files=/usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog --workdir=/usr/local/mha --target_version=5.6.17-log --timestamp=20141028113139 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxxtue oct 28 11:31:47 2014 - [info]applying differential binary/relay log files /usr/local/mha/saved_master_binlog_from_192.168.1.225_3306_20141028113139.binlog on 192.168.1.227:3306. this may take long time...applying log files succeeded.tue oct 28 11:31:47 2014 - [info]  all relay logs were successfully applied.tue oct 28 11:31:47 2014 - [info]  resetting slave 192.168.1.227(192.168.1.227:3306) and starting replication from the new master 192.168.1.226(192.168.1.226:3306)..tue oct 28 11:31:48 2014 - [info]  executed change master.tue oct 28 11:31:48 2014 - [info]  slave started.tue oct 28 11:31:48 2014 - [info] end of log messages from 192.168.1.227.tue oct 28 11:31:48 2014 - [info] -- slave recovery on host 192.168.1.227(192.168.1.227:3306) succeeded.tue oct 28 11:31:48 2014 - [info] all new slave servers recovered successfully.tue oct 28 11:31:48 2014 - [info]tue oct 28 11:31:48 2014 - [info] * phase 5: new master cleanup phease..tue oct 28 11:31:48 2014 - [info]tue oct 28 11:31:48 2014 - [info] resetting slave info on the new master..tue oct 28 11:31:48 2014 - [info]  192.168.1.226: resetting slave info succeeded.tue oct 28 11:31:48 2014 - [info] master failover to 192.168.1.226(192.168.1.226:3306) completed successfully.tue oct 28 11:31:48 2014 - [info] ----- failover report ----- mha: mysql master failover 192.168.1.225 to 192.168.1.226 succeeded master 192.168.1.225 is down! check mha manager logs at test_redis:/usr/local/mha/manager.log for details. started automated(non-interactive) failover.invalidated master ip address on 192.168.1.225.the latest slave 192.168.1.226(192.168.1.226:3306) has all relay logs for recovery.selected 192.168.1.226 as a new master.192.168.1.226: ok: applying all logs succeeded.192.168.1.226: ok: activated master ip address.192.168.1.227: this host has the latest relay log events.generating relay diff files from&nb

再谈MysqlMHA

推荐信息