mha 在测试手动故障转移和在线切换的过程中,碰到了2个比较诡异的问题,在使用ip地址调用的时候均无法测试成功,出现了detected
mha 在测试手动故障转移和在线切换的过程中,,碰到了2个比较诡异的问题,在使用ip地址调用的时候均无法测试成功,出现了detected dead master xxx does not match with specified dead master以及xxx is not alive。下面是这2个错误问题的描述及解决方案。
1、mha配置文件
[root@vdbsrv4 ~]# more /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
user=mha
password=xxx
ssh_user=root
repl_user=repl
repl_password=repl
ping_interval=1
shutdown_script=
master_ip_online_change_script=
report_script=
#master_ip_failover_script=/usr/bin/master_ip_failover
master_ip_failover_script=/tmp/master_ip_failover
[server1]
hostname=vdbsrv1
master_binlog_dir=/data/mysqldata
[server2]
hostname=vdbsrv2
master_binlog_dir=/data/mysqldata
[server3]
hostname=vdbsrv3
master_binlog_dir=/data/mysqldata/
#candidate_master=1
2、手动故障转移时的错误提示
[root@vdbsrv4 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.6 \
> --dead_master_port=3306 --new_master_host=192.168.1.8 --new_master_port=3306 --ignore_last_failover
--dead_master_ip= is not set. using 192.168.1.6.
wed apr 21 09:08:30 2015 - [warning] global configuration file /etc/masterha_default.cnf not found. skipping.
wed apr 21 09:08:30 2015 - [info] reading application default configuration from /etc/masterha/app1.cnf..
wed apr 21 09:08:30 2015 - [info] reading server configuration from /etc/masterha/app1.cnf..
wed apr 21 09:08:30 2015 - [info] mha::masterfailover version 0.56.
wed apr 21 09:08:30 2015 - [info] starting master failover.
wed apr 21 09:08:30 2015 - [info]
wed apr 21 09:08:30 2015 - [info] * phase 1: configuration check phase..
wed apr 21 09:08:30 2015 - [info]
wed apr 21 09:08:31 2015 - [info] gtid failover mode = 0
wed apr 21 09:08:31 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/mha/masterfailover.pm, ln2083] detected dead master vdbsrv1(192.168.1.6:3306)
does not match with specified dead master 192.168.1.6(192.168.1.6:3306)!
wed apr 21 09:08:31 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/mha/masterfailover.pm, ln2151]
got error: at /usr/bin/masterha_master_switch line 53
3、在线切换时的错误提示
[root@vdbsrv4 ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.8 \
> --orig_master_is_new_slave --running_updates_limit=10000
tue apr 21 11:50:14 2015 - [info] mha::masterrotate version 0.56.
tue apr 21 11:50:14 2015 - [info] starting online master switch..
tue apr 21 11:50:14 2015 - [info]
tue apr 21 11:50:14 2015 - [info] * phase 1: configuration check phase..
tue apr 21 11:50:14 2015 - [info]
tue apr 21 11:50:14 2015 - [warning] global configuration file /etc/masterha_default.cnf not found. skipping.
tue apr 21 11:50:14 2015 - [info] reading application default configuration from /etc/masterha/app1.cnf..
tue apr 21 11:50:14 2015 - [info] reading server configuration from /etc/masterha/app1.cnf..
tue apr 21 11:50:14 2015 - [info] gtid failover mode = 0
tue apr 21 11:50:14 2015 - [info] current alive master: vdbsrv1(192.168.1.6:3306)
tue apr 21 11:50:14 2015 - [info] alive slaves:
tue apr 21 11:50:14 2015 - [info] vdbsrv2(192.168.1.7:3306) version=5.6.22-log (oldest major version between slaves) log-bin:enabled
tue apr 21 11:50:14 2015 - [info] replicating from 192.168.1.6(192.168.1.6:3306)
tue apr 21 11:50:14 2015 - [info] vdbsrv3(192.168.1.8:3306) version=5.6.22-log (oldest major version between slaves) log-bin:enabled
tue apr 21 11:50:14 2015 - [info] replicating from 192.168.1.6(192.168.1.6:3306)
it is better to execute flush no_write_to_binlog tables on the master before switching. is it ok to execute on vdbsrv1(192.168.1.6:3306)? (yes/no): yes
tue apr 21 11:50:41 2015 - [info] executing flush no_write_to_binlog tables. this may take long time..
tue apr 21 11:50:41 2015 - [info] ok.
tue apr 21 11:50:41 2015 - [info] checking mha is not monitoring or doing failover..
tue apr 21 11:50:41 2015 - [info] checking replication health on vdbsrv2..
tue apr 21 11:50:41 2015 - [info] ok.
tue apr 21 11:50:41 2015 - [info] checking replication health on vdbsrv3..
tue apr 21 11:50:41 2015 - [info] ok.
tue apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/mha/masterrotate.pm, ln228] 192.168.1.8 is not alive!
tue apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/mha/masterrotate.pm, ln613] failed to get new master!
tue apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/mha/masterrotate.pm, ln652] got error: at /usr/bin/masterha_master_switch line 53
4、解决方案
直接将ip地址替换为主机名后问题解决,不再演示。
按官方文档描述,参数--dead_master_host=(hostname),而不是可以用ip地址。
if these parameters are not set, --dead_master_ip will be the result of gethostbyname(dead_master_host), and --dead_master_port will be 3306.
本文永久更新链接地址: