keepalived
由于在生产环境使用了mysqlcluster,需要实现高可用负载均衡,这里提供了keepalived+haproxy来实现.
keepalived主要功能是实现真实机器的故障隔离及负载均衡器间的失败切换.可在第3,4,5层交换.它通过vrrpv2(virtual router redundancy protocol) stack实现的.
layer3:keepalived会定期向服务器群中的服务器.发送一个icmp的数据包(既我们平时用的ping程序),如果发现某台服务的ip地址没有激活,keepalived便报告这台服务器失效,并将它从服务器群中剔除,这种情况的典型例子是某台服务器被非法关机。layer3的方式是以服务器的ip地址是否有效作为服务器工作正常与否的标准。
layer4:主要以tcp端口的状态来决定服务器工作正常与否。如web server的服务端口一般是80,如果keepalived检测到80端口没有启动,则keepalived将把这台服务器从服务器群中剔除。
layer5:在网络上占用的带宽也要大一些。keepalived将根据用户的设定检查服务器程序的运行是否正常,如果与用户的设定不相符,则keepalived将把服务器从服务器群中剔除。
software design
keepalived启动后会有单个进程
8352 ? ss 0:00 /usr/sbin/keepalived8353 ? s 0:00 \_ /usr/sbin/keepalived8356 ? s 0:01 \_ /usr/sbin/keepalived
父进程:内存管理,子进程管理等等
子进程:vrrp子进程
子进程:healthchecking 子进程
实例
2台mysqlcluster 10.1.6.203 master 10.1.6.205 backup
vip 10.1.6.173
目的访问10.1.6.173 3366端口 分别轮询通过haproxy转发到10.1.6.203 3306 和10.1.6.205 3306
mysqlcluster搭建参照之前博客,这里在2台机上安装keepalived
root@10.1.6.203:~# apt-get install keepalivedroot@10.1.6.203:~# cat /etc/keepalived/keepalived.conf vrrp_script chk_haproxy { script killall -0 haproxy # verify the pid existance interval 2 # check every 2 seconds weight -2 # add 2 points of prio if ok} vrrp_instance vi_1 { interface eth1 # interface to monitor state master virtual_router_id 51 # assign one id for this route priority 101 # 101 on master, 100 on backup nopreempt debug virtual_ipaddress { 10.1.6.173 } track_script { #注意大括号空格 chk_haproxy } notify_master /etc/keepalived/scripts/start_haproxy.sh #表示当切换到master状态时,要执行的脚本 notify_fault /etc/keepalived/scripts/stop_keepalived.sh #故障时执行的脚本 notify_stop /etc/keepalived/scripts/stop_haproxy.sh #keepalived停止运行前运行notify_stop指定的脚本 }
vrrpd配置包括三个类:
vrrp同步组(synchroization group) vrrp实例(vrrp instance) vrrp脚本这里使用了 vrrp实例, vrrp脚本
注意配置选项:
stat:指定instance(initial)的初始状态,就是说在配置好后,这台服务器的初始状态就是这里指定的,但这里指定的不算,还是得要通过竞选通过优先级来确定,里如果这里设置为master,但如若他的优先级不及另外一台,那么这台在发送通告时,会发送自己的优先级,另外一台发现优先级不如自己的高,那么他会就回抢占为master
interface:实例绑定的网卡,因为在配置虚拟ip的时候必须是在已有的网卡上添加的 priority 101:设置本节点的优先级,优先级高的为master debug:debug级别 nopreempt:设置为不抢占vrrp_script chk_haproxy { script killall -0 haproxy # verify the pid existance interval 2 # check every 2 seconds 脚本执行间隔 weight -2 # add 2 points of prio if ok 脚本结果导致的优先级变更:2表示优先级+2;-2则表示优先级-2}
然后在实例(vrrp_instance)里面引用,有点类似脚本里面的函数引用一样:先定义,后引用函数名
track_script { chk_haproxy }
注意:vrrp脚本(vrrp_script)和vrrp实例(vrrp_instance)属于同一个级别
root@10.1.6.203:scripts# cat start_haproxy.sh #!/bin/bash sleep 5get=`ip addr |grep 10.1.6.173 |wc -l`echo $get >> /etc/keepalived/scripts/start_ha.log if [ $get -eq 1 ]then echo `date +%c` success to get vip >> /etc/keepalived/scripts/start_ha.log /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfgelse echo `date +%c` can not get vip >> /etc/keepalived/scripts/start_ha.logfiroot@10.1.6.203:scripts# cat stop_keepalived.sh #!/bin/bash pid=`pidof keepalived`if [ $pid == ]then echo `date +%c` no keepalived process id >> /etc/keepalived/scripts/stop_keep.logelse echo `date +%c` will stop keepalived >> /etc/keepalived/scripts/stop_keep.log /etc/init.d/keepalived stopfi /etc/init.d/keepalived stop root@10.1.6.203:scripts# cat stop_haproxy.sh #!/bin/bash pid=`pidof haproxy`echo `date +%c` stop haproxy >> /etc/keepalived/scripts/stop_ha.logkill -9 $pid
同理配置10.1.6.205
root@10.1.6.205:~# cat /etc/keepalived/keepalived.conf vrrp_script chk_haproxy { script killall -0 haproxy # verify the pid existance interval 2 # check every 2 seconds weight 2 # add 2 points of prio if ok} vrrp_instance vi_1 { interface eth1 # interface to monitor state backup virtual_router_id 51 # assign one id for this route priority 100 # 101 on master, 100 on backup virtual_ipaddress { 10.1.6.173 } track_script { chk_haproxy } notify_master /etc/keepalived/scripts/start_haproxy.shnotify_fault /etc/keepalived/scripts/stop_keepalived.shnotify_stop /etc/keepalived/scripts/stop_haproxy.sh }
haproxy
下面再介绍下haproxy
haproxy是一款基于tcp(第四层)和http(第七层)应用的代理软件,它也可作为负载均衡器.可以支持数以万计的并发连接.同时可以保护服务器不暴露到网络上,通过端口映射.它还自带监控服务器状态的页面.
安装haproxy
wget -o/tmp/haproxy-1.4.22.tar.gz http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.22.tar.gztar xvfz /tmp/haproxy-1.4.22.tar.gz -c /tmp/cd /tmp/haproxy-1.4.22make target=linux26make install
haproxy需要对每一个mysqlcluster服务器进行健康检查
1.在2台主机分别配置haproxy.cfg
root@10.1.6.203:scripts# cat /etc/haproxy/haproxy.cfg global maxconn 51200 #默认最大连接数 #uid 99 #gid 99 daemon #以后台形式运行haproxy #quiet nbproc 1 #进程数量(可以设置多个进程提高性能) pidfile /etc/haproxy/haproxy.pid #haproxy的pid存放路径,启动进程的用户必须有权限访问此文件 defaults mode tcp #所处理的类别 (#7层 http;4层tcp ) option redispatch #serverid对应的服务器挂掉后,强制定向到其他健康的服务器 option abortonclose #当服务器负载很高的时候,自动结束掉当前队列处理比较久的连接 timeout connect 5000s #连接超时 timeout client 50000s #客户端超时 timeout server 50000s #服务器超时 log 127.0.0.1 local0 #错误日志记录 balance roundrobin #默认的负载均衡的方式,轮询方式 listen proxy bind 10.1.6.173:3366 #监听端口 mode tcp #http的7层模式 option httpchk #心跳检测的文件 server db1 10.1.6.203:3306 weight 1 check port 9222 inter 12000 rise 3 fall 3 #服务器定义,check inter 12000是检测心跳频率 rise 3是3次正确认为服务器可用, fall 3是3次失败认为服务器不可用,weight代表权重 server db2 10.1.6.205:3306 weight 1 check port 9222 inter 12000 rise 3 fall 3 listen haproxy_stats mode http bind 10.1.6.173:8888 option httplog stats refresh 5s stats uri /status #网站健康检测url,用来检测haproxy管理的网站是否可以用,正常返回200,不正常返回503 stats realm haproxy manager stats auth admin:p@a1szs24 #账号密码root@10.1.6.205:~$ cat /etc/haproxy/haproxy.cfg global maxconn 51200 #uid 99 #gid 99 daemon #quiet nbproc 1 pidfile /etc/haproxy/haproxy.pid defaults mode tcp option redispatch option abortonclose timeout connect 5000s timeout client 50000s timeout server 50000s log 127.0.0.1 local0 balance roundrobin listen proxy bind 10.1.6.173:3366 mode tcp option httpchk server db1 10.1.6.203:3306 weight 1 check port 9222 inter 12000 rise 3 fall 3 server db2 10.1.6.205:3306 weight 1 check port 9222 inter 12000 rise 3 fall 3 listen haproxy_stats mode http bind 10.1.6.173:8888 option httplog stats refresh 5s stats uri /status stats realm haproxy manager stats auth admin:p@a1szs24
2.安装xinetd
root@10.1.6.203:~# apt-get install xinetd
3.在每个节点添加xinetd服务脚本和mysqlchk端口号
root@10.1.6.203:~# vim /etc/xinetd.d/mysqlchk # default: on# description: mysqlchkservice mysqlchk #需要在servive定义{ flags = reuse socket_type = stream port = 9222 wait = no user = nobody server = /opt/mysqlchk log_on_failure += userid disable = no per_source = unlimited bind = 10.1.6.173} root@10.1.6.203:~# vim /etc/services mysqlchk 9222/tcp # mysqlchk
4.编写mysqlchk监控服务脚本
root@10.1.6.203:~# ls -l /opt/mysqlchk -rwxr--r-- 1 nobody root 1994 2013-09-17 11:27 /opt/mysqlchkroot@10.1.6.203:~# cat /opt/mysqlchk #!/bin/bash## this script checks if a mysql server is healthy running on localhost. it will# return:# http/1.x 200 ok\r (if mysql is running smoothly)# - or -# http/1.x 500 internal server error\r (else)## the purpose of this script is make haproxy capable of monitoring mysql properly# mysql_host=localhostmysql_socket=/var/run/mysqld/mysqld.sockmysql_username=mysqlchkusr #该账户密码需要在mysql里添加mysql_password=secretmysql_opts=-n -q -atmp_file=/dev/shm/mysqlchk.$$.outerr_file=/dev/shm/mysqlchk.$$.errforce_fail=/dev/shm/proxyoffmysql_bin=/opt/mysqlcluster/mysql-cluster-gpl-7.2.6-linux2.6-x86_64/bin/mysqlcheck_query=select 1 preflight_check(){ for i in $tmp_file $err_file; do if [ -f $i ]; then if [ ! -w $i ]; then echo -e http/1.1 503 service unavailable\r\n echo -e content-type: content-type: text/plain\r\n echo -e \r\n echo -e cannot write to $i\r\n echo -e \r\n exit 1 fi fi done} return_ok(){ echo -e http/1.1 200 ok\r\n echo -e content-type: text/html\r\n echo -e content-length: 43\r\n echo -e \r\n echo -e mysql is running.\r\n echo -e \r\n rm $err_file $tmp_file exit 0}return_fail(){ echo -e http/1.1 503 service unavailable\r\n echo -e content-type: text/html\r\n echo -e content-length: 42\r\n echo -e \r\n echo -e mysql is *down*.\r\n sed -e 's/\n$/\r\n/' $err_file echo -e \r\n rm $err_file $tmp_file exit 1}preflight_checkif [ -f $force_fail ]; then echo $force_fail found > $err_file return_fail;fi$mysql_bin $mysql_opts --host=$mysql_host --socket=$mysql_socket --user=$mysql_username --password=$mysql_password -e $check_query > $tmp_file 2> $err_fileif [ $? -ne 0 ]; then return_fail;fireturn_ok;
测试
2个节点开启keepalived(主节点会获得vip,自动拉起haproxy),xinetd
root@10.1.6.203:~# ip add1: lo: mtu 16436 qdisc noqueue state unknown link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo2: eth0: mtu 1500 qdisc pfifo_fast state down qlen 1000 link/ether 00:26:b9:36:0f:81 brd ff:ff:ff:ff:ff:ff inet 211.151.105.186/26 brd 211.151.105.191 scope global eth03: eth1: mtu 1500 qdisc pfifo_fast state up qlen 1000 link/ether 00:26:b9:36:0f:83 brd ff:ff:ff:ff:ff:ff inet 10.1.6.203/24 brd 10.1.6.255 scope global eth1 inet 10.1.6.173/32 scope global eth14: eth2: mtu 1500 qdisc noop state down qlen 1000 link/ether 00:26:b9:36:0f:85 brd ff:ff:ff:ff:ff:ff5: eth3: mtu 1500 qdisc noop state down qlen 1000 link/ether 00:26:b9:36:0f:87 brd ff:ff:ff:ff:ff:ffroot@10.1.6.203:~# netstat -tunlp | grep hatcp 0 0 10.1.6.173:3366 0.0.0.0:* listen 1042/haproxy tcp 0 0 10.1.6.203:8888 0.0.0.0:* listen 1042/haproxy udp 0 0 0.0.0.0:56562 0.0.0.0:* 1042/haproxy root@10.1.6.203:~# netstat -tunlp | grep xinetcp 0 0 10.1.6.203:9222 0.0.0.0:* listen 30897/xinetd root@10.1.6.203:~# ps -ef | grep haproxyroot 1042 1 0 sep17 ? 00:00:00 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
测试:
通过vip10.1.6.173 3366访问cluster数据库(注意账户dave权限需要加3个ip10.1.6.203,10.1.6.205,10.1.6.173)
root@10.1.6.203:mgm# mysql -udave -p -h 10.1.6.173 -p 3366enter password: welcome to the mysql monitor. commands end with ; or \g.your mysql connection id is 1344316server version: 5.5.22-ndb-7.2.6-gpl-log mysql cluster community server (gpl) type 'help;' or '\h' for help. type '\c' to clear the buffer. mysql> show databases;+--------------------+| database |+--------------------+| information_schema | | dave | | test | +--------------------+3 rows in set (0.01 sec) mysql>
手动分别使keepalive,haproxy,数据库挂掉.vip10.1.6.173会自动漂到10.1.6.205从上,并不影响vip的访问
通过vip,haproxy查看各节点状态
http://10.1.6.173:8888/status