GTID复制和问题处理

首先看一下什么是gtid：
gtid(global transaction id)是对于一个已提交事务的编号，并且是一个全局唯一的编号。
gtid实际上是由uuid+tid组成的。其中uuid是一个mysql实例的唯一标识。tid代表了该实例上已经提交的事务数量，并且随着事务提交单调递增。根据gtid可以知道事务最初是在哪个实例上提交的，而且方便故障切换。
接下来就看一下怎么在gtid模式下快速的添加一个slave：
我们知道在没有gtid复制以前，mysql的复制是基于binary log和position来做的，之前的复制我们要执行下面的change语句：
change master to master_host='',master_port=3306,master_user='repl',master_password='*****',master_log_file='mysqlbinlog.000003',master_log_pos=99721204;
而我们在gtid就可以执行以下的change语句：
change master to master_host='****', master_user='repl', master_password='******', master_port=3306, master_auto_position=1;
我们可以看到，基本上来说指定复制的时候原来的binary log方式需要指定master_log_file和master_log_pos，而gtid复制却不需要知道这些参数。
下面看一下怎么在gtid的模式下创建主从复制：
从上面可以看得到，在gtid的模式下我们不再需要知道master_log_file和master_log_pos两个参数，相比之下我们只需要指定master就可以了，这对于创建复制来说简单的多了。在gtid的模式下我们需要知道以下两个全局变量：
root@perconatest09:23:44>show global variables like 'gtid_%'\g*************************** 1. row ***************************variable_name: gtid_executed value: 5031589f-3551-11e7-89a0-00505693235d:1-12, 806ede0c-357e-11e7-9719-00505693235d:1-11, a38c33ee-34b7-11e7-ae1d-005056931959:1-24*************************** 2. row ***************************variable_name: gtid_executed_compression_period value: 1000*************************** 3. row ***************************variable_name: gtid_mode value: on*************************** 4. row ***************************variable_name: gtid_owned value:*************************** 5. row ***************************variable_name: gtid_purged value: 5031589f-3551-11e7-89a0-00505693235d:1-12, 806ede0c-357e-11e7-9719-00505693235d:1-11, a38c33ee-34b7-11e7-ae1d-005056931959:1-12
我们主要需要看到的就是gtid_executed和gtid_purged两个参数，
gtid_executed：这个是已经执行过的所有的事物的gtid的一个系列串，也就是binary log里面已经落盘的事物的序列号。这个参数是只读的，不能够进行设置。
gtid_purged：这个序列是指我们在binary log删除的事物的gtid的序列号。我们可以手动进行设置，方便我们做一些管理。
这两个参数理解以后，接下来我们看一下怎样去添加一个gtid复制的从库：
（1）：从主库做一个全备份，而且要记录主库备份时间点的gtid_executed
（2）：从库进行恢复，而且将从库的gtid_purged设置为我们第一步获取的master的gtid_executed
（3）：执行change master 语句。
我们使用mysqldump就可以将主库进行备份，并且将备份还原到一台新的机器作为从库。在执行之前先在主库看一下参数：
root@perconatest09:23:58>show global variables like 'gtid_e%'\g*************************** 1. row ***************************variable_name: gtid_executed value: 5031589f-3551-11e7-89a0-00505693235d:1-12, 806ede0c-357e-11e7-9719-00505693235d:1-11, a38c33ee-34b7-11e7-ae1d-005056931959:1-242 rows in set (0.01 sec) root@perconatest09:41:33>show global variables like 'gtid_p%'\g*************************** 1. row ***************************variable_name: gtid_purged value: 5031589f-3551-11e7-89a0-00505693235d:1-12, 806ede0c-357e-11e7-9719-00505693235d:1-11, a38c33ee-34b7-11e7-ae1d-005056931959:1-121 row in set (0.01 sec)
然后在主库进行备份：
mysqldump --all-databases --single-transaction --triggers --routines --host=127.0.0.1 --port=18675 --user=root--p > /home/sa/backup.sql
我们可以看一下备份文件：
[root@localhost sa]# head -30 backup.sql
我们能够看到有以下的参数：
set @@global.gtid_purged='5031589f-3551-11e7-89a0-00505693235d:1-12, 806ede0c-357e-11e7-9719-00505693235d:1-11, a38c33ee-34b7-11e7-ae1d-005056931959:1-24';
也就是说当我们进行恢复的时候，是会自动设置gtid_purged的，而这个值刚好就是master的gtid_executed，所以我们从库恢复以后基本上就不需要在做指定了。
进入从库恢复数据：
source backup.sql;
我们知道已经不需要在指定gtid_purge的值了，要是不确定还可以确认一下：
show global variables like 'gtid_executed'; show global variables like 'gtid_purged';
后面直接指定复制就好了：
change master to master_host=***, master_user=root, master_password=*****, master_port=3306, master_auto_position = 1;
将*替换为你需要指定的主库的相关信息就ok了。
gtid主从复制的模式下如果出现错误，我们该怎么恢复呢？
假如我们的主库的日志已经purged，执行了reset等操作，我们从库会有如下报错：
last_io_error: got fatal error 1236 from master when reading data from binary log: 'the slave is connecting using change master to master_auto_position = 1, but the master has purged binary logs containing gtids that the slave requires.'
提示我们找不到日志，主从复制就会停掉，下面我们看一下处理方式：
（1）主库执行以下操作：
root@perconatest09:41:38>show global variables like 'gtid_executed';+---------------+---------------------------------------------------------------------------------------------------------------------------------+| variable_name | value |+---------------+---------------------------------------------------------------------------------------------------------------------------------+| gtid_executed | 5031589f-3551-11e7-89a0-00505693235d:1-12, 806ede0c-357e-11e7-9719-00505693235d:1-11, a38c33ee-34b7-11e7-ae1d-005056931959:1-24 |+---------------+---------------------------------------------------------------------------------------------------------------------------------+1 row in set (0.01 sec)
（2）从库
root@(none)03:04:49>set global gtid_purged='5031589f-3551-11e7-89a0-00505693235d:1-12,806ede0c-357e-11e7-9719-00505693235d:1-11,a38c33ee-34b7-11e7-ae1d-005056931959:1-24';
注意，在指定前首先要确认这个值是空的，不然我们要做以下操作：
root@(none)03:04:49>reset master; root@(none)03:04:49>set global gtid_purged='5031589f-3551-11e7-89a0-00505693235d:1-12,806ede0c-357e-11e7-9719-00505693235d:1-11,a38c33ee-34b7-11e7-ae1d-005056931959:1-24'; root@(none)03:04:49>start slave; root@(none)03:04:49>show slave status\g
这样修复就完成了，但是我们最好还是用checksum校验一下主从数据的一致性。
报错信息：
got fatal error 1236 from master when reading data from binary log: ‘the slave is connecting using change master to master_auto_position = 1, but the master has purged binary logs containing gtids that the slave requires
（贴个错误信息为了增加浏览量）
当然上面的方法并不能保证数据的完全一致性，我们还要去校验使用 pt-table-checksum and pt-table-sync，但是这样效率不一定是最高的，最好的方式还是通过前面介绍的，做全备份，然后恢复，再指定master，这才是最靠谱的。
以上就是gtid复制和问题处理的详细内容。

GTID复制和问题处理

推荐信息