MySQL binlog 物理回溯最佳实战
Gus8036
8年前
<h2>0x01 前言</h2> <p>在平时的迁移或者其他维护场景中,我们需要利用备份的物理 binlog 做回溯。本篇文章根据真实案例进行讲解,如何优雅地利用 binlog 进行物理回溯。</p> <h2>0x02 测试</h2> <p>测试环境 IP 如下:</p> <ul> <li>192.168.1.101(主)</li> <li>192.168.1.102(从)</li> <li>192.168.1.103(从)</li> </ul> <p>测试步骤如下:</p> <p>1、192.168.1.101 和 192.168.1.102 建立主从,192.168.1.101 创建 sbtest 库,然后使用 sysbench 插入 100 万测试数据,相关命令如下:</p> <pre> sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on \ --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 \ --mysql-user=root --mysql-socket=/tmp/mysql.sock \ --mysql-password=xxxx --db-driver=mysql --mysql-table-engine=innodb \ --oltp-test-mode=complex prepare</pre> <p>2、在 192.168.1.102 导出数据,然后拷贝到 192.168.1.103,在 192.168.1.103 导入数据。备份文件的 MASTER_LOG_FILE 和 MASTER_LOG_POS 信息如下:</p> <pre> -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000104', \ -- MASTER_LOG_POS=3661463;</pre> <p>3、在 192.168.1.101 使用如下脚本再次导入 10 万数据。</p> <pre> #!/bin/bash for i in $(seq 1 100000) do mysql -uroot -p'xxxx' --socket=/tmp/mysql.sock -e \ "INSERT INTO sbtest.sbtest(k, c, pad) VALUES(0, '1', \ 'qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt');"</pre> <p>4、192.168.1.101 模拟 binlog 被清除。先备份 binlog,然后再 PURGE。</p> <pre> mysql> SELECT COUNT(*) FROM sbtest; +----------+ | COUNT(*) | +----------+ | 1100000 | +----------+ 1 row in set (0.19 sec) mysql> PURGE BINARY LOGS TO 'mysql-bin.000106'; Query OK, 0 rows affected (0.03 sec)</pre> <p>5、192.168.1.102 停掉同步。</p> <p>6、192.168.1.101 修改 mysql-bin.index 文件,把备份的 binlog 文件拷贝到 binlog 目录,然后手动执行 FLUSH LOGS。</p> <pre> pwd /data/mysql/binlog cp -v /data/backup/mysql-bin.00010{4,5} . cat mysql-bin.index /data/mysql/binlog/mysql-bin.000104 /data/mysql/binlog/mysql-bin.000105 /data/mysql/binlog/mysql-bin.000106 /data/mysql/binlog/mysql-bin.000107 # 注意修改权限 chown mysql:mysql -R mysql-bin.*</pre> <p>手动 FLUSH LOGS,可以看到前后 BINARY LOGS 列表发生变化。</p> <pre> mysql> SHOW BINARY LOGS; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000106 | 10485866 | | mysql-bin.000107 | 504130 | +------------------+-----------+ 2 rows in set (0.00 sec) mysql> FLUSH LOGS; Query OK, 0 rows affected (0.01 sec) mysql> SHOW BINARY LOGS; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000104 | 10486051 | | mysql-bin.000105 | 10485866 | | mysql-bin.000106 | 10485866 | | mysql-bin.000107 | 504173 | | mysql-bin.000108 | 107 | +------------------+-----------+ 6 rows in set (0.00 sec)</pre> <p>7、192.168.1.103 利用备份文件的 MASTER_LOG_FILE 和 MASTER_LOG_POS 信息和 192.168.1.101 建立主从关系。</p> <p>8、192.168.1.103 和 192.168.1.101 校验数据。192.168.1.101 的增量数据已经完全同步到 192.168.1.103。</p> <p>9、192.168.1.101 修改 mysql-bin.index 文件,将之前手动加入的 binlog 删除,然后再 FLUSH LOGS。</p> <p>10、192.168.1.102 打开同步,可以看到同步正常,再确认 192.168.1.103,同步也正常。</p> <p style="text-align: center;"><img src="https://simg.open-open.com/show/b294e7e32358263c60e5d54ba3da36fe.jpg"></p> <p>© cosgood1969/OIl_2/Pinterest</p> <p>我们接下来测试下旧 DB 已有从库不停掉同步的情况。</p> <p>测试环境 IP 如下:</p> <ul> <li>192.168.1.101(主)</li> <li>192.168.1.102(从)</li> <li>192.168.1.103(从)</li> </ul> <p>测试步骤如下:</p> <p>1、192.168.1.101 和 192.168.1.102 建立主从,192.168.1.101 创建 sbtest 库,然后使用 sysbench 插入 100 万测试数据,相关命令如下:</p> <pre> sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on \ --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 \ --mysql-user=root --mysql-socket=/tmp/mysql.sock \ --mysql-password=xxxx --db-driver=mysql --mysql-table-engine=innodb \ --oltp-test-mode=complex prepare</pre> <p>2、在 192.168.1.102 导出数据,然后拷贝到 192.168.1.103,在 192.168.1.103 导入数据。备份文件的 MASTER_LOG_FILE 和 MASTER_LOG_POS 信息如下:</p> <pre> -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000115', \ -- MASTER_LOG_POS=6102367;</pre> <p>3、在 192.168.1.101 使用如下脚本再次导入 10 万数据。</p> <pre> #!/bin/bash for i in $(seq 1 100000) do mysql -uroot -p'xxxx' --socket=/tmp/mysql.sock -e \ "INSERT INTO sbtest.sbtest(k, c, pad) VALUES(0, '1', \ 'qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt');"</pre> <p>4、192.168.1.101 模拟 binlog 被清除。先备份 binlog,然后再 PURGE。</p> <pre> mysql> SELECT COUNT(*) FROM sbtest; +----------+ | COUNT(*) | +----------+ | 1100000 | +----------+ 1 row in set (0.19 sec) mysql> PURGE BINARY LOGS TO 'mysql-bin.000118'; Query OK, 0 rows affected (0.03 sec)</pre> <p>5、192.168.1.102 <strong>不停同步。</strong></p> <p>6、192.168.1.101 修改 mysql-bin.index 文件,把备份的 binlog 文件拷贝到 binlog 目录,然后手动执行 FLUSH LOGS。</p> <pre> pwd /data/mysql/binlog cp -v /data/backup/mysql-bin.00011{5,6,7} . cat mysql-bin.index /data/mysql/binlog/mysql-bin.000115 /data/mysql/binlog/mysql-bin.000116 /data/mysql/binlog/mysql-bin.000117 /data/mysql/binlog/mysql-bin.000118 # 注意修改权限 chown mysql:mysql -R mysql-bin.*</pre> <p>手动 FLUSH LOGS,可以看到前后 BINARY LOGS 列表发生变化。</p> <pre> mysql> SHOW BINARY LOGS; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000118 | 2945005 | +------------------+-----------+ 1 row in set (0.00 sec) mysql> FLUSH LOGS; Query OK, 0 rows affected (0.00 sec) mysql> SHOW BINARY LOGS; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000115 | 10486080 | | mysql-bin.000116 | 10485866 | | mysql-bin.000117 | 10485866 | | mysql-bin.000118 | 2945048 | | mysql-bin.000119 | 107 | +------------------+-----------+ 5 rows in set (0.00 sec)</pre> <p>7、观察 192.168.1.102,可以看到此时同步已经出错。</p> <pre> Master_Log_File: mysql-bin.000119 Read_Master_Log_Pos: 107 Relay_Log_File: relaylog.000326 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql-bin.000116 Exec_Master_Log_Pos: 107 Last_Errno: 1062 Last_Error: Error 'Duplicate entry '1015491' for key 'PRIMARY'' on query. \ Default database: ''. Query: 'INSERT INTO sbtest.sbtest(k, c, pad) VALUES \ (0, '1', 'qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt')'</pre> <p>再对比下出错之前的从库状态:</p> <pre> Master_Log_File: mysql-bin.000118 Read_Master_Log_Pos: 2945005 Relay_Log_File: relaylog.000324 Relay_Log_Pos: 2945151 Relay_Master_Log_File: mysql-bin.000118 Exec_Master_Log_Pos: 2945005</pre> <p>可以看到,192.168.1.102 从 mysql-bin.000116:107 的位置重现同步,就会导致主键冲突的问题。</p> <p>8、192.168.1.103 利用备份文件的 MASTER_LOG_FILE 和 MASTER_LOG_POS 信息和 192.168.1.101 建立主从关系。</p> <pre> CHANGE MASTER TO MASTER_HOST='192.168.1.101', MASTER_PORT=3307, MASTER_USER='slave', \ MASTER_PASSWORD='xxxx', MASTER_LOG_FILE='mysql-bin.000115', MASTER_LOG_POS=6102367;</pre> <p>9、192.168.1.103 和 192.168.1.101 校验数据。192.168.1.101 的增量数据已经完全同步到 192.168.1.103。</p> <p>10、192.168.1.101 修改 mysql-bin.index 文件,将之前手动加入的 binlog 删除,然后再 FLUSH LOGS。</p> <p>11、再次观察 192.168.1.102</p> <pre> Master_Log_File: mysql-bin.000120 Read_Master_Log_Pos: 4 Relay_Log_File: relaylog.000326 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql-bin.000116 Exec_Master_Log_Pos: 107 Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: \ 'could not find next log; the first event 'mysql-bin.000107' at 504130, \ the last event read from '/data/mysql/binlog/mysql-bin.000119' at 150, \ the last byte read from '/data/mysql/binlog/mysql-bin.000119' at 150.' Last_SQL_Errno: 1062 Last_SQL_Error: Error 'Duplicate entry '1015491' for key 'PRIMARY'' on query. \ Default database: ''. Query: 'INSERT INTO sbtest.sbtest(k, c, pad) VALUES \ (0, '1', 'qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt')'</pre> <p>此时 IO 线程和 SQL 线程均异常。</p> <h2>0x03 结论</h2> <p>结论如下:</p> <ul> <li>拷贝回去的 binlog,需要将属性改为 mysql</li> <li>mysql-bin.index 这个文件不管加一行减一行,在触发 FLUSH LOGS 的时候原有的复制会被中断。如果旧 Master 有从库,恢复其他从库数据的时候,需要暂时将已有的从库同步停掉(执行 STOP SLAVE),就能避免中断</li> <li>任何线上操作,都要在完备的测试前提下再操作</li> <li>迁移过程中,重要的一点是做好数据校验,不管是用 pt,还是手动写脚本,这个过程不能缺失</li> </ul> <p> </p> <p> </p> <p>来自:https://dbarobin.com/2017/03/15/best-practices-of-recovering-via-binlog/</p> <p> </p>