MySQL binlog 物理回溯最佳实战

Gus8036 8年前
   <h2>0x01 前言</h2>    <p>在平时的迁移或者其他维护场景中,我们需要利用备份的物理 binlog 做回溯。本篇文章根据真实案例进行讲解,如何优雅地利用 binlog 进行物理回溯。</p>    <h2>0x02 测试</h2>    <p>测试环境 IP 如下:</p>    <ul>     <li>192.168.1.101(主)</li>     <li>192.168.1.102(从)</li>     <li>192.168.1.103(从)</li>    </ul>    <p>测试步骤如下:</p>    <p>1、192.168.1.101 和 192.168.1.102 建立主从,192.168.1.101 创建 sbtest 库,然后使用 sysbench 插入 100 万测试数据,相关命令如下:</p>    <pre>  sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on \  --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 \  --mysql-user=root --mysql-socket=/tmp/mysql.sock \  --mysql-password=xxxx --db-driver=mysql --mysql-table-engine=innodb \  --oltp-test-mode=complex prepare</pre>    <p>2、在 192.168.1.102 导出数据,然后拷贝到 192.168.1.103,在 192.168.1.103 导入数据。备份文件的 MASTER_LOG_FILE 和 MASTER_LOG_POS 信息如下:</p>    <pre>  -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000104', \  -- MASTER_LOG_POS=3661463;</pre>    <p>3、在 192.168.1.101 使用如下脚本再次导入 10 万数据。</p>    <pre>  #!/bin/bash    for i in $(seq 1 100000)  do      mysql -uroot -p'xxxx' --socket=/tmp/mysql.sock -e \      "INSERT INTO sbtest.sbtest(k, c, pad) VALUES(0, '1', \      'qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt');"</pre>    <p>4、192.168.1.101 模拟 binlog 被清除。先备份 binlog,然后再 PURGE。</p>    <pre>  mysql> SELECT COUNT(*) FROM sbtest;  +----------+  | COUNT(*) |  +----------+  |  1100000 |  +----------+  1 row in set (0.19 sec)    mysql> PURGE BINARY LOGS TO 'mysql-bin.000106';  Query OK, 0 rows affected (0.03 sec)</pre>    <p>5、192.168.1.102 停掉同步。</p>    <p>6、192.168.1.101 修改 mysql-bin.index 文件,把备份的 binlog 文件拷贝到 binlog 目录,然后手动执行 FLUSH LOGS。</p>    <pre>  pwd  /data/mysql/binlog  cp -v /data/backup/mysql-bin.00010{4,5} .    cat mysql-bin.index  /data/mysql/binlog/mysql-bin.000104  /data/mysql/binlog/mysql-bin.000105  /data/mysql/binlog/mysql-bin.000106  /data/mysql/binlog/mysql-bin.000107    # 注意修改权限  chown mysql:mysql -R mysql-bin.*</pre>    <p>手动 FLUSH LOGS,可以看到前后 BINARY LOGS 列表发生变化。</p>    <pre>  mysql> SHOW BINARY LOGS;  +------------------+-----------+  | Log_name         | File_size |  +------------------+-----------+  | mysql-bin.000106 |  10485866 |  | mysql-bin.000107 |    504130 |  +------------------+-----------+  2 rows in set (0.00 sec)    mysql> FLUSH LOGS;  Query OK, 0 rows affected (0.01 sec)    mysql> SHOW BINARY LOGS;  +------------------+-----------+  | Log_name         | File_size |  +------------------+-----------+  | mysql-bin.000104 |  10486051 |  | mysql-bin.000105 |  10485866 |  | mysql-bin.000106 |  10485866 |  | mysql-bin.000107 |    504173 |  | mysql-bin.000108 |       107 |  +------------------+-----------+  6 rows in set (0.00 sec)</pre>    <p>7、192.168.1.103 利用备份文件的 MASTER_LOG_FILE 和 MASTER_LOG_POS 信息和 192.168.1.101 建立主从关系。</p>    <p>8、192.168.1.103 和 192.168.1.101 校验数据。192.168.1.101 的增量数据已经完全同步到 192.168.1.103。</p>    <p>9、192.168.1.101 修改 mysql-bin.index 文件,将之前手动加入的 binlog 删除,然后再 FLUSH LOGS。</p>    <p>10、192.168.1.102 打开同步,可以看到同步正常,再确认 192.168.1.103,同步也正常。</p>    <p style="text-align: center;"><img src="https://simg.open-open.com/show/b294e7e32358263c60e5d54ba3da36fe.jpg"></p>    <p>© cosgood1969/OIl_2/Pinterest</p>    <p>我们接下来测试下旧 DB 已有从库不停掉同步的情况。</p>    <p>测试环境 IP 如下:</p>    <ul>     <li>192.168.1.101(主)</li>     <li>192.168.1.102(从)</li>     <li>192.168.1.103(从)</li>    </ul>    <p>测试步骤如下:</p>    <p>1、192.168.1.101 和 192.168.1.102 建立主从,192.168.1.101 创建 sbtest 库,然后使用 sysbench 插入 100 万测试数据,相关命令如下:</p>    <pre>  sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on \  --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 \  --mysql-user=root --mysql-socket=/tmp/mysql.sock \  --mysql-password=xxxx --db-driver=mysql --mysql-table-engine=innodb \  --oltp-test-mode=complex prepare</pre>    <p>2、在 192.168.1.102 导出数据,然后拷贝到 192.168.1.103,在 192.168.1.103 导入数据。备份文件的 MASTER_LOG_FILE 和 MASTER_LOG_POS 信息如下:</p>    <pre>  -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000115', \  -- MASTER_LOG_POS=6102367;</pre>    <p>3、在 192.168.1.101 使用如下脚本再次导入 10 万数据。</p>    <pre>  #!/bin/bash    for i in $(seq 1 100000)  do      mysql -uroot -p'xxxx' --socket=/tmp/mysql.sock -e \      "INSERT INTO sbtest.sbtest(k, c, pad) VALUES(0, '1', \      'qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt');"</pre>    <p>4、192.168.1.101 模拟 binlog 被清除。先备份 binlog,然后再 PURGE。</p>    <pre>  mysql> SELECT COUNT(*) FROM sbtest;  +----------+  | COUNT(*) |  +----------+  |  1100000 |  +----------+  1 row in set (0.19 sec)    mysql> PURGE BINARY LOGS TO 'mysql-bin.000118';  Query OK, 0 rows affected (0.03 sec)</pre>    <p>5、192.168.1.102 <strong>不停同步。</strong></p>    <p>6、192.168.1.101 修改 mysql-bin.index 文件,把备份的 binlog 文件拷贝到 binlog 目录,然后手动执行 FLUSH LOGS。</p>    <pre>  pwd  /data/mysql/binlog  cp -v /data/backup/mysql-bin.00011{5,6,7} .    cat mysql-bin.index  /data/mysql/binlog/mysql-bin.000115  /data/mysql/binlog/mysql-bin.000116  /data/mysql/binlog/mysql-bin.000117  /data/mysql/binlog/mysql-bin.000118    # 注意修改权限  chown mysql:mysql -R mysql-bin.*</pre>    <p>手动 FLUSH LOGS,可以看到前后 BINARY LOGS 列表发生变化。</p>    <pre>  mysql> SHOW BINARY LOGS;  +------------------+-----------+  | Log_name         | File_size |  +------------------+-----------+  | mysql-bin.000118 |   2945005 |  +------------------+-----------+  1 row in set (0.00 sec)    mysql> FLUSH LOGS;  Query OK, 0 rows affected (0.00 sec)    mysql> SHOW BINARY LOGS;  +------------------+-----------+  | Log_name         | File_size |  +------------------+-----------+  | mysql-bin.000115 |  10486080 |  | mysql-bin.000116 |  10485866 |  | mysql-bin.000117 |  10485866 |  | mysql-bin.000118 |   2945048 |  | mysql-bin.000119 |       107 |  +------------------+-----------+  5 rows in set (0.00 sec)</pre>    <p>7、观察 192.168.1.102,可以看到此时同步已经出错。</p>    <pre>  Master_Log_File: mysql-bin.000119  Read_Master_Log_Pos: 107  Relay_Log_File: relaylog.000326  Relay_Log_Pos: 253  Relay_Master_Log_File: mysql-bin.000116  Exec_Master_Log_Pos: 107    Last_Errno: 1062  Last_Error: Error 'Duplicate entry '1015491' for key 'PRIMARY'' on query. \  Default database: ''. Query: 'INSERT INTO sbtest.sbtest(k, c, pad) VALUES \  (0, '1', 'qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt')'</pre>    <p>再对比下出错之前的从库状态:</p>    <pre>  Master_Log_File: mysql-bin.000118  Read_Master_Log_Pos: 2945005  Relay_Log_File: relaylog.000324  Relay_Log_Pos: 2945151  Relay_Master_Log_File: mysql-bin.000118  Exec_Master_Log_Pos: 2945005</pre>    <p>可以看到,192.168.1.102 从 mysql-bin.000116:107 的位置重现同步,就会导致主键冲突的问题。</p>    <p>8、192.168.1.103 利用备份文件的 MASTER_LOG_FILE 和 MASTER_LOG_POS 信息和 192.168.1.101 建立主从关系。</p>    <pre>  CHANGE MASTER TO MASTER_HOST='192.168.1.101', MASTER_PORT=3307, MASTER_USER='slave', \  MASTER_PASSWORD='xxxx', MASTER_LOG_FILE='mysql-bin.000115', MASTER_LOG_POS=6102367;</pre>    <p>9、192.168.1.103 和 192.168.1.101 校验数据。192.168.1.101 的增量数据已经完全同步到 192.168.1.103。</p>    <p>10、192.168.1.101 修改 mysql-bin.index 文件,将之前手动加入的 binlog 删除,然后再 FLUSH LOGS。</p>    <p>11、再次观察 192.168.1.102</p>    <pre>  Master_Log_File: mysql-bin.000120  Read_Master_Log_Pos: 4  Relay_Log_File: relaylog.000326  Relay_Log_Pos: 253  Relay_Master_Log_File: mysql-bin.000116  Exec_Master_Log_Pos: 107  Last_IO_Errno: 1236  Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: \  'could not find next log; the first event 'mysql-bin.000107' at 504130, \  the last event read from '/data/mysql/binlog/mysql-bin.000119' at 150, \  the last byte read from '/data/mysql/binlog/mysql-bin.000119' at 150.'  Last_SQL_Errno: 1062  Last_SQL_Error: Error 'Duplicate entry '1015491' for key 'PRIMARY'' on query. \  Default database: ''. Query: 'INSERT INTO sbtest.sbtest(k, c, pad) VALUES \  (0, '1', 'qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt')'</pre>    <p>此时 IO 线程和 SQL 线程均异常。</p>    <h2>0x03 结论</h2>    <p>结论如下:</p>    <ul>     <li>拷贝回去的 binlog,需要将属性改为 mysql</li>     <li>mysql-bin.index 这个文件不管加一行减一行,在触发 FLUSH LOGS 的时候原有的复制会被中断。如果旧 Master 有从库,恢复其他从库数据的时候,需要暂时将已有的从库同步停掉(执行 STOP SLAVE),就能避免中断</li>     <li>任何线上操作,都要在完备的测试前提下再操作</li>     <li>迁移过程中,重要的一点是做好数据校验,不管是用 pt,还是手动写脚本,这个过程不能缺失</li>    </ul>    <p> </p>    <p> </p>    <p>来自:https://dbarobin.com/2017/03/15/best-practices-of-recovering-via-binlog/</p>    <p> </p>