本文共 128321 字,大约阅读时间需要 427 分钟。
这里以masterha_master_switch为背景详解各种可能遇到的场景
host_1(host_1:3306) (current master) +--host_2(host_2:3306 slave[candidate master]) +--host_3(host_3:3306 etl)
配置文件中加上no_check_delay=0 即可忽略报错
### 模拟现场,现场的3台DB gtid状态* master host_2dba:lc> show master status;+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| host_2.000002 | 2885 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+1 row in set (0.00 sec)* slave (candidate master) host_1 Retrieved_Gtid_Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446353 Auto_Position: 1* etl (other slave) host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:4-16,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353-446356 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446356 Auto_Position: 1### 切换日志masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorThu Nov 9 10:43:49 2017 - [info] MHA::MasterFailover version 0.56.Thu Nov 9 10:43:49 2017 - [info] Starting master failover.Thu Nov 9 10:43:49 2017 - [info]Thu Nov 9 10:43:49 2017 - [info] * Phase 1: Configuration Check Phase..Thu Nov 9 10:43:49 2017 - [info]Thu Nov 9 10:43:50 2017 - [info] HealthCheck: SSH to host_2 is reachable.Thu Nov 9 10:43:50 2017 - [info] Binlog server host_2 is reachable.Thu Nov 9 10:43:50 2017 - [info] HealthCheck: SSH to host_1 is reachable.Thu Nov 9 10:43:50 2017 - [info] Binlog server host_1 is reachable.Thu Nov 9 10:43:50 2017 - [info] HealthCheck: SSH to host_3 is reachable.Thu Nov 9 10:43:50 2017 - [info] Binlog server host_3 is reachable.Thu Nov 9 10:43:51 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306)Thu Nov 9 10:43:51 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306)Thu Nov 9 10:43:51 2017 - [info] GTID failover mode = 1Thu Nov 9 10:43:51 2017 - [info] Dead Servers:Thu Nov 9 10:43:51 2017 - [info] host_2( host_2:3306)Thu Nov 9 10:43:51 2017 - [info] Checking master reachability via MySQL(double check)...Thu Nov 9 10:43:51 2017 - [info] ok.Thu Nov 9 10:43:51 2017 - [info] Alive Servers:Thu Nov 9 10:43:51 2017 - [info] host_1( host_1:3306)Thu Nov 9 10:43:51 2017 - [info] host_3( host_3:3306)Thu Nov 9 10:43:51 2017 - [info] Alive Slaves:Thu Nov 9 10:43:51 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:43:51 2017 - [info] GTID ONThu Nov 9 10:43:51 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 10:43:51 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 10:43:51 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:43:51 2017 - [info] GTID ONThu Nov 9 10:43:51 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 10:43:51 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 10:43:51 2017 - [info] Starting SQL thread on host_1( host_1:3306) ..Thu Nov 9 10:43:51 2017 - [info] done.Thu Nov 9 10:43:51 2017 - [info] Starting SQL thread on host_3( host_3:3306) ..Thu Nov 9 10:43:51 2017 - [info] done.Thu Nov 9 10:43:51 2017 - [info] Starting GTID based failover.Thu Nov 9 10:43:51 2017 - [info]Thu Nov 9 10:43:51 2017 - [info] ** Phase 1: Configuration Check Phase completed.Thu Nov 9 10:43:51 2017 - [info]Thu Nov 9 10:43:51 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Thu Nov 9 10:43:51 2017 - [info]Thu Nov 9 10:43:51 2017 - [info] HealthCheck: SSH to host_2 is reachable.Thu Nov 9 10:43:51 2017 - [info] Forcing shutdown so that applications never connect to the current master..Thu Nov 9 10:43:51 2017 - [info] Executing master IP deactivation script:Thu Nov 9 10:43:51 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stopssh --ssh_user=rootThu Nov 9 10:43:53 2017 - [info] done.Thu Nov 9 10:43:53 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Thu Nov 9 10:43:53 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Thu Nov 9 10:43:53 2017 - [info]Thu Nov 9 10:43:53 2017 - [info] * Phase 3: Master Recovery Phase..Thu Nov 9 10:43:53 2017 - [info]Thu Nov 9 10:43:53 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Thu Nov 9 10:43:53 2017 - [info]Thu Nov 9 10:43:53 2017 - [info] The latest binary log file/position on all slaves is host_2.000002:1115Thu Nov 9 10:43:53 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:4-16,Thu Nov 9 10:43:53 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Thu Nov 9 10:43:53 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:43:53 2017 - [info] GTID ONThu Nov 9 10:43:53 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 10:43:53 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 10:43:53 2017 - [info] The oldest binary log file/position on all slaves is host_2.000002:230Thu Nov 9 10:43:53 2017 - [info] Retrieved Gtid Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353Thu Nov 9 10:43:53 2017 - [info] Oldest slaves:Thu Nov 9 10:43:53 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:43:53 2017 - [info] GTID ONThu Nov 9 10:43:53 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 10:43:53 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 10:43:53 2017 - [info]Thu Nov 9 10:43:53 2017 - [info] * Phase 3.3: Determining New Master Phase..Thu Nov 9 10:43:53 2017 - [info]Thu Nov 9 10:43:53 2017 - [info] Searching new master from slaves..Thu Nov 9 10:43:53 2017 - [info] Candidate masters from the configuration file:Thu Nov 9 10:43:53 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:43:53 2017 - [info] GTID ONThu Nov 9 10:43:53 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 10:43:53 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 10:43:53 2017 - [info] Non-candidate masters:Thu Nov 9 10:43:53 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:43:53 2017 - [info] GTID ONThu Nov 9 10:43:53 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 10:43:53 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 10:43:53 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..Thu Nov 9 10:43:53 2017 - [info] Not found.Thu Nov 9 10:43:53 2017 - [info] Searching from all candidate_master slaves..Thu Nov 9 10:43:53 2017 - [info] New master is host_1( host_1:3306)Thu Nov 9 10:43:53 2017 - [info] Starting master failover..Thu Nov 9 10:43:53 2017 - [info]Thu Nov 9 10:43:53 2017 - [info]Thu Nov 9 10:43:53 2017 - [info] * Phase 3.3: New Master Recovery Phase..Thu Nov 9 10:43:53 2017 - [info]Thu Nov 9 10:43:53 2017 - [info] Waiting all logs to be applied..Thu Nov 9 10:43:53 2017 - [info] done.Thu Nov 9 10:43:53 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply..Thu Nov 9 10:43:53 2017 - [info] Waiting all logs to be applied on the latest slave..Thu Nov 9 10:43:53 2017 - [info] Resetting slave host_1( host_1:3306) and starting replication from the new master host_3( host_3:3306)..Thu Nov 9 10:43:53 2017 - [info] Executed CHANGE MASTER.Thu Nov 9 10:43:54 2017 - [info] Slave started.Thu Nov 9 10:43:54 2017 - [info] Waiting to execute all relay logs on host_1( host_1:3306)..Thu Nov 9 10:43:54 2017 - [info] master_pos_wait( host_3.000049:18041) completed on host_1( host_1:3306). Executed 0 events.Thu Nov 9 10:43:54 2017 - [info] done.Thu Nov 9 10:43:54 2017 - [info] done.Thu Nov 9 10:43:54 2017 - [info] -- Saving binlog from host host_2 started, pid: 150294Thu Nov 9 10:43:54 2017 - [info] -- Saving binlog from host host_1 started, pid: 150295Thu Nov 9 10:43:54 2017 - [info] -- Saving binlog from host host_3 started, pid: 150297Thu Nov 9 10:43:54 2017 - [info]Thu Nov 9 10:43:54 2017 - [info] Log messages from host_1 ...Thu Nov 9 10:43:54 2017 - [info]Thu Nov 9 10:43:54 2017 - [info] Fetching binary logs from binlog server host_1..Thu Nov 9 10:43:54 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000002 --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171109104349.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binThu Nov 9 10:43:54 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Thu Nov 9 10:43:54 2017 - [info] End of log messages from host_1.Thu Nov 9 10:43:54 2017 - [warning] Got error from host_1.Thu Nov 9 10:43:54 2017 - [info]Thu Nov 9 10:43:54 2017 - [info] Log messages from host_3 ...Thu Nov 9 10:43:54 2017 - [info]Thu Nov 9 10:43:54 2017 - [info] Fetching binary logs from binlog server host_3..Thu Nov 9 10:43:54 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000002 --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171109104349.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binThu Nov 9 10:43:54 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Thu Nov 9 10:43:54 2017 - [info] End of log messages from host_3.Thu Nov 9 10:43:54 2017 - [warning] Got error from host_3.Thu Nov 9 10:43:55 2017 - [info]Thu Nov 9 10:43:55 2017 - [info] Log messages from host_2 ...Thu Nov 9 10:43:55 2017 - [info]Thu Nov 9 10:43:54 2017 - [info] Fetching binary logs from binlog server host_2..Thu Nov 9 10:43:54 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000002 --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171109104349.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binThu Nov 9 10:43:55 2017 - [info] scp from root@ host_2:/var/log/masterha/mha_test/saved_binlog_binlog1_20171109104349.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171109104349.binlog succeeded.Thu Nov 9 10:43:55 2017 - [info] End of log messages from host_2.Thu Nov 9 10:43:55 2017 - [info] Saved mysqlbinlog size from host_2 is 6047 bytes.Thu Nov 9 10:43:55 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171109104349.binlog ..Thu Nov 9 10:43:55 2017 - [info] Differential log apply from binlog server succeeded.Thu Nov 9 10:43:55 2017 - [info] Getting new master's binlog name and position..Thu Nov 9 10:43:55 2017 - [info] host_1.000053:3624Thu Nov 9 10:43:55 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Thu Nov 9 10:43:55 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000053, 3624, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,Thu Nov 9 10:43:55 2017 - [info] Executing master IP activate script:Thu Nov 9 10:43:55 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'Thu Nov 9 10:43:57 2017 - [info] OK.Thu Nov 9 10:43:57 2017 - [info] Setting read_only=0 on host_1( host_1:3306)..Thu Nov 9 10:43:57 2017 - [info] ok.Thu Nov 9 10:43:57 2017 - [info] ** Finished master recovery successfully.Thu Nov 9 10:43:57 2017 - [info] * Phase 3: Master Recovery Phase completed.Thu Nov 9 10:43:57 2017 - [info]Thu Nov 9 10:43:57 2017 - [info] * Phase 4: Slaves Recovery Phase..Thu Nov 9 10:43:57 2017 - [info]Thu Nov 9 10:43:57 2017 - [info]Thu Nov 9 10:43:57 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Thu Nov 9 10:43:57 2017 - [info]Thu Nov 9 10:43:57 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 155162. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171109104349.log if it takes time..Thu Nov 9 10:43:58 2017 - [info]Thu Nov 9 10:43:58 2017 - [info] Log messages from host_3 ...Thu Nov 9 10:43:58 2017 - [info]Thu Nov 9 10:43:57 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306)..Thu Nov 9 10:43:57 2017 - [info] Executed CHANGE MASTER.Thu Nov 9 10:43:58 2017 - [info] Slave started.Thu Nov 9 10:43:58 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,Thu Nov 9 10:43:58 2017 - [info] End of log messages from host_3.Thu Nov 9 10:43:58 2017 - [info] -- Slave on host host_3( host_3:3306) started.Thu Nov 9 10:43:58 2017 - [info] All new slave servers recovered successfully.Thu Nov 9 10:43:58 2017 - [info]Thu Nov 9 10:43:58 2017 - [info] * Phase 5: New master cleanup phase..Thu Nov 9 10:43:58 2017 - [info]Thu Nov 9 10:43:58 2017 - [info] Resetting slave info on the new master..Thu Nov 9 10:43:58 2017 - [info] host_1: Resetting slave info succeeded.Thu Nov 9 10:43:58 2017 - [info] Master failover to host_1( host_1:3306) completed successfully.Thu Nov 9 10:43:58 2017 - [info]Thu Nov 9 10:43:58 2017 - [info] Sending mail..
### 模拟现场,现场的3台DB gtid状态* master host_1dba:lc> show master status;+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| host_1.000053 | 5229 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+1 row in set (0.00 sec)* slave (candidate master) host_2 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 Auto_Position: 1* etl (other slave) host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:17-21,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446357-446362 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 Auto_Position: 1### 切换日志masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorThu Nov 9 10:59:14 2017 - [info] MHA::MasterFailover version 0.56.Thu Nov 9 10:59:14 2017 - [info] Starting master failover.Thu Nov 9 10:59:14 2017 - [info]Thu Nov 9 10:59:14 2017 - [info] * Phase 1: Configuration Check Phase..Thu Nov 9 10:59:14 2017 - [info]Thu Nov 9 10:59:15 2017 - [info] HealthCheck: SSH to host_2 is reachable.Thu Nov 9 10:59:15 2017 - [info] Binlog server host_2 is reachable.Thu Nov 9 10:59:15 2017 - [info] HealthCheck: SSH to host_1 is reachable.Thu Nov 9 10:59:15 2017 - [info] Binlog server host_1 is reachable.Thu Nov 9 10:59:15 2017 - [info] HealthCheck: SSH to host_3 is reachable.Thu Nov 9 10:59:16 2017 - [info] Binlog server host_3 is reachable.Thu Nov 9 10:59:16 2017 - [warning] SQL Thread is stopped(no error) on host_2( host_2:3306)Thu Nov 9 10:59:16 2017 - [info] GTID failover mode = 1Thu Nov 9 10:59:16 2017 - [info] Dead Servers:Thu Nov 9 10:59:16 2017 - [info] host_1( host_1:3306)Thu Nov 9 10:59:16 2017 - [info] Checking master reachability via MySQL(double check)...Thu Nov 9 10:59:16 2017 - [info] ok.Thu Nov 9 10:59:16 2017 - [info] Alive Servers:Thu Nov 9 10:59:16 2017 - [info] host_2( host_2:3306)Thu Nov 9 10:59:16 2017 - [info] host_3( host_3:3306)Thu Nov 9 10:59:16 2017 - [info] Alive Slaves:Thu Nov 9 10:59:16 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:59:16 2017 - [info] GTID ONThu Nov 9 10:59:16 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 10:59:16 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 10:59:16 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:59:16 2017 - [info] GTID ONThu Nov 9 10:59:16 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 10:59:16 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 10:59:16 2017 - [info] Starting SQL thread on host_2( host_2:3306) ..Thu Nov 9 10:59:16 2017 - [info] done.Thu Nov 9 10:59:16 2017 - [info] Starting GTID based failover.Thu Nov 9 10:59:16 2017 - [info]Thu Nov 9 10:59:16 2017 - [info] ** Phase 1: Configuration Check Phase completed.Thu Nov 9 10:59:16 2017 - [info]Thu Nov 9 10:59:16 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Thu Nov 9 10:59:16 2017 - [info]Thu Nov 9 10:59:16 2017 - [info] HealthCheck: SSH to host_1 is reachable.Thu Nov 9 10:59:16 2017 - [info] Forcing shutdown so that applications never connect to the current master..Thu Nov 9 10:59:16 2017 - [info] Executing master IP deactivation script:Thu Nov 9 10:59:16 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --command=stopssh --ssh_user=rootThu Nov 9 10:59:20 2017 - [info] done.Thu Nov 9 10:59:20 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Thu Nov 9 10:59:20 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Thu Nov 9 10:59:20 2017 - [info]Thu Nov 9 10:59:20 2017 - [info] * Phase 3: Master Recovery Phase..Thu Nov 9 10:59:20 2017 - [info]Thu Nov 9 10:59:20 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Thu Nov 9 10:59:20 2017 - [info]Thu Nov 9 10:59:20 2017 - [info] The latest binary log file/position on all slaves is host_1.000053:5229Thu Nov 9 10:59:20 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:17-21,Thu Nov 9 10:59:20 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Thu Nov 9 10:59:20 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:59:20 2017 - [info] GTID ONThu Nov 9 10:59:20 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 10:59:20 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 10:59:20 2017 - [info] The oldest binary log file/position on all slaves is host_1.000053:3624Thu Nov 9 10:59:20 2017 - [info] Oldest slaves:Thu Nov 9 10:59:20 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:59:20 2017 - [info] GTID ONThu Nov 9 10:59:20 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 10:59:20 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 10:59:20 2017 - [info]Thu Nov 9 10:59:20 2017 - [info] * Phase 3.3: Determining New Master Phase..Thu Nov 9 10:59:20 2017 - [info]Thu Nov 9 10:59:20 2017 - [info] Searching new master from slaves..Thu Nov 9 10:59:20 2017 - [info] Candidate masters from the configuration file:Thu Nov 9 10:59:20 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:59:20 2017 - [info] GTID ONThu Nov 9 10:59:20 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 10:59:20 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 10:59:20 2017 - [info] Non-candidate masters:Thu Nov 9 10:59:20 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 10:59:20 2017 - [info] GTID ONThu Nov 9 10:59:20 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 10:59:20 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 10:59:20 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..Thu Nov 9 10:59:20 2017 - [info] Not found.Thu Nov 9 10:59:20 2017 - [info] Searching from all candidate_master slaves..Thu Nov 9 10:59:20 2017 - [info] New master is host_2( host_2:3306)Thu Nov 9 10:59:20 2017 - [info] Starting master failover..Thu Nov 9 10:59:20 2017 - [info]Thu Nov 9 10:59:20 2017 - [info]Thu Nov 9 10:59:20 2017 - [info] * Phase 3.3: New Master Recovery Phase..Thu Nov 9 10:59:20 2017 - [info]Thu Nov 9 10:59:20 2017 - [info] Waiting all logs to be applied..Thu Nov 9 10:59:20 2017 - [info] done.Thu Nov 9 10:59:20 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply..Thu Nov 9 10:59:20 2017 - [info] Waiting all logs to be applied on the latest slave..Thu Nov 9 10:59:20 2017 - [info] Resetting slave host_2( host_2:3306) and starting replication from the new master host_3( host_3:3306)..Thu Nov 9 10:59:20 2017 - [info] Executed CHANGE MASTER.Thu Nov 9 10:59:21 2017 - [info] Slave started.Thu Nov 9 10:59:21 2017 - [info] Waiting to execute all relay logs on host_2( host_2:3306)..Thu Nov 9 10:59:21 2017 - [info] master_pos_wait( host_3.000049:22035) completed on host_2( host_2:3306). Executed 0 events.Thu Nov 9 10:59:21 2017 - [info] done.Thu Nov 9 10:59:21 2017 - [info] done.Thu Nov 9 10:59:21 2017 - [info] -- Saving binlog from host host_2 started, pid: 184482Thu Nov 9 10:59:21 2017 - [info] -- Saving binlog from host host_1 started, pid: 184483Thu Nov 9 10:59:21 2017 - [info] -- Saving binlog from host host_3 started, pid: 184487Thu Nov 9 10:59:21 2017 - [info]Thu Nov 9 10:59:21 2017 - [info] Log messages from host_2 ...Thu Nov 9 10:59:21 2017 - [info]Thu Nov 9 10:59:21 2017 - [info] Fetching binary logs from binlog server host_2..Thu Nov 9 10:59:21 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000053 --start_pos=5229 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171109105914.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binThu Nov 9 10:59:21 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Thu Nov 9 10:59:21 2017 - [info] End of log messages from host_2.Thu Nov 9 10:59:21 2017 - [warning] Got error from host_2.Thu Nov 9 10:59:21 2017 - [info]Thu Nov 9 10:59:21 2017 - [info] Log messages from host_3 ...Thu Nov 9 10:59:21 2017 - [info]Thu Nov 9 10:59:21 2017 - [info] Fetching binary logs from binlog server host_3..Thu Nov 9 10:59:21 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000053 --start_pos=5229 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171109105914.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binThu Nov 9 10:59:21 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Thu Nov 9 10:59:21 2017 - [info] End of log messages from host_3.Thu Nov 9 10:59:21 2017 - [warning] Got error from host_3.Thu Nov 9 10:59:22 2017 - [info]Thu Nov 9 10:59:22 2017 - [info] Log messages from host_1 ...Thu Nov 9 10:59:22 2017 - [info]Thu Nov 9 10:59:21 2017 - [info] Fetching binary logs from binlog server host_1..Thu Nov 9 10:59:21 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000053 --start_pos=5229 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171109105914.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binThu Nov 9 10:59:22 2017 - [info] scp from root@ host_1:/var/log/masterha/mha_test/saved_binlog_binlog2_20171109105914.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_1_binlog2_20171109105914.binlog succeeded.Thu Nov 9 10:59:22 2017 - [info] End of log messages from host_1.Thu Nov 9 10:59:22 2017 - [info] Saved mysqlbinlog size from host_1 is 800 bytes.Thu Nov 9 10:59:22 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_1_binlog2_20171109105914.binlog ..Thu Nov 9 10:59:22 2017 - [info] Differential log apply from binlog server succeeded.Thu Nov 9 10:59:22 2017 - [info] Getting new master's binlog name and position..Thu Nov 9 10:59:22 2017 - [info] host_2.000003:1680Thu Nov 9 10:59:22 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Thu Nov 9 10:59:22 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2.000003, 1680, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21,Thu Nov 9 10:59:22 2017 - [info] Executing master IP activate script:Thu Nov 9 10:59:22 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --new_master_host= host_2 --new_master_ip= host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'Thu Nov 9 10:59:24 2017 - [info] OK.Thu Nov 9 10:59:24 2017 - [info] Setting read_only=0 on host_2( host_2:3306)..Thu Nov 9 10:59:24 2017 - [info] ok.Thu Nov 9 10:59:24 2017 - [info] ** Finished master recovery successfully.Thu Nov 9 10:59:24 2017 - [info] * Phase 3: Master Recovery Phase completed.Thu Nov 9 10:59:24 2017 - [info]Thu Nov 9 10:59:24 2017 - [info] * Phase 4: Slaves Recovery Phase..Thu Nov 9 10:59:24 2017 - [info]Thu Nov 9 10:59:24 2017 - [info]Thu Nov 9 10:59:24 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Thu Nov 9 10:59:24 2017 - [info]Thu Nov 9 10:59:24 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 189393. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171109105914.log if it takes time..Thu Nov 9 10:59:25 2017 - [info]Thu Nov 9 10:59:25 2017 - [info] Log messages from host_3 ...Thu Nov 9 10:59:25 2017 - [info]Thu Nov 9 10:59:24 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_2( host_2:3306)..Thu Nov 9 10:59:24 2017 - [info] Executed CHANGE MASTER.Thu Nov 9 10:59:25 2017 - [info] Slave started.Thu Nov 9 10:59:25 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21,Thu Nov 9 10:59:25 2017 - [info] End of log messages from host_3.Thu Nov 9 10:59:25 2017 - [info] -- Slave on host host_3( host_3:3306) started.Thu Nov 9 10:59:25 2017 - [info] All new slave servers recovered successfully.Thu Nov 9 10:59:25 2017 - [info]Thu Nov 9 10:59:25 2017 - [info] * Phase 5: New master cleanup phase..Thu Nov 9 10:59:25 2017 - [info]Thu Nov 9 10:59:25 2017 - [info] Resetting slave info on the new master..Thu Nov 9 10:59:25 2017 - [info] host_2: Resetting slave info succeeded.Thu Nov 9 10:59:25 2017 - [info] Master failover to host_2( host_2:3306) completed successfully.Thu Nov 9 10:59:25 2017 - [info]Thu Nov 9 10:59:25 2017 - [info] Sending mail..
masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorTue Nov 7 17:11:29 2017 - [info] MHA::MasterFailover version 0.56.Tue Nov 7 17:11:29 2017 - [info] Starting master failover.Tue Nov 7 17:11:29 2017 - [info]Tue Nov 7 17:11:29 2017 - [info] * Phase 1: Configuration Check Phase..Tue Nov 7 17:11:29 2017 - [info]Tue Nov 7 17:11:29 2017 - [info] HealthCheck: SSH to host_2 is reachable.Tue Nov 7 17:11:29 2017 - [info] Binlog server host_2 is reachable.Tue Nov 7 17:11:29 2017 - [info] HealthCheck: SSH to host_1 is reachable.Tue Nov 7 17:11:30 2017 - [info] Binlog server host_1 is reachable.Tue Nov 7 17:11:30 2017 - [info] HealthCheck: SSH to host_3 is reachable.Tue Nov 7 17:11:30 2017 - [info] Binlog server host_3 is reachable.Tue Nov 7 17:11:30 2017 - [warning] SQL Thread is stopped(no error) on host_2( host_2:3306)Tue Nov 7 17:11:30 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306)Tue Nov 7 17:11:30 2017 - [info] GTID failover mode = 1Tue Nov 7 17:11:30 2017 - [info] Dead Servers:Tue Nov 7 17:11:30 2017 - [info] host_1( host_1:3306)Tue Nov 7 17:11:30 2017 - [info] Checking master reachability via MySQL(double check)...Tue Nov 7 17:11:30 2017 - [info] ok.Tue Nov 7 17:11:30 2017 - [info] Alive Servers:Tue Nov 7 17:11:30 2017 - [info] host_2( host_2:3306)Tue Nov 7 17:11:30 2017 - [info] host_3( host_3:3306)Tue Nov 7 17:11:30 2017 - [info] Alive Slaves:Tue Nov 7 17:11:30 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 17:11:30 2017 - [info] GTID ONTue Nov 7 17:11:30 2017 - [info] Replicating from host_1( host_1:3306)Tue Nov 7 17:11:30 2017 - [info] Primary candidate for the new Master (candidate_master is set)Tue Nov 7 17:11:30 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 17:11:30 2017 - [info] GTID ONTue Nov 7 17:11:30 2017 - [info] Replicating from host_1( host_1:3306)Tue Nov 7 17:11:30 2017 - [info] Not candidate for the new Master (no_master is set)Tue Nov 7 17:11:30 2017 - [info] Starting SQL thread on host_2( host_2:3306) ..Tue Nov 7 17:11:30 2017 - [info] done.Tue Nov 7 17:11:30 2017 - [info] Starting SQL thread on host_3( host_3:3306) ..Tue Nov 7 17:11:30 2017 - [info] done.Tue Nov 7 17:11:30 2017 - [info] Starting GTID based failover.Tue Nov 7 17:11:30 2017 - [info]Tue Nov 7 17:11:30 2017 - [info] ** Phase 1: Configuration Check Phase completed.Tue Nov 7 17:11:30 2017 - [info]Tue Nov 7 17:11:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Tue Nov 7 17:11:30 2017 - [info]Tue Nov 7 17:11:30 2017 - [info] HealthCheck: SSH to host_1 is reachable.Tue Nov 7 17:11:31 2017 - [info] Forcing shutdown so that applications never connect to the current master..Tue Nov 7 17:11:31 2017 - [info] Executing master IP deactivation script:Tue Nov 7 17:11:31 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --command=stopssh --ssh_user=rootTue Nov 7 17:11:33 2017 - [info] done.Tue Nov 7 17:11:33 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Tue Nov 7 17:11:33 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] * Phase 3: Master Recovery Phase..Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] The latest binary log file/position on all slaves is host_1.000051:13508Tue Nov 7 17:11:33 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:3-8Tue Nov 7 17:11:33 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Tue Nov 7 17:11:33 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 17:11:33 2017 - [info] GTID ONTue Nov 7 17:11:33 2017 - [info] Replicating from host_1( host_1:3306)Tue Nov 7 17:11:33 2017 - [info] Primary candidate for the new Master (candidate_master is set)Tue Nov 7 17:11:33 2017 - [info] The oldest binary log file/position on all slaves is host_1.000051:11918Tue Nov 7 17:11:33 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:2-3,Tue Nov 7 17:11:33 2017 - [info] Oldest slaves:Tue Nov 7 17:11:33 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 17:11:33 2017 - [info] GTID ONTue Nov 7 17:11:33 2017 - [info] Replicating from host_1( host_1:3306)Tue Nov 7 17:11:33 2017 - [info] Not candidate for the new Master (no_master is set)Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] * Phase 3.3: Determining New Master Phase..Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] Searching new master from slaves..Tue Nov 7 17:11:33 2017 - [info] Candidate masters from the configuration file:Tue Nov 7 17:11:33 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 17:11:33 2017 - [info] GTID ONTue Nov 7 17:11:33 2017 - [info] Replicating from host_1( host_1:3306)Tue Nov 7 17:11:33 2017 - [info] Primary candidate for the new Master (candidate_master is set)Tue Nov 7 17:11:33 2017 - [info] Non-candidate masters:Tue Nov 7 17:11:33 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 17:11:33 2017 - [info] GTID ONTue Nov 7 17:11:33 2017 - [info] Replicating from host_1( host_1:3306)Tue Nov 7 17:11:33 2017 - [info] Not candidate for the new Master (no_master is set)Tue Nov 7 17:11:33 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..Tue Nov 7 17:11:33 2017 - [info] New master is host_2( host_2:3306)Tue Nov 7 17:11:33 2017 - [info] Starting master failover..Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] * Phase 3.3: New Master Recovery Phase..Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] Waiting all logs to be applied..Tue Nov 7 17:11:33 2017 - [info] done.Tue Nov 7 17:11:33 2017 - [info] -- Saving binlog from host host_2 started, pid: 54677Tue Nov 7 17:11:33 2017 - [info] -- Saving binlog from host host_1 started, pid: 54681Tue Nov 7 17:11:33 2017 - [info] -- Saving binlog from host host_3 started, pid: 54683Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] Log messages from host_3 ...Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] Fetching binary logs from binlog server host_3..Tue Nov 7 17:11:33 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000051 --start_pos=13508 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171107171129.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 17:11:33 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Tue Nov 7 17:11:33 2017 - [info] End of log messages from host_3.Tue Nov 7 17:11:33 2017 - [warning] Got error from host_3.Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] Log messages from host_2 ...Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] Fetching binary logs from binlog server host_2..Tue Nov 7 17:11:33 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000051 --start_pos=13508 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171107171129.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 17:11:33 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Tue Nov 7 17:11:33 2017 - [info] End of log messages from host_2.Tue Nov 7 17:11:33 2017 - [warning] Got error from host_2.Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] Log messages from host_1 ...Tue Nov 7 17:11:33 2017 - [info]Tue Nov 7 17:11:33 2017 - [info] Fetching binary logs from binlog server host_1..Tue Nov 7 17:11:33 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000051 --start_pos=13508 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171107171129.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 17:11:33 2017 - [info] scp from root@ host_1:/var/log/masterha/mha_test/saved_binlog_binlog2_20171107171129.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_1_binlog2_20171107171129.binlog succeeded.Tue Nov 7 17:11:33 2017 - [info] End of log messages from host_1.Tue Nov 7 17:11:33 2017 - [info] Saved mysqlbinlog size from host_1 is 8578 bytes.Tue Nov 7 17:11:33 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_1_binlog2_20171107171129.binlog ..Tue Nov 7 17:11:33 2017 - [info] Differential log apply from binlog server succeeded.Tue Nov 7 17:11:33 2017 - [info] Getting new master's binlog name and position..Tue Nov 7 17:11:33 2017 - [info] host_2.000001:5048Tue Nov 7 17:11:33 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Nov 7 17:11:33 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2.000001, 5048, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,Tue Nov 7 17:11:33 2017 - [info] Executing master IP activate script:Tue Nov 7 17:11:33 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --new_master_host= host_2 --new_master_ip= host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'Tue Nov 7 17:11:36 2017 - [info] OK.Tue Nov 7 17:11:36 2017 - [info] Setting read_only=0 on host_2( host_2:3306)..Tue Nov 7 17:11:36 2017 - [info] ok.Tue Nov 7 17:11:36 2017 - [info] ** Finished master recovery successfully.Tue Nov 7 17:11:36 2017 - [info] * Phase 3: Master Recovery Phase completed.Tue Nov 7 17:11:36 2017 - [info]Tue Nov 7 17:11:36 2017 - [info] * Phase 4: Slaves Recovery Phase..Tue Nov 7 17:11:36 2017 - [info]Tue Nov 7 17:11:36 2017 - [info]Tue Nov 7 17:11:36 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Tue Nov 7 17:11:36 2017 - [info]Tue Nov 7 17:11:36 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 58422. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171107171129.log if it takes time..Tue Nov 7 17:11:37 2017 - [info]Tue Nov 7 17:11:37 2017 - [info] Log messages from host_3 ...Tue Nov 7 17:11:37 2017 - [info]Tue Nov 7 17:11:36 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_2( host_2:3306)..Tue Nov 7 17:11:36 2017 - [info] Executed CHANGE MASTER.Tue Nov 7 17:11:37 2017 - [info] Slave started.Tue Nov 7 17:11:37 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,Tue Nov 7 17:11:37 2017 - [info] End of log messages from host_3.Tue Nov 7 17:11:37 2017 - [info] -- Slave on host host_3( host_3:3306) started.Tue Nov 7 17:11:37 2017 - [info] All new slave servers recovered successfully.Tue Nov 7 17:11:37 2017 - [info]Tue Nov 7 17:11:37 2017 - [info] * Phase 5: New master cleanup phase..Tue Nov 7 17:11:37 2017 - [info]Tue Nov 7 17:11:37 2017 - [info] Resetting slave info on the new master..Tue Nov 7 17:11:37 2017 - [info] host_2: Resetting slave info succeeded.Tue Nov 7 17:11:37 2017 - [info] Master failover to host_2( host_2:3306) completed successfully.Tue Nov 7 17:11:37 2017 - [info]Tue Nov 7 17:11:37 2017 - [info] Sending mail..
masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorTue Nov 7 15:56:11 2017 - [info] MHA::MasterFailover version 0.56.Tue Nov 7 15:56:11 2017 - [info] Starting master failover.Tue Nov 7 15:56:11 2017 - [info]Tue Nov 7 15:56:11 2017 - [info] * Phase 1: Configuration Check Phase..Tue Nov 7 15:56:11 2017 - [info]Tue Nov 7 15:56:11 2017 - [info] HealthCheck: SSH to host_2 is reachable.Tue Nov 7 15:56:12 2017 - [info] Binlog server host_2 is reachable.Tue Nov 7 15:56:12 2017 - [info] HealthCheck: SSH to host_1 is reachable.Tue Nov 7 15:56:12 2017 - [info] Binlog server host_1 is reachable.Tue Nov 7 15:56:12 2017 - [info] HealthCheck: SSH to host_3 is reachable.Tue Nov 7 15:56:13 2017 - [info] Binlog server host_3 is reachable.Tue Nov 7 15:56:13 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306)Tue Nov 7 15:56:13 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306)Tue Nov 7 15:56:13 2017 - [info] GTID failover mode = 1Tue Nov 7 15:56:13 2017 - [info] Dead Servers:Tue Nov 7 15:56:13 2017 - [info] host_2( host_2:3306)Tue Nov 7 15:56:13 2017 - [info] Checking master reachability via MySQL(double check)...Tue Nov 7 15:56:13 2017 - [info] ok.Tue Nov 7 15:56:13 2017 - [info] Alive Servers:Tue Nov 7 15:56:13 2017 - [info] host_1( host_1:3306)Tue Nov 7 15:56:13 2017 - [info] host_3( host_3:3306)Tue Nov 7 15:56:13 2017 - [info] Alive Slaves:Tue Nov 7 15:56:13 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 15:56:13 2017 - [info] GTID ONTue Nov 7 15:56:13 2017 - [info] Replicating from host_2( host_2:3306)Tue Nov 7 15:56:13 2017 - [info] Primary candidate for the new Master (candidate_master is set)Tue Nov 7 15:56:13 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 15:56:13 2017 - [info] GTID ONTue Nov 7 15:56:13 2017 - [info] Replicating from host_2( host_2:3306)Tue Nov 7 15:56:13 2017 - [info] Not candidate for the new Master (no_master is set)Tue Nov 7 15:56:13 2017 - [info] Starting SQL thread on host_1( host_1:3306) ..Tue Nov 7 15:56:13 2017 - [info] done.Tue Nov 7 15:56:13 2017 - [info] Starting SQL thread on host_3( host_3:3306) ..Tue Nov 7 15:56:13 2017 - [info] done.Tue Nov 7 15:56:13 2017 - [info] Starting GTID based failover.Tue Nov 7 15:56:13 2017 - [info]Tue Nov 7 15:56:13 2017 - [info] ** Phase 1: Configuration Check Phase completed.Tue Nov 7 15:56:13 2017 - [info]Tue Nov 7 15:56:13 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Tue Nov 7 15:56:13 2017 - [info]Tue Nov 7 15:56:13 2017 - [info] HealthCheck: SSH to host_2 is reachable.Tue Nov 7 15:56:13 2017 - [info] Forcing shutdown so that applications never connect to the current master..Tue Nov 7 15:56:13 2017 - [info] Executing master IP deactivation script:Tue Nov 7 15:56:13 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stopssh --ssh_user=rootTue Nov 7 15:56:16 2017 - [info] done.Tue Nov 7 15:56:16 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Tue Nov 7 15:56:16 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Tue Nov 7 15:56:16 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] * Phase 3: Master Recovery Phase..Tue Nov 7 15:56:16 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Tue Nov 7 15:56:16 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] The latest binary log file/position on all slaves is host_2.000049:11291Tue Nov 7 15:56:16 2017 - [info] Retrieved Gtid Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:3-446352Tue Nov 7 15:56:16 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Tue Nov 7 15:56:16 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 15:56:16 2017 - [info] GTID ONTue Nov 7 15:56:16 2017 - [info] Replicating from host_2( host_2:3306)Tue Nov 7 15:56:16 2017 - [info] Primary candidate for the new Master (candidate_master is set)Tue Nov 7 15:56:16 2017 - [info] The oldest binary log file/position on all slaves is host_2.000049:10703Tue Nov 7 15:56:16 2017 - [info] Retrieved Gtid Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:3-446350Tue Nov 7 15:56:16 2017 - [info] Oldest slaves:Tue Nov 7 15:56:16 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 15:56:16 2017 - [info] GTID ONTue Nov 7 15:56:16 2017 - [info] Replicating from host_2( host_2:3306)Tue Nov 7 15:56:16 2017 - [info] Not candidate for the new Master (no_master is set)Tue Nov 7 15:56:16 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] * Phase 3.3: Determining New Master Phase..Tue Nov 7 15:56:16 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] Searching new master from slaves..Tue Nov 7 15:56:16 2017 - [info] Candidate masters from the configuration file:Tue Nov 7 15:56:16 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 15:56:16 2017 - [info] GTID ONTue Nov 7 15:56:16 2017 - [info] Replicating from host_2( host_2:3306)Tue Nov 7 15:56:16 2017 - [info] Primary candidate for the new Master (candidate_master is set)Tue Nov 7 15:56:16 2017 - [info] Non-candidate masters:Tue Nov 7 15:56:16 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledTue Nov 7 15:56:16 2017 - [info] GTID ONTue Nov 7 15:56:16 2017 - [info] Replicating from host_2( host_2:3306)Tue Nov 7 15:56:16 2017 - [info] Not candidate for the new Master (no_master is set)Tue Nov 7 15:56:16 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..Tue Nov 7 15:56:16 2017 - [info] New master is host_1( host_1:3306)Tue Nov 7 15:56:16 2017 - [info] Starting master failover..Tue Nov 7 15:56:16 2017 - [info]Tue Nov 7 15:56:16 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] * Phase 3.3: New Master Recovery Phase..Tue Nov 7 15:56:16 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] Waiting all logs to be applied..Tue Nov 7 15:56:16 2017 - [info] done.Tue Nov 7 15:56:16 2017 - [info] -- Saving binlog from host host_2 started, pid: 79759Tue Nov 7 15:56:16 2017 - [info] -- Saving binlog from host host_1 started, pid: 79768Tue Nov 7 15:56:16 2017 - [info] -- Saving binlog from host host_3 started, pid: 79770Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:17 2017 - [info] Log messages from host_1 ...Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_1..Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 15:56:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_1.Tue Nov 7 15:56:17 2017 - [warning] Got error from host_1.Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:17 2017 - [info] Log messages from host_3 ...Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_3..Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 15:56:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_3.Tue Nov 7 15:56:17 2017 - [warning] Got error from host_3.Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:17 2017 - [info] Log messages from host_2 ...Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_2..Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 15:56:17 2017 - [info] scp from root@ host_2:/var/log/masterha/mha_test/saved_binlog_binlog1_20171107155611.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171107155611.binlog succeeded.Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_2.Tue Nov 7 15:56:17 2017 - [info] Saved mysqlbinlog size from host_2 is 768 bytes.Tue Nov 7 15:56:17 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171107155611.binlog ..Tue Nov 7 15:56:17 2017 - [info] Differential log apply from binlog server succeeded.Tue Nov 7 15:56:17 2017 - [info] Getting new master's binlog name and position..Tue Nov 7 15:56:17 2017 - [info] host_1.000051:11449Tue Nov 7 15:56:17 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Nov 7 15:56:17 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000051, 11449, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1,Tue Nov 7 15:56:17 2017 - [info] Executing master IP activate script:Tue Nov 7 15:56:17 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'Tue Nov 7 15:56:20 2017 - [info] OK.Tue Nov 7 15:56:20 2017 - [info] Setting read_only=0 on host_1( host_1:3306)..Tue Nov 7 15:56:20 2017 - [info] ok.Tue Nov 7 15:56:20 2017 - [info] ** Finished master recovery successfully.Tue Nov 7 15:56:20 2017 - [info] * Phase 3: Master Recovery Phase completed.Tue Nov 7 15:56:20 2017 - [info]Tue Nov 7 15:56:20 2017 - [info] * Phase 4: Slaves Recovery Phase..Tue Nov 7 15:56:20 2017 - [info]Tue Nov 7 15:56:20 2017 - [info]Tue Nov 7 15:56:20 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Tue Nov 7 15:56:20 2017 - [info]Tue Nov 7 15:56:20 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 85941. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171107155611.log if it takes time..Tue Nov 7 15:56:21 2017 - [info]Tue Nov 7 15:56:21 2017 - [info] Log messages from host_3 ...Tue Nov 7 15:56:21 2017 - [info]Tue Nov 7 15:56:20 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306)..Tue Nov 7 15:56:20 2017 - [info] Executed CHANGE MASTER.Tue Nov 7 15:56:21 2017 - [info] Slave started.Tue Nov 7 15:56:21 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1,Tue Nov 7 15:56:21 2017 - [info] End of log messages from host_3.Tue Nov 7 15:56:21 2017 - [info] -- Slave on host host_3( host_3:3306) started.Tue Nov 7 15:56:21 2017 - [info] All new slave servers recovered successfully.Tue Nov 7 15:56:21 2017 - [info]Tue Nov 7 15:56:21 2017 - [info] * Phase 5: New master cleanup phase..Tue Nov 7 15:56:21 2017 - [info]Tue Nov 7 15:56:21 2017 - [info] Resetting slave info on the new master..Tue Nov 7 15:56:21 2017 - [info] host_1: Resetting slave info succeeded.Tue Nov 7 15:56:21 2017 - [info] Master failover to host_1( host_1:3306) completed successfully.Tue Nov 7 15:56:21 2017 - [info]Tue Nov 7 15:56:21 2017 - [info] Sending mail..
无影响,正常切换
dba:(none)> show processlist;+----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+| Id | User | Host | db | Command | Time | State | Info |+----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+| 63 | dba | localhost | NULL | Query | 0 | starting | show processlist || 65 | dba | xx:11164 | NULL | Sleep | 121 | | NULL || 83 | dba | new master:49022 | NULL | Query | 176 | Waiting for global read lock | BINLOG 'GpAKWhNYUy1LMAAAAGYHAAAAAG0AAAAAAAEAAmxjAAh0X2NoYXJfMgACAw8CLAECGpAKWh5YUy1LJwAAAI0HAAAAAG |+----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+3 rows in set (0.00 sec)
dead_master上的最后部分日志没有传递到slave和etl的情况, 然而slave的日志也落后etl (这是最严苛的情况)
masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorTue Nov 7 15:56:17 2017 - [info] Log messages from host_1 ...Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_1..Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 15:56:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_1.Tue Nov 7 15:56:17 2017 - [warning] Got error from host_1.Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:17 2017 - [info] Log messages from host_3 ...Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_3..Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 15:56:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_3.Tue Nov 7 15:56:17 2017 - [warning] Got error from host_3.Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:17 2017 - [info] Log messages from host_2 ...Tue Nov 7 15:56:17 2017 - [info]Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_2..Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binTue Nov 7 15:56:17 2017 - [info] scp from root@ host_2:/var/log/masterha/mha_test/saved_binlog_binlog1_20171107155611.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171107155611.binlog succeeded.Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_2.Tue Nov 7 15:56:17 2017 - [info] Saved mysqlbinlog size from host_2 is 768 bytes.Tue Nov 7 15:56:17 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171107155611.binlog ..Tue Nov 7 15:56:17 2017 - [info] Differential log apply from binlog server succeeded.
masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorThu Nov 9 11:20:04 2017 - [info] -- Saving binlog from host host_2 started, pid: 117389Thu Nov 9 11:20:05 2017 - [info]Thu Nov 9 11:20:05 2017 - [info] Log messages from host_2 ...Thu Nov 9 11:20:05 2017 - [info]Thu Nov 9 11:20:04 2017 - [info] Fetching binary logs from binlog server host_2..Thu Nov 9 11:20:04 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000004 --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171109111957.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binThu Nov 9 11:20:05 2017 - [info] scp from root@ host_2:/var/log/masterha/mha_test/saved_binlog_binlog1_20171109111957.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171109111957.binlog succeeded.Thu Nov 9 11:20:05 2017 - [info] End of log messages from host_2.Thu Nov 9 11:20:05 2017 - [info] Saved mysqlbinlog size from host_2 is 4444 bytes.Thu Nov 9 11:20:05 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171109111957.binlog ..Thu Nov 9 11:20:05 2017 - [info] Differential log apply from binlog server succeeded.
### 3台服务器的GTID状态* master host_1dba:lc> show master status;+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| host_1.000055 | 6016 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446369 |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+1 row in set (0.00 sec)* slave host_2 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446369 Auto_Position: 1* etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:22-25,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446366-446369 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-25,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446369 Auto_Position: 1### 切换日志masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorThu Nov 9 15:00:09 2017 - [info] MHA::MasterFailover version 0.56.Thu Nov 9 15:00:09 2017 - [info] Starting master failover.Thu Nov 9 15:00:09 2017 - [info]Thu Nov 9 15:00:09 2017 - [info] * Phase 1: Configuration Check Phase..Thu Nov 9 15:00:09 2017 - [info]Thu Nov 9 15:00:09 2017 - [info] HealthCheck: SSH to host_2 is reachable.Thu Nov 9 15:00:09 2017 - [info] Binlog server host_2 is reachable.Thu Nov 9 15:00:10 2017 - [warning] SQL Thread is stopped(no error) on host_2( host_2:3306)Thu Nov 9 15:00:10 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306)Thu Nov 9 15:00:10 2017 - [info] GTID failover mode = 1Thu Nov 9 15:00:10 2017 - [info] Dead Servers:Thu Nov 9 15:00:10 2017 - [info] host_1( host_1:3306)Thu Nov 9 15:00:10 2017 - [info] Checking master reachability via MySQL(double check)...Thu Nov 9 15:00:10 2017 - [info] ok.Thu Nov 9 15:00:10 2017 - [info] Alive Servers:Thu Nov 9 15:00:10 2017 - [info] host_2( host_2:3306)Thu Nov 9 15:00:10 2017 - [info] host_3( host_3:3306)Thu Nov 9 15:00:10 2017 - [info] Alive Slaves:Thu Nov 9 15:00:10 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 15:00:10 2017 - [info] GTID ONThu Nov 9 15:00:10 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 15:00:10 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 15:00:10 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 15:00:10 2017 - [info] GTID ONThu Nov 9 15:00:10 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 15:00:10 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 15:00:10 2017 - [info] Starting SQL thread on host_2( host_2:3306) ..Thu Nov 9 15:00:10 2017 - [info] done.Thu Nov 9 15:00:10 2017 - [info] Starting SQL thread on host_3( host_3:3306) ..Thu Nov 9 15:00:10 2017 - [info] done.Thu Nov 9 15:00:10 2017 - [info] Starting GTID based failover.Thu Nov 9 15:00:10 2017 - [info]Thu Nov 9 15:00:10 2017 - [info] ** Phase 1: Configuration Check Phase completed.Thu Nov 9 15:00:10 2017 - [info]Thu Nov 9 15:00:10 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Thu Nov 9 15:00:10 2017 - [info]Thu Nov 9 15:00:10 2017 - [info] HealthCheck: SSH to host_1 is reachable.Thu Nov 9 15:00:10 2017 - [info] Forcing shutdown so that applications never connect to the current master..Thu Nov 9 15:00:10 2017 - [info] Executing master IP deactivation script:Thu Nov 9 15:00:10 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --command=stopssh --ssh_user=rootThu Nov 9 15:00:17 2017 - [info] done.Thu Nov 9 15:00:17 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Thu Nov 9 15:00:17 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Thu Nov 9 15:00:17 2017 - [info]Thu Nov 9 15:00:17 2017 - [info] * Phase 3: Master Recovery Phase..Thu Nov 9 15:00:17 2017 - [info]Thu Nov 9 15:00:17 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Thu Nov 9 15:00:17 2017 - [info]Thu Nov 9 15:00:17 2017 - [info] The latest binary log file/position on all slaves is host_1.000055:4090Thu Nov 9 15:00:17 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:22-25,Thu Nov 9 15:00:17 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Thu Nov 9 15:00:17 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 15:00:17 2017 - [info] GTID ONThu Nov 9 15:00:17 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 15:00:17 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 15:00:17 2017 - [info] The oldest binary log file/position on all slaves is host_1.000055:2806Thu Nov 9 15:00:17 2017 - [info] Oldest slaves:Thu Nov 9 15:00:17 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 15:00:17 2017 - [info] GTID ONThu Nov 9 15:00:17 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 15:00:17 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 15:00:17 2017 - [info]Thu Nov 9 15:00:17 2017 - [info] * Phase 3.3: Determining New Master Phase..Thu Nov 9 15:00:17 2017 - [info]Thu Nov 9 15:00:17 2017 - [info] Searching new master from slaves..Thu Nov 9 15:00:17 2017 - [info] Candidate masters from the configuration file:Thu Nov 9 15:00:17 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 15:00:17 2017 - [info] GTID ONThu Nov 9 15:00:17 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 15:00:17 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 15:00:17 2017 - [info] Non-candidate masters:Thu Nov 9 15:00:17 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 15:00:17 2017 - [info] GTID ONThu Nov 9 15:00:17 2017 - [info] Replicating from host_1( host_1:3306)Thu Nov 9 15:00:17 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 15:00:17 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..Thu Nov 9 15:00:17 2017 - [info] Not found.Thu Nov 9 15:00:17 2017 - [info] Searching from all candidate_master slaves..Thu Nov 9 15:00:17 2017 - [info] New master is host_2( host_2:3306)Thu Nov 9 15:00:17 2017 - [info] Starting master failover..Thu Nov 9 15:00:17 2017 - [info]Thu Nov 9 15:00:17 2017 - [info]Thu Nov 9 15:00:17 2017 - [info] * Phase 3.3: New Master Recovery Phase..Thu Nov 9 15:00:17 2017 - [info]Thu Nov 9 15:00:17 2017 - [info] Waiting all logs to be applied..Thu Nov 9 15:00:17 2017 - [info] done.Thu Nov 9 15:00:17 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply..Thu Nov 9 15:00:17 2017 - [info] Waiting all logs to be applied on the latest slave..Thu Nov 9 15:00:17 2017 - [info] Resetting slave host_2( host_2:3306) and starting replication from the new master host_3( host_3:3306)..Thu Nov 9 15:00:17 2017 - [info] Executed CHANGE MASTER.Thu Nov 9 15:00:18 2017 - [info] Slave started.Thu Nov 9 15:00:18 2017 - [info] Waiting to execute all relay logs on host_2( host_2:3306)..Thu Nov 9 15:00:18 2017 - [info] master_pos_wait( host_3.000049:25843) completed on host_2( host_2:3306). Executed 0 events.Thu Nov 9 15:00:18 2017 - [info] done.Thu Nov 9 15:00:18 2017 - [info] done.Thu Nov 9 15:00:18 2017 - [info] -- Saving binlog from host host_2 started, pid: 175683Thu Nov 9 15:00:18 2017 - [info]Thu Nov 9 15:00:18 2017 - [info] Log messages from host_2 ...Thu Nov 9 15:00:18 2017 - [info]Thu Nov 9 15:00:18 2017 - [info] Fetching binary logs from binlog server host_2..Thu Nov 9 15:00:18 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000055 --start_pos=4090 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171109150009.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binThu Nov 9 15:00:18 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Thu Nov 9 15:00:18 2017 - [info] End of log messages from host_2.Thu Nov 9 15:00:18 2017 - [warning] Got error from host_2.Thu Nov 9 15:00:18 2017 - [info] Getting new master's binlog name and position..Thu Nov 9 15:00:18 2017 - [info] host_2.000005:1390Thu Nov 9 15:00:18 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Thu Nov 9 15:00:18 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2.000005, 1390, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-25,Thu Nov 9 15:00:18 2017 - [info] Executing master IP activate script:Thu Nov 9 15:00:18 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --new_master_host= host_2 --new_master_ip= host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'Thu Nov 9 15:00:22 2017 - [info] OK.Thu Nov 9 15:00:22 2017 - [info] Setting read_only=0 on host_2( host_2:3306)..Thu Nov 9 15:00:22 2017 - [info] ok.Thu Nov 9 15:00:22 2017 - [info] ** Finished master recovery successfully.Thu Nov 9 15:00:22 2017 - [info] * Phase 3: Master Recovery Phase completed.Thu Nov 9 15:00:22 2017 - [info]Thu Nov 9 15:00:22 2017 - [info] * Phase 4: Slaves Recovery Phase..Thu Nov 9 15:00:22 2017 - [info]Thu Nov 9 15:00:22 2017 - [info]Thu Nov 9 15:00:22 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Thu Nov 9 15:00:22 2017 - [info]Thu Nov 9 15:00:22 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 180681. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171109150009.log if it takes time..Thu Nov 9 15:00:23 2017 - [info]Thu Nov 9 15:00:23 2017 - [info] Log messages from host_3 ...Thu Nov 9 15:00:23 2017 - [info]Thu Nov 9 15:00:22 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_2( host_2:3306)..Thu Nov 9 15:00:22 2017 - [info] Executed CHANGE MASTER.Thu Nov 9 15:00:23 2017 - [info] Slave started.Thu Nov 9 15:00:23 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-25,Thu Nov 9 15:00:23 2017 - [info] End of log messages from host_3.Thu Nov 9 15:00:23 2017 - [info] -- Slave on host host_3( host_3:3306) started.Thu Nov 9 15:00:23 2017 - [info] All new slave servers recovered successfully.Thu Nov 9 15:00:23 2017 - [info]Thu Nov 9 15:00:23 2017 - [info] * Phase 5: New master cleanup phase..Thu Nov 9 15:00:23 2017 - [info]Thu Nov 9 15:00:23 2017 - [info] Resetting slave info on the new master..Thu Nov 9 15:00:23 2017 - [info] host_2: Resetting slave info succeeded.Thu Nov 9 15:00:23 2017 - [info] Master failover to host_2( host_2:3306) completed successfully.Thu Nov 9 15:00:23 2017 - [info]Thu Nov 9 15:00:23 2017 - [info] Sending mail..
结论: 由于binlog server没有配置master,所以会丢失master没有传递过来的事务日志
好在,slave和etl之间会互相change master,所以尽管slave(candidate master)的日志落后,最终也还是用etl的日志补齐了slave缺失的日志。### 3台DB的GTID状态* master host_2dba:lc> show master status;+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| host_2.000005 | 5785 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446378 |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+1 row in set (0.00 sec)* slave host_1 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446369 Auto_Position: 1* etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:26-31,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446370-446372 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446372 Auto_Position: 1### 切换日志Thu Nov 9 16:22:41 2017 - [info] MHA::MasterFailover version 0.56.Thu Nov 9 16:22:41 2017 - [info] Starting master failover.Thu Nov 9 16:22:41 2017 - [info]Thu Nov 9 16:22:41 2017 - [info] * Phase 1: Configuration Check Phase..Thu Nov 9 16:22:41 2017 - [info]Thu Nov 9 16:22:41 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306)Thu Nov 9 16:22:41 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306)Thu Nov 9 16:22:41 2017 - [info] GTID failover mode = 1Thu Nov 9 16:22:41 2017 - [info] Dead Servers:Thu Nov 9 16:22:41 2017 - [info] host_2( host_2:3306)Thu Nov 9 16:22:41 2017 - [info] Checking master reachability via MySQL(double check)...Thu Nov 9 16:22:41 2017 - [info] ok.Thu Nov 9 16:22:41 2017 - [info] Alive Servers:Thu Nov 9 16:22:41 2017 - [info] host_1( host_1:3306)Thu Nov 9 16:22:41 2017 - [info] host_3( host_3:3306)Thu Nov 9 16:22:41 2017 - [info] Alive Slaves:Thu Nov 9 16:22:41 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 16:22:41 2017 - [info] GTID ONThu Nov 9 16:22:41 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 16:22:41 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 16:22:41 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 16:22:41 2017 - [info] GTID ONThu Nov 9 16:22:41 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 16:22:41 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 16:22:41 2017 - [info] Starting SQL thread on host_1( host_1:3306) ..Thu Nov 9 16:22:41 2017 - [info] done.Thu Nov 9 16:22:41 2017 - [info] Starting SQL thread on host_3( host_3:3306) ..Thu Nov 9 16:22:41 2017 - [info] done.Thu Nov 9 16:22:41 2017 - [info] Starting GTID based failover.Thu Nov 9 16:22:41 2017 - [info]Thu Nov 9 16:22:41 2017 - [info] ** Phase 1: Configuration Check Phase completed.Thu Nov 9 16:22:41 2017 - [info]Thu Nov 9 16:22:41 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Thu Nov 9 16:22:41 2017 - [info]Thu Nov 9 16:22:42 2017 - [info] HealthCheck: SSH to host_2 is reachable.Thu Nov 9 16:22:42 2017 - [info] Forcing shutdown so that applications never connect to the current master..Thu Nov 9 16:22:42 2017 - [info] Executing master IP deactivation script:Thu Nov 9 16:22:42 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root=================== swift vip : tgw_vip from host_2 is deleted ==============================--2017-11-09 16:22:42-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi正在连接 tgw_server:80... 已连接。已发出 HTTP 请求,正在等待回应... 200 OK长度:未指定 [text/html]正在保存至: “STDOUT” 0K 9.79M=0s2017-11-09 16:22:44 (9.79 MB/s) - 已写入标准输出 [38]Thu Nov 9 16:22:44 2017 - [info] done.Thu Nov 9 16:22:44 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Thu Nov 9 16:22:44 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Thu Nov 9 16:22:44 2017 - [info]Thu Nov 9 16:22:44 2017 - [info] * Phase 3: Master Recovery Phase..Thu Nov 9 16:22:44 2017 - [info]Thu Nov 9 16:22:44 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Thu Nov 9 16:22:44 2017 - [info]Thu Nov 9 16:22:44 2017 - [info] The latest binary log file/position on all slaves is host_2.000005:4015Thu Nov 9 16:22:44 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:26-31,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446370-446372Thu Nov 9 16:22:44 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Thu Nov 9 16:22:44 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 16:22:44 2017 - [info] GTID ONThu Nov 9 16:22:44 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 16:22:44 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 16:22:44 2017 - [info] The oldest binary log file/position on all slaves is host_2.000005:3130Thu Nov 9 16:22:44 2017 - [info] Oldest slaves:Thu Nov 9 16:22:44 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 16:22:44 2017 - [info] GTID ONThu Nov 9 16:22:44 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 16:22:44 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 16:22:44 2017 - [info]Thu Nov 9 16:22:44 2017 - [info] * Phase 3.3: Determining New Master Phase..Thu Nov 9 16:22:44 2017 - [info]Thu Nov 9 16:22:44 2017 - [info] Searching new master from slaves..Thu Nov 9 16:22:44 2017 - [info] Candidate masters from the configuration file:Thu Nov 9 16:22:44 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 16:22:44 2017 - [info] GTID ONThu Nov 9 16:22:44 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 16:22:44 2017 - [info] Primary candidate for the new Master (candidate_master is set)Thu Nov 9 16:22:44 2017 - [info] Non-candidate masters:Thu Nov 9 16:22:44 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledThu Nov 9 16:22:44 2017 - [info] GTID ONThu Nov 9 16:22:44 2017 - [info] Replicating from host_2( host_2:3306)Thu Nov 9 16:22:44 2017 - [info] Not candidate for the new Master (no_master is set)Thu Nov 9 16:22:44 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..Thu Nov 9 16:22:44 2017 - [info] Not found.Thu Nov 9 16:22:44 2017 - [info] Searching from all candidate_master slaves..Thu Nov 9 16:22:44 2017 - [info] New master is host_1( host_1:3306)Thu Nov 9 16:22:44 2017 - [info] Starting master failover..Thu Nov 9 16:22:44 2017 - [info]From: host_2( host_2:3306) (current master) +-- host_1( host_1:3306) +-- host_3( host_3:3306)To: host_1( host_1:3306) (new master) +-- host_3( host_3:3306)Thu Nov 9 16:22:44 2017 - [info]Thu Nov 9 16:22:44 2017 - [info] * Phase 3.3: New Master Recovery Phase..Thu Nov 9 16:22:44 2017 - [info]Thu Nov 9 16:22:44 2017 - [info] Waiting all logs to be applied..Thu Nov 9 16:22:44 2017 - [info] done.Thu Nov 9 16:22:44 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply..Thu Nov 9 16:22:44 2017 - [info] Waiting all logs to be applied on the latest slave..Thu Nov 9 16:22:44 2017 - [info] Resetting slave host_1( host_1:3306) and starting replication from the new master host_3( host_3:3306)..Thu Nov 9 16:22:44 2017 - [info] Executed CHANGE MASTER.Thu Nov 9 16:22:45 2017 - [info] Slave started.Thu Nov 9 16:22:45 2017 - [info] Waiting to execute all relay logs on host_1( host_1:3306)..Thu Nov 9 16:22:45 2017 - [info] master_pos_wait( host_3.000049:28663) completed on host_1( host_1:3306). Executed 0 events.Thu Nov 9 16:22:45 2017 - [info] done.Thu Nov 9 16:22:45 2017 - [info] done.Thu Nov 9 16:22:45 2017 - [info] Getting new master's binlog name and position..Thu Nov 9 16:22:45 2017 - [info] host_1.000056:1170Thu Nov 9 16:22:45 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Thu Nov 9 16:22:45 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000056, 1170, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446372Thu Nov 9 16:22:45 2017 - [info] Executing master IP activate script:Thu Nov 9 16:22:45 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'Unknown option: new_master_userUnknown option: new_master_password=================== swift vip : tgw_vip to host_1 is added ==============================Thu Nov 9 16:22:47 2017 - [info] OK.Thu Nov 9 16:22:47 2017 - [info] Setting read_only=0 on host_1( host_1:3306)..Thu Nov 9 16:22:47 2017 - [info] ok.Thu Nov 9 16:22:47 2017 - [info] ** Finished master recovery successfully.Thu Nov 9 16:22:47 2017 - [info] * Phase 3: Master Recovery Phase completed.Thu Nov 9 16:22:47 2017 - [info]Thu Nov 9 16:22:47 2017 - [info] * Phase 4: Slaves Recovery Phase..Thu Nov 9 16:22:47 2017 - [info]Thu Nov 9 16:22:47 2017 - [info]Thu Nov 9 16:22:47 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Thu Nov 9 16:22:47 2017 - [info]Thu Nov 9 16:22:47 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 112317. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171109162241.log if it takes time..Thu Nov 9 16:22:48 2017 - [info]Thu Nov 9 16:22:48 2017 - [info] Log messages from host_3 ...Thu Nov 9 16:22:48 2017 - [info]Thu Nov 9 16:22:47 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306)..Thu Nov 9 16:22:47 2017 - [info] Executed CHANGE MASTER.Thu Nov 9 16:22:48 2017 - [info] Slave started.Thu Nov 9 16:22:48 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446372) completed on host_3( host_3:3306). Executed 0 events.Thu Nov 9 16:22:48 2017 - [info] End of log messages from host_3.Thu Nov 9 16:22:48 2017 - [info] -- Slave on host host_3( host_3:3306) started.Thu Nov 9 16:22:48 2017 - [info] All new slave servers recovered successfully.Thu Nov 9 16:22:48 2017 - [info]Thu Nov 9 16:22:48 2017 - [info] * Phase 5: New master cleanup phase..Thu Nov 9 16:22:48 2017 - [info]Thu Nov 9 16:22:48 2017 - [info] Resetting slave info on the new master..Thu Nov 9 16:22:49 2017 - [info] host_1: Resetting slave info succeeded.Thu Nov 9 16:22:49 2017 - [info] Master failover to host_1( host_1:3306) completed successfully.Thu Nov 9 16:22:49 2017 - [info]----- Failover Report -----bak_mha_test: MySQL Master failover host_2( host_2:3306) to host_1( host_1:3306) succeededMaster host_2( host_2:3306) is down!Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details.Started automated(non-interactive) failover.Invalidated master IP address on host_2( host_2:3306)Selected host_1( host_1:3306) as a new master. host_1( host_1:3306): OK: Applying all logs succeeded. host_1( host_1:3306): OK: Activated master IP address. host_3( host_3:3306): OK: Slave started, replicating from host_1( host_1:3306) host_1( host_1:3306): Resetting slave info succeeded.Master failover to host_1( host_1:3306) completed successfully.
结论: 由于binlog server没有配置master,所以会丢失master没有传递过来的事务日志
好在,slave和etl之间会互相change master,所以尽管slave(candidate master)的日志落后,最终也还是用etl的日志补齐了slave缺失的日志。一般这种场景就是:已经failover到最后的change master阶段,这样主从结构已经变更,MHA无法重新走一遍。
不过,即便到这步骤失败了,表示master的日志已经补完,由于是gtid模式,自己再让slave change master到最新的master即可,最后ACTIVE new ip和readonly=1就好了
Thu Nov 9 16:49:39 2017 - [info] MHA::MasterFailover version 0.56.Thu Nov 9 16:49:39 2017 - [info] Starting master failover.Thu Nov 9 16:49:39 2017 - [info]Thu Nov 9 16:49:39 2017 - [info] * Phase 1: Configuration Check Phase..Thu Nov 9 16:49:39 2017 - [info]Thu Nov 9 16:49:39 2017 - [info] GTID failover mode = 1Thu Nov 9 16:49:39 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln169] Detected dead master host_1( host_1:3306) does not match with specified dead master host_2( host_2:3306)!Thu Nov 9 16:49:39 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53
1. failover最终命令masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error2. binlog server建议配置master就可以了[binlog1]$master_ip只配置slave,或者没有配置,会导致丢失部分没有从master传递过来的日志事务
同1.1 结论
### 3台DB的GTID状态* master host_2dba:lc> show master status;+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| host_2.000008 | 5445 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+1 row in set (0.00 sec)* slave host_1 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446385 Auto_Position: 1* etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:46-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446386-446388 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388 Auto_Position: 1### 模拟故障场景* 隔离master的网络,让其等同于down机master> iptables -A INPUT -p tcp -s other_host --dport 22 -j ACCEPTmaster> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP### 切换日志masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorFri Nov 10 11:12:38 2017 - [info] MHA::MasterFailover version 0.56.Fri Nov 10 11:12:38 2017 - [info] Starting master failover.Fri Nov 10 11:12:38 2017 - [info]Fri Nov 10 11:12:38 2017 - [info] * Phase 1: Configuration Check Phase..Fri Nov 10 11:12:38 2017 - [info]Fri Nov 10 11:13:28 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_2! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.Fri Nov 10 11:13:28 2017 - [warning] Failed to SSH to binlog server host_2Fri Nov 10 11:13:29 2017 - [info] HealthCheck: SSH to host_1 is reachable.Fri Nov 10 11:13:29 2017 - [info] Binlog server host_1 is reachable.Fri Nov 10 11:13:29 2017 - [info] HealthCheck: SSH to host_3 is reachable.Fri Nov 10 11:13:29 2017 - [info] Binlog server host_3 is reachable.Fri Nov 10 11:13:29 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306)Fri Nov 10 11:13:29 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306)Fri Nov 10 11:13:29 2017 - [info] GTID failover mode = 1Fri Nov 10 11:13:29 2017 - [info] Dead Servers:Fri Nov 10 11:13:29 2017 - [info] host_2( host_2:3306)Fri Nov 10 11:13:29 2017 - [info] Checking master reachability via MySQL(double check)...Fri Nov 10 11:13:30 2017 - [info] ok.Fri Nov 10 11:13:30 2017 - [info] Alive Servers:Fri Nov 10 11:13:30 2017 - [info] host_1( host_1:3306)Fri Nov 10 11:13:30 2017 - [info] host_3( host_3:3306)Fri Nov 10 11:13:30 2017 - [info] Alive Slaves:Fri Nov 10 11:13:30 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 11:13:30 2017 - [info] GTID ONFri Nov 10 11:13:30 2017 - [info] Replicating from host_2( host_2:3306)Fri Nov 10 11:13:30 2017 - [info] Primary candidate for the new Master (candidate_master is set)Fri Nov 10 11:13:30 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 11:13:30 2017 - [info] GTID ONFri Nov 10 11:13:30 2017 - [info] Replicating from host_2( host_2:3306)Fri Nov 10 11:13:30 2017 - [info] Not candidate for the new Master (no_master is set)Fri Nov 10 11:13:30 2017 - [info] Starting SQL thread on host_1( host_1:3306) ..Fri Nov 10 11:13:30 2017 - [info] done.Fri Nov 10 11:13:30 2017 - [info] Starting SQL thread on host_3( host_3:3306) ..Fri Nov 10 11:13:30 2017 - [info] done.Fri Nov 10 11:13:30 2017 - [info] Starting GTID based failover.Fri Nov 10 11:13:30 2017 - [info]Fri Nov 10 11:13:30 2017 - [info] ** Phase 1: Configuration Check Phase completed.Fri Nov 10 11:13:30 2017 - [info]Fri Nov 10 11:13:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Fri Nov 10 11:13:30 2017 - [info]Fri Nov 10 11:14:20 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_2! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.Fri Nov 10 11:14:20 2017 - [info] Forcing shutdown so that applications never connect to the current master..Fri Nov 10 11:14:20 2017 - [info] Executing master IP deactivation script:Fri Nov 10 11:14:20 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stopssh: connect to host host_2 port 22: Connection timed out=================== swift vip : tgw_vip from host_2 is deleted ==============================--2017-11-10 11:14:27-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi正在连接 tgw_server:80... 已连接。已发出 HTTP 请求,正在等待回应... 200 OK长度:未指定 [text/html]正在保存至: “STDOUT” 0K 11.4M=0s2017-11-10 11:16:27 (11.4 MB/s) - 已写入标准输出 [38]Fri Nov 10 11:16:27 2017 - [info] done.Fri Nov 10 11:16:27 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Fri Nov 10 11:16:27 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Fri Nov 10 11:16:27 2017 - [info]Fri Nov 10 11:16:27 2017 - [info] * Phase 3: Master Recovery Phase..Fri Nov 10 11:16:27 2017 - [info]Fri Nov 10 11:16:27 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Fri Nov 10 11:16:27 2017 - [info]Fri Nov 10 11:16:27 2017 - [info] The latest binary log file/position on all slaves is host_2.000008:4265Fri Nov 10 11:16:27 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:46-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446386-446388Fri Nov 10 11:16:27 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Fri Nov 10 11:16:27 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 11:16:27 2017 - [info] GTID ONFri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306)Fri Nov 10 11:16:27 2017 - [info] Not candidate for the new Master (no_master is set)Fri Nov 10 11:16:27 2017 - [info] The oldest binary log file/position on all slaves is host_2.000008:3380Fri Nov 10 11:16:27 2017 - [info] Oldest slaves:Fri Nov 10 11:16:27 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 11:16:27 2017 - [info] GTID ONFri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306)Fri Nov 10 11:16:27 2017 - [info] Primary candidate for the new Master (candidate_master is set)Fri Nov 10 11:16:27 2017 - [info]Fri Nov 10 11:16:27 2017 - [info] * Phase 3.3: Determining New Master Phase..Fri Nov 10 11:16:27 2017 - [info]Fri Nov 10 11:16:27 2017 - [info] Searching new master from slaves..Fri Nov 10 11:16:27 2017 - [info] Candidate masters from the configuration file:Fri Nov 10 11:16:27 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 11:16:27 2017 - [info] GTID ONFri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306)Fri Nov 10 11:16:27 2017 - [info] Primary candidate for the new Master (candidate_master is set)Fri Nov 10 11:16:27 2017 - [info] Non-candidate masters:Fri Nov 10 11:16:27 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 11:16:27 2017 - [info] GTID ONFri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306)Fri Nov 10 11:16:27 2017 - [info] Not candidate for the new Master (no_master is set)Fri Nov 10 11:16:27 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..Fri Nov 10 11:16:27 2017 - [info] Not found.Fri Nov 10 11:16:27 2017 - [info] Searching from all candidate_master slaves..Fri Nov 10 11:16:27 2017 - [info] New master is host_1( host_1:3306)Fri Nov 10 11:16:27 2017 - [info] Starting master failover..Fri Nov 10 11:16:27 2017 - [info]From: host_2( host_2:3306) (current master) +-- host_1( host_1:3306) +-- host_3( host_3:3306)To: host_1( host_1:3306) (new master) +-- host_3( host_3:3306)Fri Nov 10 11:16:27 2017 - [info]Fri Nov 10 11:16:27 2017 - [info] * Phase 3.3: New Master Recovery Phase..Fri Nov 10 11:16:27 2017 - [info]Fri Nov 10 11:16:27 2017 - [info] Waiting all logs to be applied..Fri Nov 10 11:16:27 2017 - [info] done.Fri Nov 10 11:16:27 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply..Fri Nov 10 11:16:27 2017 - [info] Waiting all logs to be applied on the latest slave..Fri Nov 10 11:16:27 2017 - [info] Resetting slave host_1( host_1:3306) and starting replication from the new master host_3( host_3:3306)..Fri Nov 10 11:16:27 2017 - [info] Executed CHANGE MASTER.Fri Nov 10 11:16:28 2017 - [info] Slave started.Fri Nov 10 11:16:28 2017 - [info] Waiting to execute all relay logs on host_1( host_1:3306)..Fri Nov 10 11:16:28 2017 - [info] master_pos_wait( host_3.000049:40136) completed on host_1( host_1:3306). Executed 0 events.Fri Nov 10 11:16:28 2017 - [info] done.Fri Nov 10 11:16:28 2017 - [info] done.Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_2 started, pid: 43038Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_1 started, pid: 43039Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_3 started, pid: 43041Fri Nov 10 11:16:28 2017 - [info]Fri Nov 10 11:16:28 2017 - [info] Log messages from host_2 ...Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_2.Fri Nov 10 11:16:28 2017 - [warning] SSH is not reachable on host_2. SkippingFri Nov 10 11:16:28 2017 - [info]Fri Nov 10 11:16:28 2017 - [info] Log messages from host_1 ...Fri Nov 10 11:16:28 2017 - [info]Fri Nov 10 11:16:28 2017 - [info] Fetching binary logs from binlog server host_1..Fri Nov 10 11:16:28 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000008 --start_pos=4265 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171110111238.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binFailed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66Fri Nov 10 11:16:28 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_1.Fri Nov 10 11:16:28 2017 - [warning] Got error from host_1.Fri Nov 10 11:16:28 2017 - [info]Fri Nov 10 11:16:28 2017 - [info] Log messages from host_3 ...Fri Nov 10 11:16:28 2017 - [info]Fri Nov 10 11:16:28 2017 - [info] Fetching binary logs from binlog server host_3..Fri Nov 10 11:16:28 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000008 --start_pos=4265 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171110111238.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binFailed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66Fri Nov 10 11:16:28 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_3.Fri Nov 10 11:16:28 2017 - [warning] Got error from host_3.Fri Nov 10 11:16:28 2017 - [info] Getting new master's binlog name and position..Fri Nov 10 11:16:28 2017 - [info] host_1.000058:4059Fri Nov 10 11:16:28 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Fri Nov 10 11:16:28 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000058, 4059, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388Fri Nov 10 11:16:28 2017 - [info] Executing master IP activate script:Fri Nov 10 11:16:28 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'Unknown option: new_master_userUnknown option: new_master_password=================== swift vip : tgw_vip to host_1 is added ==============================Fri Nov 10 11:16:30 2017 - [info] OK.Fri Nov 10 11:16:30 2017 - [info] ** Finished master recovery successfully.Fri Nov 10 11:16:30 2017 - [info] * Phase 3: Master Recovery Phase completed.Fri Nov 10 11:16:30 2017 - [info]Fri Nov 10 11:16:30 2017 - [info] * Phase 4: Slaves Recovery Phase..Fri Nov 10 11:16:30 2017 - [info]Fri Nov 10 11:16:30 2017 - [info]Fri Nov 10 11:16:30 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Fri Nov 10 11:16:30 2017 - [info]Fri Nov 10 11:16:30 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 46878. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171110111238.log if it takes time..Fri Nov 10 11:16:31 2017 - [info]Fri Nov 10 11:16:31 2017 - [info] Log messages from host_3 ...Fri Nov 10 11:16:31 2017 - [info]Fri Nov 10 11:16:30 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306)..Fri Nov 10 11:16:30 2017 - [info] Executed CHANGE MASTER.Fri Nov 10 11:16:31 2017 - [info] Slave started.Fri Nov 10 11:16:31 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388) completed on host_3( host_3:3306). Executed 0 events.Fri Nov 10 11:16:31 2017 - [info] End of log messages from host_3.Fri Nov 10 11:16:31 2017 - [info] -- Slave on host host_3( host_3:3306) started.Fri Nov 10 11:16:31 2017 - [info] All new slave servers recovered successfully.Fri Nov 10 11:16:31 2017 - [info]Fri Nov 10 11:16:31 2017 - [info] * Phase 5: New master cleanup phase..Fri Nov 10 11:16:31 2017 - [info]Fri Nov 10 11:16:31 2017 - [info] Resetting slave info on the new master..Fri Nov 10 11:16:31 2017 - [info] host_1: Resetting slave info succeeded.Fri Nov 10 11:16:31 2017 - [info] Master failover to host_1( host_1:3306) completed successfully.Fri Nov 10 11:16:31 2017 - [info]----- Failover Report -----bak_mha_test: MySQL Master failover host_2( host_2:3306) to host_1( host_1:3306) succeededMaster host_2( host_2:3306) is down!Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details.Started automated(non-interactive) failover.Invalidated master IP address on host_2( host_2:3306)Selected host_1( host_1:3306) as a new master. host_1( host_1:3306): OK: Applying all logs succeeded. host_1( host_1:3306): OK: Activated master IP address. host_3( host_3:3306): OK: Slave started, replicating from host_1( host_1:3306) host_1( host_1:3306): Resetting slave info succeeded.Master failover to host_1( host_1:3306) completed successfully.Fri Nov 10 11:16:31 2017 - [info] Sending mail..### 最后一步很重要如果dead master之后又活过来了,那么这一步要做dead_master> /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -chttp://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==》 tgw章节详细描述
结论: 由于master 已挂,然而最后的日志没有传递到其他服务器,所以会丢失master没有传递过来的事务日志
好在,slave和etl之间会互相change master,所以尽管slave(candidate master)的日志落后,最终也还是用etl的日志补齐了slave缺失的日志。测试省略,和2.2.1基本一样
结论:由于master上的所有日志全部传递到etl,所以最后是不会丢失master上任何数据的。
测试省略,和2.2.1基本一样
结论: 由于master 已挂,然而最后的日志没有传递到其他服务器,所以会丢失master没有传递过来的事务日志
好在,slave和etl之间会互相change master,所以尽管slave(candidate master)的日志落后,最终也还是用etl的日志补齐了slave缺失的日志。测试省略,和2.2.1基本一样
结论:由于master上的所有日志全部传递到slave,所以最后是不会丢失master上任何数据的。
同1.4结论
同1.4结论
dead_master上的最后部分日志没有传递到slave和etl的情况, 然而slave的日志也落后etl (这是最严苛的情况)
2.2.1 测试的就是这种情况,详细日志切换请看2.2.1
结论: 由于binlog server配置了3台,但是由于master server已经挂掉,无法从master的binlog server上获取日志,所以会丢失master上没有传递的日志事务
### 3台DB的gtid 状态* master host_1dba:lc> show master status;+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| host_1.000058 | 8517 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-60,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+1 row in set (0.00 sec)* slave host_2 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Auto_Position: 1* etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:51-55,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446389-446392 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-55,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Auto_Position: 1### 模拟故障master> iptables -A INPUT -p tcp -s other_host --dport 22 -j ACCEPTmaster> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP### 故障切换masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_errorFri Nov 10 14:15:51 2017 - [info] MHA::MasterFailover version 0.56.Fri Nov 10 14:15:51 2017 - [info] Starting master failover.Fri Nov 10 14:15:51 2017 - [info]Fri Nov 10 14:15:51 2017 - [info] * Phase 1: Configuration Check Phase..Fri Nov 10 14:15:51 2017 - [info]Fri Nov 10 14:16:41 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.Fri Nov 10 14:16:41 2017 - [warning] Failed to SSH to binlog server host_1Fri Nov 10 14:16:41 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln239] Binlog Server is defined but there is no alive server.Fri Nov 10 14:16:41 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/share/perl5/vendor_perl/MHA/MasterFailover.pm line 2082
结论: binlog server 必须要配置一个活的 server,如果只配置master,如果master挂了,那么就等于一个都没有,MHA不会切换
### 3台DB的gtid 状态* master host_1dba:lc> show master status;+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+| host_1.000058 | 8517 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-60,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 |+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+1 row in set (0.00 sec)* slave host_2 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Auto_Position: 1* etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:51-55,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446389-446392 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-55,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Auto_Position: 1### 模拟故障master> iptables -A INPUT -p tcp -s other_host --dport 22 -j ACCEPTmaster> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP### 故障切换Fri Nov 10 14:29:50 2017 - [info] MHA::MasterFailover version 0.56.Fri Nov 10 14:29:50 2017 - [info] Starting master failover.Fri Nov 10 14:29:50 2017 - [info]Fri Nov 10 14:29:50 2017 - [info] * Phase 1: Configuration Check Phase..Fri Nov 10 14:29:50 2017 - [info]Fri Nov 10 14:29:50 2017 - [info] HealthCheck: SSH to host_2 is reachable.Fri Nov 10 14:29:50 2017 - [info] Binlog server host_2 is reachable.Fri Nov 10 14:29:50 2017 - [info] HealthCheck: SSH to host_3 is reachable.Fri Nov 10 14:29:50 2017 - [info] Binlog server host_3 is reachable.Fri Nov 10 14:29:50 2017 - [warning] SQL Thread is stopped(no error) on host_2( host_2:3306)Fri Nov 10 14:29:50 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306)Fri Nov 10 14:29:50 2017 - [info] GTID failover mode = 1Fri Nov 10 14:29:50 2017 - [info] Dead Servers:Fri Nov 10 14:29:50 2017 - [info] host_1( host_1:3306)Fri Nov 10 14:29:50 2017 - [info] Checking master reachability via MySQL(double check)...Fri Nov 10 14:29:51 2017 - [info] ok.Fri Nov 10 14:29:51 2017 - [info] Alive Servers:Fri Nov 10 14:29:51 2017 - [info] host_2( host_2:3306)Fri Nov 10 14:29:51 2017 - [info] host_3( host_3:3306)Fri Nov 10 14:29:51 2017 - [info] Alive Slaves:Fri Nov 10 14:29:51 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 14:29:51 2017 - [info] GTID ONFri Nov 10 14:29:51 2017 - [info] Replicating from host_1( host_1:3306)Fri Nov 10 14:29:51 2017 - [info] Primary candidate for the new Master (candidate_master is set)Fri Nov 10 14:29:51 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 14:29:51 2017 - [info] GTID ONFri Nov 10 14:29:51 2017 - [info] Replicating from host_1( host_1:3306)Fri Nov 10 14:29:51 2017 - [info] Not candidate for the new Master (no_master is set)Fri Nov 10 14:29:51 2017 - [info] Starting SQL thread on host_2( host_2:3306) ..Fri Nov 10 14:29:51 2017 - [info] done.Fri Nov 10 14:29:51 2017 - [info] Starting SQL thread on host_3( host_3:3306) ..Fri Nov 10 14:29:52 2017 - [info] done.Fri Nov 10 14:29:52 2017 - [info] Starting GTID based failover.Fri Nov 10 14:29:52 2017 - [info]Fri Nov 10 14:29:52 2017 - [info] ** Phase 1: Configuration Check Phase completed.Fri Nov 10 14:29:52 2017 - [info]Fri Nov 10 14:29:52 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Fri Nov 10 14:29:52 2017 - [info]Fri Nov 10 14:30:42 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.Fri Nov 10 14:30:42 2017 - [info] Forcing shutdown so that applications never connect to the current master..Fri Nov 10 14:30:42 2017 - [info] Executing master IP deactivation script:Fri Nov 10 14:30:42 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --command=stopssh: connect to host host_1 port 22: Connection timed out=================== swift vip : tgw_vip from host_1 is deleted ==============================--2017-11-10 14:30:49-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi正在连接 tgw_server:80... 已连接。已发出 HTTP 请求,正在等待回应... 200 OK长度:未指定 [text/html]正在保存至: “STDOUT” 0K 12.1M=0s2017-11-10 14:32:47 (12.1 MB/s) - 已写入标准输出 [38]Fri Nov 10 14:32:47 2017 - [info] done.Fri Nov 10 14:32:47 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Fri Nov 10 14:32:47 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Fri Nov 10 14:32:47 2017 - [info]Fri Nov 10 14:32:47 2017 - [info] * Phase 3: Master Recovery Phase..Fri Nov 10 14:32:47 2017 - [info]Fri Nov 10 14:32:47 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Fri Nov 10 14:32:47 2017 - [info]Fri Nov 10 14:32:47 2017 - [info] The latest binary log file/position on all slaves is host_1.000058:6912Fri Nov 10 14:32:47 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:51-55,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446389-446392Fri Nov 10 14:32:47 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Fri Nov 10 14:32:47 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 14:32:47 2017 - [info] GTID ONFri Nov 10 14:32:47 2017 - [info] Replicating from host_1( host_1:3306)Fri Nov 10 14:32:47 2017 - [info] Not candidate for the new Master (no_master is set)Fri Nov 10 14:32:47 2017 - [info] The oldest binary log file/position on all slaves is host_1.000058:5307Fri Nov 10 14:32:47 2017 - [info] Oldest slaves:Fri Nov 10 14:32:47 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 14:32:47 2017 - [info] GTID ONFri Nov 10 14:32:47 2017 - [info] Replicating from host_1( host_1:3306)Fri Nov 10 14:32:47 2017 - [info] Primary candidate for the new Master (candidate_master is set)Fri Nov 10 14:32:47 2017 - [info]Fri Nov 10 14:32:47 2017 - [info] * Phase 3.3: Determining New Master Phase..Fri Nov 10 14:32:47 2017 - [info]Fri Nov 10 14:32:47 2017 - [info] Searching new master from slaves..Fri Nov 10 14:32:47 2017 - [info] Candidate masters from the configuration file:Fri Nov 10 14:32:47 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 14:32:47 2017 - [info] GTID ONFri Nov 10 14:32:47 2017 - [info] Replicating from host_1( host_1:3306)Fri Nov 10 14:32:47 2017 - [info] Primary candidate for the new Master (candidate_master is set)Fri Nov 10 14:32:47 2017 - [info] Non-candidate masters:Fri Nov 10 14:32:47 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabledFri Nov 10 14:32:47 2017 - [info] GTID ONFri Nov 10 14:32:47 2017 - [info] Replicating from host_1( host_1:3306)Fri Nov 10 14:32:47 2017 - [info] Not candidate for the new Master (no_master is set)Fri Nov 10 14:32:47 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..Fri Nov 10 14:32:47 2017 - [info] Not found.Fri Nov 10 14:32:47 2017 - [info] Searching from all candidate_master slaves..Fri Nov 10 14:32:47 2017 - [info] New master is host_2( host_2:3306)Fri Nov 10 14:32:47 2017 - [info] Starting master failover..Fri Nov 10 14:32:47 2017 - [info]From: host_1( host_1:3306) (current master) +-- host_2( host_2:3306) +-- host_3( host_3:3306)To: host_2( host_2:3306) (new master) +-- host_3( host_3:3306)Fri Nov 10 14:32:47 2017 - [info]Fri Nov 10 14:32:47 2017 - [info] * Phase 3.3: New Master Recovery Phase..Fri Nov 10 14:32:47 2017 - [info]Fri Nov 10 14:32:47 2017 - [info] Waiting all logs to be applied..Fri Nov 10 14:32:47 2017 - [info] done.Fri Nov 10 14:32:47 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply..Fri Nov 10 14:32:47 2017 - [info] Waiting all logs to be applied on the latest slave..Fri Nov 10 14:32:47 2017 - [info] Resetting slave host_2( host_2:3306) and starting replication from the new master host_3( host_3:3306)..Fri Nov 10 14:32:47 2017 - [info] Executed CHANGE MASTER.Fri Nov 10 14:32:48 2017 - [info] Slave started.Fri Nov 10 14:32:48 2017 - [info] Waiting to execute all relay logs on host_2( host_2:3306)..Fri Nov 10 14:32:48 2017 - [info] master_pos_wait( host_3.000049:42954) completed on host_2( host_2:3306). Executed 0 events.Fri Nov 10 14:32:48 2017 - [info] done.Fri Nov 10 14:32:48 2017 - [info] done.Fri Nov 10 14:32:48 2017 - [info] -- Saving binlog from host host_2 started, pid: 76664Fri Nov 10 14:32:48 2017 - [info] -- Saving binlog from host host_3 started, pid: 76665Fri Nov 10 14:32:48 2017 - [info]Fri Nov 10 14:32:48 2017 - [info] Log messages from host_2 ...Fri Nov 10 14:32:48 2017 - [info]Fri Nov 10 14:32:48 2017 - [info] Fetching binary logs from binlog server host_2..Fri Nov 10 14:32:48 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000058 --start_pos=6912 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171110142950.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binFailed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66Fri Nov 10 14:32:48 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Fri Nov 10 14:32:48 2017 - [info] End of log messages from host_2.Fri Nov 10 14:32:48 2017 - [warning] Got error from host_2.Fri Nov 10 14:32:48 2017 - [info]Fri Nov 10 14:32:48 2017 - [info] Log messages from host_3 ...Fri Nov 10 14:32:48 2017 - [info]Fri Nov 10 14:32:48 2017 - [info] Fetching binary logs from binlog server host_3..Fri Nov 10 14:32:48 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000058 --start_pos=6912 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171110142950.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.binFailed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66Fri Nov 10 14:32:48 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?Fri Nov 10 14:32:48 2017 - [info] End of log messages from host_3.Fri Nov 10 14:32:48 2017 - [warning] Got error from host_3.Fri Nov 10 14:32:48 2017 - [info] Getting new master's binlog name and position..Fri Nov 10 14:32:48 2017 - [info] host_2.000008:6895Fri Nov 10 14:32:48 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Fri Nov 10 14:32:48 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2.000008, 6895, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-55,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392Fri Nov 10 14:32:48 2017 - [info] Executing master IP activate script:Fri Nov 10 14:32:48 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --new_master_host= host_2 --new_master_ip= host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'Unknown option: new_master_userUnknown option: new_master_password=================== swift vip : tgw_vip to host_2 is added ==============================Fri Nov 10 14:32:51 2017 - [info] OK.Fri Nov 10 14:32:51 2017 - [info] ** Finished master recovery successfully.Fri Nov 10 14:32:51 2017 - [info] * Phase 3: Master Recovery Phase completed.Fri Nov 10 14:32:51 2017 - [info]Fri Nov 10 14:32:51 2017 - [info] * Phase 4: Slaves Recovery Phase..Fri Nov 10 14:32:51 2017 - [info]Fri Nov 10 14:32:51 2017 - [info]Fri Nov 10 14:32:51 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Fri Nov 10 14:32:51 2017 - [info]Fri Nov 10 14:32:51 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 80398. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171110142950.log if it takes time..Fri Nov 10 14:32:52 2017 - [info]Fri Nov 10 14:32:52 2017 - [info] Log messages from host_3 ...Fri Nov 10 14:32:52 2017 - [info]Fri Nov 10 14:32:51 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_2( host_2:3306)..Fri Nov 10 14:32:51 2017 - [info] Executed CHANGE MASTER.Fri Nov 10 14:32:52 2017 - [info] Slave started.Fri Nov 10 14:32:52 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-55,ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392) completed on host_3( host_3:3306). Executed 0 events.Fri Nov 10 14:32:52 2017 - [info] End of log messages from host_3.Fri Nov 10 14:32:52 2017 - [info] -- Slave on host host_3( host_3:3306) started.Fri Nov 10 14:32:52 2017 - [info] All new slave servers recovered successfully.Fri Nov 10 14:32:52 2017 - [info]Fri Nov 10 14:32:52 2017 - [info] * Phase 5: New master cleanup phase..Fri Nov 10 14:32:52 2017 - [info]Fri Nov 10 14:32:52 2017 - [info] Resetting slave info on the new master..Fri Nov 10 14:32:52 2017 - [info] host_2: Resetting slave info succeeded.Fri Nov 10 14:32:52 2017 - [info] Master failover to host_2( host_2:3306) completed successfully.Fri Nov 10 14:32:52 2017 - [info]----- Failover Report -----bak_mha_test: MySQL Master failover host_1( host_1:3306) to host_2( host_2:3306) succeededMaster host_1( host_1:3306) is down!Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details.Started automated(non-interactive) failover.Invalidated master IP address on host_1( host_1:3306)Selected host_2( host_2:3306) as a new master. host_2( host_2:3306): OK: Applying all logs succeeded. host_2( host_2:3306): OK: Activated master IP address. host_3( host_3:3306): OK: Slave started, replicating from host_2( host_2:3306) host_2( host_2:3306): Resetting slave info succeeded.Master failover to host_2( host_2:3306) completed successfully.Fri Nov 10 14:32:52 2017 - [info] Sending mail..
结论: binlog server 配置成多台slave,这是正确的方案。 由于master 挂了,master没有传递过来的binlog会丢失,这是没办法的. 好在,其余slave自动补齐现有日志
会切换成功,由于master 挂了,master没有传递过来的binlog会丢失好在,其余slave自动补齐现有日志
同1.6结论
MHA + GTID 模式,重点配置和用法如下:
1. commandmasterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error2. binlog server在配置文件中对 master,slave,etl 都写在binlog server中。对MySQL down 和 DB server down 综合考虑下,建议这样配置。3. tgw 清理dead master 如果还可以起来,那么必须在上面执行: /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c原因可参看:http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==> TGW 章节
* Phase 1: Configuration Check Phase.. HealthCheck: SSH N台DB是否reachable Binlog server: N台DB 是否reachable GTID failover mode = ? Dead Servers is ? Primary candidate for the new Master (candidate_master is set) ?* Phase 2: Dead Master Shutdown Phase.. Executing master IP deactivation script: TGW-vip delete操作 shutdown_script: ?* Phase 3: Master Recovery Phase.. * Phase 3.1: Getting Latest Slaves Phase.. Latest slaves ,file position ? Oldest slaves , file position ? * Phase 3.3: Determining New Master Phase.. 选择哪个slave为new master * Phase 3.3: New Master Recovery Phase.. Replicating from the latest slave and waiting to apply.. --让new master change master 到 latest slave Waiting all logs to be applied on the latest slave.. --让new master跟 latest slave的日志保持一致 Saving binlog from binlog server。。。 --根据配置的binlog server,开始生成latest slave和dead master之间的diff日志 Applying differential binlog --apply这些差异日志到new master,让new master执行完所有缺失的日志 Getting new master's binlog name and position.. --获取new master现在的binlog file pos All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST, MASTER_PORT=3306, MASTER_AUTO_POSITION=1 Executing master IP activate script: --TGW-vip 激活操作,并且设置readonly=0* Phase 4: Slaves Recovery Phase.. (并行操作) Resetting slave and starting replication from the new master* Phase 5: New master cleanup phase.. Resetting slave info on the new master.
转载地址:http://fqvpa.baihongyu.com/