mysql master HA solution: one master, one candidate master, and multiple slaves configuration

环境：

三台percona server：210, 211, 227。

210为M——master，211为M2——candidate master ，227为S——slave。

整体架构为：

M(读写) — M2(只读)
|
S(只读)

实际配置上，M2与S是两个平行的slave（从机）。

唯一区别：在master HA的配置中，配置M2为candidate master。当master宕机时，优先将M2升级为新master。

一、配master

可以参考主从配置的blog：http://bananalighter.com/mysql-master-slaver-replication/

dump出数据供slave建库使用。

mysqldump -uroot -p test > test.dmp

1	mysqldump -uroot -p test > test.dmp

二、配candidate master和slave(注意只读)

1.candidate master（211机器，已经在主从模式中配置过my.cnf，不赘述）

mysql>change master to master_host='172.19.17.210', master_user='rep',master_password='rep',master_log_file='mysql-bin.000001',master_log_pos=0;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
mysql>set global read_only=1;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like '%read_only%';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_read_only | OFF |
| read_only | ON |
| tx_read_only | OFF |
+------------------+-------+
3 rows in set (0.00 sec)

mysql>change master to master_host='172.19.17.210', master_user='rep',master_password='rep',master_log_file='mysql-bin.000001',master_log_pos=0;

Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql>set global read_only=1;

Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%read_only%';

+------------------+-------+

| Variable_name | Value |

+------------------+-------+

| innodb_read_only | OFF |

| read_only | ON |

| tx_read_only | OFF |

+------------------+-------+

3 rows in set (0.00 sec)

2.slave

（1）创建数据库

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)

mysql> create database test;

Query OK, 1 row affected (0.00 sec)

mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| mysql |

| performance_schema |

| test |

+--------------------+

4 rows in set (0.00 sec)

（2）导入前述步骤导出的数据库数据。

use test;
source /app/test.dmp

1 2	use test; source /app/test.dmp

（3）配置my.cnf

[root@testdb3 ~]# vim /etc/my.cnf 
#添加以下内容
explicit_defaults_for_timestamp=true
log_bin = mysql-bin
server_id = 2
relay_log = /var/lib/mysql/mysql-relay-bin
log_slave_updates = 1
read_only = 1
innodb_buffer_pool_size = 128M
join_buffer_size = 128M
sort_buffer_size = 2M
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

[root@testdb3 ~]# vim /etc/my.cnf

#添加以下内容

explicit_defaults_for_timestamp=true

log_bin = mysql-bin

server_id = 2

relay_log = /var/lib/mysql/mysql-relay-bin

log_slave_updates = 1

read_only = 1

innodb_buffer_pool_size = 128M

join_buffer_size = 128M

sort_buffer_size = 2M

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

重启数据库

[root@testdb3 ~]# service mysql restart
Shutting down MySQL (Percona Server)..[  OK  ]
Starting MySQL (Percona Server).....[  OK  ]

[root@testdb3 ~]# service mysql restart

Shutting down MySQL (Percona Server)..[ OK ]

Starting MySQL (Percona Server).....[ OK ]

（4）新建同步用户，配权限及master

grant replication slave,replication client on *.* to rep@'172.19.17.%' identified by 'rep';

grant all on test.* to rep@'172.19.17.%';

commit;

change master to master_host='172.19.17.210', master_user='rep',master_password='rep',master_log_file='mysql-bin.000024',master_log_pos=0;

start slave;

grant replication slave,replication client on *.* to rep@'172.19.17.%' identified by 'rep';

grant all on test.* to rep@'172.19.17.%';

commit;

change master to master_host='172.19.17.210', master_user='rep',master_password='rep',master_log_file='mysql-bin.000024',master_log_pos=0;

start slave;

（5）在这里有个问题，困扰我一下午。

show slave status\G报错：

Last_Error: Error ‘Duplicate entry ‘7’ for key ‘PRIMARY” on query. Default database: ‘test’. Query: ‘insert into test (id,value) values (7,77)’

我理解，这是由于slave同步到了不够新的relay log，有些重做的sql数据其实已经dump出来了，就不能再插一次。思路是清一清master的log。

不过下文的哥们通过让slave跳过这些sql绕开了这些问题。

4、当检查slave数据库服务器状体出现如下错误时，
Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'dnslog'. Query: 'insert into logop (logtime,tbname, sqlstr) VALUES('2011-10-25 07:00:00','logcount04','C/x??}?
根据提示可以知道是由于重复插入数据导致错误。先stop slave然后使用SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;再执行start slave后最后再通过show slave status\G;查看slave从库状态。
5、如果还是不能解决问题则需要修复表，先stop slave然后repair table logop;再执行start slave后最后通过show slave status\G;查看slave从库状态。

4、当检查slave数据库服务器状体出现如下错误时，

Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'dnslog'. Query: 'insert into logop (logtime,tbname, sqlstr) VALUES('2011-10-25 07:00:00','logcount04','C/x??}?

根据提示可以知道是由于重复插入数据导致错误。先stop slave然后使用SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;再执行start slave后最后再通过show slave status\G;查看slave从库状态。

5、如果还是不能解决问题则需要修复表，先stop slave然后repair table logop;再执行start slave后最后通过show slave status\G;查看slave从库状态。

参考：http://blog.sina.com.cn/s/blog_502c8cc401010p4b.html

注：只读的概念参见文章：http://bananalighter.com/mysql-variables-read-only/

（6）三个服务器的同步测试

在master插入一条数据，两个从机同步更新，不赘述。

三、配置master HA

（1）编辑master HA 配置文件/etc/masterha/test_MMS.cnf

[root@testdb1 masterha]# vim /etc/masterha/test_MMS.cnf 

[server default]
# mysql user and password
user=root
password=root
# working directory on the manager
manager_workdir=/masterha/test_MMS
# manager log file
manager_log=/masterha/test_MMS/test_MMS.log
# working directory on MySQL servers
remote_workdir=/masterha/test_MMS

# accounts
ssh_user=root
repl_user=rep
repl_password=rep
ping_interval=1

[server1]
hostname=testdb1
[server2]
hostname=testdb2
candidate_master=1
[server3]
hostname=testdb3
no_master=1

[root@testdb1 masterha]# vim /etc/masterha/test_MMS.cnf

[server default]

# mysql user and password

user=root

password=root

# working directory on the manager

manager_workdir=/masterha/test_MMS

# manager log file

manager_log=/masterha/test_MMS/test_MMS.log

# working directory on MySQL servers

remote_workdir=/masterha/test_MMS

# accounts

ssh_user=root

repl_user=rep

repl_password=rep

ping_interval=1

[server1]

hostname=testdb1

[server2]

hostname=testdb2

candidate_master=1

[server3]

hostname=testdb3

no_master=1

注：

candidate_master
You might use different kinds of machines between slaves, and want to promote the most reliable machine to the new master (i.e. promoting a RAID1+0 slave rather than RAID0 slaves).
By setting candidate_master to 1, the server is prioritized to the new master, as long as it meets conditions to be the new master (i.e. binary log is enabled, it does not delay replication significantly, etc). So candidate_master=1 does not mean that the specified host always becomes new master when the current master crashes, but is helpful to set priority.
If you set candidate_master=1 on multiple servers, priority is decided by sort order by block name ([server_xxx]). [server_1] will have higher priority than [server_2].
no_master
By setting no_master=1 on the target server, the server never becomes the new master. This is useful if you have some servers that should not become the new master. For example, you may want to set no_master=1 when you run slaves on unreliable (RAID0) machine, or when you run a slave at a remote data center. Note that if none of the slaves can be new master, MHA aborts and does not start monitoring/failover.

（2）配各个机器的hosts文件和ssh免密登陆

（3）检查并运行masterHA

[root@testdb1 masterha]# masterha_check_ssh --conf=/etc/masterha/test_MMS.cnf 
Thu Oct 30 17:04:57 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
/etc/masterha/test_MMS.cnf:
 at /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm line 148
[root@testdb1 masterha]# masterha_manager --conf=/etc/masterha/test_MMS.cnf
Thu Oct 30 17:44:58 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Thu Oct 30 17:44:58 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..
Thu Oct 30 17:44:58 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..

[root@testdb1 masterha]# masterha_check_ssh --conf=/etc/masterha/test_MMS.cnf

Thu Oct 30 17:04:57 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..

/etc/masterha/test_MMS.cnf:

at /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm line 148

[root@testdb1 masterha]# masterha_manager --conf=/etc/masterha/test_MMS.cnf

Thu Oct 30 17:44:58 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..

Thu Oct 30 17:44:58 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..

Thu Oct 30 17:44:58 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..

各种mysql同步的错误和状态解释： http://www.jb51.net/article/27221.htm

卧槽，MHA就没生效啊！啊啊啊！尼玛

反复检查了好多遍，费了两天时间，终于知道为啥了。

Mon Nov  3 14:30:54 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. Can't exec "/script/masterha/master_ip_failover": No such file or directory at /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm line 68.
Mon Nov  3 14:30:54 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Nov  3 14:30:54 2014 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

Mon Nov 3 14:30:54 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. Can't exec "/script/masterha/master_ip_failover": No such file or directory at /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm line 68.

Mon Nov 3 14:30:54 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.

Mon Nov 3 14:30:54 2014 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

原因是在masterha的默认配置文件中添加了master_ip_failover_script的配置（抄的官网），但是并没有编写对应的脚本。所以调用出错。太脑残了。

注释掉相关配置就好啦。

[root@testdb1 /]# vim /etc/masterha_default.cnf 

[server default]
user=root
password=root
ssh_user=root
master_binlog_dir= /var/lib/mysql
remote_workdir=/data/log/masterha
secondary_check_script= masterha_secondary_check -s remote_host1 -s remote_host2
ping_interval=3
#master_ip_failover_script=/script/masterha/master_ip_failover
#shutdown_script= /script/masterha/power_manager
report_script= /script/masterha/send_master_failover_mail

[root@testdb1 /]# vim /etc/masterha_default.cnf

[server default]

user=root

password=root

ssh_user=root

master_binlog_dir= /var/lib/mysql

remote_workdir=/data/log/masterha

secondary_check_script= masterha_secondary_check -s remote_host1 -s remote_host2

ping_interval=3

#master_ip_failover_script=/script/masterha/master_ip_failover

#shutdown_script= /script/masterha/power_manager

report_script= /script/masterha/send_master_failover_mail

以下是测试，还特么的有错：

 Testing mysqlbinlog output..mysqlbinlog: File '/var/lib/mysql/mysql-relay-bin.062756' not found (Errcode: 2 - No such file or directory)
mysqlbinlog failed with rc 1:0!
 at /usr/bin/apply_diff_relay_logs line 380
 main::check() called at /usr/bin/apply_diff_relay_logs line 486
 eval {...} called at /usr/bin/apply_diff_relay_logs line 466
 main::main() called at /usr/bin/apply_diff_relay_logs line 112
Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln195] Slaves settings check failed!
Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln375] Slave configuration failed.
Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48
Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Tue Nov 4 10:08:11 2014 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

Testing mysqlbinlog output..mysqlbinlog: File '/var/lib/mysql/mysql-relay-bin.062756' not found (Errcode: 2 - No such file or directory)

mysqlbinlog failed with rc 1:0!

at /usr/bin/apply_diff_relay_logs line 380

main::check() called at /usr/bin/apply_diff_relay_logs line 486

eval {...} called at /usr/bin/apply_diff_relay_logs line 466

main::main() called at /usr/bin/apply_diff_relay_logs line 112

Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln195] Slaves settings check failed!

Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln375] Slave configuration failed.

Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48

Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.

Tue Nov 4 10:08:11 2014 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

检查slave机的mysql日志：

2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown: 
2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345
2014-11-04 10:26:48 7540 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown: 
2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345
2014-11-04 10:26:48 7540 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown: 
2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345
2014-11-04 10:26:48 7540 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown: 
2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry,

2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown:

2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345

2014-11-04 10:26:48 7540 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.

2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown:

2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345

2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown:

2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345

2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown:

2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry,

被这个问题刷屏了。搜索发现大部分人报告的问题是server_id相同导致的master端识别slave混乱。但我的是不同的。

再深度搜索，终于被一文章救命了：http://www.2cto.com/database/201405/305269.html

还是master对slave的识别混乱问题，不过这次搞相同了的是server_uuid。

查看如下。

mysql> SHOW GLOBAL VARIABLES LIKE 'server_uuid';
+---------------+--------------------------------------+
| Variable_name | Value                                |
+---------------+--------------------------------------+
| server_uuid   | 7b05a533-10b8-11e4-b539-005056935dca |
+---------------+--------------------------------------+
1 row in set (0.03 sec)

mysql> SHOW GLOBAL VARIABLES LIKE 'server_uuid';

+---------------+--------------------------------------+

| Variable_name | Value |

+---------------+--------------------------------------+

| server_uuid | 7b05a533-10b8-11e4-b539-005056935dca |

+---------------+--------------------------------------+

1 row in set (0.03 sec)

修复手段：修改/mysql_data_dir/auto.cnf

[root@testdb3 ~]# vim /app/mysql/auto.cnf 

[auto]
server-uuid=7b05a533-10b8-11e4-b539-005056935dcb

[root@testdb3 ~]# vim /app/mysql/auto.cnf

[auto]

server-uuid=7b05a533-10b8-11e4-b539-005056935dcb

重启数据库，重启slave。

[root@testdb1 ~]# masterha_manager --conf=/etc/masterha/test_MMS.cnf  &
[2] 13174
[root@testdb1 ~]# Tue Nov  4 13:21:16 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Tue Nov  4 13:21:16 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..
Tue Nov  4 13:21:16 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..

[root@testdb1 ~]# masterha_check_status --conf=/etc/masterha/test_MMS.cnf 
test_MMS (pid:13174) is running(0:PING_OK), master:testdb1

[root@testdb1 ~]# masterha_manager --conf=/etc/masterha/test_MMS.cnf &

[2] 13174

[root@testdb1 ~]# Tue Nov 4 13:21:16 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..

Tue Nov 4 13:21:16 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..

Tue Nov 4 13:21:16 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..

[root@testdb1 ~]# masterha_check_status --conf=/etc/masterha/test_MMS.cnf

test_MMS (pid:13174) is running(0:PING_OK), master:testdb1

done！！！

终于ok了，从部署好至此过了3天。神坑爹。

四、failover测试

场景——M跪，M2起，S挂到M2

以下是各种输入输出：

M（210）：

[root@testdb1 ~]# service mysql stop
Shutting down MySQL (Percona Server).....[  OK  ]

1 2	[root@testdb1 ~]# service mysql stop Shutting down MySQL (Percona Server).....[ OK ]

M2（211）：

mysql> show slave status\G
Empty set (0.00 sec)

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000005 |      120 | test         |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> show slave status\G

Empty set (0.00 sec)

mysql> show master status;

+------------------+----------+--------------+------------------+-------------------+

+------------------+----------+--------------+------------------+-------------------+

+------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

S（227）：

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: *.211
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 120
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: test
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 120
              Relay_Log_Space: 456
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 211
                  Master_UUID: 7b05a533-10b8-11e4-b539-005056935dca
             Master_Info_File: /app/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: *.211

Master_User: rep

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: mysql-bin.000005

Read_Master_Log_Pos: 120

Relay_Log_File: mysql-relay-bin.000002

Relay_Log_Pos: 283

Relay_Master_Log_File: mysql-bin.000005

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB: test

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 120

Relay_Log_Space: 456

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 211

Master_UUID: 7b05a533-10b8-11e4-b539-005056935dca

Master_Info_File: /app/mysql/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:

Executed_Gtid_Set:

Auto_Position: 0

1 row in set (0.00 sec)

明显master切到了211。

五、复原

将M先设为M2 的slave

mysql> reset master;
Query OK, 0 rows affected (0.18 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.05 sec)

mysql> change master to master_host='172.19.17.211', master_user='rep',master_password='rep',master_log_file='mysql-bin.000005',master_log_pos=120;

start slave;

Query OK, 0 rows affected, 2 warnings (0.26 sec)


mysql> show slave status\G
*************************** 1. row ***************************
 Slave_IO_State: Waiting for master to send event
 Master_Host: *.211
 Master_User: rep
 Master_Port: 3306
 Connect_Retry: 60
 Master_Log_File: mysql-bin.000005
 Read_Master_Log_Pos: 120
 Relay_Log_File: mysqld-relay-bin.000002
 Relay_Log_Pos: 283
 Relay_Master_Log_File: mysql-bin.000005
 Slave_IO_Running: Yes
 Slave_SQL_Running: Yes
 Replicate_Do_DB: test
 Replicate_Ignore_DB: 
 Replicate_Do_Table: 
 Replicate_Ignore_Table: 
 Replicate_Wild_Do_Table: 
 Replicate_Wild_Ignore_Table: 
 Last_Errno: 0
 Last_Error: 
 Skip_Counter: 0
 Exec_Master_Log_Pos: 120
 Relay_Log_Space: 457
 Until_Condition: None
 Until_Log_File: 
 Until_Log_Pos: 0
 Master_SSL_Allowed: No
 Master_SSL_CA_File: 
 Master_SSL_CA_Path: 
 Master_SSL_Cert: 
 Master_SSL_Cipher: 
 Master_SSL_Key: 
 Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
 Last_IO_Errno: 0
 Last_IO_Error: 
 Last_SQL_Errno: 0
 Last_SQL_Error: 
 Replicate_Ignore_Server_Ids: 
 Master_Server_Id: 211
 Master_UUID: 7b05a533-10b8-11e4-b539-005056935dca
 Master_Info_File: /app/mysql/master.info
 SQL_Delay: 0
 SQL_Remaining_Delay: NULL
 Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
 Master_Retry_Count: 86400
 Master_Bind: 
 Last_IO_Error_Timestamp: 
 Last_SQL_Error_Timestamp: 
 Master_SSL_Crl: 
 Master_SSL_Crlpath: 
 Retrieved_Gtid_Set: 
 Executed_Gtid_Set: 
 Auto_Position: 0
1 row in set (0.00 sec)

mysql> quit
Bye

mysql> reset master;

Query OK, 0 rows affected (0.18 sec)

mysql> reset slave;

Query OK, 0 rows affected (0.05 sec)

mysql> change master to master_host='172.19.17.211', master_user='rep',master_password='rep',master_log_file='mysql-bin.000005',master_log_pos=120;

start slave;

Query OK, 0 rows affected, 2 warnings (0.26 sec)

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: *.211

Master_User: rep

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: mysql-bin.000005

Read_Master_Log_Pos: 120

Relay_Log_File: mysqld-relay-bin.000002

Relay_Log_Pos: 283

Relay_Master_Log_File: mysql-bin.000005

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB: test

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 120

Relay_Log_Space: 457

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 211

Master_UUID: 7b05a533-10b8-11e4-b539-005056935dca

Master_Info_File: /app/mysql/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:

Executed_Gtid_Set:

Auto_Position: 0

1 row in set (0.00 sec)

mysql> quit

Bye

使用masterHA自带的切换工具masterha_master_switch

[root@testdb1 ~]#  masterha_master_switch --master_state=alive --global_conf=/etc/masterha_default.cnf --conf=/etc/masterha/test_MMS.cnf
Tue Nov  4 14:17:06 2014 - [info] MHA::MasterRotate version 0.55.
Tue Nov  4 14:17:06 2014 - [info] Starting online master switch..
Tue Nov  4 14:17:06 2014 - [info] 
Tue Nov  4 14:17:06 2014 - [info] * Phase 1: Configuration Check Phase..
Tue Nov  4 14:17:06 2014 - [info] 
Tue Nov  4 14:17:06 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Tue Nov  4 14:17:06 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..
Tue Nov  4 14:17:06 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..
Tue Nov  4 14:17:06 2014 - [info] Current Alive Master: testdb2(*.211:3306)
Tue Nov  4 14:17:06 2014 - [info] Alive Slaves:
Tue Nov  4 14:17:06 2014 - [info]   testdb1(*.210:3306)  Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled
Tue Nov  4 14:17:06 2014 - [info]     Replicating from *.211(*.211:3306)
Tue Nov  4 14:17:06 2014 - [info]   testdb3(*.227:3306)  Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled
Tue Nov  4 14:17:06 2014 - [info]     Replicating from *.211(*.211:3306)
Tue Nov  4 14:17:06 2014 - [info]     Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on testdb2(*.211:3306)? (YES/no): yes
Tue Nov  4 14:17:22 2014 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Nov  4 14:17:22 2014 - [info]  ok.
Tue Nov  4 14:17:22 2014 - [info] Checking MHA is not monitoring or doing failover..
Tue Nov  4 14:17:22 2014 - [info] Checking replication health on testdb1..
Tue Nov  4 14:17:22 2014 - [info]  ok.
Tue Nov  4 14:17:22 2014 - [info] Checking replication health on testdb3..
Tue Nov  4 14:17:22 2014 - [info]  ok.
Tue Nov  4 14:17:22 2014 - [info] Searching new master from slaves..
Tue Nov  4 14:17:22 2014 - [info]  Candidate masters from the configuration file:
Tue Nov  4 14:17:22 2014 - [info]   testdb2(*.211:3306)  Version=5.6.19-67.0-log log-bin:enabled
Tue Nov  4 14:17:22 2014 - [info]  Non-candidate masters:
Tue Nov  4 14:17:22 2014 - [info]   testdb3(*.227:3306)  Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled
Tue Nov  4 14:17:22 2014 - [info]     Replicating from *.211(*.211:3306)
Tue Nov  4 14:17:22 2014 - [info]     Not candidate for the new Master (no_master is set)
Tue Nov  4 14:17:22 2014 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Nov  4 14:17:22 2014 - [info]   Not found.
Tue Nov  4 14:17:22 2014 - [info]  Searching from all candidate_master slaves..
Tue Nov  4 14:17:22 2014 - [info]   Not found.
Tue Nov  4 14:17:22 2014 - [info]  Searching from all slaves which have received the latest relay log events..
Tue Nov  4 14:17:22 2014 - [info] 
From:
testdb2 (current master)
 +--testdb1
 +--testdb3

To:
testdb1 (new master)
 +--testdb3

Starting master switch from testdb2(*.211:3306) to testdb1(*.210:3306)? (yes/NO): yes
Tue Nov  4 14:17:29 2014 - [info] Checking whether testdb1(*.210:3306) is ok for the new master..
Tue Nov  4 14:17:29 2014 - [info]  ok.
Tue Nov  4 14:17:29 2014 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Nov  4 14:17:29 2014 - [info] 
Tue Nov  4 14:17:29 2014 - [info] * Phase 2: Rejecting updates Phase..
Tue Nov  4 14:17:29 2014 - [info] 
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Tue Nov  4 14:17:42 2014 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Tue Nov  4 14:17:42 2014 - [info] Executing FLUSH TABLES WITH READ LOCK..
Tue Nov  4 14:17:42 2014 - [info]  ok.
Tue Nov  4 14:17:42 2014 - [info] Orig master binlog:pos is mysql-bin.000005:120.
Tue Nov  4 14:17:42 2014 - [info]  Waiting to execute all relay logs on testdb1(*.210:3306)..
Tue Nov  4 14:17:42 2014 - [info]  master_pos_wait(mysql-bin.000005:120) completed on testdb1(*.210:3306). Executed 0 events.
Tue Nov  4 14:17:42 2014 - [info]   done.
Tue Nov  4 14:17:42 2014 - [info] Getting new master's binlog name and position..
Tue Nov  4 14:17:42 2014 - [info]  mysql-bin.000001:120
Tue Nov  4 14:17:42 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='testdb1 or *.210', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Tue Nov  4 14:17:42 2014 - [info] 
Tue Nov  4 14:17:42 2014 - [info] * Switching slaves in parallel..
Tue Nov  4 14:17:42 2014 - [info] 
Tue Nov  4 14:17:42 2014 - [info] -- Slave switch on host testdb3(*.227:3306) started, pid: 21833
Tue Nov  4 14:17:42 2014 - [info] 
Tue Nov  4 14:17:43 2014 - [info] Log messages from testdb3 ...
Tue Nov  4 14:17:43 2014 - [info] 
Tue Nov  4 14:17:42 2014 - [info]  Waiting to execute all relay logs on testdb3(*.227:3306)..
Tue Nov  4 14:17:42 2014 - [info]  master_pos_wait(mysql-bin.000005:120) completed on testdb3(*.227:3306). Executed 0 events.
Tue Nov  4 14:17:42 2014 - [info]   done.
Tue Nov  4 14:17:42 2014 - [info]  Resetting slave testdb3(*.227:3306) and starting replication from the new master testdb1(*.210:3306)..
Tue Nov  4 14:17:43 2014 - [info]  Executed CHANGE MASTER.
Tue Nov  4 14:17:43 2014 - [info]  Slave started.
Tue Nov  4 14:17:43 2014 - [info] End of log messages from testdb3 ...
Tue Nov  4 14:17:43 2014 - [info] 
Tue Nov  4 14:17:43 2014 - [info] -- Slave switch on host testdb3(*.227:3306) succeeded.
Tue Nov  4 14:17:43 2014 - [info] Unlocking all tables on the orig master:
Tue Nov  4 14:17:43 2014 - [info] Executing UNLOCK TABLES..
Tue Nov  4 14:17:43 2014 - [info]  ok.
Tue Nov  4 14:17:43 2014 - [info] All new slave servers switched successfully.
Tue Nov  4 14:17:43 2014 - [info] 
Tue Nov  4 14:17:43 2014 - [info] * Phase 5: New master cleanup phase..
Tue Nov  4 14:17:43 2014 - [info] 
Tue Nov  4 14:17:43 2014 - [info]  testdb1: Resetting slave info succeeded.
Tue Nov  4 14:17:43 2014 - [info] Switching master to testdb1(*.210:3306) completed successfully.

[root@testdb1 ~]# masterha_master_switch --master_state=alive --global_conf=/etc/masterha_default.cnf --conf=/etc/masterha/test_MMS.cnf

Tue Nov 4 14:17:06 2014 - [info] MHA::MasterRotate version 0.55.

Tue Nov 4 14:17:06 2014 - [info] Starting online master switch..

Tue Nov 4 14:17:06 2014 - [info]

Tue Nov 4 14:17:06 2014 - [info] * Phase 1: Configuration Check Phase..

Tue Nov 4 14:17:06 2014 - [info]

Tue Nov 4 14:17:06 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..

Tue Nov 4 14:17:06 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..

Tue Nov 4 14:17:06 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..

Tue Nov 4 14:17:06 2014 - [info] Current Alive Master: testdb2(*.211:3306)

Tue Nov 4 14:17:06 2014 - [info] Alive Slaves:

Tue Nov 4 14:17:06 2014 - [info] testdb1(*.210:3306) Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled

Tue Nov 4 14:17:06 2014 - [info] Replicating from *.211(*.211:3306)

Tue Nov 4 14:17:06 2014 - [info] testdb3(*.227:3306) Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled

Tue Nov 4 14:17:06 2014 - [info] Replicating from *.211(*.211:3306)

Tue Nov 4 14:17:06 2014 - [info] Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on testdb2(*.211:3306)? (YES/no): yes

Tue Nov 4 14:17:22 2014 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..

Tue Nov 4 14:17:22 2014 - [info] ok.

Tue Nov 4 14:17:22 2014 - [info] Checking MHA is not monitoring or doing failover..

Tue Nov 4 14:17:22 2014 - [info] Checking replication health on testdb1..

Tue Nov 4 14:17:22 2014 - [info] ok.

Tue Nov 4 14:17:22 2014 - [info] Checking replication health on testdb3..

Tue Nov 4 14:17:22 2014 - [info] ok.

Tue Nov 4 14:17:22 2014 - [info] Searching new master from slaves..

Tue Nov 4 14:17:22 2014 - [info] Candidate masters from the configuration file:

Tue Nov 4 14:17:22 2014 - [info] testdb2(*.211:3306) Version=5.6.19-67.0-log log-bin:enabled

Tue Nov 4 14:17:22 2014 - [info] Non-candidate masters:

Tue Nov 4 14:17:22 2014 - [info] testdb3(*.227:3306) Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled

Tue Nov 4 14:17:22 2014 - [info] Replicating from *.211(*.211:3306)

Tue Nov 4 14:17:22 2014 - [info] Not candidate for the new Master (no_master is set)

Tue Nov 4 14:17:22 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events..

Tue Nov 4 14:17:22 2014 - [info] Not found.

Tue Nov 4 14:17:22 2014 - [info] Searching from all candidate_master slaves..

Tue Nov 4 14:17:22 2014 - [info] Not found.

Tue Nov 4 14:17:22 2014 - [info] Searching from all slaves which have received the latest relay log events..

Tue Nov 4 14:17:22 2014 - [info]

From:

testdb2 (current master)

+--testdb1

+--testdb3

To:

testdb1 (new master)

+--testdb3

Starting master switch from testdb2(*.211:3306) to testdb1(*.210:3306)? (yes/NO): yes

Tue Nov 4 14:17:29 2014 - [info] Checking whether testdb1(*.210:3306) is ok for the new master..

Tue Nov 4 14:17:29 2014 - [info] ok.

Tue Nov 4 14:17:29 2014 - [info] ** Phase 1: Configuration Check Phase completed.

Tue Nov 4 14:17:29 2014 - [info]

Tue Nov 4 14:17:29 2014 - [info] * Phase 2: Rejecting updates Phase..

Tue Nov 4 14:17:29 2014 - [info]

master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes

Tue Nov 4 14:17:42 2014 - [info] Locking all tables on the orig master to reject updates from everybody (including root):

Tue Nov 4 14:17:42 2014 - [info] Executing FLUSH TABLES WITH READ LOCK..

Tue Nov 4 14:17:42 2014 - [info] ok.

Tue Nov 4 14:17:42 2014 - [info] Orig master binlog:pos is mysql-bin.000005:120.

Tue Nov 4 14:17:42 2014 - [info] Waiting to execute all relay logs on testdb1(*.210:3306)..

Tue Nov 4 14:17:42 2014 - [info] master_pos_wait(mysql-bin.000005:120) completed on testdb1(*.210:3306). Executed 0 events.

Tue Nov 4 14:17:42 2014 - [info] done.

Tue Nov 4 14:17:42 2014 - [info] Getting new master's binlog name and position..

Tue Nov 4 14:17:42 2014 - [info] mysql-bin.000001:120

Tue Nov 4 14:17:42 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='testdb1 or *.210', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='xxx';

Tue Nov 4 14:17:42 2014 - [info]

Tue Nov 4 14:17:42 2014 - [info] * Switching slaves in parallel..

Tue Nov 4 14:17:42 2014 - [info]

Tue Nov 4 14:17:42 2014 - [info] -- Slave switch on host testdb3(*.227:3306) started, pid: 21833

Tue Nov 4 14:17:42 2014 - [info]

Tue Nov 4 14:17:43 2014 - [info] Log messages from testdb3 ...

Tue Nov 4 14:17:43 2014 - [info]

Tue Nov 4 14:17:42 2014 - [info] Waiting to execute all relay logs on testdb3(*.227:3306)..

Tue Nov 4 14:17:42 2014 - [info] master_pos_wait(mysql-bin.000005:120) completed on testdb3(*.227:3306). Executed 0 events.

Tue Nov 4 14:17:42 2014 - [info] done.

Tue Nov 4 14:17:42 2014 - [info] Resetting slave testdb3(*.227:3306) and starting replication from the new master testdb1(*.210:3306)..

Tue Nov 4 14:17:43 2014 - [info] Executed CHANGE MASTER.

Tue Nov 4 14:17:43 2014 - [info] Slave started.

Tue Nov 4 14:17:43 2014 - [info] End of log messages from testdb3 ...

Tue Nov 4 14:17:43 2014 - [info]

Tue Nov 4 14:17:43 2014 - [info] -- Slave switch on host testdb3(*.227:3306) succeeded.

Tue Nov 4 14:17:43 2014 - [info] Unlocking all tables on the orig master:

Tue Nov 4 14:17:43 2014 - [info] Executing UNLOCK TABLES..

Tue Nov 4 14:17:43 2014 - [info] ok.

Tue Nov 4 14:17:43 2014 - [info] All new slave servers switched successfully.

Tue Nov 4 14:17:43 2014 - [info]

Tue Nov 4 14:17:43 2014 - [info] * Phase 5: New master cleanup phase..

Tue Nov 4 14:17:43 2014 - [info]

Tue Nov 4 14:17:43 2014 - [info] testdb1: Resetting slave info succeeded.

Tue Nov 4 14:17:43 2014 - [info] Switching master to testdb1(*.210:3306) completed successfully.

注：这里

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on testdb2(*.211:3306)? (YES/no): yes

1	It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on testdb2(*.211:3306)? (YES/no): yes

对于生产系统还是要先把数据刷到盘里的，这里测试就不费劲了。

mysql master HA solution: one master, one candidate master, and multiple slaves configuration

Comments (4)

Leave a Comment 取消回复