Monthly Archive: 十月 2014

mysql master HA solution: one master, one candidate master, and multiple slaves configuration

环境:

三台percona server:210, 211, 227。

210为M——master,211为M2——candidate master ,227为S——slave。

整体架构为:

M(读写)  —  M2(只读)
|
S(只读)

实际配置上,M2与S是两个平行的slave(从机)。

唯一区别:在master HA的配置中,配置M2为candidate master。当master宕机时,优先将M2升级为新master。

 

一、配master

可以参考主从配置的blog:http://bananalighter.com/mysql-master-slaver-replication/

dump出数据供slave建库使用。

mysqldump -uroot -p test > test.dmp

二、配candidate master和slave(注意只读)

1.candidate master(211机器,已经在主从模式中配置过my.cnf,不赘述)

mysql>change master to master_host='172.19.17.210', master_user='rep',master_password='rep',master_log_file='mysql-bin.000001',master_log_pos=0;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
mysql>set global read_only=1;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like '%read_only%';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_read_only | OFF |
| read_only | ON |
| tx_read_only | OFF |
+------------------+-------+
3 rows in set (0.00 sec)

 

2.slave

(1)创建数据库

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)

(2)导入前述步骤导出的数据库数据。

use test;
source /app/test.dmp

 

(3)配置my.cnf

[root@testdb3 ~]# vim /etc/my.cnf 
#添加以下内容
explicit_defaults_for_timestamp=true
log_bin = mysql-bin
server_id = 2
relay_log = /var/lib/mysql/mysql-relay-bin
log_slave_updates = 1
read_only = 1
innodb_buffer_pool_size = 128M
join_buffer_size = 128M
sort_buffer_size = 2M
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

重启数据库

[root@testdb3 ~]# service mysql restart
Shutting down MySQL (Percona Server)..[  OK  ]
Starting MySQL (Percona Server).....[  OK  ]

(4)新建同步用户,配权限及master

grant replication slave,replication client on *.* to rep@'172.19.17.%' identified by 'rep';

grant all on test.* to rep@'172.19.17.%';

commit;

change master to master_host='172.19.17.210', master_user='rep',master_password='rep',master_log_file='mysql-bin.000024',master_log_pos=0;

start slave;

(5)在这里有个问题,困扰我一下午。

show slave status\G报错:

Last_Error: Error ‘Duplicate entry ‘7’ for key ‘PRIMARY” on query. Default database: ‘test’. Query: ‘insert into test (id,value) values (7,77)’

我理解,这是由于slave同步到了不够新的relay log,有些重做的sql数据其实已经dump出来了,就不能再插一次。思路是清一清master的log。

不过下文的哥们通过让slave跳过这些sql绕开了这些问题。

4、当检查slave数据库服务器状体出现如下错误时,
Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'dnslog'. Query: 'insert into logop (logtime,tbname, sqlstr) VALUES('2011-10-25 07:00:00','logcount04','C/x??}?
根据提示可以知道是由于重复插入数据导致错误。先stop slave然后使用SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;再执行start slave后最后再通过show slave status\G;查看slave从库状态。
5、如果还是不能解决问题则需要修复表,先stop slave然后repair table logop;再执行start slave后最后通过show slave status\G;查看slave从库状态。

参考:http://blog.sina.com.cn/s/blog_502c8cc401010p4b.html

注:只读的概念参见文章:http://bananalighter.com/mysql-variables-read-only/

(6)三个服务器的同步测试

在master插入一条数据,两个从机同步更新,不赘述。

三、配置master HA

(1)编辑master HA 配置文件/etc/masterha/test_MMS.cnf

[root@testdb1 masterha]# vim /etc/masterha/test_MMS.cnf 

[server default]
# mysql user and password
user=root
password=root
# working directory on the manager
manager_workdir=/masterha/test_MMS
# manager log file
manager_log=/masterha/test_MMS/test_MMS.log
# working directory on MySQL servers
remote_workdir=/masterha/test_MMS

# accounts
ssh_user=root
repl_user=rep
repl_password=rep
ping_interval=1

[server1]
hostname=testdb1
[server2]
hostname=testdb2
candidate_master=1
[server3]
hostname=testdb3
no_master=1

注:

candidate_master
You might use different kinds of machines between slaves, and want to promote the most reliable machine to the new master (i.e. promoting a RAID1+0 slave rather than RAID0 slaves).
By setting candidate_master to 1, the server is prioritized to the new master, as long as it meets conditions to be the new master (i.e. binary log is enabled, it does not delay replication significantly, etc). So candidate_master=1 does not mean that the specified host always becomes new master when the current master crashes, but is helpful to set priority.
If you set candidate_master=1 on multiple servers, priority is decided by sort order by block name ([server_xxx]). [server_1] will have higher priority than [server_2].
no_master
By setting no_master=1 on the target server, the server never becomes the new master. This is useful if you have some servers that should not become the new master. For example, you may want to set no_master=1 when you run slaves on unreliable (RAID0) machine, or when you run a slave at a remote data center. Note that if none of the slaves can be new master, MHA aborts and does not start monitoring/failover.

(2)配各个机器的hosts文件和ssh免密登陆

(3)检查并运行masterHA

[root@testdb1 masterha]# masterha_check_ssh --conf=/etc/masterha/test_MMS.cnf 
Thu Oct 30 17:04:57 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
/etc/masterha/test_MMS.cnf:
 at /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm line 148
[root@testdb1 masterha]# masterha_manager --conf=/etc/masterha/test_MMS.cnf
Thu Oct 30 17:44:58 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Thu Oct 30 17:44:58 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..
Thu Oct 30 17:44:58 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..

各种mysql同步的错误和状态解释: http://www.jb51.net/article/27221.htm

卧槽,MHA就没生效啊! 啊啊啊!尼玛

反复检查了好多遍,费了两天时间,终于知道为啥了。

Mon Nov  3 14:30:54 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. Can't exec "/script/masterha/master_ip_failover": No such file or directory at /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm line 68.
Mon Nov  3 14:30:54 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Nov  3 14:30:54 2014 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

原因是在masterha的默认配置文件中添加了master_ip_failover_script的配置(抄的官网),但是并没有编写对应的脚本。所以调用出错。太脑残了。

注释掉相关配置就好啦。

[root@testdb1 /]# vim /etc/masterha_default.cnf 

[server default]
user=root
password=root
ssh_user=root
master_binlog_dir= /var/lib/mysql
remote_workdir=/data/log/masterha
secondary_check_script= masterha_secondary_check -s remote_host1 -s remote_host2
ping_interval=3
#master_ip_failover_script=/script/masterha/master_ip_failover
#shutdown_script= /script/masterha/power_manager
report_script= /script/masterha/send_master_failover_mail

以下是测试,还特么的有错:

 Testing mysqlbinlog output..mysqlbinlog: File '/var/lib/mysql/mysql-relay-bin.062756' not found (Errcode: 2 - No such file or directory)
mysqlbinlog failed with rc 1:0!
 at /usr/bin/apply_diff_relay_logs line 380
 main::check() called at /usr/bin/apply_diff_relay_logs line 486
 eval {...} called at /usr/bin/apply_diff_relay_logs line 466
 main::main() called at /usr/bin/apply_diff_relay_logs line 112
Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln195] Slaves settings check failed!
Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln375] Slave configuration failed.
Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48
Tue Nov 4 10:08:11 2014 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Tue Nov 4 10:08:11 2014 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

检查slave机的mysql日志:

2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown: 
2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345
2014-11-04 10:26:48 7540 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown: 
2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345
2014-11-04 10:26:48 7540 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown: 
2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000004' at position 345
2014-11-04 10:26:48 7540 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-11-04 10:26:48 7540 [Note] Slave: received end packet from server, apparent master shutdown: 
2014-11-04 10:26:48 7540 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry,

被这个问题刷屏了。搜索发现大部分人报告的问题是server_id相同导致的master端识别slave混乱。但我的是不同的。

再深度搜索,终于被一文章救命了:http://www.2cto.com/database/201405/305269.html

还是master对slave的识别混乱问题,不过这次搞相同了的是server_uuid

查看如下。

mysql> SHOW GLOBAL VARIABLES LIKE 'server_uuid';
+---------------+--------------------------------------+
| Variable_name | Value                                |
+---------------+--------------------------------------+
| server_uuid   | 7b05a533-10b8-11e4-b539-005056935dca |
+---------------+--------------------------------------+
1 row in set (0.03 sec)

修复手段:修改/mysql_data_dir/auto.cnf

[root@testdb3 ~]# vim /app/mysql/auto.cnf 

[auto]
server-uuid=7b05a533-10b8-11e4-b539-005056935dcb

重启数据库,重启slave。

[root@testdb1 ~]# masterha_manager --conf=/etc/masterha/test_MMS.cnf  &
[2] 13174
[root@testdb1 ~]# Tue Nov  4 13:21:16 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Tue Nov  4 13:21:16 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..
Tue Nov  4 13:21:16 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..

[root@testdb1 ~]# masterha_check_status --conf=/etc/masterha/test_MMS.cnf 
test_MMS (pid:13174) is running(0:PING_OK), master:testdb1

done!!!

终于ok了,从部署好至此过了3天。神坑爹。

四、failover测试

场景——M跪,M2起,S挂到M2

以下是各种输入输出:

M(210):

[root@testdb1 ~]# service mysql stop
Shutting down MySQL (Percona Server).....[  OK  ]

M2(211):

mysql> show slave status\G
Empty set (0.00 sec)

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000005 |      120 | test         |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

S(227):

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: *.211
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 120
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 283
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: test
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 120
              Relay_Log_Space: 456
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 211
                  Master_UUID: 7b05a533-10b8-11e4-b539-005056935dca
             Master_Info_File: /app/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

明显master切到了211。

五、复原

将M先设为M2 的slave

mysql> reset master;
Query OK, 0 rows affected (0.18 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.05 sec)

mysql> change master to master_host='172.19.17.211', master_user='rep',master_password='rep',master_log_file='mysql-bin.000005',master_log_pos=120;

start slave;

Query OK, 0 rows affected, 2 warnings (0.26 sec)


mysql> show slave status\G
*************************** 1. row ***************************
 Slave_IO_State: Waiting for master to send event
 Master_Host: *.211
 Master_User: rep
 Master_Port: 3306
 Connect_Retry: 60
 Master_Log_File: mysql-bin.000005
 Read_Master_Log_Pos: 120
 Relay_Log_File: mysqld-relay-bin.000002
 Relay_Log_Pos: 283
 Relay_Master_Log_File: mysql-bin.000005
 Slave_IO_Running: Yes
 Slave_SQL_Running: Yes
 Replicate_Do_DB: test
 Replicate_Ignore_DB: 
 Replicate_Do_Table: 
 Replicate_Ignore_Table: 
 Replicate_Wild_Do_Table: 
 Replicate_Wild_Ignore_Table: 
 Last_Errno: 0
 Last_Error: 
 Skip_Counter: 0
 Exec_Master_Log_Pos: 120
 Relay_Log_Space: 457
 Until_Condition: None
 Until_Log_File: 
 Until_Log_Pos: 0
 Master_SSL_Allowed: No
 Master_SSL_CA_File: 
 Master_SSL_CA_Path: 
 Master_SSL_Cert: 
 Master_SSL_Cipher: 
 Master_SSL_Key: 
 Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
 Last_IO_Errno: 0
 Last_IO_Error: 
 Last_SQL_Errno: 0
 Last_SQL_Error: 
 Replicate_Ignore_Server_Ids: 
 Master_Server_Id: 211
 Master_UUID: 7b05a533-10b8-11e4-b539-005056935dca
 Master_Info_File: /app/mysql/master.info
 SQL_Delay: 0
 SQL_Remaining_Delay: NULL
 Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
 Master_Retry_Count: 86400
 Master_Bind: 
 Last_IO_Error_Timestamp: 
 Last_SQL_Error_Timestamp: 
 Master_SSL_Crl: 
 Master_SSL_Crlpath: 
 Retrieved_Gtid_Set: 
 Executed_Gtid_Set: 
 Auto_Position: 0
1 row in set (0.00 sec)

mysql> quit
Bye

使用masterHA自带的切换工具masterha_master_switch

[root@testdb1 ~]#  masterha_master_switch --master_state=alive --global_conf=/etc/masterha_default.cnf --conf=/etc/masterha/test_MMS.cnf
Tue Nov  4 14:17:06 2014 - [info] MHA::MasterRotate version 0.55.
Tue Nov  4 14:17:06 2014 - [info] Starting online master switch..
Tue Nov  4 14:17:06 2014 - [info] 
Tue Nov  4 14:17:06 2014 - [info] * Phase 1: Configuration Check Phase..
Tue Nov  4 14:17:06 2014 - [info] 
Tue Nov  4 14:17:06 2014 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Tue Nov  4 14:17:06 2014 - [info] Reading application default configurations from /etc/masterha/test_MMS.cnf..
Tue Nov  4 14:17:06 2014 - [info] Reading server configurations from /etc/masterha/test_MMS.cnf..
Tue Nov  4 14:17:06 2014 - [info] Current Alive Master: testdb2(*.211:3306)
Tue Nov  4 14:17:06 2014 - [info] Alive Slaves:
Tue Nov  4 14:17:06 2014 - [info]   testdb1(*.210:3306)  Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled
Tue Nov  4 14:17:06 2014 - [info]     Replicating from *.211(*.211:3306)
Tue Nov  4 14:17:06 2014 - [info]   testdb3(*.227:3306)  Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled
Tue Nov  4 14:17:06 2014 - [info]     Replicating from *.211(*.211:3306)
Tue Nov  4 14:17:06 2014 - [info]     Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on testdb2(*.211:3306)? (YES/no): yes
Tue Nov  4 14:17:22 2014 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Nov  4 14:17:22 2014 - [info]  ok.
Tue Nov  4 14:17:22 2014 - [info] Checking MHA is not monitoring or doing failover..
Tue Nov  4 14:17:22 2014 - [info] Checking replication health on testdb1..
Tue Nov  4 14:17:22 2014 - [info]  ok.
Tue Nov  4 14:17:22 2014 - [info] Checking replication health on testdb3..
Tue Nov  4 14:17:22 2014 - [info]  ok.
Tue Nov  4 14:17:22 2014 - [info] Searching new master from slaves..
Tue Nov  4 14:17:22 2014 - [info]  Candidate masters from the configuration file:
Tue Nov  4 14:17:22 2014 - [info]   testdb2(*.211:3306)  Version=5.6.19-67.0-log log-bin:enabled
Tue Nov  4 14:17:22 2014 - [info]  Non-candidate masters:
Tue Nov  4 14:17:22 2014 - [info]   testdb3(*.227:3306)  Version=5.6.19-67.0-log (oldest major version between slaves) log-bin:enabled
Tue Nov  4 14:17:22 2014 - [info]     Replicating from *.211(*.211:3306)
Tue Nov  4 14:17:22 2014 - [info]     Not candidate for the new Master (no_master is set)
Tue Nov  4 14:17:22 2014 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Nov  4 14:17:22 2014 - [info]   Not found.
Tue Nov  4 14:17:22 2014 - [info]  Searching from all candidate_master slaves..
Tue Nov  4 14:17:22 2014 - [info]   Not found.
Tue Nov  4 14:17:22 2014 - [info]  Searching from all slaves which have received the latest relay log events..
Tue Nov  4 14:17:22 2014 - [info] 
From:
testdb2 (current master)
 +--testdb1
 +--testdb3

To:
testdb1 (new master)
 +--testdb3

Starting master switch from testdb2(*.211:3306) to testdb1(*.210:3306)? (yes/NO): yes
Tue Nov  4 14:17:29 2014 - [info] Checking whether testdb1(*.210:3306) is ok for the new master..
Tue Nov  4 14:17:29 2014 - [info]  ok.
Tue Nov  4 14:17:29 2014 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Nov  4 14:17:29 2014 - [info] 
Tue Nov  4 14:17:29 2014 - [info] * Phase 2: Rejecting updates Phase..
Tue Nov  4 14:17:29 2014 - [info] 
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Tue Nov  4 14:17:42 2014 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Tue Nov  4 14:17:42 2014 - [info] Executing FLUSH TABLES WITH READ LOCK..
Tue Nov  4 14:17:42 2014 - [info]  ok.
Tue Nov  4 14:17:42 2014 - [info] Orig master binlog:pos is mysql-bin.000005:120.
Tue Nov  4 14:17:42 2014 - [info]  Waiting to execute all relay logs on testdb1(*.210:3306)..
Tue Nov  4 14:17:42 2014 - [info]  master_pos_wait(mysql-bin.000005:120) completed on testdb1(*.210:3306). Executed 0 events.
Tue Nov  4 14:17:42 2014 - [info]   done.
Tue Nov  4 14:17:42 2014 - [info] Getting new master's binlog name and position..
Tue Nov  4 14:17:42 2014 - [info]  mysql-bin.000001:120
Tue Nov  4 14:17:42 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='testdb1 or *.210', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='xxx';
Tue Nov  4 14:17:42 2014 - [info] 
Tue Nov  4 14:17:42 2014 - [info] * Switching slaves in parallel..
Tue Nov  4 14:17:42 2014 - [info] 
Tue Nov  4 14:17:42 2014 - [info] -- Slave switch on host testdb3(*.227:3306) started, pid: 21833
Tue Nov  4 14:17:42 2014 - [info] 
Tue Nov  4 14:17:43 2014 - [info] Log messages from testdb3 ...
Tue Nov  4 14:17:43 2014 - [info] 
Tue Nov  4 14:17:42 2014 - [info]  Waiting to execute all relay logs on testdb3(*.227:3306)..
Tue Nov  4 14:17:42 2014 - [info]  master_pos_wait(mysql-bin.000005:120) completed on testdb3(*.227:3306). Executed 0 events.
Tue Nov  4 14:17:42 2014 - [info]   done.
Tue Nov  4 14:17:42 2014 - [info]  Resetting slave testdb3(*.227:3306) and starting replication from the new master testdb1(*.210:3306)..
Tue Nov  4 14:17:43 2014 - [info]  Executed CHANGE MASTER.
Tue Nov  4 14:17:43 2014 - [info]  Slave started.
Tue Nov  4 14:17:43 2014 - [info] End of log messages from testdb3 ...
Tue Nov  4 14:17:43 2014 - [info] 
Tue Nov  4 14:17:43 2014 - [info] -- Slave switch on host testdb3(*.227:3306) succeeded.
Tue Nov  4 14:17:43 2014 - [info] Unlocking all tables on the orig master:
Tue Nov  4 14:17:43 2014 - [info] Executing UNLOCK TABLES..
Tue Nov  4 14:17:43 2014 - [info]  ok.
Tue Nov  4 14:17:43 2014 - [info] All new slave servers switched successfully.
Tue Nov  4 14:17:43 2014 - [info] 
Tue Nov  4 14:17:43 2014 - [info] * Phase 5: New master cleanup phase..
Tue Nov  4 14:17:43 2014 - [info] 
Tue Nov  4 14:17:43 2014 - [info]  testdb1: Resetting slave info succeeded.
Tue Nov  4 14:17:43 2014 - [info] Switching master to testdb1(*.210:3306) completed successfully.

注:这里

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on testdb2(*.211:3306)? (YES/no): yes

对于生产系统还是要先把数据刷到盘里的,这里测试就不费劲了。

 

mysql 只读配置的含义

【转载】:http://blog.csdn.net/cindy9902/article/details/5886355

  –read_only         Make all non-temporary tables read-only, with the
exception for replication (slave) threads and users with
the SUPER privilege

SUPER privilege :

The SUPER privilege enables an account to use CHANGE MASTER TO , KILL or mysqladmin kill to kill threads belonging to other accounts (you can always kill your own threads), PURGE BINARY LOGS , configuration changes using SET GLOBAL to modify global system variables, the mysqladmin debug command, enabling or disabling logging, performing updates even if the read_only system variable is enabled, starting and stopping replication on slave servers, specification of any account in the DEFINER attribute of stored programs and views, and enables you to connect (once) even if the connection limit controlled by the max_connections system variable is reached.

To create or alter stored routines if binary logging is enabled, you may also need the SUPER privilege, as described in Section 18.6, “Binary Logging of Stored Programs” .

 

read-only选项:对所有的非临时表进行只读控制。但是有两种情况例外:

1. 对replication threads例外,以保证slave能够正常的进行replication。

2. 对于拥有super权限的用户,可以ignore这个选项。

SUPER 权限 : 1. 可以有change master to, kill其他用户的线程的权限。

2. Purge binary logs 来删除binary log, set global来动态设置变量的权限。

3. 执行mysqladmin debug命令,开启或者关闭log,在read-only打开时执行update/insert操作。

4. 执行start slave, stop slave.

5. 当连接数已经达到max_connections的最大值时,也可以连接到server。

oracle 使用expdb导出数据

一、查询oracle目录对象对应的目录

查询 dba_directories,获取对象对应的目录信息。

SQL> desc dba_directories;
SQL> select DIRECTORY_NAME from dba_directories;

DIRECTORY_NAME
------------------------------
DMPDIR
DMP_GEJ
XMLDIR
DATA_PUMP_DIR
ORACLE_OCM_CONFIG_DIR

SQL> select DIRECTORY_PATH from dba_directories where DIRECTORY_NAME='DMPDIR';

DIRECTORY_PATH
--------------------------------------------------------------------------------
/exp

于是,dump的目录就是/exp

二、编辑dump脚本

$ cd /exp
$ vi expdp_20141030.sh

#插入以下内容
#登陆expdb软件
expdp system/username@dbname 
#标记需要导出的用户名
schemas=(hehe,heihei,haha) 
#导出的目录
directory=DMPDIR 
#导出的文件名
dumpfile=expdp_20141030.dmp
#到处操作的日志名 
logfile=expdp_20141030.log

 

三、执行并查看

 

 

 

【翻译】mysql master HA other solutions

参考:https://code.google.com/p/mysql-master-ha/wiki/Other_HA_Solutions#Other_Solutions_and_Issues

mysql master HA各种解决方案:

手动解决

mysql复制是异步或者半同步的。当master崩溃时,一些slave很有可能尚未接收到最新的relay log事件,这就可能导致slave之间可能处于不同的状态。手动解决一致性问题并不繁杂,但若不解决一致性问题,主从复制将无法启动。手动重启主从复制往往需要花费一小时的时间。

只有一个master并只挂一个slave

只有一个master和一个slave时,不会发生一些slave过时于某一slave的情况。当master崩溃时,只要使应用将所有流量发送到slave转成的新master上即可。故障切换很简单。

M(RW)
|                          –(master crash)–>             M(RW), promoted from S
S(R)

主(读写)
|                          –(master崩溃)–>               新主(读写), 由从机生格而来
从(读)

但是本节方案有严重的问题。

首先,无法水平扩展读流量。当在一个slave运行诸如备份、分析查询、脚本任务等重量级操作时可能带来性能问题。当这个唯一的slave宕机时,master需要处理slave的所有流量。

其次是可用性,当master宕机时,只剩新主机成为单一故障点。创建新slave需要作在线备份,将数据转储到新硬件并立即开启slave。这些操作通常需要花费几个小时的时间。一些关键应用不允许数据库成为单一故障点并持续几个小时的时间。在线备份数据库会造成额外的i/o负担,所以在使用高峰时期备份也是危险的。

第三个问题是缺乏扩展性。例如,如果想要在远程数据中心建立一个只读数据库,需要至少两个slave,一个位于本地数据中心,另一个slave位于远程数据中心。只有一个slave无法建立前述架构。

单一slave在许多场景中都无法满足要求。

 一个master一个候选master及多个slave

使用标题所述架构非常常见。在当前master宕机时,候选master具有较高优先级成为新的master。有些情况下候选master配置成只读master:譬如多master的配置。

 

M(RW)—–M2(R)                                            M(RW), promoted from M2
|                                                                           |
+—-+—-+                –(master crash)–>                  +-x–+–x-+
S(R) S2(R)                                                           S(?)      S(?)
(From which position should S restart replication?)

 

M(读写)—–M2(只读)                                            M(读写), 由 M2 升格成为新master
|                                                                           |
+—-+—-+                –(master 宕机)–>                   +-x–+–x-+
S(只读) S2(只读)                                                   S(?)      S(?)
(slave应该从哪一点开始复制?)

但是前述方案作为master的灾备方案不总是可行的。当master宕机时,其余salve可能没有接收到全部的relay log时间,因此还是需要解决salve之间的一致性问题。

既不能接受一致性问题,又想立即恢复服务怎么办:将候选master升格为新master,丢弃全部slave。然后通过在线备份新master建立新的slave。这种方案与上一节“单主单从”的问题一样,剩余的slave不能用于扩展读流量,也不能用于冗余。在恢复slave规模之前,将会产生故障单点问题。

这种架构使用非常广泛,但只有少数人能够全面了解前述的潜在问题:当master宕机时,其挂载的slave变得不一致;试图直接将slave挂到新master上开启复制也会失败。如果想要保证一致性,必须舍弃全部原有的slave。

还有一种架构:两个master,一个只读。每个master都至少挂一个slave。

M(RW)–M2(R)
|           |
S(R)    S2(R)

M(读写)  —  M2(只读)
|                |
S(只读)      S2(只读)

当当前master宕机时,至少有一个slave可以继续复制。但是这种方案使用不多,它最大的缺点是太复杂。三层复制(主主从,M->M2->S2)采用了这种架构,但是管理三层复制比较复杂。例如,如果中层的服务器(M2)宕机,第三层的slave(S2)无法继续复制。在许多情况下,需要重新配置M2和S2,因此这种架构至少需要4台机器(一台空白机用于出现问题时顶上)。

(待续)

 

Install oracle 11g client

一、创建用户组及用户

[root@template ~]# groupadd oinstall
[root@template ~]# groupadd dba
[root@template ~]# useradd -g oinstall -G dba oracle
[root@template ~]# passwd oracle

二、创建安装目录

[root@template ~]# mkdir /oracle
[root@template ~]# lvcreate -L 20G -n lv_oracle vg0
[root@template ~]# mkfs.ext4 /dev/vg0/lv_oracle 
[root@template ~]# vim /etc/fstab 
添加一行:
/dev/mapper/vg0-lv_oracle /                       ext4    defaults        1 1

[root@template ~]# mount /dev/vg0/lv_oracle /oracle
[root@template ~]# mkdir -p /oracle/product/11.2/db_1
[root@template ~]# chown -R oralce:oinstall /oracle

三、安装

直接运行解压后的runInstaller。

关于oracle 安装介质的说明:

http://bananalighter.com/oracle-installer-7-zips/

四、环境变量

[root@template ~]# vim /home/oracle/.bash_profile 
添加:
# User specific environment and startup programs
export ORACLE_BASE=/oracle
export ORACLE_HOME=$ORACLE_BASE/product/11.2/db_1

PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin

export PATH

五、TNS

mkdir -p $ORACLE_HOME/network/admin
cd $ORACLE_HOME/network/admin
vi tnsnames.ora 添加如下内容并且保存
 
ODS18 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = haha)
    )
  )

六、血泪教训

就不要像下面这样屌屌的自己选择自定义安装了,出各种奇怪的错。

QQ截图20141028154757

老老实实的打包安装,或者自己单独装以下三个包:

oracle-instantclient-basic-10.2.0.5-1.x86_64.rpm
oracle-instantclient-sqlplus-10.2.0.5-1.x86_64.rpm
oracle-instantclient-devel-10.2.0.5-1.x86_64.rpm

参考:http://blog.sina.com.cn/s/blog_4ed7040c01011xv6.html

 

oracle 安装介质 7 个包

Oracle11g有多张安装光盘:
文件名称                                                                           说明
p102025301120——Linux-x86-64_1of7.zip             database安装介质
p102025301120——Linux-x86-64_2of7.zip             database安装介质
p102025301120——Linux-x86-64_3of7.zip             grid安装介质
p102025301120——Linux-x86-64_4of7.zip             client安装介质
p102025301120——Linux-x86-64_5of7.zip             gateways安装介质
p102025301120——Linux-x86-64_6of7.zip             example
p102025301120——Linux-x86-64_7of7.zip             deinstall
参考:http://blog.itpub.net/28716724/viewspace-756073/

mysql replication related privileges

一、权限说明

在配置mysql replication时,有一步骤:

grant replication slave,replication client on *.* to rep@'192.168.46.%' identified by ‘***';

其中两个权限的含义如下。

REPLICATION CLIENT:

enables the use of SHOW MASTER STATUS and SHOW SLAVE STATUS.

该权限用于查看replication的状态。

REPLICATION SLAVE:

The REPLICATION SLAVE privilege should be granted to accounts that are used by slave servers to connect to the current server as their master. Without this privilege, the slave cannot request updates that have been made to databases on the master server.

该权限用于slave机连接master并进行复制。

一般来说,我们会单独在主服务器创建一个专门用于Replication的账户。这个账户必须具有REPLICATION SLAVE权限,除此之外没有必要添加不必要的权限,保证该用户的职责单一。假定我们要建立的这个账户为repl,密码为repl,那么这一操作的命令如下:
mysql> GRANT REPLICATION SLAVE ON *.* TO ‘repl’@’192.168.0.%’ IDENTIFIED BY ‘repl’;
其中要特别说明一下192.168.0.%,这个配置是指明repl用户所在服务器,这里%是通配符,表示192.168.0.0-192.168.0.255的Server都可以以repl用户登陆主服务器。如果没有使用通配符,而访问的服务器又不在上述配制里,那么你将无法使用该账户从你的服务器replicate主服务器.

二、实践tips

在《Hight Performance MySql》一书中对用户权限的设置有所不同,作者建议在主机和从机上都配置repl账户,并同时赋予REPLICATION SLAVE和REPLICATION CLIENT权限,命令如下:
mysql> GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO repl@’192.168.0.%’ IDENTIFIED BY ‘repl’;
作者解释了这样做的好处:一方面使用同一账户对Replication进行监视管理会很方便,不必区分slave,master,另一方面,repl账户在slave和master上的配制是一样的,这样如果我们切换slave和master,账户不需要做任何改动。

参考:http://blog.csdn.net/bluishglc/article/details/5744303

 

【翻译】CDH 的Cloudera Manager免费与收费版的对比表 = =

翻译:http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-enterprise/cloudera-manager/cloudera-manager-features.html

CDH 特性 免费版 付费版
Deployment, Configuration & Management 系统管理
Automated Deployment & Hadoop Readiness Checks 自动化部署及快速检查
Install the complete CDH stack in minutes and ensure optimal settings 安装完整的CDH及优化配置
Service Management 服务管理
Configure and manage all CDH services, including Impala and Search, from a central interface 提供统一的界面管理与配置全部的CDH服务,包括cloudera impala及cloudera search
Security Management 安全
Configure and manage security across the cluster – including Kerberos authentication and role-based (administrator and read-only) administration 跨群集的安全管理与配置(包括Kerberos认证及基于角色的管理)
Resource Management 资源管理
Allocate cluster resources by workload or by user/group/application to eliminate contention and ensure Quality-of-Service (QoS) 根据工作量分配资源,或根据/user/group/application文件消除争用,保证QoS
High Availability HA
Easily configure and manage High Availability for various services like HDFS, MapReduce, Oozie, YARN, HBase 为多种服务配置HA:HDFS,MapReduce,Oozie,YARN,Hbase
Client Configuration Management 管理客户端配置
Centrally configure all client access to the cluster 集中配置连接到群集的客户端
Node Templating 节点模板
Easily deploy and expand heterogeneous clusters by creating templates for node roles 通过为节点角色创造模板,来部署和扩展不同的群集
Comprehensive Workflows 全面的工作流
Perform end-to-end tasks such as start/stop/restart clusters, services and roles, add/delete hosts, decommission nodes etc. 执行端到端的任务,如群集、服务、角色级别的启停,增删主机,解除节点等。
Multi-Cluster Management 多群集管理
Manage multiple CDH clusters from a single instance of Cloudera Manager 一个Manager管理多个CDH群集
Monitor
Service, Host & Activity Monitoring 服务、主机、活动的监控
Get a consolidated, real-time view of the state of all services, hosts and activities running in the cluster 对服务、主机、活动的统一的实时的监控
Events & Alerts 事件和警报
Create, aggregate and receive alerts on relevant Hadoop events pertaining to system health, log messages, user actions and activities Set thresholds and create custom alerts for metrics collected by CM 创建、合计、接收Hadoop相关的系统健康、日志信息、用户动作和活动的警报。设置阈值并创建用户警报。
Diagnose
Global Time Control 全程控制
Correlate all views along a configurable timeline to simplify diagnosis 通过可配置的时间线串联所有视图,简化诊断。
Proactive Health Checks 健康预检
Monitor dozens of service performance metrics and get alerts you when you approach critical thresholds 监控服务性能,当达到阈值时向用户报警。
Heatmaps 热度图
Visualize health status and metrics across the cluster to quickly identify problem nodes and take action 图形化展示群集的健康状态,便于发现故障节点并修复。
Customizable Charts 可定制的图表
Report and visualize on key time-series metrics about services, roles and hosts 按照时间顺序提供服务、角色和主机的形象报告。
Intelligent Log Management 智能日志管理
Gather, view and search Hadoop logs collected from across the cluster 可以收集、观察和查询从群集中获得的Hadoop日志。
Integrate
Comprehensive API 广泛的API
Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools 可以简单的将CM与现有的企业范围的管理和监控工具集成起来。
3rd Party Application Management 对第三方应用的管理
Deploy, manage and monitor services for 3rd party applications running on the cluster (e.g. data integration tools, math/machine learning applications, non-CDH services etc.) 部署、管理和监控运行在群集上的第三方应用服务。
Advanced Management Features (Enabled by Subscription)
Operational Report & Quota Management 操作报告和配额管理
Visualize current and historical disk usage; set user and group-based quotas; and track MapReduce, Impala, YARN and HBase usage 1.当前及历史磁盘用量展示
2.基于用户和组的配额设置
3.跟踪MapReduce、Impala、YARN和Hbase的用量
Configuration History & Rollbacks 记录配置历史及回滚
Maintain a trail of all actions and a complete record of configuration changes, including the ability to roll back to previous states 保留所有活动及配置变化的痕迹档案,包含回滚到之前状态的能力。
Rolling Updates 滚动升级
Stage service updates and restarts to portions of the cluster sequentially to minimize downtime when upgrading or updating your cluster 分阶段升级和重启群集各部分,最小化宕机时间。
AD Kerberos Integration AD与Kerberos的集成
Integrate directly with Active Directory to get started easily with Kerberos 直接与AD集成,可以方便的与Kerberos一起工作
Kerberos Wizard Kerberos向导
Easily configure Kerberos and trigger automated workflows to secure clusters 方便配置Kerberos,可以自动触发工作流来保证群集安全。
Hadoop SSL Related Configs Hadoop SSL相关配置
Simplify configs and eliminates need for safety valves 简化配置并减少安全阀的需求
LDAP/SAML Integration LDAP/SAML的集成
Integrate user credentials with Active Directory and enable single sign-on (SSO) capabilities 集成了基于AD的用户验证,并提供了SSO能力。
SNMP Support 对SNMP的支持
Send Hadoop-specific events and alerts to global monitoring tools as SNMP traps 以SNMP 异常报告的方式向全局监控工具发送Hadoop特定的事件和告警。(参见文末 注1)
Scheduled Diagnostics cloudera技术诊断的支持
Take a snapshot of the cluster state and automatically send it to Cloudera support to assist with optimization and issue resolution 优化和解决问题时,收集群集状态快照并自动发送至cloudera支持。
Automated Backup & Disaster Recovery 自动化备份和灾难恢复
Centrally configure and manage snapshotting and replication workflows for HDFS, Hive and HBase 集中配置和管理快照,复制HDFS、Hive、HBase工作流。

 

注:

1.snmp traps:SNMP是指简单网络管理协议,trap是它规定的一种通信方式,用于被管理的设备主动向充当管理者的设备报告自己的异常信息。