欢迎投稿

今日深度:

故障案例:slave延迟很大,故障案例slave延迟

故障案例:slave延迟很大,故障案例slave延迟


案例1:lvm机型,从库创建完成后,主库qps 2w,从库6k多。从某个时间点开始延迟在缓慢增加,一直涨到7w多秒才发现去处理;从库io的util很高

故障原因:查看配置发现这个从库开启了log_slave_updates,一直在产生binlog,当把这个参数禁用或者设置sync_binlog=0以后,util立马就降下来了,同步延迟也就慢慢变小了直到为0,此前已经发现多次使用lvm逻辑卷管理或者SSD的机器的udb只要开启了这个sync_binlog=1,好像磁盘util都很高,也没有深究为什么。


案例2:slave延迟一直在增大,从库和主库的qps很低,io都很低,从库cpu 100%

故障原因:因为从库执行SQL是单线程的,所以只能利用一个CPU的资源,当cpu使用率到100%,整个库都卡住了,不管什么操作都很慢;发现从库上只执行一个简单的主键更新操作,所以很奇怪为什么主键更新还这么慢

     ID: 6933
   USER: system user
   HOST: 
     DB: gangao
COMMAND: Connect
   TIME: 457156
  STATE: Updating
   INFO: UPDATE`STK_DAILYQUOTEFA`SET`ID`=0x3631323538393739303330,`SECUCODE`=0x3237353233,`TRADINGDAY`='2003-03-18 00:00:00',`TRADINGSTATE`=0x31,`PREVCLOSINGPRICEBA`=0x312E363238,`OPENINGPRICEBA`=0x312E36323633,`HIGHESTPRICEBA`=0x312E36333332,`LOWESTPRICEBA`=0x312E36303734,`CLOSINGPRICEBA`=0x312E363136,`ENTRYTIME`='2015-07-24 15:41:13',`UPDATETIME`='2015-07-24 15:41:13',`GROUNDTIME`='2015-07-24 15:41:13',`UPDATEID`=0x3631323538393739303330,`RESOURCEID`=0x43616C63,`RECORDID`=NULL WHERE`ID`=0x3631323538393739303330


后来发现这里的ID是0x3631323538393739303330十六机制表示的字符串


普通10进制int值时的查询计划

mysql> explain select * from gangao.STK_DAILYQUOTEFA where id = 60737669922;
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table            | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | STK_DAILYQUOTEFA | const | PRIMARY       | PRIMARY | 8       | const |    1 |       |
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (2.84 sec)

注意这里转过去其实是字符,而问题就是出在这里
mysql> select hex('60737669922');
+------------------------+
| hex('60737669922')     |
+------------------------+
| 3630373337363639393232 |
+------------------------+
1 row in set (0.00 sec)

mysql> select 0x3630373337363639393232;
+--------------------------+
| 0x3630373337363639393232 |
+--------------------------+
| 60737669922              |
+--------------------------+
1 row in set (0.00 sec)

这里再解析执行计划,发现找不到该记录
mysql> explain select * from gangao.STK_DAILYQUOTEFA where id = 0x3630373337363639393232;
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                               |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
1 row in set (3.10 sec)

实际上记录是存在的
mysql> select * from gangao.STK_DAILYQUOTEFA where id = 60737669922;
+-------------+----------+---------------------+--------------+--------------------+----------------+----------------+---------------+----------------+---------------------+---------------------+---------------------+-------------+------------+----------+
| ID          | SECUCODE | TRADINGDAY          | TRADINGSTATE | PREVCLOSINGPRICEBA | OPENINGPRICEBA | HIGHESTPRICEBA | LOWESTPRICEBA | CLOSINGPRICEBA | ENTRYTIME           | UPDATETIME          | GROUNDTIME          | UPDATEID    | RESOURCEID | RECORDID |
+-------------+----------+---------------------+--------------+--------------------+----------------+----------------+---------------+----------------+---------------------+---------------------+---------------------+-------------+------------+----------+
| 60737669922 |    20041 | 2009-06-05 00:00:00 | 1            |            33.9315 |        33.8687 |        35.8897 |       33.7851 |        35.2277 | 2015-07-13 15:40:01 | 2015-07-13 15:40:01 | 2015-07-13 15:40:01 | 60737669922 | Calc       | NULL     |
+-------------+----------+---------------------+--------------+--------------------+----------------+----------------+---------------+----------------+---------------------+---------------------+---------------------+-------------+------------+----------+
1 row in set (1.94 sec)
但是转成这16进制表示的字符形式就出不来结果了

mysql> select * from gangao.STK_DAILYQUOTEFA where id = 0x3630373337363639393232;
Empty set (0.40 sec)


而这个数值真正的16进制数值应该是

mysql> select hex(60737669922);
+------------------+
| hex(60737669922) |
+------------------+
| E243F4B22        |
+------------------+
1 row in set (0.00 sec)

这里需要加上0才行

mysql> select 0xE243F4B22+0;
+---------------+
| 0xE243F4B22+0 |
+---------------+
|   60737669922 |
+---------------+
1 row in set (0.00 sec)

而这时再解析

mysql> explain select * from gangao.STK_DAILYQUOTEFA where id = 0xE243F4B22+0;
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table            | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | STK_DAILYQUOTEFA | const | PRIMARY       | PRIMARY | 8       | const |    1 |       |
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (2.83 sec)

mysql> select * from gangao.STK_DAILYQUOTEFA where id = 0xE243F4B22+0;
+-------------+----------+---------------------+--------------+--------------------+----------------+----------------+---------------+----------------+---------------------+---------------------+---------------------+-------------+------------+----------+
| ID          | SECUCODE | TRADINGDAY          | TRADINGSTATE | PREVCLOSINGPRICEBA | OPENINGPRICEBA | HIGHESTPRICEBA | LOWESTPRICEBA | CLOSINGPRICEBA | ENTRYTIME           | UPDATETIME          | GROUNDTIME          | UPDATEID    | RESOURCEID | RECORDID |
+-------------+----------+---------------------+--------------+--------------------+----------------+----------------+---------------+----------------+---------------------+---------------------+---------------------+-------------+------------+----------+
| 60737669922 |    20041 | 2009-06-05 00:00:00 | 1            |            33.9315 |        33.8687 |        35.8897 |       33.7851 |        35.2277 | 2015-07-13 15:40:01 | 2015-07-13 15:40:01 | 2015-07-13 15:40:01 | 60737669922 | Calc       | NULL     |
+-------------+----------+---------------------+--------------+--------------------+----------------+----------------+---------------+----------------+---------------------+---------------------+---------------------+-------------+------------+----------+
1 row in set (4.25 sec)

所以问题就出在客户将10进制整数123456转成字符串'123456'的16进制形式处理了,导致了索引根本没用上,查询缓慢,甚至一直在做无用功,根本不存在这些值。


案例3:主库上只有write操作,主库的io比较低,从库io很高,主库不存在峰值。

原因分析:后来对比主从的配置,发现主库的innodb_flush_log_at_trx_commit设置为2,而从库上innodb_flush_log_at_trx_commit是1,后来将从库也调为2或者0就没问题,这个参数具体的含义不多说,下图是没调整前,主从的io对比




版权声明:本文为博主原创文章,未经博主允许不得转载。

www.htsjk.Com true http://www.htsjk.com/shujukunews/9705.html NewsArticle 故障案例:slave延迟很大,故障案例slave延迟 案例1:lvm机型,从库创建完成后,主库qps 2w,从库6k多。从某个时间点开始延迟在缓慢增加,一直涨到7w多秒才发现去处理;从库io的util很高...
评论暂时关闭