案例如下:
在使用show innodb status检查引擎状态时,发现了死锁问题:
*** (1) transaction:
transaction 0 677833455, active 0 sec, process no 11393, os thread id 278546 starting index read
mysql tables in use 1, locked 1
lock wait 3 lock struct(s), heap size 320
mysql thread id 83, query id 162348740 dcnet03 dcnet searching rows for update
update tsk_task set status_id=1064,update_time=now () where status_id=1061 and mon_time*** (1) waiting for this lock to be granted:
record locks space id 0 page no 849384 n bits 208 index `primary` of table `dcnet_db/tsk_task` trx id 0 677833455 lock_mode x locks rec but not gap waiting
record lock, heap no 92 physical record: n_fields 11; compact format; info bits 0
0: len 8; hex 800000000097629c; asc b ;; 1: len 6; hex 00002866eaee; asc (f ;; 2: len 7; hex 00000d40040110; asc @ ;; 3: len 8; hex 80000000000050b2; asc p ;; 4: len 8; hex 800000000000502a; asc p*;; 5: len 8; hex 8000000000005426; asc t&;; 6: len 8; hex 800012412c66d29c; asc a,f ;; 7: len 23; hex 75706c6f6164666972652e636f6d2f6 8616e642e706870; asc xxx.com/;; 8: len 8; hex 800000000000042b; asc +;; 9: len 4; hex 474bfa2b; asc gk +;; 10: len 8; hex 8000000000004e24; asc n$;;
*** (2) transaction:
transaction 0 677833454, active 0 sec, process no 11397, os thread id 344086 updating or deleting, thread declared inside innodb 499
mysql tables in use 1, locked 1
3 lock struct(s), heap size 320, undo log entries 1
mysql thread id 84, query id 162348739 dcnet03 dcnet updating
update tsk_task set status_id=1067,update_time=now () where id in (9921180)
*** (2) holds the lock(s):
record locks space id 0 page no 849384 n bits 208 index `primary` of table `dcnet_db/tsk_task` trx id 0 677833454 lock_mode x locks rec but not gap
record lock, heap no 92 physical record: n_fields 11; compact format; info bits 0
0: len 8; hex 800000000097629c; asc b ;; 1: len 6; hex 00002866eaee; asc (f ;; 2: len 7; hex 00000d40040110; asc @ ;; 3: len 8; hex 80000000000050b2; asc p ;; 4: len 8; hex 800000000000502a; asc p*;; 5: len 8; hex 8000000000005426; asc t&;; 6: len 8; hex 800012412c66d29c; asc a,f ;; 7: len 23; hex 75706c6f6164666972652e636f6d2f6 8616e642e706870; asc uploadfire.com/hand.php;; 8: len 8; hex 800000000000042b; asc +;; 9: len 4; hex 474bfa2b; asc gk +;; 10: len 8; hex 8000000000004e24; asc n$;;
*** (2) waiting for this lock to be granted:
record locks space id 0 page no 843102 n bits 600 index `key_tsktask_montime2` of table `dcnet_db/tsk_task` trx id 0 677833454 lock_mode x locks rec but not gap waiting
record lock, heap no 395 physical record: n_fields 3; compact format; info bits 0
0: len 8; hex 8000000000000425; asc %;; 1: len 8; hex 800012412c66d29c; asc a,f ;; 2: len 8; hex 800000000097629c; asc b ;;
*** we roll back transaction (1)
此死锁问题涉及tsk_task表,该表用于保存系统监测任务,以下是相关字段及索引:
id:主键;
mon_time:监测时间;
status_id:任务状态;
索引:key_tsktask_montime2 (status_id, mon_time)。
分析,涉及的两条语句应该不会涉及相同的tsk_task记录,那为什么会造成死锁呢?
查询mysql官网文档,发现这跟mysql的索引机制有关。mysql的innodb引擎是行级锁,我原来的理解是直接对记录进行锁定,实际上并不是这样的。
要点如下:
不是对记录进行锁定,而是对索引进行锁定;
在update、delete操作时,mysql不仅锁定where条件扫描过的所有索引记录,而且会锁定相邻的键值,即所谓的next-key locking;
如语句update tsk_task set update_time = now() where id > 10000会锁定所有主键大于等于1000的所有记录,在该语句完成之前,你就不能对主键等于10000的记录进行操作;
当非簇索引(non-cluster index)记录被锁定时,相关的簇索引(cluster index)记录也需要被锁定才能完成相应的操作。
再分析一下发生问题的两条sql语句,就不难找到问题所在了:
当“update tsk_task set status_id=1064,update_time=now () where status_id=1061 and mon_time
假设“update tsk_task set status_id=1067,update_time=now () where id in (9921180)”几乎同时执行时,本语句首先锁定簇索引(主键),由于需要更新status_id的值,所以还需要锁定key_tsktask_montime2的某些索引记录。
这样第一条语句锁定了key_tsktask_montime2的记录,等待主键索引,而第二条语句则锁定了主键索引记录,而等待key_tsktask_montime2的记录,在此情况下,死锁就产生了。
笔者通过拆分第一条语句解决死锁问题:
先查出符合条件的id:select id from tsk_task where status_id=1061 and mon_time
至此,死锁问题彻底解决