您好,欢迎访问一九零五行业门户网

mysql中 innodb表的count()优化

作/译者:叶金荣(imysql#imysql.com),来源: http://imysql.com,欢迎转载。 起因:在innodb表上做count(*)统计实在是太慢了,因此想办法看能不能再快点。 现象:先来看几个测试案例,如下 一、 sbtest 表上的测试 show create table sbtest\g***********
作/译者:叶金荣(imysql#imysql.com>),来源: http://imysql.com,欢迎转载。
起因:在innodb表上做count(*)统计实在是太慢了,因此想办法看能不能再快点。
现象:先来看几个测试案例,如下
一、 sbtest 表上的测试
show create table sbtest\g*************************** 1. row ***************************table: sbtestcreate table: create table `sbtest` (`aid` bigint(20) unsigned not null auto_increment,`id` int(10) unsigned not null default '0',`k` int(10) unsigned not null default '0',`c` char(120) not null default '',`pad` char(60) not null default '',primary key (`aid`),key `k` (`k`),key `id` (`id`)) engine=innodb auto_increment=1000001 default charset=latin1show index from sbtest;+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| table | non_unique | key_name | seq_in_index | column_name | collation | cardinality | sub_part | packed | null | index_type | comment |+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| sbtest | 0 | primary | 1 | aid | a | 1000099 | null | null | | btree | || sbtest | 1 | k | 1 | k | a | 18 | null | null | | btree | || sbtest | 1 | id | 1 | id | a | 1000099 | null | null | | btree | |+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
填充了 100万条 记录。
1、 直接 count(*)
explain select count(*) from sbtest;+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+| 1 | simple | sbtest | index | null | primary | 8 | null | 1000099 | using index |+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+select count(*) from sbtest;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (1.42 sec)
可以看到,如果不加任何条件,那么优化器优先采用 primary key 来进行扫描。
2、count(*) 使用 primary key 字段做条件
explain select count(*) from sbtest where aid>=0;+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+| 1 | simple | sbtest | range | primary | primary | 8 | null | 485600 | using where; using index |+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+select count(*) from sbtest where aid>=0;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (1.39 sec)
可以看到,尽管优化器认为只需要扫描 485600 条记录(其实是索引),比刚才少多了,但其实仍然要做全表(索引)扫描。因此耗时和第一种相当。
3、 count(*) 使用 secondary index 字段做条件
explain select count(*) from sbtest where id>=0;+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+| 1 | simple | sbtest | range | id | id | 4 | null | 500049 | using where; using index |+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+select count(*) from sbtest where id>=0;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (0.43 sec)
可以看到,采用这种方式查询会非常快。
有人也许会问了,会不会是因为 id 字段的长度比 aid 字段的长度来的小,导致它扫描起来比较快呢?先不着急下结论,咱们来看看下面的测试例子。
二、 sbtest1 表上的测试
show create table sbtest1\g*************************** 1. row ***************************table: sbtest1create table: create table `sbtest1` (`aid` int(10) unsigned not null auto_increment,`id` bigint(20) unsigned not null default '0',`k` int(10) unsigned not null default '0',`c` char(120) not null default '',`pad` char(60) not null default '',primary key (`aid`),key `k` (`k`),key `id` (`id`)) engine=innodb auto_increment=1000001 default charset=latin1show index from sbtest1;+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| table | non_unique | key_name | seq_in_index | column_name | collation | cardinality | sub_part | packed | null | index_type | comment |+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| sbtest1 | 0 | primary | 1 | aid | a | 1000099 | null | null | | btree | || sbtest1 | 1 | k | 1 | k | a | 18 | null | null | | btree | || sbtest1 | 1 | id | 1 | id | a | 1000099 | null | null | | btree | |+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
这个表里,把 aid 和 id 的字段长度调换了一下,也填充了 1000万条 记录。
1、 直接 count(*)
explain select count(*) from sbtest1;+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+| 1 | simple | sbtest1 | index | null | primary | 4 | null | 1000099 | using index |+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+select count(*) from sbtest1;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (1.42 sec)
可以看到,如果不加任何条件,那么优化器优先采用 primary key 来进行扫描。
2、count(*) 使用 primary key 字段做条件
explain select count(*) from sbtest1 where aid>=0;+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+| 1 | simple | sbtest1 | range | primary | primary | 4 | null | 316200 | using where; using index |+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+1 row in set (0.00 sec)select count(*) from sbtest1 where aid>=0;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (1.42 sec)
可以看到,尽管优化器认为只需要扫描 485600 条记录(其实是索引),比刚才少多了,但其实仍然要做全表(索引)扫描。因此耗时和第一种相当。
3、 count(*) 使用 secondary index 字段做条件
explain select count(*) from sbtest1 where id>=0;+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+| 1 | simple | sbtest1 | range | id | id | 8 | null | 500049 | using where; using index |+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+1 row in set (0.00 sec)select count(*) from sbtest1 where id>=0;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (0.45 sec)
可以看到,采用这种方式查询会非常快。
上面的所有测试,均在 mysql 5.1.24 环境下通过,并且每次查询前都重启了 mysqld。
可以看到,把 aid 和 id 的长度调换之后,采用 secondary index 查询仍然是要比用 primary key 查询来的快很多。看来主要不是字段长度引起的索引扫描快慢,而是采用 primary key 以及 secondary index 引起的区别。那么,为什么用 secondary index 扫描反而比 primary key 扫描来的要快呢?我们就需要了解innodb的? clustered index?和 secondary index?之间的区别了。
innodb 的 clustered index 是把 primary key 以及 row data 保存在一起的,而 secondary index 则是单独存放,然后有个指针指向 primary key。因此,需要进行 count(*) 统计表记录总数时,利用 secondary index 扫描起来,显然更快。而primary key则主要在扫描索引,同时要返回结果记录时的作用较大,例如:
select * from sbtest where aid = xxx;
那既然是使用 secondary index 会比 primary key 更快,为何优化器却优先选择 primary key 来扫描呢, heikki tuuri?的回答是:
in the example table, the secondary index is inserted into in a perfect order! that isvery unusual. normally the secondary index would be fragmented, causing random disk i/o,and the scan would be slower than in the primary index.i am changing this to a feature request: keep 'clustering ratio' statistics on a secondaryindex and do the scan there if the order is almost the same as in the primary index. idoubt this feature will ever be implemented, though.
详情请看: 这个 bug,以及这个文章: innodb row counting using indexes。
最后感谢 老杨的帮助。
技术相关:?
mysql优化
innodb
相关文章 关于mysql explain 中的id mysql优化一般步聚(教程) [存储引擎基础知识]innodb与myisam的六大区别 mysql技术内幕:innodb存储-3.6 innodb存储引擎文件 mysql之handler_read_*
原文地址:mysql中 innodb表的count()优化, 感谢原作者分享。
其它类似信息

推荐信息