mysql中 innodb表的count()优化

作/译者：叶金荣（imysql#imysql.com），来源： http://imysql.com，欢迎转载。起因：在innodb表上做count(*)统计实在是太慢了，因此想办法看能不能再快点。现象：先来看几个测试案例，如下一、 sbtest 表上的测试 show create table sbtest\g***********
作/译者：叶金荣（imysql#imysql.com>），来源： http://imysql.com，欢迎转载。
起因：在innodb表上做count(*)统计实在是太慢了，因此想办法看能不能再快点。
现象：先来看几个测试案例，如下
一、 sbtest 表上的测试
show create table sbtest\g*************************** 1. row ***************************table: sbtestcreate table: create table `sbtest` (`aid` bigint(20) unsigned not null auto_increment,`id` int(10) unsigned not null default '0',`k` int(10) unsigned not null default '0',`c` char(120) not null default '',`pad` char(60) not null default '',primary key (`aid`),key `k` (`k`),key `id` (`id`)) engine=innodb auto_increment=1000001 default charset=latin1show index from sbtest;+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| table | non_unique | key_name | seq_in_index | column_name | collation | cardinality | sub_part | packed | null | index_type | comment |+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| sbtest | 0 | primary | 1 | aid | a | 1000099 | null | null | | btree | || sbtest | 1 | k | 1 | k | a | 18 | null | null | | btree | || sbtest | 1 | id | 1 | id | a | 1000099 | null | null | | btree | |+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
填充了 100万条记录。
1、直接 count(*)
explain select count(*) from sbtest;+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+| 1 | simple | sbtest | index | null | primary | 8 | null | 1000099 | using index |+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+select count(*) from sbtest;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (1.42 sec)
可以看到，如果不加任何条件，那么优化器优先采用 primary key 来进行扫描。
2、count(*) 使用 primary key 字段做条件
explain select count(*) from sbtest where aid>=0;+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+| 1 | simple | sbtest | range | primary | primary | 8 | null | 485600 | using where; using index |+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+select count(*) from sbtest where aid>=0;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (1.39 sec)
可以看到，尽管优化器认为只需要扫描 485600 条记录(其实是索引)，比刚才少多了，但其实仍然要做全表(索引)扫描。因此耗时和第一种相当。
3、 count(*) 使用 secondary index 字段做条件
explain select count(*) from sbtest where id>=0;+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+| 1 | simple | sbtest | range | id | id | 4 | null | 500049 | using where; using index |+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+select count(*) from sbtest where id>=0;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (0.43 sec)
可以看到，采用这种方式查询会非常快。
有人也许会问了，会不会是因为 id 字段的长度比 aid 字段的长度来的小，导致它扫描起来比较快呢？先不着急下结论，咱们来看看下面的测试例子。
二、 sbtest1 表上的测试
show create table sbtest1\g*************************** 1. row ***************************table: sbtest1create table: create table `sbtest1` (`aid` int(10) unsigned not null auto_increment,`id` bigint(20) unsigned not null default '0',`k` int(10) unsigned not null default '0',`c` char(120) not null default '',`pad` char(60) not null default '',primary key (`aid`),key `k` (`k`),key `id` (`id`)) engine=innodb auto_increment=1000001 default charset=latin1show index from sbtest1;+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| table | non_unique | key_name | seq_in_index | column_name | collation | cardinality | sub_part | packed | null | index_type | comment |+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| sbtest1 | 0 | primary | 1 | aid | a | 1000099 | null | null | | btree | || sbtest1 | 1 | k | 1 | k | a | 18 | null | null | | btree | || sbtest1 | 1 | id | 1 | id | a | 1000099 | null | null | | btree | |+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
这个表里，把 aid 和 id 的字段长度调换了一下，也填充了 1000万条记录。
1、直接 count(*)
explain select count(*) from sbtest1;+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+| 1 | simple | sbtest1 | index | null | primary | 4 | null | 1000099 | using index |+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+select count(*) from sbtest1;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (1.42 sec)
可以看到，如果不加任何条件，那么优化器优先采用 primary key 来进行扫描。
2、count(*) 使用 primary key 字段做条件
explain select count(*) from sbtest1 where aid>=0;+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+| 1 | simple | sbtest1 | range | primary | primary | 4 | null | 316200 | using where; using index |+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+1 row in set (0.00 sec)select count(*) from sbtest1 where aid>=0;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (1.42 sec)
可以看到，尽管优化器认为只需要扫描 485600 条记录(其实是索引)，比刚才少多了，但其实仍然要做全表(索引)扫描。因此耗时和第一种相当。
3、 count(*) 使用 secondary index 字段做条件
explain select count(*) from sbtest1 where id>=0;+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+| 1 | simple | sbtest1 | range | id | id | 8 | null | 500049 | using where; using index |+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+1 row in set (0.00 sec)select count(*) from sbtest1 where id>=0;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (0.45 sec)
可以看到，采用这种方式查询会非常快。
上面的所有测试，均在 mysql 5.1.24 环境下通过，并且每次查询前都重启了 mysqld。
可以看到，把 aid 和 id 的长度调换之后，采用 secondary index 查询仍然是要比用 primary key 查询来的快很多。看来主要不是字段长度引起的索引扫描快慢，而是采用 primary key 以及 secondary index 引起的区别。那么，为什么用 secondary index 扫描反而比 primary key 扫描来的要快呢？我们就需要了解innodb的? clustered index?和 secondary index?之间的区别了。
innodb 的 clustered index 是把 primary key 以及 row data 保存在一起的，而 secondary index 则是单独存放，然后有个指针指向 primary key。因此，需要进行 count(*) 统计表记录总数时，利用 secondary index 扫描起来，显然更快。而primary key则主要在扫描索引，同时要返回结果记录时的作用较大，例如：
select * from sbtest where aid = xxx;
那既然是使用 secondary index 会比 primary key 更快，为何优化器却优先选择 primary key 来扫描呢， heikki tuuri?的回答是：
in the example table, the secondary index is inserted into in a perfect order! that isvery unusual. normally the secondary index would be fragmented, causing random disk i/o,and the scan would be slower than in the primary index.i am changing this to a feature request: keep 'clustering ratio' statistics on a secondaryindex and do the scan there if the order is almost the same as in the primary index. idoubt this feature will ever be implemented, though.
详情请看：这个 bug，以及这个文章： innodb row counting using indexes。
最后感谢老杨的帮助。
技术相关:?
mysql优化
innodb
相关文章关于mysql explain 中的id mysql优化一般步聚(教程) [存储引擎基础知识]innodb与myisam的六大区别 mysql技术内幕：innodb存储-3.6 innodb存储引擎文件 mysql之handler_read_*
原文地址：mysql中 innodb表的count()优化, 感谢原作者分享。

mysql中 innodb表的count()优化

推荐信息