子查询的一点补充,之前小鱼写过一篇关于in和exists性能的分析 http://www.dbaxiaoyu.com/archives/2012 其实这个都是子查询,而在最新的oracle 11g中,in和exists基本不太可能产生变化,因为11g的cbo不仅可以unnest展开子查询为表连接,还新增了null-aware
子查询的一点补充,之前小鱼写过一篇关于in和exists性能的分析 http://www.dbaxiaoyu.com/archives/2012
其实这个都是子查询,而在最新的oracle 11g中,in和exists基本不太可能产生变化,因为11g的cbo不仅可以unnest展开子查询为表连接,还新增了null-aware anti join的算法,由于in对null敏感。
而在oracle 11g之前,如果关联列上面没有not null的约束,那么此时not in的写法就无法对子查询进行展开,一般我们会看见形如下面的filter执行计划:
c:\users\administrator>sqlplus / as sysdba
sql*plus: release 10.2.0.4.0 - production on tue may 13 10:14:42 2014
copyright (c) 1982, 2007, oracle. all rights reserved.
connected to:
oracle database 10g enterprise edition release 10.2.0.4.0 - 64bit production
with the partitioning, olap, data mining and real application testing options
sql> set autotrace traceonly;
sql> set linesize 140;
sql> select * from table02 where object_id not in (select object_id from table01);
execution plan
----------------------------------------------------------
plan hash value: 206984988
------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
------------------------------------------------------------------------------
| 0 | select statement | | 52376 | 9053k| 3430 (1)| 00:00:42 |
|* 1 | filter | | | | | |
| 2 | table access full| table02 | 52408 | 9058k| 154 (2)| 00:00:02 |
|* 3 | table access full| table01 | 50979 | 647k| 2 (0)| 00:00:01 |
------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - filter( not exists (select /*+ */ 0 from table01 table01
where lnnvl(object_id
:b1)))
3 - filter(lnnvl(object_id
:b1))
note
-----
- dynamic sampling used for this statement
statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
17188464 consistent gets
0 physical reads
0 redo size
1403 bytes sent via sql*net to client
492 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
而这个执行成本往往非常高,而如果我们添加一个not null的约束,或者改写下sql或者添加not null约束来取消这个特别消耗成本的filter
1)改写成minus写法:
sql> select * from table02 a minus
2 select * from table02 where object_id in (select object_id from table01);
execution plan
----------------------------------------------------------
plan hash value: 1546480765
--------------------------------------------------------------------------------
--------
| id | operation | name | rows | bytes |tempspc| cost (%cpu)| ti
me |
--------------------------------------------------------------------------------
--------
| 0 | select statement | | 52408 | 18m| | 4674 (54)| 00
:00:57 |
| 1 | minus | | | | | |
|
| 2 | sort unique | | 52408 | 9058k| 21m| 2189 (1)| 00
:00:27 |
| 3 | table access full | table02 | 52408 | 9058k| | 154 (2)| 00
:00:02 |
| 4 | sort unique | | 52409 | 9724k| 19m| 2484 (1)| 00
:00:30 |
|* 5 | hash join | | 52409 | 9724k| | 308 (2)| 00
:00:04 |
| 6 | table access full| table01 | 53662 | 681k| | 153 (1)| 00
:00:02 |
| 7 | table access full| table02 | 52408 | 9058k| | 154 (2)| 00
:00:02 |
--------------------------------------------------------------------------------
--------
predicate information (identified by operation id):
---------------------------------------------------
5 - access(object_id=object_id)
note
-----
- dynamic sampling used for this statement
statistics
----------------------------------------------------------
13 recursive calls
0 db block gets
2296 consistent gets
0 physical reads
0 redo size
1403 bytes sent via sql*net to client
492 bytes received via sql*net from client
2 sql*net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed
这里逻辑读降了好多,虽然cost感觉好像比上述的filter执行成本还要大,但是sql的相应时间确明显比filter好太多了。
2 给子表和主表增加not null的约束:
sql> alter table table01 modify object_id not null;
table altered.
sql> alter table table02 modify object_id not null;
table altered.
sql> select * from table02 where object_id not in (select object_id from table01);
execution plan
----------------------------------------------------------
plan hash value: 35610947
--------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------
| 0 | select statement | | 1 | 190 | 308 (2)| 00:00:04 |
|* 1 | hash join right anti| | 1 | 190 | 308 (2)| 00:00:04 |
| 2 | table access full | table01 | 53662 | 681k| 153 (1)| 00:00:02 |
| 3 | table access full | table02 | 52408 | 9058k| 154 (2)| 00:00:02 |
--------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access(object_id=object_id)
note
-----
- dynamic sampling used for this statement
statistics
----------------------------------------------------------
265 recursive calls
0 db block gets
1557 consistent gets
0 physical reads
0 redo size
1403 bytes sent via sql*net to client
492 bytes received via sql*net from client
2 sql*net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed
注意这里需要对子表和主表都添加not null约束,不然在10g的cbo下,oracle还是会选择性能较差的filter。
我们看看各个版本优化器对于in和exists处理的变化(table01和table02的object_id上都有not null约束)
sql> select /*+ optimizer_features_enable('8.1.7')*/* from table02 b where exists (select 1 from table01 a where a.object_id=b.object_id);
50075 rows selected.
execution plan
----------------------------------------------------------
plan hash value: 206984988
--------------------------------------------------------------
| id | operation | name | rows | bytes | cost |
--------------------------------------------------------------
| 0 | select statement | | 2806 | 485k| 67 |
|* 1 | filter | | | | |
| 2 | table access full| table02 | 2806 | 485k| 67 |
|* 3 | table access full| table01 | 561 | 7293 | 67 |
--------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - filter( exists (select 0 from table01 a where
a.object_id=:b1))
3 - filter(a.object_id=:b1)
note
-----
- cpu costing is off (consider enabling it)
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
17191469 consistent gets
0 physical reads
0 redo size
2569714 bytes sent via sql*net to client
37210 bytes received via sql*net from client
3340 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50075 rows processed
sql> select /*+ optimizer_features_enable('8.1.7')*/* from table02 b where not
exists (select 1 from table01 a where a.object_id=b.object_id);
execution plan
----------------------------------------------------------
plan hash value: 206984988
--------------------------------------------------------------
| id | operation | name | rows | bytes | cost |
--------------------------------------------------------------
| 0 | select statement | | 2806 | 485k| 67 |
|* 1 | filter | | | | |
| 2 | table access full| table02 | 2806 | 485k| 67 |
|* 3 | table access full| table01 | 561 | 7293 | 67 |
--------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - filter( not exists (select 0 from table01 a where
a.object_id=:b1))
3 - filter(a.object_id=:b1)
note
-----
- cpu costing is off (consider enabling it)
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
17191469 consistent gets
0 physical reads
0 redo size
2569714 bytes sent via sql*net to client
37210 bytes received via sql*net from client
3340 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
sql> select /*+ optimizer_features_enable('8.1.7')*/* from table02 b where objec
t_id in (select object_id from table01 a);
50075 rows selected.
execution plan
----------------------------------------------------------
plan hash value: 2067593584
-------------------------------------------------------------------------
| id | operation | name | rows | bytes |tempspc| cost |
-------------------------------------------------------------------------
| 0 | select statement | | 31m| 5705m| | 469 |
|* 1 | hash join | | 31m| 5705m| | 469 |
| 2 | view | vw_nso_1 | 56115 | 712k| | 251 |
| 3 | sort unique | | 56115 | 712k| 2216k| 251 |
| 4 | table access full| table01 | 56115 | 712k| | 67 |
| 5 | table access full | table02 | 56115 | 9699k| | 67 |
-------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access(object_id=$nso_col_1)
note
-----
- cpu costing is off (consider enabling it)
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4684 consistent gets
0 physical reads
0 redo size
2569714 bytes sent via sql*net to client
37210 bytes received via sql*net from client
3340 sql*net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
50075 rows processed
sql> select /*+ optimizer_features_enable('8.1.7')*/* from table02 b where objec
t_id not in (select object_id from table01 a);
execution plan
----------------------------------------------------------
plan hash value: 206984988
--------------------------------------------------------------
| id | operation | name | rows | bytes | cost |
--------------------------------------------------------------
| 0 | select statement | | 2806 | 485k| 67 |
|* 1 | filter | | | | |
| 2 | table access full| table02 | 2806 | 485k| 67 |
|* 3 | table access full| table01 | 561 | 7293 | 67 |
--------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - filter( not exists (select 0 from table01 a where
object_id=:b1))
3 - filter(object_id=:b1)
note
-----
- cpu costing is off (consider enabling it)
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4684 consistent gets
0 physical reads
0 redo size
2569714 bytes sent via sql*net to client
37210 bytes received via sql*net from client
3340 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
这里看出在8i的优化器模式下,in的子查询被展开为表连接了,其余的not in、exists、not exists的子查询并不被选择展开为表连接,而是采用一种filter的关联方式,虽然这里的执行成本初看来filter的cost更小,但是sq的相应时间消耗资源的比例确实天壤之别,很多情况我们并不能以cost值去衡量这个sql性能。
sql> select /*+ optimizer_features_enable('9.2.0')*/* from table02 b where exis
ts (select 1 from table01 a where a.object_id=b.object_id);
50075 rows selected.
execution plan
----------------------------------------------------------
plan hash value: 268410134
-----------------------------------------------------------------------
| id | operation | name | rows | bytes |tempspc| cost |
-----------------------------------------------------------------------
| 0 | select statement | | 50075 | 5183k| | 236 |
|* 1 | hash join semi | | 50075 | 5183k| 5136k| 236 |
| 2 | table access full | table02 | 50076 | 4547k| | 68 |
| 3 | view | vw_sq_1 | 50075 | 635k| | 68 |
| 4 | table access full| table01 | 50075 | 244k| | 68 |
-----------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access(object_id=b.object_id)
note
-----
- cpu costing is off (consider enabling it)
statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4684 consistent gets
0 physical reads
0 redo size
2569714 bytes sent via sql*net to client
37210 bytes received via sql*net from client
3340 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50075 rows processed
sql> select /*+ optimizer_features_enable('9.2.0')*/* from table02 b where not e
xists (select 1 from table01 a where a.object_id=b.object_id);
execution plan
----------------------------------------------------------
plan hash value: 2991049530
----------------------------------------------------------------------
| id | operation | name | rows | bytes |tempspc| cost |
----------------------------------------------------------------------
| 0 | select statement | | 5629 | 1044k| | 324 |
|* 1 | hash join anti | | 5629 | 1044k| 10m| 324 |
| 2 | table access full| table02 | 58373 | 9m| | 68 |
| 3 | table access full| table01 | 52744 | 669k| | 68 |
----------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access(a.object_id=b.object_id)
note
-----
- cpu costing is off (consider enabling it)
- dynamic sampling used for this statement
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4684 consistent gets
0 physical reads
0 redo size
2569714 bytes sent via sql*net to client
37210 bytes received via sql*net from client
3340 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
sql> select /*+ optimizer_features_enable('9.2.0')*/* from table02 b where objec
t_id in (select object_id from table01 a);
execution plan
----------------------------------------------------------
plan hash value: 1361234999
------------------------------------------------------------------------
| id | operation | name | rows | bytes |tempspc| cost |
------------------------------------------------------------------------
| 0 | select statement | | 50075 | 5183k| | 236 |
|* 1 | hash join semi | | 50075 | 5183k| 5136k| 236 |
| 2 | table access full | table02 | 50076 | 4547k| | 68 |
| 3 | view | vw_nso_1 | 50075 | 635k| | 68 |
| 4 | table access full| table01 | 50075 | 244k| | 68 |
------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access(object_id=$nso_col_1)
note
-----
- cpu costing is off (consider enabling it)
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4684 consistent gets
0 physical reads
0 redo size
2569714 bytes sent via sql*net to client
37210 bytes received via sql*net from client
3340 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50075 rows processed
sql> select /*+ optimizer_features_enable('9.2.0')*/* from table02 b where objec
t_id not in (select object_id from table01 a);
execution plan
----------------------------------------------------------
plan hash value: 2991049530
----------------------------------------------------------------------
| id | operation | name | rows | bytes |tempspc| cost |
----------------------------------------------------------------------
| 0 | select statement | | 5629 | 1044k| | 324 |
|* 1 | hash join anti | | 5629 | 1044k| 10m| 324 |
| 2 | table access full| table02 | 58373 | 9m| | 68 |
| 3 | table access full| table01 | 52744 | 669k| | 68 |
----------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access(object_id=object_id)
note
-----
- cpu costing is off (consider enabling it)
- dynamic sampling used for this statement
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4684 consistent gets
0 physical reads
0 redo size
2569714 bytes sent via sql*net to client
37210 bytes received via sql*net from client
3340 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
看出在9i优化器环境下,in和exists都被展开为表连接,此时cbo走的hash join的连接方式。
由于主表和子表的object_id上有not null的约束,所以这里not in和not exists执行计划也都相同,对于关联数据较多的sql,hash join往往比nested loop高效很多。
oracle 10g的优化器对于in、exists、not in和not exists区别并不大,到了11g的优化器,新增了null aware anti join算法,此时并不需要表中有not null约束,也能走hash join的连接方式。
关于in、exists、not in和not exists一直是很多朋友纠结的问题,小鱼这里简单总结下:
在oracle 8i下,in是可以展开为表连接的,而not in、exists、not exists会选择filter执行计划,如果被驱动表没有高效索引,驱动表数据返回较多,这个执行计划往往存在很严重的性能问题
在oracle 9i到oracle 10g下,in和exists没有多大性能的区别,而not in和not exists则可能有所区别,主要看关联列是否有not null约束,如果没有也只能走filter的执行计划,而有则会选择hash join和filter的中优秀的执行方式
在oracle 11g下,由于新增了null-aware anti join的算法,in和exists基本没有区别了,既可以走hash join也可以走filter。
从此in、exists、not in、not exists的经典问题可能并不绝对了,虽然优化器有诸多的缺陷,但是cbo确实在不断的改进自己,这个是值得庆幸的!
而现在我们来看看返回结果上有什么区别:
sql> select * from t01;
id name
---------- ----------
1 xiaoyu
2 xiaobai
3
sql> select * from t02;
id name
---------- ----------
10 xiaoyu
20 xiaotian
sql> select * from t01 where t01.name in (select name from t02);
id name
---------- ----------
1 xiaoyu
sql> select * from t01 where exists (select 1 from t02 where t01.name=t02.name);
id name
---------- ----------
1 xiaoyu
来看看not in和not exists:
sql> select * from t01 where t01.name not in (select name from t02);
id name
---------- ----------
2 xiaobai
sql> select * from t01 where not exists (select 1 from t02 where t01.name=t02.na
me);
id name
---------- ----------
3
2 xiaobai
看出这里的子查询中in和exists返回结果没有区别,not in的只返回一行数据,而not exists确返回了两行数据,其实我们应该是希望返回两行数据的,那么如果我们再t02表上面添加一个name null的rows来看看
sql> insert into t02 values(30,null);
1 row created.
sql> commit;
commit complete.
sql> select * from t01 where name in (select name from t02);
id name
---------- ----------
1 xiaoyu
sql> select * from t01 where exists (select 1 from t02 where t01.name=t02.name);
id name
---------- ----------
1 xiaoyu
sql> select * from t01 where name not in (select name from t02);
no rows selected
sql> select * from t01 where not exists (select 1 from t02 where t01.name=t02.na
me);
id name
---------- ----------
3
2 xiaobai
这里看出in和exists对于null处理没有变化,但是not in和not exists就不同了,not exists对于子表的null会直接略掉,也就是认为满足这个not exists的条件,而not in对于子表的null是敏感的,换句话说只要子表有null值,则not in不返回任何结果集。
关于in和exists补充就到此为止了,话说最近手头正有个子查询不展开的案例,该走hash join的走的是filter,整理完后会与大家分享!
原文地址:子查询in、exists、not in、not exists一点补充, 感谢原作者分享。