Left Join

开发有个语句执行了超过2个小时没有结果，询问我到底为什么执行这么久。语句格式如下select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id; 这个是典型的理解错误，本意是要对a表进行过滤后进行 []left join] 的，我们来看看到底
开发有个语句执行了超过2个小时没有结果，询问我到底为什么执行这么久。
语句格式如下select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id; 这个是典型的理解错误，本意是要对a表进行过滤后进行[]left join]的，我们来看看到底什么是真正的[left join]。

[gpadmin@mdw ~]$ psql bigdatagp psql (8.2.15) type help for help. bigdatagp=# drop table tgt1; drop table bigdatagp=# drop table tgt2; drop table bigdatagp=# explain select t1.telnumber,t2.ua,t2.url,t1.apply_name,t2.apply_name from gpbase.tb_csv_gn_ip_session t1 ,gpbase.tb_csv_gn_http_session_hw t2 where t1.bigdatagp=# \q bigdatagp=# create table tgt1(id int, name varchar(20)); notice: table doesn't have 'distributed by' clause -- using column named 'id' as the greenplum database data distribution key for this table. hint: the 'distributed by' clause determines the distribution of data. make sure column(s) chosen are the optimal data distribution key to minimize skew. create table bigdatagp=# create table tgt2(id int, name varchar(20)); notice: table doesn't have 'distributed by' clause -- using column named 'id' as the greenplum database data distribution key for this table. hint: the 'distributed by' clause determines the distribution of data. make sure column(s) chosen are the optimal data distribution key to minimize skew. create table bigdatagp=# insert into tgt1 select generate_series(1,3),('a','b'); error: column name is of type character varying but expression is of type record hint: you will need to rewrite or cast the expression. bigdatagp=# insert into tgt1 select generate_series(1,5),generate_series(1,5)||'a'; insert 0 5 bigdatagp=# insert into tgt2 select generate_series(1,2),generate_series(1,2)||'a'; insert 0 2 bigdatagp=# select * from tgt1; id | name ----+------ 2 | 2a 4 | 4a 1 | 1a 3 | 3a 5 | 5a (5 rows) bigdatagp=# select * from tgt1 order by id; id | name ----+------ 1 | 1a 2 | 2a 3 | 3a 4 | 4a 5 | 5a (5 rows) bigdatagp=# select * from tgt2 order by id; id | name ----+------ 1 | 1a 2 | 2a (2 rows) bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id; id | name | id | name ----+------+----+------ 3 | 3a | | 5 | 5a | | 1 | 1a | 1 | 1a 2 | 2a | 2 | 2a 4 | 4a | | (5 rows) bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id order by a.id; id | name | id | name ----+------+----+------ 1 | 1a | 1 | 1a 2 | 2a | 2 | 2a 3 | 3a | | 4 | 4a | | 5 | 5a | | (5 rows) bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where id>=3 order by a.id; error: column reference id is ambiguous line 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id>=3 orde... ^ bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id; id | name | id | name ----+------+----+------ 3 | 3a | | 4 | 4a | | 5 | 5a | | (3 rows) bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id; id | name | id | name ----+------+----+------ 1 | 1a | | 2 | 2a | | 3 | 3a | | 4 | 4a | | 5 | 5a | | (5 rows) bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id; id | name | id | name ----+------+----+------ (0 rows) bigdatagp=# select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id; id | name | id | name ----+------+----+------ 1 | 1a | | 2 | 2a | | 3 | 3a | | 4 | 4a | | 5 | 5a | | (5 rows) bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=3 order by a.id; query plan --------------------------------------------------------------------------------------------------------------------------------------------------- gather motion 64:1 (slice1; segments: 64) (cost=7.18..7.19 rows=1 width=14) merge key: ?column5? rows out: 3 rows at destination with 21 ms to end, start offset by 559 ms. -> sort (cost=7.18..7.19 rows=1 width=14) sort key: a.id rows out: avg 1.0 rows x 3 workers. max 1 rows (seg52) with 5.452 ms to first row, 5.454 ms to end, start offset by 564 ms. executor memory: 63k bytes avg, 74k bytes max (seg2). work_mem used: 63k bytes avg, 74k bytes max (seg2). workfile: (0 spilling, 0 reused) -> hash left join (cost=2.04..7.15 rows=1 width=14) hash cond: a.id = b.id rows out: avg 1.0 rows x 3 workers. max 1 rows (seg52) with 4.190 ms to first row, 4.598 ms to end, start offset by 565 ms. -> seq scan on tgt1 a (cost=0.00..5.06 rows=1 width=7) filter: id >= 3 rows out: avg 1.0 rows x 3 workers. max 1 rows (seg52) with 0.156 ms to first row, 0.158 ms to end, start offset by 565 ms. -> hash (cost=2.02..2.02 rows=1 width=7) rows in: (no row requested) 0 rows (seg0) with 0 ms to end. -> seq scan on tgt2 b (cost=0.00..2.02 rows=1 width=7) rows out: (no row requested) 0 rows (seg0) with 0 ms to end. slice statistics: (slice0) executor memory: 332k bytes. (slice1) executor memory: 446k bytes avg x 64 workers, 4329k bytes max (seg52). work_mem: 74k bytes max. statement statistics: memory used: 128000k bytes total runtime: 580.630 ms (24 rows) bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=3 order by a.id; query plan --------------------------------------------------------------------------------------------------------------------------------------------------------- gather motion 64:1 (slice1; segments: 64) (cost=7.23..7.24 rows=1 width=14) merge key: ?column5? rows out: 5 rows at destination with 24 ms to end, start offset by 701 ms. -> sort (cost=7.23..7.24 rows=1 width=14) sort key: a.id rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 6.292 ms to first row, 6.294 ms to end, start offset by 715 ms. executor memory: 70k bytes avg, 74k bytes max (seg0). work_mem used: 70k bytes avg, 74k bytes max (seg0). workfile: (0 spilling, 0 reused) -> hash left join (cost=2.04..7.17 rows=1 width=14) hash cond: a.id = b.id join filter: a.id >= 3 rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 4.422 ms to first row, 5.055 ms to end, start offset by 717 ms. executor memory: 1k bytes avg, 1k bytes max (seg42). work_mem used: 1k bytes avg, 1k bytes max (seg42). workfile: (0 spilling, 0 reused) (seg42) hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets. -> seq scan on tgt1 a (cost=0.00..5.05 rows=1 width=7) rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 0.179 ms to first row, 0.180 ms to end, start offset by 717 ms. -> hash (cost=2.02..2.02 rows=1 width=7) rows in: avg 1.0 rows x 2 workers. max 1 rows (seg42) with 0.194 ms to end, start offset by 721 ms. -> seq scan on tgt2 b (cost=0.00..2.02 rows=1 width=7) rows out: avg 1.0 rows x 2 workers. max 1 rows (seg42) with 0.143 ms to first row, 0.145 ms to end, start offset by 721 ms. slice statistics: (slice0) executor memory: 332k bytes. (slice1) executor memory: 581k bytes avg x 64 workers, 4353k bytes max (seg42). work_mem: 74k bytes max. statement statistics: memory used: 128000k bytes total runtime: 725.316 ms (27 rows) bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id>=6 order by a.id; query plan -------------------------------------------------------------------------------------------------------------- gather motion 64:1 (slice1; segments: 64) (cost=7.17..7.18 rows=1 width=14) merge key: ?column5? rows out: (no row requested) 0 rows at destination with 6.536 ms to end, start offset by 1.097 ms. -> sort (cost=7.17..7.18 rows=1 width=14) sort key: a.id rows out: (no row requested) 0 rows (seg0) with 0 ms to end. executor memory: 33k bytes avg, 33k bytes max (seg0). work_mem used: 33k bytes avg, 33k bytes max (seg0). workfile: (0 spilling, 0 reused) -> hash left join (cost=2.04..7.15 rows=1 width=14) hash cond: a.id = b.id rows out: (no row requested) 0 rows (seg0) with 0 ms to end. -> seq scan on tgt1 a (cost=0.00..5.06 rows=1 width=7) filter: id >= 6 rows out: (no row requested) 0 rows (seg0) with 0 ms to end. -> hash (cost=2.02..2.02 rows=1 width=7) rows in: (no row requested) 0 rows (seg0) with 0 ms to end. -> seq scan on tgt2 b (cost=0.00..2.02 rows=1 width=7) rows out: (no row requested) 0 rows (seg0) with 0 ms to end. slice statistics: (slice0) executor memory: 332k bytes. (slice1) executor memory: 225k bytes avg x 64 workers, 225k bytes max (seg0). work_mem: 33k bytes max. statement statistics: memory used: 128000k bytes total runtime: 8.615 ms (24 rows) bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id>=6 order by a.id; query plan -------------------------------------------------------------------------------------------------------------------------------------------------------- gather motion 64:1 (slice1; segments: 64) (cost=7.23..7.24 rows=1 width=14) merge key: ?column5? rows out: 5 rows at destination with 115 ms to end, start offset by 1.195 ms. -> sort (cost=7.23..7.24 rows=1 width=14) sort key: a.id rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 6.979 ms to first row, 6.980 ms to end, start offset by 12 ms. executor memory: 72k bytes avg, 74k bytes max (seg0). work_mem used: 72k bytes avg, 74k bytes max (seg0). workfile: (0 spilling, 0 reused) -> hash left join (cost=2.04..7.17 rows=1 width=14) hash cond: a.id = b.id join filter: a.id >= 6 rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 5.570 ms to first row, 6.157 ms to end, start offset by 12 ms. executor memory: 1k bytes avg, 1k bytes max (seg42). work_mem used: 1k bytes avg, 1k bytes max (seg42). workfile: (0 spilling, 0 reused) (seg42) hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets. -> seq scan on tgt1 a (cost=0.00..5.05 rows=1 width=7) rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 0.050 ms to first row, 0.051 ms to end, start offset by 12 ms. -> hash (cost=2.02..2.02 rows=1 width=7) rows in: avg 1.0 rows x 2 workers. max 1 rows (seg42) with 0.153 ms to end, start offset by 18 ms. -> seq scan on tgt2 b (cost=0.00..2.02 rows=1 width=7) rows out: avg 1.0 rows x 2 workers. max 1 rows (seg42) with 0.133 ms to first row, 0.135 ms to end, start offset by 18 ms. slice statistics: (slice0) executor memory: 332k bytes. (slice1) executor memory: 583k bytes avg x 64 workers, 4353k bytes max (seg42). work_mem: 74k bytes max. statement statistics: memory used: 128000k bytes total runtime: 116.997 ms (27 rows) bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where id=6 order by a.id; error: column reference id is ambiguous line 1: ...* from tgt1 a left join tgt2 b on a.id=b.id where id=6 order... ^ bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id where a.id=6 order by a.id; query plan ----------------------------------------------------------------------------------------------------- gather motion 1:1 (slice1; segments: 1) (cost=7.17..7.18 rows=4 width=14) merge key: ?column5? rows out: (no row requested) 0 rows at destination with 3.212 ms to end, start offset by 339 ms. -> sort (cost=7.17..7.18 rows=1 width=14) sort key: a.id rows out: (no row requested) 0 rows with 0 ms to end. executor memory: 58k bytes. work_mem used: 58k bytes. workfile: (0 spilling, 0 reused) -> hash left join (cost=2.04..7.14 rows=1 width=14) hash cond: a.id = b.id rows out: (no row requested) 0 rows with 0 ms to end. -> seq scan on tgt1 a (cost=0.00..5.06 rows=1 width=7) filter: id = 6 rows out: (no row requested) 0 rows with 0 ms to end. -> hash (cost=2.02..2.02 rows=1 width=7) rows in: (no row requested) 0 rows with 0 ms to end. -> seq scan on tgt2 b (cost=0.00..2.02 rows=1 width=7) filter: id = 6 rows out: (no row requested) 0 rows with 0 ms to end. slice statistics: (slice0) executor memory: 252k bytes. (slice1) executor memory: 251k bytes (seg3). work_mem: 58k bytes max. statement statistics: memory used: 128000k bytes total runtime: 342.067 ms (25 rows) bigdatagp=# explain analyze select * from tgt1 a left join tgt2 b on a.id=b.id and a.id=6 order by a.id; query plan -------------------------------------------------------------------------------------------------------------------------------------------------------- gather motion 64:1 (slice1; segments: 64) (cost=7.23..7.24 rows=1 width=14) merge key: ?column5? rows out: 5 rows at destination with 435 ms to end, start offset by 1.130 ms. -> sort (cost=7.23..7.24 rows=1 width=14) sort key: a.id rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 5.156 ms to first row, 5.158 ms to end, start offset by 7.597 ms. executor memory: 58k bytes avg, 58k bytes max (seg0). work_mem used: 58k bytes avg, 58k bytes max (seg0). workfile: (0 spilling, 0 reused) -> hash left join (cost=2.04..7.17 rows=1 width=14) hash cond: a.id = b.id join filter: a.id = 6 rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 4.155 ms to first row, 4.813 ms to end, start offset by 7.930 ms. executor memory: 1k bytes avg, 1k bytes max (seg42). work_mem used: 1k bytes avg, 1k bytes max (seg42). workfile: (0 spilling, 0 reused) (seg42) hash chain length 1.0 avg, 1 max, using 1 of 262151 buckets. -> seq scan on tgt1 a (cost=0.00..5.05 rows=1 width=7) rows out: avg 1.0 rows x 5 workers. max 1 rows (seg42) with 0.126 ms to first row, 0.127 ms to end, start offset by 7.941 ms. -> hash (cost=2.02..2.02 rows=1 width=7) rows in: avg 1.0 rows x 2 workers. max 1 rows (seg42) with 0.103 ms to end, start offset by 12 ms. -> seq scan on tgt2 b (cost=0.00..2.02 rows=1 width=7) rows out: avg 1.0 rows x 2 workers. max 1 rows (seg42) with 0.074 ms to first row, 0.076 ms to end, start offset by 12 ms. slice statistics: (slice0) executor memory: 332k bytes. (slice1) executor memory: 569k bytes avg x 64 workers, 4337k bytes max (seg42). work_mem: 58k bytes max. statement statistics: memory used: 128000k bytes total runtime: 436.384 ms (27 rows)
因此如果要对a表过滤需要把条件写在where里面，要对b表过滤需要把调教写在b表的子查询里面，至于[on]只是用来控制显示的。
-eof-

Left Join

推荐信息