PostgreSQL 對(duì)IN,EXISTS,ANY/ALL,JOIN的sql優(yōu)化方案
測(cè)試環(huán)境:
postgres=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit (1 row) postgres=#
數(shù)據(jù)準(zhǔn)備:
$ pgbench -i -s 10
postgres=# \d List of relations Schema | Name | Type | Owner --------+------------------+-------+---------- public | pgbench_accounts | table | postgres public | pgbench_branches | table | postgres public | pgbench_history | table | postgres public | pgbench_tellers | table | postgres (4 rows) postgres=# select * from pgbench_accounts limit 1; aid | bid | abalance | filler -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 0 | (1 row) postgres=# select * from pgbench_branches limit 1; bid | bbalance | filler -----+----------+-------- 1 | 0 | (1 row) postgres=# select * from pgbench_history limit 1; tid | bid | aid | delta | mtime | filler -----+-----+-----+-------+-------+-------- (0 rows) postgres=# select * from pgbench_tellers limit 1; tid | bid | tbalance | filler -----+-----+----------+-------- 1 | 1 | 0 | (1 row) postgres=# select * from pgbench_branches; bid | bbalance | filler -----+----------+-------- 1 | 0 | 2 | 0 | 3 | 0 | 4 | 0 | 5 | 0 | 6 | 0 | 7 | 0 | 8 | 0 | 9 | 0 | 10 | 0 | (10 rows) postgres=# update pgbench_branches set bbalance=4500000 where bid in (4,7); UPDATE 2 postgres=#
IN語(yǔ)句
查詢要求:找出那些余額(balance)大于0的每個(gè)分支(branch)在表在pgbench_accounts中有多少個(gè)賬戶
1.使用IN子句
SELECT count( aid ),bid FROM pgbench_accounts WHERE bid IN ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 ) GROUP BY bid;
2.使用ANY子句
SELECT count( aid ),bid FROM pgbench_accounts WHERE bid = ANY ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 ) GROUP BY bid;
3.使用EXISTS子句
SELECT count( aid ),bid FROM pgbench_accounts WHERE EXISTS ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 AND pgbench_accounts.bid = pgbench_branches.bid ) GROUP BY bid;
4.使用INNER JOIN
SELECT count( aid ),a.bid FROM pgbench_accounts a JOIN pgbench_branches b ON a.bid = b.bid WHERE b.bbalance > 0 GROUP BY a.bid;
在完成這個(gè)查詢要求的時(shí)候,有人可能會(huì)假設(shè)exists和inner join性能可能會(huì)更好,因?yàn)樗麄兛梢允褂脙杀磉B接的邏輯和優(yōu)化。而IN和ANY子句需要使用子查詢。
然而,PostgreSQL(10版本之后)已經(jīng)智能的足以對(duì)上面四種寫法產(chǎn)生相同的執(zhí)行計(jì)劃!
所有上面的寫法都會(huì)產(chǎn)生相同的執(zhí)行計(jì)劃:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize GroupAggregate (cost=23327.73..23330.26 rows=10 width=12) (actual time=97.199..99.014 rows=2 loops=1) Group Key: a.bid -> Gather Merge (cost=23327.73..23330.06 rows=20 width=12) (actual time=97.191..99.006 rows=6 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=22327.70..22327.73 rows=10 width=12) (actual time=93.762..93.766 rows=2 loops=3) Sort Key: a.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=22327.44..22327.54 rows=10 width=12) (actual time=93.723..93.727 rows=2 loops=3) Group Key: a.bid -> Hash Join (cost=1.14..22119.10 rows=41667 width=8) (actual time=24.024..83.263 rows=66667 loops=3) Hash Cond: (a.bid = b.bid) -> Parallel Seq Scan on pgbench_accounts a (cost=0.00..20560.67 rows=416667 width=8) (actual time=0.023..43.151 rows=333333 loops=3) -> Hash (cost=1.12..1.12 rows=1 width=4) (actual time=0.027..0.028 rows=2 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pgbench_branches b (cost=0.00..1.12 rows=1 width=4) (actual time=0.018..0.020 rows=2 loops=3) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.342 ms Execution Time: 99.164 ms (22 rows)
那么,我們是否可以得出這樣的結(jié)論:我們可以隨意地編寫查詢,而PostgreSQL的智能將會(huì)處理其余的問(wèn)題?!
等等!
如果我們考慮排除情況,事情會(huì)變得不同。
排除查詢
查詢要求:找出那些余額(balance)不大于0的每個(gè)分支(branch)在表在pgbench_accounts中有多少個(gè)賬戶
1.使用NOT IN
SELECT count( aid ),bid FROM pgbench_accounts WHERE bid NOT IN ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 ) GROUP BY bid;
執(zhí)行計(jì)劃:
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize GroupAggregate (cost=23645.42..23647.95 rows=10 width=12) (actual time=128.606..130.502 rows=8 loops=1) Group Key: pgbench_accounts.bid -> Gather Merge (cost=23645.42..23647.75 rows=20 width=12) (actual time=128.598..130.490 rows=24 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=22645.39..22645.42 rows=10 width=12) (actual time=124.960..124.963 rows=8 loops=3) Sort Key: pgbench_accounts.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=22645.13..22645.23 rows=10 width=12) (actual time=124.917..124.920 rows=8 loops=3) Group Key: pgbench_accounts.bid -> Parallel Seq Scan on pgbench_accounts (cost=1.13..21603.46 rows=208333 width=8) (actual time=0.078..83.134 rows=266667 loops=3) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 66667 SubPlan 1 -> Seq Scan on pgbench_branches (cost=0.00..1.12 rows=1 width=4) (actual time=0.020..0.021 rows=2 loops=3) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.310 ms Execution Time: 130.620 ms (21 rows) postgres=#
2.使用<>ALL
SELECT count( aid ),bid FROM pgbench_accounts WHERE bid <> ALL ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 ) GROUP BY bid;
執(zhí)行計(jì)劃:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize GroupAggregate (cost=259581.79..259584.32 rows=10 width=12) (actual time=418.220..419.913 rows=8 loops=1) Group Key: pgbench_accounts.bid -> Gather Merge (cost=259581.79..259584.12 rows=20 width=12) (actual time=418.212..419.902 rows=24 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=258581.76..258581.79 rows=10 width=12) (actual time=413.906..413.909 rows=8 loops=3) Sort Key: pgbench_accounts.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=258581.50..258581.60 rows=10 width=12) (actual time=413.872..413.875 rows=8 loops=3) Group Key: pgbench_accounts.bid -> Parallel Seq Scan on pgbench_accounts (cost=0.00..257539.83 rows=208333 width=8) (actual time=0.054..367.244 rows=266667 loops=3) Filter: (SubPlan 1) Rows Removed by Filter: 66667 SubPlan 1 -> Materialize (cost=0.00..1.13 rows=1 width=4) (actual time=0.000..0.001 rows=2 loops=1000000) -> Seq Scan on pgbench_branches (cost=0.00..1.12 rows=1 width=4) (actual time=0.001..0.001 rows=2 loops=337880) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.218 ms Execution Time: 420.035 ms (22 rows) postgres=#
3.使用NOT EXISTS
SELECT count( aid ),bid FROM pgbench_accounts WHERE NOT EXISTS ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 AND pgbench_accounts.bid = pgbench_branches.bid ) GROUP BY bid;
執(zhí)行計(jì)劃:
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize GroupAggregate (cost=28327.72..28330.25 rows=10 width=12) (actual time=152.024..153.931 rows=8 loops=1) Group Key: pgbench_accounts.bid -> Gather Merge (cost=28327.72..28330.05 rows=20 width=12) (actual time=152.014..153.917 rows=24 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=27327.70..27327.72 rows=10 width=12) (actual time=147.782..147.786 rows=8 loops=3) Sort Key: pgbench_accounts.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=27327.43..27327.53 rows=10 width=12) (actual time=147.732..147.737 rows=8 loops=3) Group Key: pgbench_accounts.bid -> Hash Anti Join (cost=1.14..25452.43 rows=375000 width=8) (actual time=0.134..101.884 rows=266667 loops=3) Hash Cond: (pgbench_accounts.bid = pgbench_branches.bid) -> Parallel Seq Scan on pgbench_accounts (cost=0.00..20560.67 rows=416667 width=8) (actual time=0.032..45.174 rows=333333 loops=3) -> Hash (cost=1.12..1.12 rows=1 width=4) (actual time=0.036..0.037 rows=2 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pgbench_branches (cost=0.00..1.12 rows=1 width=4) (actual time=0.025..0.027 rows=2 loops=3) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.322 ms Execution Time: 154.040 ms (22 rows) postgres=#
4.使用LEFT JOIN和IS NULL
SELECT count( aid ),a.bid FROM pgbench_accounts a LEFT JOIN pgbench_branches b ON a.bid = b.bid AND b.bbalance > 0 WHERE b.bid IS NULL GROUP BY a.bid;
執(zhí)行計(jì)劃:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Finalize GroupAggregate (cost=28327.72..28330.25 rows=10 width=12) (actual time=145.298..147.096 rows=8 loops=1) Group Key: a.bid -> Gather Merge (cost=28327.72..28330.05 rows=20 width=12) (actual time=145.288..147.083 rows=24 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=27327.70..27327.72 rows=10 width=12) (actual time=141.883..141.887 rows=8 loops=3) Sort Key: a.bid Sort Method: quicksort Memory: 25kB Worker 0: Sort Method: quicksort Memory: 25kB Worker 1: Sort Method: quicksort Memory: 25kB -> Partial HashAggregate (cost=27327.43..27327.53 rows=10 width=12) (actual time=141.842..141.847 rows=8 loops=3) Group Key: a.bid -> Hash Anti Join (cost=1.14..25452.43 rows=375000 width=8) (actual time=0.087..99.535 rows=266667 loops=3) Hash Cond: (a.bid = b.bid) -> Parallel Seq Scan on pgbench_accounts a (cost=0.00..20560.67 rows=416667 width=8) (actual time=0.025..44.337 rows=333333 loops=3) -> Hash (cost=1.12..1.12 rows=1 width=4) (actual time=0.026..0.027 rows=2 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pgbench_branches b (cost=0.00..1.12 rows=1 width=4) (actual time=0.019..0.020 rows=2 loops=3) Filter: (bbalance > 0) Rows Removed by Filter: 8 Planning Time: 0.231 ms Execution Time: 147.180 ms (22 rows) postgres=#
NOT IN 和 <> ALL生成執(zhí)行計(jì)劃都包含了一個(gè)子查詢。他們是各自獨(dú)立的。
而NOT EXISTS和LEFT JOIN生成了相同的執(zhí)行計(jì)劃。
這些hash連接(或hash anti join)是完成查詢要求的最靈活的方式。這也是推薦exists或join的原因。因此,推薦使用exists或join的經(jīng)驗(yàn)法則是有效的。
但是,我們繼續(xù)往下看! 即使有了子查詢執(zhí)行計(jì)劃,NOT IN子句的執(zhí)行時(shí)間也會(huì)更好?
是的。PostgreSQL做了出色的優(yōu)化,PostgreSQL將子查詢計(jì)劃進(jìn)行了hash處理。因此PostgreSQL對(duì)如何處理IN子句有了更好的理解,這是一種邏輯思維方式,因?yàn)楹芏嗳藘A向于使用IN子句。子查詢返回的行很少,但即使子查詢返回幾百行,也會(huì)發(fā)生同樣的情況。
但是,如果子查詢返回大量行(幾十萬(wàn)行)怎么辦?讓我們嘗試一個(gè)簡(jiǎn)單的測(cè)試:
CREATE TABLE t1 AS SELECT * FROM generate_series(0, 500000) id; CREATE TABLE t2 AS SELECT (random() * 4000000)::integer id FROM generate_series(0, 4000000); ANALYZE t1; ANALYZE t2; EXPLAIN SELECT id FROM t1 WHERE id NOT IN (SELECT id FROM t2);
執(zhí)行計(jì)劃:
QUERY PLAN -------------------------------------------------------------------------------- Gather (cost=1000.00..15195064853.01 rows=250000 width=4) Workers Planned: 1 -> Parallel Seq Scan on t1 (cost=0.00..15195038853.01 rows=147059 width=4) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Materialize (cost=0.00..93326.01 rows=4000001 width=4) -> Seq Scan on t2 (cost=0.00..57700.01 rows=4000001 width=4) (7 rows) postgres=#
這里,執(zhí)行計(jì)劃將子查詢進(jìn)行了物化。代價(jià)評(píng)估變成了15195038853.01。(PostgreSQL的默認(rèn)設(shè)置,如果t2表的行低于100k,會(huì)將子查詢進(jìn)行hash)。這樣就會(huì)嚴(yán)重影響性能。因此,對(duì)于那種子查詢返回的行數(shù)很少的場(chǎng)景,IN子句可以起到很好的作用。
其它注意點(diǎn)
有的!在我們用不同的方式寫查詢的時(shí)候,可能有數(shù)據(jù)類型的轉(zhuǎn)換。
比如,語(yǔ)句:
EXPLAIN ANALYZE SELECT * FROM emp WHERE gen = ANY(ARRAY['M', 'F']);
就會(huì)發(fā)生隱式的類型轉(zhuǎn)換:
Seq Scan on emp (cost=0.00..1.04 rows=2 width=43) (actual time=0.023..0.026 rows=3 loops=1) Filter: ((gen)::text = ANY ('{M,F}'::text[]))
這里的(gen)::text就發(fā)生了類型轉(zhuǎn)換。如果在大表上,這種類型轉(zhuǎn)換的代價(jià)會(huì)很高,因此,PostgreSQL對(duì)IN子句做了更好的處理。
EXPLAIN ANALYZE SELECT * FROM emp WHERE gen IN ('M','F'); Seq Scan on emp (cost=0.00..1.04 rows=3 width=43) (actual time=0.030..0.034 rows=3 loops=1) Filter: (gen = ANY ('{M,F}'::bpchar[]))
將IN子句轉(zhuǎn)換成了ANY子句,沒(méi)有對(duì)gen列進(jìn)行類型轉(zhuǎn)換。而是將M\F轉(zhuǎn)成了bpchar(內(nèi)部等價(jià)于char)
總結(jié)
簡(jiǎn)單來(lái)說(shuō),exists和直接join表通常比較好。
很多情況下,PostgreSQL將IN子句換成被hash的子計(jì)劃。在一些特殊場(chǎng)景下,IN可以獲得更好的執(zhí)行計(jì)劃。
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教。
相關(guān)文章
PostgreSQL創(chuàng)建新用戶所遇見的權(quán)限問(wèn)題以及解決辦法
這篇文章主要給大家介紹了關(guān)于PostgreSQL創(chuàng)建新用戶所遇見的權(quán)限問(wèn)題以及解決辦法, 在PostgreSQL中創(chuàng)建一個(gè)新用戶非常簡(jiǎn)單,但可能會(huì)遇到權(quán)限問(wèn)題,需要的朋友可以參考下2023-09-09PostgreSQL對(duì)GROUP BY子句使用常量的特殊限制詳解
這篇文章主要介紹了PostgreSQL對(duì)GROUP BY子句使用常量的特殊限制詳解,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-02-02基于PostgreSql 別名區(qū)分大小寫的問(wèn)題
這篇文章主要介紹了基于PostgreSql 別名區(qū)分大小寫的問(wèn)題,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-01-01Visual Studio Code(VS Code)查詢PostgreSQL拓展安裝教程圖解
這篇文章主要介紹了Visual Studio Code(VS Code)查詢PostgreSQL拓展安裝教程,本文通過(guò)圖文并茂的形式給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2021-01-01postgreSQL自動(dòng)生成隨機(jī)數(shù)值的實(shí)例
這篇文章主要介紹了postgreSQL自動(dòng)生成隨機(jī)數(shù)值的實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-01-01pgsql之create user與create role的區(qū)別介紹
這篇文章主要介紹了pgsql之create user與create role的區(qū)別介紹,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-01-01Ubuntu中卸載Postgresql出錯(cuò)的解決方法
這篇文章主要給大家介紹了關(guān)于在Ubuntu中卸載Postgresql出錯(cuò)的解決方法,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧。2017-09-09Postgresql 查看SQL語(yǔ)句執(zhí)行效率的操作
這篇文章主要介紹了Postgresql 查看SQL語(yǔ)句執(zhí)行效率的操作,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-02-02PostgreSQL進(jìn)行重置密碼的方法小結(jié)
今天想測(cè)試一個(gè)PostgresSQL語(yǔ)法的 SQL,但是打開PostgresSQL之后沉默了,密碼是什么?日長(zhǎng)月久的,漸漸就忘記了,于是開始了尋找密碼的道路,所以本文介紹了Postgresql忘記密碼,如何重置密碼,需要的朋友可以參考下2024-05-05對(duì)PostgreSQL中的慢查詢進(jìn)行分析和優(yōu)化的操作指南
在數(shù)據(jù)庫(kù)的世界里,慢查詢就像是路上的絆腳石,讓數(shù)據(jù)處理的道路變得崎嶇不平,想象一下,你正在高速公路上飛馳,突然遇到一堆減速帶,那感覺肯定糟透了,本文介紹了怎樣對(duì)?PostgreSQL?中的慢查詢進(jìn)行分析和優(yōu)化,需要的朋友可以參考下2024-07-07