PostgreSQL利用遞歸優(yōu)化求稀疏列唯一值的方法
在數(shù)據(jù)庫(kù)中經(jīng)常會(huì)碰到一些表的列是稀疏列,只有很少的值,例如性別字段,一般就只有2種不同的值。
但是當(dāng)我們求這些稀疏列的唯一值時(shí),如果表的數(shù)據(jù)量很大,速度還是會(huì)很慢。
例如:
創(chuàng)建測(cè)試表
bill=# create table t_sex (sex char(1), otherinfo text); CREATE TABLE bill=# insert into t_sex select 'm', generate_series(1,10000000)||'this is test'; INSERT 0 10000000 bill=# insert into t_sex select 'w', generate_series(1,10000000)||'this is test'; INSERT 0 10000000
查詢:
可以看到下面的查詢速度很慢。
bill=# select count(distinct sex) from t_sex; count ------- 2 (1 row) Time: 8803.505 ms (00:08.804) bill=# select sex from t_sex t group by sex; sex ----- m w (2 rows) Time: 1026.464 ms (00:01.026)
那么我們對(duì)該字段加上索引又是什么情況呢?
速度依然沒(méi)有明顯
bill=# create index idx_sex_1 on t_sex(sex); CREATE INDEX bill=# select count(distinct sex) from t_sex; count ------- 2 (1 row) Time: 8502.460 ms (00:08.502) bill=# select sex from t_sex t group by sex; sex ----- m w (2 rows) Time: 572.353 ms
的變化,可以看到執(zhí)行計(jì)劃已經(jīng)使用Index Only Scan了。
bill=# explain select count(distinct sex) from t_sex;
QUERY PLAN
----------------------------------------------------------------------------------------------
Aggregate (cost=371996.44..371996.45 rows=1 width=8)
-> Index Only Scan using idx_sex_1 on t_sex (cost=0.44..321996.44 rows=20000000 width=2)
(2 rows)
同樣的SQL我們看看在Oracle中性能如何?
創(chuàng)建測(cè)試表:
SQL> create table t_sex (sex char(1), otherinfo varchar2(100)); Table created. SQL> insert into t_sex select 'm', rownum||'this is test' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> insert into t_sex select 'w', rownum||'this is test' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete.
性能測(cè)試:
SQL> set lines 1000 pages 2000
SQL> set autotrace on
SQL> set timing on
SQL> select count(distinct sex) from t_sex;
COUNT(DISTINCTSEX)
------------------
2
Elapsed: 00:00:01.58
Execution Plan
----------------------------------------------------------
Plan hash value: 3915432945
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 20132 (1)| 00:00:01 |
| 1 | SORT GROUP BY | | 1 | 3 | | |
| 2 | TABLE ACCESS FULL| T_SEX | 14M| 42M| 20132 (1)| 00:00:01 |
----------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
74074 consistent gets
0 physical reads
0 redo size
552 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select sex from t_sex t group by sex;
SE
--
m
w
Elapsed: 00:00:01.08
Execution Plan
----------------------------------------------------------
Plan hash value: 3915432945
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14M| 42M| 20558 (3)| 00:00:01 |
| 1 | SORT GROUP BY | | 14M| 42M| 20558 (3)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T_SEX | 14M| 42M| 20132 (1)| 00:00:01 |
----------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
74074 consistent gets
0 physical reads
0 redo size
589 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
2 rows processed
可以看到Oracle的性能即使不加索引也明顯比PostgreSQL中要好。
那么我們?cè)赑ostgreSQL中是不是沒(méi)辦法繼續(xù)優(yōu)化了呢?這種情況我們利用pg中的遞歸語(yǔ)句結(jié)合索引可以大幅提升性能。
SQL改寫:
bill=# with recursive tmp as ( bill(# ( bill(# select min(t.sex) as sex from t_sex t where t.sex is not null bill(# ) bill(# union all bill(# ( bill(# select (select min(t.sex) from t_sex t where t.sex > s.sex and t.sex is not null) bill(# from tmp s where s.sex is not null bill(# ) bill(# ) bill-# select count(distinct sex) from tmp; count ------- 2 (1 row) Time: 2.711 ms
查看執(zhí)行計(jì)劃:
bill=# explain with recursive tmp as (
bill(# (
bill(# select min(t.sex) as sex from t_sex t where t.sex is not null
bill(# )
bill(# union all
bill(# (
bill(# select (select min(t.sex) from t_sex t where t.sex > s.sex and t.sex is not null)
bill(# from tmp s where s.sex is not null
bill(# )
bill(# )
bill-# select count(distinct sex) from tmp;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=53.62..53.63 rows=1 width=8)
CTE tmp
-> Recursive Union (cost=0.46..51.35 rows=101 width=32)
-> Result (cost=0.46..0.47 rows=1 width=32)
InitPlan 3 (returns $1)
-> Limit (cost=0.44..0.46 rows=1 width=2)
-> Index Only Scan using idx_sex_1 on t_sex t (cost=0.44..371996.44 rows=20000000 width=2)
Index Cond: (sex IS NOT NULL)
-> WorkTable Scan on tmp s (cost=0.00..4.89 rows=10 width=32)
Filter: (sex IS NOT NULL)
-> CTE Scan on tmp (cost=0.00..2.02 rows=101 width=32)
(11 rows)
Time: 1.371 ms
可以看到執(zhí)行時(shí)間從原先的8000ms降低到了2ms,提升了幾千倍!
甚至對(duì)比Oracle,性能也是提升了很多。
但是需要注意的是:這種寫法僅僅是針對(duì)稀疏列,換成數(shù)據(jù)分布廣泛的字段,顯然性能是下降的, 所以使用遞歸SQL不適合數(shù)據(jù)分布廣泛的字段的group by或者count(distinct)操作。
到此這篇關(guān)于PostgreSQL利用遞歸優(yōu)化求稀疏列唯一值的文章就介紹到這了,更多相關(guān)PostgreSQL遞歸優(yōu)化內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Postgresql限制用戶登錄錯(cuò)誤次數(shù)的實(shí)例代碼
這篇文章主要介紹了Postgresql限制用戶登錄錯(cuò)誤次數(shù)的實(shí)例代碼,本文給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2021-03-03
postgreSQL中的row_number() 與distinct用法說(shuō)明
這篇文章主要介紹了postgreSQL中的row_number() 與distinct用法說(shuō)明,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-01-01
如何解決PostgreSQL執(zhí)行語(yǔ)句長(zhǎng)時(shí)間卡著不動(dòng)不報(bào)錯(cuò)也不執(zhí)行的問(wèn)題
某日開(kāi)發(fā)同事上報(bào)一sql性能問(wèn)題,一條查詢好似一直跑不出結(jié)果,查詢了n小時(shí),還未返回結(jié)果,這篇文章主要給大家介紹了關(guān)于如何解決PostgreSQL執(zhí)行語(yǔ)句長(zhǎng)時(shí)間卡著不動(dòng)不報(bào)錯(cuò)也不執(zhí)行問(wèn)題的相關(guān)資料,需要的朋友可以參考下2024-02-02
PostgreSQL時(shí)間線(timeline)和History File的用法
這篇文章主要介紹了PostgreSQL時(shí)間線(timeline)和History File的用法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2020-12-12
基于PostgreSQL/openGauss?的分布式數(shù)據(jù)庫(kù)解決方案
ShardingSphere-Proxy?作為透明數(shù)據(jù)庫(kù)代理,用戶無(wú)需關(guān)心?Proxy?如何協(xié)調(diào)背后的數(shù)據(jù)庫(kù)。今天通過(guò)本文給大家介紹基于PostgreSQL/openGauss?的分布式數(shù)據(jù)庫(kù)解決方案,感興趣的朋友跟隨小編一起看看吧2021-12-12

