PostgreSQL limit的神奇作用詳解

更新時間：2022年09月16日 11:12:35 作者：foucus、

這篇文章主要介紹了PostgreSQL limit的神奇作用，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習吧

最近碰到這樣一個SQL引發(fā)的性能問題，SQL內容大致如下：

SELECT *
FROM t1
WHERE id = 999
AND (case $1
    WHEN 'true' THEN
    info = $2
    ELSE info = $3 end) limit 1;

開發(fā)反應這條SQL加上limit 1之后過了一段時間從原先的索引掃描變成了全表掃描，一個簡單的limit 1為何會產生這樣的影響，我只取一條數(shù)據(jù)不是應該更快了嗎？

下面我們就從這條SQL開始說起。

首先我們先看下這個表結構，比較簡單，info列上有個索引，如下所示：

bill=# \d t1
                            Table "public.t1"
  Column  |            Type             | Collation | Nullable | Default
----------+-----------------------------+-----------+----------+---------
 id       | integer                     |           |          |
 info     | text                        |           |          |
 crt_time | timestamp without time zone |           |          |
Indexes:
    "idx_t1" btree (info)

并且info列是沒有重復值的，這意味著無論where條件中傳入什么變量都肯定是能走索引掃描的。那為什么加上limit 1后會變成全表掃描呢？

我們先看看這條SQL之前正常的走索引的執(zhí)行計劃：

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.027..0.027 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.025..0.026 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.158 ms
 Execution Time: 0.057 ms
(6 rows)

而現(xiàn)在的執(zhí)行計劃卻是這樣的：

 Limit  (cost=0.00..0.35 rows=1 width=45) (actual time=487.564..487.564 rows=0 loops=1)
   ->  Seq Scan on t1  (cost=0.00..170895.98 rows=491791 width=45) (actual time=487.562..487.562 rows=0 loops=1)
         Filter: ((id = 999) AND CASE $1 WHEN 'true'::text THEN (info = $2) ELSE (info = $3) END)
         Rows Removed by Filter: 6000000
 Planning Time: 0.119 ms
 Execution Time: 487.595 ms
(6 rows)

奇怪的是下面的全表掃描加上limit后cost反而更低，但實際時間竟然長了這么多。而當我們將日志中獲取的綁定變量值帶入SQL中再去查看執(zhí)行計劃時，仍然是走索引掃描。既然如此，那比較容易想到的就是plan cache導致的執(zhí)行計劃錯誤了。

由于在PostgreSQL中執(zhí)行計劃緩存只是會話級別的，PostgreSQL在生成執(zhí)行計劃緩存前，會先走5次custom plan，然后記錄這5次總的custom plan的cost, 以及custom plan的次數(shù)，最后生成通用的generic plan。

以后，每次bind時，會根據(jù)緩存的執(zhí)行計劃以及給定的參數(shù)值計算一個COST，如果這個COST 小于前面存儲的custom plan cost的平均值，則使用當前緩存的執(zhí)行計劃。如果這個COST大于前面存儲的custom plan cost的平均值，則使用custom plan(即重新生成執(zhí)行計劃)，同時custom plan的次數(shù)加1，custom plan總成本也會累加進去。

既然如此，我們使用prepare語句再測試一次：

bill=# prepare p1 as select * from t1 where id = 999
bill-# and (case $1 when 'true' then info = $2 else info = $3 end)  limit 1;
PREPARE
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.831..0.831 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.830..0.830 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.971 ms
 Execution Time: 0.889 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.038..0.039 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.037..0.037 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.240 ms
 Execution Time: 0.088 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.036..0.036 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.035..0.035 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.136 ms
 Execution Time: 0.076 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.051..0.051 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.049..0.050 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.165 ms
 Execution Time: 0.091 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.18 rows=1 width=45) (actual time=0.027..0.027 rows=0 loops=1)
   ->  Index Scan using idx_t1 on t1  (cost=0.56..3.18 rows=1 width=45) (actual time=0.025..0.026 rows=0 loops=1)
         Index Cond: (info = 'bill'::text)
         Filter: (id = 999)
 Planning Time: 0.158 ms
 Execution Time: 0.057 ms
(6 rows)
bill=# explain analyze execute p1('true','bill','postgres');
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.35 rows=1 width=45) (actual time=487.564..487.564 rows=0 loops=1)
   ->  Seq Scan on t1  (cost=0.00..170895.98 rows=491791 width=45) (actual time=487.562..487.562 rows=0 loops=1)
         Filter: ((id = 999) AND CASE $1 WHEN 'true'::text THEN (info = $2) ELSE (info = $3) END)
         Rows Removed by Filter: 6000000
 Planning Time: 0.119 ms
 Execution Time: 487.595 ms
(6 rows)

果然在第6次時出現(xiàn)了我們想要的結果！

可以看到前5次索引掃描的cost都是3.18，而全表掃描的cost卻是0.35，所以自然優(yōu)化器選擇了全表掃描，可為什么cost變低了反而時間更久了呢？解答這個問題前我們先要來了解下limit子句的cost是如何計算的。

limit cost計算方法：

先從一個最簡單的例子看起：

我們只取1條記錄，cost很低，時間也很少。

bill=# explain analyze select * from t1 limit 1;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.02 rows=1 width=45) (actual time=0.105..0.106 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..110921.49 rows=5997449 width=45) (actual time=0.103..0.103 rows=1 loops=1)
 Planning Time: 0.117 ms
 Execution Time: 0.133 ms
(4 rows)

加上where條件試試呢？

cost一下子變成3703.39了，似乎也很好理解，因為我們在進行l(wèi)imit前要使用where條件進行一次數(shù)據(jù)過濾，所以cost變得很高了。

bill=# explain analyze select * from t1 where id = 1000 limit 1;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3703.39 rows=1 width=45) (actual time=0.482..0.483 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..125915.11 rows=34 width=45) (actual time=0.480..0.481 rows=1 loops=1)
         Filter: (id = 1000)
         Rows Removed by Filter: 1008
 Planning Time: 0.117 ms
 Execution Time: 0.523 ms
(6 rows)

但當我們換個條件時結果又不同了：

從where id=1000變成 id=999，cost竟然一下子又降低到0.13了，似乎找到了前面全表掃描的limit cost比索引掃描還低的原因了。

bill=# explain analyze select * from t1 where id = 999 limit 1;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.13 rows=1 width=45) (actual time=0.041..0.042 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..125915.11 rows=983582 width=45) (actual time=0.040..0.040 rows=1 loops=1)
         Filter: (id = 999)
         Rows Removed by Filter: 107
 Planning Time: 0.114 ms
 Execution Time: 0.079 ms
(6 rows)

那么這個limit的cost究竟是如何計算的呢，為什么條件不同cost能差這么多呢？

下面給出limit cost計算方法：

limit_cost = ( N / B ) * A

N：表示limit取的數(shù)據(jù)，如limit 1則N=1；

B：表示估算得到的總記錄數(shù)；

A：表示估算的總成本。

例如上面cost=0.13的執(zhí)行計劃中，N = 1，B = 983582，A = 125915.11，那么limit cost便是：

(1/983582)*125915.11 = 0.128，即執(zhí)行計劃中顯示的0.13。

簡而言之就是如果通過where條件篩選得到的行數(shù)越多，那么limit cost就會越低。

知道了這些我們再回過頭去看那條SQL就清楚了，因為where id = 999這個條件的數(shù)據(jù)比較多，這也就導致了即使是全表掃描limit cost也很低，甚至比索引掃描還低。

SELECT *
FROM t1
WHERE id = 999
AND (case $1
    WHEN 'true' THEN
    info = $2
    ELSE info = $3 end) limit 1;

但是需要注意的是，我們即使使用explain analyze看到的執(zhí)行計劃中的cost也是一個估算值，并不是實際值，盡管這個和實際值差距不會很大，但如果cost本身就很小，那么還是會帶來一點誤解的。

例如前面的SQL我想要提高全表掃描的limit cost讓其大于索引掃描，這樣優(yōu)化器便會一直選擇索引掃描了，于是我將limit 1改成limit 100（即增加N的值），但是卻仍然沒有起作用：

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..5.58 rows=1 width=53) (actual time=0.049..0.051 rows=1 loops=1)
-> Index Scan using idx_scm_bind_scm_customer_id_index on scm_bind t (cost=0.56..5.58 rows=1 width=53) (actual time=0.049..0.050 rows=1 loops=1)
Index Cond: ((scm_customer_id)::text = 'wmGAgeDQAAXcpcw9QWkDOUQsIDI1xOqQ'::text)
Filter: ((bind_status)::text = '2'::text)
Planning Time: 0.160 ms
Execution Time: 0.072 ms
(6 rows)
Time: 0.470 ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..8.90 rows=100 width=53) (actual time=1047.859..16654.360 rows=1 loops=1)
-> Seq Scan on scm_bind t (cost=0.00..552392.00 rows=6208050 width=53) (actual time=1047.858..16654.357 rows=1 loops=1)
Filter: (((bind_status)::text = '2'::text) AND CASE $1 WHEN 'client'::text THEN ((scm_customer_id)::text = ($2)::text) ELSE ((scm_customer_id)::text = ($3)::text) END)
Rows Removed by Filter: 12169268
Planning Time: 0.147 ms
Execution Time: 16654.459 ms
(6 rows)
Time: 16654.924 ms (00:16.655)

下面的全表掃描是第6次傳入參數(shù)得到的，可以看到全表掃描的cost是8.9，而索引掃描是5.58，那應該不會選擇cost更高的8.9啊？

而當我們去跟蹤實際的cost就可以發(fā)現(xiàn)：

$1 = {magic = 195726186, raw_parse_tree = 0x15df470,
query_string = 0x16d65b8 "PREPARE p1(varchar,varchar,varchar) as\n select\n t.scm_sale_customer_id,\n t.scm_customer_id\n from\n scm_bind t\n where t.bind_status = '2'\n and (case $1 when 'client' then scm_customer_id ="..., commandTag = 0x95b5ba "SELECT", param_types = 0x16d66c8, num_params = 3, parserSetup = 0x0, parserSetupArg = 0x0, cursor_options = 256, fixed_result = true,
resultDesc = 0x16d66e8, context = 0x15df250, query_list = 0x16dbe80, relationOids = 0x16e6138, invalItems = 0x0, search_path = 0x16e6168, query_context = 0x16dbd70, rewriteRoleId = 10,
rewriteRowSecurity = true, dependsOnRLS = false, gplan = 0x16ff668, is_oneshot = false, is_complete = true, is_saved = true, is_valid = true, generation = 6, next_saved = 0x0,
generic_cost = 8.8979953447539888, total_custom_cost = 52.899999999999999, num_custom_plans = 5}

實際索引掃描的cost大約數(shù)10.58，和執(zhí)行計劃中顯示的還是有一定差距的。

讓我們言歸正傳，既然知道了為什么全表掃描的limit cost更低，我們再來解決下一個問題：為什么cost很低但實際執(zhí)行時間卻這么長？

讓我們再看看執(zhí)行計劃：

Limit  (cost=0.00..0.35 rows=1 width=45) (actual time=487.564..487.564 rows=0 loops=1)
   ->  Seq Scan on t1  (cost=0.00..170895.98 rows=491791 width=45) (actual time=487.562..487.562 rows=0 loops=1)
         Filter: ((id = 999) AND CASE $1 WHEN 'true'::text THEN (info = $2) ELSE (info = $3) END)
         Rows Removed by Filter: 6000000
 Planning Time: 0.119 ms
 Execution Time: 487.595 ms
(6 rows)

仔細觀察可以發(fā)現(xiàn)，原先應該作為索引的info列的過濾條件，竟然整個作為了filter條件去進行數(shù)據(jù)過濾了。

那么最后的問題就出現(xiàn)在這個where條件中的case when表達式了，因為在case when表達式進行過濾前，綁定變量還沒有傳入實際的值，而優(yōu)化器對于不確定的值自然無法選擇是否去走索引了，這里不得不吐槽一下這種寫法。。。

因此對于優(yōu)化器計算limit cost時，只知道where id = 999會得到大量的數(shù)據(jù)，而無法判斷后面的case when里面會得到多少數(shù)據(jù)，因此雖然后面的條件只會得到很少一部分數(shù)據(jù)，但是優(yōu)化器生成limit cost時估算得到的總記錄數(shù)B只是根據(jù)id = 999去判斷，導致估算的cost很低，但實際卻只得到很少的數(shù)據(jù)，要去表中過濾大量數(shù)據(jù)。

不得不感嘆這個“簡單”的SQL竟然包含著這么多知識。

到此這篇關于PostgreSQL limit的神奇作用詳解的文章就介紹到這了,更多相關PostgreSQL limit內容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

Postgresql排序與limit組合場景性能極限優(yōu)化詳解

PostgreSQL自定義函數(shù)并且調用方式
這篇文章主要介紹了PostgreSQL如何自定義函數(shù)并且調用，本文通過示例代碼給大家介紹的非常詳細，對大家的學習或工作具有一定的參考借鑒價值,需要的朋友可以參考下
2023-06-06
pgsql 如何手動觸發(fā)歸檔
這篇文章主要介紹了pgsql手動觸發(fā)歸檔的操作，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2021-01-01
PostgreSQL之分區(qū)表(partitioning)
通過合理的設計，可以將選擇一定的規(guī)則，將大表切分多個不重不漏的子表，這就是傳說中的partitioning。比如，我們可以按時間切分，每天一張子表，比如我們可以按照某其他字段分割，總之了就是化整為零，提高查詢的效能
2016-11-11
PostgreSQL?數(shù)組類型操作使用及特點詳解
這篇文章主要為大家介紹了PostgreSQL?數(shù)組類型操作使用及特點詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進步，早日升職加薪
2022-10-10
postgresql的jsonb數(shù)據(jù)查詢和修改的方法
這篇文章主要介紹了postgresql的jsonb數(shù)據(jù)查詢和修改的方法,本文給大家介紹的非常詳細，對大家的學習或工作具有一定的參考借鑒價值，需要的朋友可以參考下
2021-03-03
Postgresql數(shù)據(jù)庫中的json類型字段使用示例詳解
JSON的主要用于在服務器與web應用之間傳輸數(shù)據(jù)，這篇文章主要介紹了Postgresql數(shù)據(jù)庫中的json類型字段使用,本文結合實例代碼給大家介紹的非常詳細，對大家的學習或工作具有一定的參考借鑒價值，需要的朋友可以參考下
2023-02-02
PostgreSQL實時查看數(shù)據(jù)庫實例正在執(zhí)行的SQL語句實例詳解
在任何數(shù)據(jù)庫中,分析和優(yōu)化SQL的執(zhí)行,最重要的工作就是執(zhí)行計劃的解讀,而說到執(zhí)行計劃得先了解postgresql的查詢執(zhí)行過程,下面這篇文章主要給大家介紹了關于PostgreSQL實時查看數(shù)據(jù)庫實例正在執(zhí)行的SQL語句的相關資料,需要的朋友可以參考下
2023-01-01
Postgresql 動態(tài)統(tǒng)計某一列的某一值出現(xiàn)的次數(shù)實例
這篇文章主要介紹了Postgresql 動態(tài)統(tǒng)計某一列的某一值出現(xiàn)的次數(shù)實例，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2021-01-01
PostgreSQL長事務與失效的索引查詢淺析介紹
pg中的長事務會影響表中垃圾回收，導致表的年齡增長無法freeze。能消耗事務的只有當執(zhí)行了一些DML或者DDL操作后才能算是我們通常說的長事務。否則只能算是我們常說的長連接，當然長連接也有很多弊端，例如占用內存、cpu等資源
2022-09-09
CentOS PostgreSQL 12 主從復制(主從切換)操作
這篇文章主要介紹了CentOS PostgreSQL 12 主從復制(主從切換)操作，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2021-01-01

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

PostgreSQL limit的神奇作用詳解

相關文章

最新評論

大家感興趣的內容

最近更新的內容

常用在線小工具