PostgreSQL運(yùn)維案例之遞歸查詢(xún)死循環(huán)解決方案

更新時(shí)間：2024年02月03日 08:52:33 作者：Nickxyoung

PostgreSQL提供的遞歸語(yǔ)法是很棒的,例如可用來(lái)解決樹(shù)形查詢(xún)的問(wèn)題,解決Oracle用戶(hù)connect by的語(yǔ)法兼容性,下面這篇文章主要給大家介紹了關(guān)于PostgreSQL運(yùn)維案例之遞歸查詢(xún)死循環(huán)解決方案的相關(guān)資料,需要的朋友可以參考下

一、問(wèn)題背景

某日，開(kāi)發(fā)同事上報(bào)一sql性能問(wèn)題，一條查詢(xún)好似一直跑不出結(jié)果，查詢(xún)了n小時(shí)，還未返回結(jié)果。比較詭異的是同樣的sql，相同的數(shù)據(jù)量，相同的表大小，且在服務(wù)器硬件配置相同的情況下，在另外一套環(huán)境查詢(xún)非?？?，毫秒級(jí)。

第一時(shí)間排查了異常環(huán)境的查詢(xún)進(jìn)程stack，并抓取了一分鐘的strace。從結(jié)果得知進(jìn)程是正常執(zhí)行的，那么看起來(lái)就是查詢(xún)慢的問(wèn)題了。

最終發(fā)現(xiàn)是遞歸查詢(xún)出現(xiàn)了死循環(huán)，以下內(nèi)容均是在個(gè)人電腦進(jìn)行的模擬復(fù)現(xiàn)

sql語(yǔ)句如下：

with s as (select * from emp_info where empno='200' and emp_type>'5' and emp_status='Y')
  
  select 
  s.empno as "staffNo",
  s.emp_type as "empType",
  s.emp_tel_info as "empNum",
  a.cust_name as "Name",
  a.cust_position as "Postion",
  a.cust_addr as "Addr",
  a.cust_tel_info as "Mobile",
  (
    with  recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status 
	from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
	f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)
	
	select r.region_code as "FirstRegCode" 
	from r where r.region_type='5'
	and r.region_status='Y'
  ),
  (
    with  recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status 
	from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
	f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)
	
	select r.region_code as "SecondRegCode" 
	from r where r.region_type='4'
	and r.region_status='Y'
  ),
  (
    with  recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status 
	from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
	f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)
	
	select r.region_code as "ThirdRegCode" 
	from r where r.region_type='3'
	and r.region_status='Y'
	
  ),
  (
    with  recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status 
	from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
	f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)
	
	select r.region_code as "FurthRegCode" 
	from r where r.region_type='2'
	and r.region_status='Y'
	
  )
  
  from s left join cust_info a on s.empno=a.cust_id;

二、問(wèn)題分析

對(duì)比了兩個(gè)壞境的執(zhí)行計(jì)劃，代價(jià)預(yù)估及掃描算子、連接算子看起來(lái)都是一樣的。

執(zhí)行計(jì)劃如下：

                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=8.58..1944.99 rows=1 width=866)
   CTE s
     ->  Index Scan using emp_info_pkey on emp_info  (cost=0.28..8.30 rows=1 width=57)
           Index Cond: ((empno)::text = '200'::text)
           Filter: (((emp_type)::text > '5'::text) AND ((emp_status)::text = 'Y'::text))
   ->  CTE Scan on s  (cost=0.00..0.02 rows=1 width=256)
   ->  Index Scan using cust_info_pkey on cust_info a  (cost=0.28..8.29 rows=1 width=200)
         Index Cond: ((s.empno)::text = (cust_id)::text)
   SubPlan 3
     ->  CTE Scan on r r_1  (cost=479.57..482.09 rows=1 width=118)
           Filter: (((region_type)::text = '5'::text) AND ((region_status)::text = 'Y'::text))
           CTE r
             ->  Recursive Union  (cost=0.28..479.57 rows=101 width=19)
                   ->  Index Scan using region_tbl_pkey on region_tbl f  (cost=0.28..8.29 rows=1 width=19)
                         Index Cond: ((region_code)::text = (s.region_code)::text)
                   ->  Hash Join  (cost=0.33..46.93 rows=10 width=19)
                         Hash Cond: ((f_1.region_code)::text = (r.parent_region_code)::text)
                         ->  Seq Scan on region_tbl f_1  (cost=0.00..39.00 rows=2000 width=19)
                         ->  Hash  (cost=0.20..0.20 rows=10 width=118)
                               ->  WorkTable Scan on r  (cost=0.00..0.20 rows=10 width=118)
   SubPlan 5
     ->  CTE Scan on r r_3  (cost=479.57..482.09 rows=1 width=118)
           Filter: (((region_type)::text = '4'::text) AND ((region_status)::text = 'Y'::text))
           CTE r
             ->  Recursive Union  (cost=0.28..479.57 rows=101 width=19)
                   ->  Index Scan using region_tbl_pkey on region_tbl f_2  (cost=0.28..8.29 rows=1 width=19)
                         Index Cond: ((region_code)::text = (s.region_code)::text)
                   ->  Hash Join  (cost=0.33..46.93 rows=10 width=19)
                         Hash Cond: ((f_3.region_code)::text = (r_2.parent_region_code)::text)
                         ->  Seq Scan on region_tbl f_3  (cost=0.00..39.00 rows=2000 width=19)
                         ->  Hash  (cost=0.20..0.20 rows=10 width=118)
                               ->  WorkTable Scan on r r_2  (cost=0.00..0.20 rows=10 width=118)
   SubPlan 7
     ->  CTE Scan on r r_5  (cost=479.57..482.09 rows=1 width=118)
           Filter: (((region_type)::text = '3'::text) AND ((region_status)::text = 'Y'::text))
           CTE r
             ->  Recursive Union  (cost=0.28..479.57 rows=101 width=19)
                   ->  Index Scan using region_tbl_pkey on region_tbl f_4  (cost=0.28..8.29 rows=1 width=19)
                         Index Cond: ((region_code)::text = (s.region_code)::text)
                   ->  Hash Join  (cost=0.33..46.93 rows=10 width=19)
                         Hash Cond: ((f_5.region_code)::text = (r_4.parent_region_code)::text)
                         ->  Seq Scan on region_tbl f_5  (cost=0.00..39.00 rows=2000 width=19)
                         ->  Hash  (cost=0.20..0.20 rows=10 width=118)
                               ->  WorkTable Scan on r r_4  (cost=0.00..0.20 rows=10 width=118)
   SubPlan 9
     ->  CTE Scan on r r_7  (cost=479.57..482.09 rows=1 width=118)
           Filter: (((region_type)::text = '2'::text) AND ((region_status)::text = 'Y'::text))
           CTE r
             ->  Recursive Union  (cost=0.28..479.57 rows=101 width=19)
                   ->  Index Scan using region_tbl_pkey on region_tbl f_6  (cost=0.28..8.29 rows=1 width=19)
                         Index Cond: ((region_code)::text = (s.region_code)::text)
                   ->  Hash Join  (cost=0.33..46.93 rows=10 width=19)
                         Hash Cond: ((f_7.region_code)::text = (r_6.parent_region_code)::text)
                         ->  Seq Scan on region_tbl f_7  (cost=0.00..39.00 rows=2000 width=19)
                         ->  Hash  (cost=0.20..0.20 rows=10 width=118)
                               ->  WorkTable Scan on r r_6  (cost=0.00..0.20 rows=10 width=118)
(56 rows)

postgres=#

從執(zhí)行計(jì)劃來(lái)看，代價(jià)預(yù)估中沒(méi)有發(fā)現(xiàn)非常耗時(shí)的步驟。對(duì)正常的環(huán)境中explain analyze查看實(shí)際消耗，實(shí)際執(zhí)行300ms，最終返回了一條數(shù)據(jù)，和代價(jià)預(yù)估基本一致。逐步排查，最終將重心放在了遞歸查詢(xún)這部分。

遞歸部分sql：

with recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status
from region_tbl f where f.region_code=s.region_code union all select f.ctid, f.region_code,f.parent_region_code,
f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)

分析sql邏輯，遞歸條件為f.region_code=r.parent_region_code，并且遞歸開(kāi)始的f.region_code字段值為s.region_code=‘1200’，這里的1200是通過(guò)對(duì)s表進(jìn)行查詢(xún)得到的，如下：

postgres=# select * from emp_info where empno='200' and emp_type>'5' and emp_status='Y';
 region_code | emp_type | emp_tel_info |             emp_name             | emp_status | empno
-------------+----------+--------------+----------------------------------+------------+-------
 1200        | 6        | 85192900896  | d7bcf68fc9d88d8b3f5ed6fa2713abcf | Y          | 200
(1 row)

改寫(xiě)遞歸部分的sql，查看實(shí)際執(zhí)行，并打印了元組的ctid，如下是limit 10的結(jié)果：

postgres=# with  recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status from region_tbl f where f.region_code='1200' union all select f.ctid, f.region_code,f.parent_region_code,f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)select * from r limit 10;
  ctid   | region_code | parent_region_code | region_type | region_status
---------+-------------+--------------------+-------------+---------------
 (18,75) | 1200        | 1020               | 5           | Y
 (18,76) | 1020        | 1002               | 4           | Y
 (9,108) | 1002        | 120                | 3           | Y
 (18,79) | 120         | 12                 | 2           | N
 (18,81) | 12          | 1                  | 1           | N
 (0,110) | 1           | 3                  | 3           | N
 (0,108) | 3           | 4                  | 6           | N
 (0,109) | 4           | 3                  | 3           | N
 (0,108) | 3           | 4                  | 6           | N
 (0,109) | 4           | 3                  | 3           | N
(10 rows)

再看limit 15的結(jié)果：

postgres=# with  recursive r as (select f.ctid,f.region_code,f.parent_region_code,f.region_type,f.region_status from region_tbl f where f.region_code='1200' union all select f.ctid, f.region_code,f.parent_region_code,f.region_type,f.region_status from region_tbl f,r where f.region_code=r.parent_region_code)select * from r limit 15;
  ctid   | region_code | parent_region_code | region_type | region_status
---------+-------------+--------------------+-------------+---------------
 (18,75) | 1200        | 1020               | 5           | Y
 (18,76) | 1020        | 1002               | 4           | Y
 (9,108) | 1002        | 120                | 3           | Y
 (18,79) | 120         | 12                 | 2           | N
 (18,81) | 12          | 1                  | 1           | N
 (0,110) | 1           | 3                  | 3           | N
 (0,108) | 3           | 4                  | 6           | N
 (0,109) | 4           | 3                  | 3           | N
 (0,108) | 3           | 4                  | 6           | N
 (0,109) | 4           | 3                  | 3           | N
 (0,108) | 3           | 4                  | 6           | N
 (0,109) | 4           | 3                  | 3           | N
 (0,108) | 3           | 4                  | 6           | N
 (0,109) | 4           | 3                  | 3           | N
 (0,108) | 3           | 4                  | 6           | N
(15 rows)

之后還打印了limit 1000，limit 10000的結(jié)果。發(fā)現(xiàn)一個(gè)現(xiàn)象，ctid為(0,108) (0,109)這兩條數(shù)據(jù)一直在交替迭代，所以sql執(zhí)行慢是一直在交替掃描這兩條數(shù)據(jù)，這條sql在這個(gè)環(huán)境中是永遠(yuǎn)都跑不出結(jié)果的。

這兩條數(shù)據(jù)很有特點(diǎn)，目前的遞歸條件為f.region_code=r.parent_region_code，而這兩條數(shù)據(jù)的值剛好形成了一個(gè)閉環(huán)，導(dǎo)致遞歸陷入了死循環(huán)。

postgres=# select ctid,region_code,parent_region_code from region_tbl where region_code in ('3','4');
  ctid   | region_code | parent_region_code
---------+-------------+--------------------
 (0,108) | 3           | 4
 (0,109) | 4           | 3
(2 rows)

另外一個(gè)正常的環(huán)境中這兩條數(shù)據(jù)并沒(méi)有形成閉環(huán)，如下：

postgres=# select ctid,region_code,parent_region_code from region_tbl where region_code in ('3','4');
  ctid   | region_code | parent_region_code
---------+-------------+--------------------
 (0,245) | 3           | 0
 (0,246) | 4           | 0
(2 rows)

三、解決方案

1.已知是這兩條數(shù)據(jù)導(dǎo)致的問(wèn)題，那么可以參照正常環(huán)境修改數(shù)據(jù)值，或者從查詢(xún)條件中剔除這兩條數(shù)據(jù)。實(shí)際執(zhí)行340ms返回一條數(shù)據(jù)，如下：

 staffNo | empType |   empNum    |     Name     |            Postion             |        Addr        |   Mobile    | FirstRegCode | SecondRegCode | Thir
dRegCode | FurthRegCode
---------+---------+-------------+--------------+--------------------------------+--------------------+-------------+--------------+---------------+-----
---------+--------------
 200     | 6       | 85192900896 | 運(yùn)維yuanyuan |  Database administrator | 陜西省西安市高新區(qū) | 13512345678 | 1200         | 1020          | 1002
         |
(1 row)

Time: 339.986 ms

2.為什么要用到遞歸？遞歸條件是否可以修改？

sql中一些字段的查詢(xún)是通過(guò)遞歸完成的，例如SecondRegCode字段值為1020是遞歸輸出的第二條結(jié)果，即通過(guò)1200遞歸查詢(xún)出1020。如果不使用遞歸，那么只能查詢(xún)到1200（FirstRegCode字段值），這個(gè)字段的值是查不到的。也就是說(shuō)查詢(xún)某些字段是依賴(lài)遞歸的。

以當(dāng)前的sql邏輯，遞歸條件是無(wú)法修改的。

建議：

1）如果sql中繼續(xù)使用遞歸，那么對(duì)于region_code和parent_region_code字段關(guān)系一定要做明確的規(guī)則處理，比如建立檢查約束，明確region_code大于parent_region_code，這樣存入的數(shù)據(jù)就不會(huì)出現(xiàn)閉環(huán)。

2）sql作為結(jié)構(gòu)化查詢(xún)語(yǔ)言對(duì)比應(yīng)用語(yǔ)言，對(duì)復(fù)雜邏輯的處理存在很多局限性，不如應(yīng)用代碼靈活。因此是否可以考慮將遞歸處理從sql中拉出來(lái)，放到應(yīng)用代碼中處理

總結(jié)

到此這篇關(guān)于PostgreSQL運(yùn)維案例之遞歸查詢(xún)死循環(huán)的文章就介紹到這了,更多相關(guān)PostgreSQL遞歸查詢(xún)死循環(huán)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: