快捷導航

mysql not in、left join、IS NULL、NOT EXISTS 效率問題記錄

更新時間：2011年12月16日 12:03:56 作者：

mysql not in、left join、IS NULL、NOT EXISTS 效率問題記錄，需要的朋友可以參考下。

NOT IN、JOIN、IS NULL、NOT EXISTS效率對比

語句一：select count(*) from A where A.a not in (select a from B)

語句二：select count(*) from A left join B on A.a = B.a where B.a is null

語句三：select count(*) from A where not exists (select a from B where A.a = B.a)

知道以上三條語句的實際效果是相同的已經(jīng)很久了，但是一直沒有深究其間的效率對比。一直感覺上語句二是最快的。
今天工作上因為要對一個數(shù)千萬行數(shù)據(jù)的庫進行數(shù)據(jù)清除，需要刪掉兩千多萬行數(shù)據(jù)。大量的用到了以上三條語句所要實現(xiàn)的功能。本來用的是語句一，但是結(jié)果是執(zhí)行速度1個小時32分，日志文件占用21GB。時間上雖然可以接受，但是對硬盤空間的占用確是個問題。因此將所有的語句一都換成語句二。本以為會更快。沒想到執(zhí)行40多分鐘后，第一批50000行都沒有刪掉，反而讓SQL SERVER崩潰掉了，結(jié)果令人詫異。試了試單獨執(zhí)行這條語句，查詢近一千萬行的表，語句一用了4秒，語句二卻用了18秒，差距很大。語句三的效率與語句一接近。

第二種寫法是大忌，應(yīng)該盡量避免。第一種和第三種寫法本質(zhì)上幾乎一樣。

假設(shè)buffer pool足夠大，寫法二相對于寫法一來說存在以下幾點不足：
（1）left join本身更耗資源（需要更多資源來處理產(chǎn)生的中間結(jié)果集）
（2）left join的中間結(jié)果集的規(guī)模不會比表A小
（3）寫法二還需要對left join產(chǎn)生的中間結(jié)果做is null的條件篩選，而寫法一則在兩個集合join的同時完成了篩選，這部分開銷是額外的

這三點綜合起來，在處理海量數(shù)據(jù)時就會產(chǎn)生比較明顯的區(qū)別（主要是內(nèi)存和CPU上的開銷）。我懷疑樓主在測試時buffer pool可能已經(jīng)處于飽和狀態(tài)，這樣的話，寫法二的那些額外開銷不得不借助磁盤上的虛擬內(nèi)存，在SQL Server做換頁時，由于涉及到較慢的I/O操作因此這種差距會更加明顯。

關(guān)于日志文件過大，這也是正常的，因為刪除的記錄多嘛。可以根據(jù)數(shù)據(jù)庫的用途考慮將恢復(fù)模型設(shè)為simple，或者在刪除結(jié)束后將日志truncate掉并把文件shrink下來。

因為以前曾經(jīng)作過一個對這個庫進行無條件刪除的腳本，就是要刪除數(shù)據(jù)量較大的表中的所有數(shù)據(jù)，但是因為客戶要求，不能使用truncate table，怕破壞已有的庫結(jié)構(gòu)。所以只能用delete刪，當時也遇到了日志文件過大的問題，當時采用的方法是分批刪除，在SQL2K中用set rowcount @chunk，在SQL2K5中用delete top @chunk。這樣的操作不僅使刪除時間大大減少，而且讓日志量大大減少，只增長了1G左右。
但是這次清除數(shù)據(jù)的工作需要加上條件，就是delete A from A where ....后面有條件的。再次使用分批刪除的方法，卻已經(jīng)沒效果了。
不知您知不知道這是為什么。

mysql not in 和 left join 效率問題記錄

首先說明該條sql的功能是查詢集合a不在集合b的數(shù)據(jù)。
not in的寫法

復(fù)制代碼代碼如下:

 
select add_tb.RUID 
from (select distinct RUID 
from UserMsg 
where SubjectID =12 
and CreateTime>'2009-8-14 15:30:00' 
and CreateTime<='2009-8-17 16:00:00' 
) add_tb 
where add_tb.RUID 
not in (select distinct RUID 
from UserMsg 
where SubjectID =12 
and CreateTime<'2009-8-14 15:30:00' 
) 

復(fù)制代碼代碼如下:

 
select a.ruid,b.ruid 
from(select distinct RUID 
from UserMsg 
where SubjectID =12 
and CreateTime >= '2009-8-14 15:30:00' 
and CreateTime<='2009-8-17 16:00:00' 
) a left join ( 
select distinct RUID 
from UserMsg 
where SubjectID =12 and CreateTime< '2009-8-14 15:30:00' 
) b on a.ruid = b.ruid 
where b.ruid is null 

復(fù)制代碼代碼如下:

 
select distinct a.RUID 
from UserMsg a 
left join UserMsg b 
on a.ruid = b.ruid 
and b.subjectID =12 and b.createTime < '2009-8-14 15:30:00' 
where a.subjectID =12 
and a.createTime >= '2009-8-14 15:30:00' 
and a.createtime <='2009-8-17 16:00:00' 
and b.ruid is null; 

復(fù)制代碼代碼如下:

 
select distinct a.ruid 
from UserMsg a 
where a.subjectID =12 
and a.createTime >= '2009-8-14 15:30:00' 
and a.createTime <='2009-8-17 16:00:00' 
and not exists ( 
select distinct RUID 
from UserMsg 
where subjectID =12 and createTime < '2009-8-14 15:30:00' 
and ruid=a.ruid 
) 

復(fù)制代碼代碼如下:

 
select a.ruid,b.ruid 
from( select distinct RUID 
from UserMsg 
where CreateTime >= '2009-8-14 15:30:00' 
and CreateTime<='2009-8-17 16:00:00' 
) a left join UserMsg b 
on a.ruid = b.ruid 
and b.createTime < '2009-8-14 15:30:00' 
where b.ruid is null;