快捷導(dǎo)航

Python實(shí)現(xiàn)以時(shí)間換空間的緩存替換算法

更新時(shí)間：2016年02月19日 09:58:01 投稿：mrr

緩存是指可以進(jìn)行高速數(shù)據(jù)交換的存儲器，它先于內(nèi)存與CPU交換數(shù)據(jù)，因此速度很快。緩存就是把一些數(shù)據(jù)暫時(shí)存放于某些地方，可能是內(nèi)存，也有可能硬盤。下面給大家介紹Python實(shí)現(xiàn)以時(shí)間換空間的緩存替換算法，需要的朋友參考下

緩存是指可以進(jìn)行高速數(shù)據(jù)交換的存儲器，它先于內(nèi)存與CPU交換數(shù)據(jù)，因此速度很快。緩存就是把一些數(shù)據(jù)暫時(shí)存放于某些地方，可能是內(nèi)存，也有可能硬盤。

在使用Scrapy爬網(wǎng)站的時(shí)候，產(chǎn)生出來的附加產(chǎn)物，因?yàn)樵赟crapy爬取的時(shí)候，CPU的運(yùn)行時(shí)間緊迫度不高（訪問頻次太高容易被封禁），借此機(jī)會難得來上一下，讓自己的內(nèi)存解放一下。

算法原理：

通過將要緩存的數(shù)據(jù)用二進(jìn)制展開，得到的二進(jìn)制數(shù)據(jù)映射到緩存字段上，要檢驗(yàn)是否已經(jīng)緩存過，僅需要去查找對應(yīng)的映射位置即可，如果全部匹配上，則已經(jīng)緩存。

# 二進(jìn)制就是個(gè)二叉樹
# 如下面可以表示出來的數(shù)據(jù)有0, 1, 2, 3四個(gè)(兩個(gè)樹獨(dú)立)

0 1
/ \ / \
0 1 0 1

因此對緩存的操作就轉(zhuǎn)化為對二叉樹的操作，添加和查找只要在二叉樹上找到對應(yīng)路徑的node即可。

算法關(guān)鍵代碼：

def _read_bit(self, data, position):
return (data >> position) & 0x1
def _write_bit(self, data, position, value):
return data | value << position

實(shí)際使用效果如何呢?

在和Python默認(rèn)的 set 相比較，得出測試結(jié)果如下（存取整型，不定長字符串，定長字符串）：

Please select test mode:4
Please enter test times:1000
====================================================================================================
TEST RESULT::
====================================================================================================
set() bytecache
items 1000 1000
add(s) 0.0 0.0209999084473
read(s) 0.0 0.0149998664856
hits 1000 1000
missed 0 0
size 32992 56
add(s/item) 0.0 2.09999084473e-05
read(s/item) 0.0 2.09999084473e-05
====================================================================================================
size (set / bytecache): 589.142857143
add time (bytecache / set): N/A
read time (bytecache / set): N/A
====================================================================================================
...test fixed length & int data end...
====================================================================================================
TEST RESULT::
====================================================================================================
set() bytecache
items 1000 1000
add(s) 0.00100016593933 6.1740000248
read(s) 0.0 7.21300005913
hits 999 999
missed 0 0
size 32992 56
add(s/item) 1.00016593933e-06 0.0061740000248
read(s/item) 0.0 0.0061740000248
====================================================================================================
size (set / bytecache): 589.142857143
add time (bytecache / set): 6172.97568534
read time (bytecache / set): N/A
====================================================================================================
...test mutative length & string data end...
====================================================================================================
TEST RESULT::
====================================================================================================
set() bytecache
items 1000 1000
add(s) 0.0 0.513999938965
read(s) 0.0 0.421000003815
hits 999 999
missed 0 0
size 32992 56
add(s/item) 0.0 0.000513999938965
read(s/item) 0.0 0.000513999938965
====================================================================================================
size (set / bytecache): 589.142857143
add time (bytecache / set): N/A
read time (bytecache / set): N/A
====================================================================================================
...test Fixed length(64) & string data end...

測試下來，內(nèi)存消耗控制的比較好，一直在56字節(jié)，而是用 set 的內(nèi)存雖然也不是很大，當(dāng)相較于 ByteCache 來說，則大上很多。

但 ByteCache 的方式來緩存，最大的問題是當(dāng)碰到非常大的隨機(jī)數(shù)據(jù)時(shí)，消耗時(shí)間會比較驚人。如下面這種隨機(jī)長度的字符串緩存測試結(jié)果：

Please select test mode:2
Please enter test times:2000
====================================================================================================
TEST RESULT::
====================================================================================================
set() bytecache
items 2000 2000
add(s) 0.00400018692017 31.3759999275
read(s) 0.0 44.251999855
hits 1999 1999
missed 0 0
size 131296 56
add(s/item) 2.00009346008e-06 0.0156879999638
read(s/item) 0.0 0.0156879999638
====================================================================================================
size (set / bytecache): 2344.57142857
add time (bytecache / set): 7843.63344856
read time (bytecache / set): N/A
====================================================================================================
...test mutative length & string data end...

在2000個(gè)數(shù)據(jù)中，添加消耗31s，查找消耗44s，而 set 接近于0，單條數(shù)據(jù)也需要16ms（均值）才能完成讀/寫操作。

不過，正如開頭說的，在緊迫度不是很高的Scrapy中，這個(gè)時(shí)間并不會太過于窘迫，更何況在Scrapy中，一般是用來緩存哈希后的數(shù)據(jù)，這些數(shù)據(jù)的一個(gè)重要特性是定長，定長在本緩存算法中還是表現(xiàn)不錯(cuò)的，在64位長度的時(shí)候，均值才0.5ms。而與此同時(shí)倒是能在大量緩存的時(shí)候，釋放出比較客觀的內(nèi)存。

如果有更好的緩存算法能讓速度在上新臺階，也是無比期待的。。。

總結(jié)：

1. 此方法的目標(biāo)是用時(shí)間換取空間，切勿在時(shí)間緊迫度高的地方使用

2. 非常適用于大量定長，且數(shù)據(jù)本身比較小的情況下使用

3. 接2，非常不建議在大量不定長的數(shù)據(jù)，而且數(shù)據(jù)本身比較大的情況下使用

以上內(nèi)容是小編給大家介紹的Python實(shí)現(xiàn)以時(shí)間換空間的緩存替換算法，希望對大家有所幫助！

您可能感興趣的文章: