快捷導(dǎo)航

Redis內(nèi)存碎片率調(diào)優(yōu)處理方式

更新時(shí)間：2024年09月30日 10:22:18 作者：小馬運(yùn)維的一天

Redis集群因內(nèi)存碎片率超過1.5觸發(fā)告警,分析發(fā)現(xiàn)內(nèi)因與外因?qū)е聝?nèi)存碎片,內(nèi)因?yàn)椴僮飨到y(tǒng)內(nèi)存分配機(jī)制,外因?yàn)镽edis操作特性,使用Redis內(nèi)置內(nèi)存碎片清理機(jī)制可有效降低碎片率,但需注意可能影響性能,建議使用MEMORY命令診斷內(nèi)存使用情況,合理配置參數(shù)以優(yōu)化性能

1.背景概述

在生產(chǎn)環(huán)境中Redis Cluster集群觸發(fā)了內(nèi)存碎片化的告警（碎片率>1.5），集群節(jié)點(diǎn)分布三臺(tái)宿主機(jī)六個(gè)節(jié)點(diǎn)三主三從架構(gòu)，Redis版本是6.2.X。

2.redis內(nèi)存碎片的形成

內(nèi)存碎片形成的原因主要有2點(diǎn)：

內(nèi)因：操作系統(tǒng)的內(nèi)存分配機(jī)制。
外因：Redis的負(fù)載特征造成。

內(nèi)因：

內(nèi)存分配器會(huì)按照固定大小來分配內(nèi)存，而不是按需分配。例如Linux下默認(rèn)是4KB，開啟內(nèi)存大頁機(jī)制后就變成2MB。

Redis中使用jemalloc分配器來分配內(nèi)存。它會(huì)按照一系列固定大小的內(nèi)存來進(jìn)行分配。例如當(dāng)Redis中需要申請(qǐng)一個(gè)20B大小的空間來保存數(shù)據(jù)，那么jemalloc分配器就會(huì)分配32B：

倘若此時(shí)應(yīng)用還要寫入5B大小的數(shù)據(jù)，那么無需申請(qǐng)額外的空間。

倘若此時(shí)應(yīng)用還要寫入20B大小的數(shù)據(jù)，那么必須在申請(qǐng)額外的空間了，此時(shí)就會(huì)有產(chǎn)生內(nèi)存碎片的風(fēng)險(xiǎn)（之前分配的32B中，10B就是內(nèi)存碎片了）

外因：

我們有一個(gè)Redis實(shí)例，里面有著不同大小的鍵值對(duì)，那么根據(jù)內(nèi)存分配器的分配機(jī)制來看。就有可能分配著不同大小的連續(xù)內(nèi)存空間。

另一方面，我們對(duì)鍵值對(duì)也有可能有著不同的操作，增刪改查。

如圖所示

上圖中，白色部分的就是內(nèi)存碎片，可以看出大小不一的鍵值對(duì)以及修改刪除操作導(dǎo)致產(chǎn)生了內(nèi)存碎片。

3.清理內(nèi)存碎片

清理內(nèi)存碎片之前，首先應(yīng)該做的就是判斷是否有內(nèi)存碎片：

我們可以登錄Redis后使用INFO MEMORY命令查看。mem_fragmentation_ratio參數(shù)的值就是內(nèi)存碎片化的值。

mem_fragmentation_ratio代表Redis實(shí)例當(dāng)前的內(nèi)存碎片率。其計(jì)算公式為：

mem_fragmentation_ratio = used_memory_rss / used_memory

used_memory_rss：操作系統(tǒng)實(shí)際分配Redis的物理內(nèi)存空間。
used_memory：Redis為了保存數(shù)據(jù)而實(shí)際申請(qǐng)的空間。

針對(duì)mem_fragmentation_ratio，有兩個(gè)參考：

mem_fragmentation_ratio ∈ (1, 1.5]：屬于合理范圍內(nèi)，暫時(shí)可以放放。
mem_fragmentation_ratio ∈ (1.5, +∞)：表明內(nèi)存碎片率超過了50%，需要采取措施降低內(nèi)存碎片率。

那么如何清理內(nèi)存碎片呢（一般不會(huì)重啟實(shí)例，因?yàn)樯a(chǎn)上往往不允許這種神操作出現(xiàn)），在Redis4.0版本以后，Redis提供了內(nèi)置的內(nèi)存碎片清理機(jī)制。

#內(nèi)存碎片回收機(jī)制使用
activedefrag yes
# active-defrag-ignore-bytes與active-defrag-threshold-lower兩個(gè)條件同時(shí)滿足會(huì)觸發(fā)內(nèi)存碎片清理，當(dāng)有一個(gè)不滿足則停止清理
# 啟動(dòng)活動(dòng)碎片整理的最小碎片浪費(fèi)量，內(nèi)存碎片的字節(jié)數(shù)達(dá)到100M時(shí)開始清理
active-defrag-ignore-bytes 100mb
# 啟動(dòng)活動(dòng)碎片整理的最小碎片百分比，內(nèi)存碎片空間占操作系統(tǒng)分配給redis的總空間比例達(dá)到 10% 時(shí)開始清理
active-defrag-threshold-lower 10
# 內(nèi)存碎片超過 100%，則盡最大努力整理
active-defrag-threshold-upper 100
# 以CPU百分比表示的碎片整理的最小工作量，自動(dòng)清理過程所用CPU時(shí)間的比例不低于5%，保證能正常清理
active-defrag-cycle-min 5
# 自動(dòng)清理過程所用CPU時(shí)間的比例不高于75%，超過75%停止清理，避免redis主線程阻塞
active-defrag-cycle-max 75

以上參數(shù)可根據(jù)主機(jī)資源配置及應(yīng)用場(chǎng)景自行調(diào)整。

除此之外，值得注意的是，雖然Redis提供了這樣的自動(dòng)內(nèi)存清理機(jī)制，能夠帶來清理內(nèi)存碎片的好處，但是與此同時(shí)的必定有著其對(duì)應(yīng)的犧牲，也就是性能影響問題。

Redis提供了內(nèi)置的內(nèi)存碎片清理機(jī)制的使用前提是在編譯的過程中添加了內(nèi)存分配器參數(shù)MALLOC=jemalloc

make MALLOC=jemalloc

4.擴(kuò)展

當(dāng)然我們?cè)谏a(chǎn)環(huán)境使用Redis過程中也可以使用自帶的命令進(jìn)行內(nèi)存使用情況的診斷，便于我們及時(shí)對(duì)問題進(jìn)行優(yōu)化處理。

我們可以使用MEMORY help查看：

127.0.0.1:6379> MEMORY help
1) MEMORY <subcommand> arg arg ... arg. Subcommands are:
2) DOCTOR - Return memory problems reports.
3) MALLOC-STATS -- Return internal statistics report from the memory allocator.
4) PURGE -- Attempt to purge dirty pages for reclamation by the allocator.
5) STATS -- Return information about the memory usage of the server.
6) USAGE <key> [SAMPLES <count>] -- Return memory in bytes used by <key> and its value. Nested values are sampled up to <count> times (default: 5).

MEMORY STATS

redis的內(nèi)存使用不只包含全部的key-value數(shù)據(jù)，還有描述這些key-value的元信息，以及許多管理功能的消耗，好比持久化、主從復(fù)制，經(jīng)過MEMORY STATS能夠更好的了解到redis的內(nèi)存使用情況

127.0.0.1:6379> MEMORY STATS
 1) "peak.allocated"
 2) (integer) 13749176
 3) "total.allocated"
 4) (integer) 7308872
 5) "startup.allocated"
 6) (integer) 791424
 7) "replication.backlog"
 8) (integer) 0
 9) "clients.slaves"
10) (integer) 0
11) "clients.normal"
12) (integer) 2587994
13) "aof.buffer"
14) (integer) 0
15) "lua.caches"
16) (integer) 1912
17) "db.0"
18) 1) "overhead.hashtable.main"
    2) (integer) 285216
    3) "overhead.hashtable.expires"
    4) (integer) 1568
19) "overhead.total"
20) (integer) 3668114
21) "keys.count"
22) (integer) 5492
23) "keys.bytes-per-key"
24) (integer) 1186
25) "dataset.bytes"
26) (integer) 3640758
27) "dataset.percentage"
28) "55.861713409423828"
29) "peak.percentage"
30) "53.158615112304688"
31) "allocator.allocated"
32) (integer) 7323560
33) "allocator.active"
34) (integer) 8429568
35) "allocator.resident"
36) (integer) 12427264
37) "allocator-fragmentation.ratio"
38) "1.1510205268859863"
39) "allocator-fragmentation.bytes"
40) (integer) 1106008
41) "allocator-rss.ratio"
42) "1.4742468595504761"
43) "allocator-rss.bytes"
44) (integer) 3997696
45) "rss-overhead.ratio"
46) "0.65557020902633667"
47) "rss-overhead.bytes"
48) (integer) -4280320
49) "fragmentation"
50) "1.1209555864334106"
51) "fragmentation.bytes"
52) (integer) 879088

一共有15項(xiàng)內(nèi)容，內(nèi)存使用量均以字節(jié)為單位

1. peak.allocated

redis啟動(dòng)到如今，最多使用過多少內(nèi)存。

2. total.allocated

當(dāng)前使用的內(nèi)存總量。

3. startup.allocated

redis啟動(dòng)初始化時(shí)使用的內(nèi)存，有不少讀者會(huì)比較奇怪，為何個(gè)人redis啟動(dòng)之后什么都沒作就已經(jīng)占用了幾十MB的內(nèi)存？

這是由于redis自己不只存儲(chǔ)key-value，還有其余的內(nèi)存消耗，好比共享變量、主從復(fù)制、持久化和db元信息，下面各項(xiàng)會(huì)有詳細(xì)介紹。

4. replication.backlog

主從復(fù)制backlog使用的內(nèi)存，默認(rèn)10MB，backlog只在主從斷線重連時(shí)發(fā)揮做用，主從復(fù)制自己并不依賴此項(xiàng)。

5. clients.slaves

主從復(fù)制中全部slave的讀寫緩沖區(qū)，包括output-buffer（也即輸出緩沖區(qū)）使用的內(nèi)存和querybuf（也即輸入緩沖區(qū)），這里簡單介紹一下主從復(fù)制：app

redis把一次事件循環(huán)中，全部對(duì)數(shù)據(jù)庫發(fā)生更改的內(nèi)容先追加到slave的output-buffer中，在事件循環(huán)結(jié)束后統(tǒng)一發(fā)送給slave。

那么主從之間就不免會(huì)有數(shù)據(jù)的延遲，若是主從之間鏈接斷開，重連時(shí)為了保證數(shù)據(jù)的一致性就要作一次全量同步，這顯然是不夠高效的。

backlog就是為此而設(shè)計(jì)，master在backlog中緩存一部分主從復(fù)制的增量數(shù)據(jù)，斷線重連時(shí)若是slave的偏移量在backlog中，那就能夠只把偏移量以后的增量數(shù)據(jù)同步給slave便可，避免了全量同步的開銷。

6. clients.normal

除slave外全部其余客戶端的讀寫緩沖區(qū)。

有時(shí)候一些客戶端讀取不及時(shí)，就會(huì)形成output-buffer積壓占用內(nèi)存過多的狀況，能夠經(jīng)過配置項(xiàng)client-output-buffer-limit來限制，當(dāng)超過閾值以后redis就會(huì)主動(dòng)斷開鏈接以釋放內(nèi)存，slave亦是如此。

7. aof.buffer

此項(xiàng)為aof持久化使用的緩存和aofrewrite時(shí)產(chǎn)生的緩存之和，固然若是關(guān)閉了appendonly那這項(xiàng)就一直為0：less

redis并非在有寫入時(shí)就當(dāng)即作持久化的，而是在一次事件循環(huán)內(nèi)把全部的寫入數(shù)據(jù)緩存起來，待到事件循環(huán)結(jié)束后再持久化到磁盤。

aofrewrite時(shí)緩存增量數(shù)據(jù)使用的內(nèi)存，只在aofrewrite時(shí)才會(huì)使用。

能夠看出這一項(xiàng)的大小與寫入流量成正比。ide

8. db.0

redis每一個(gè)db的元信息使用的內(nèi)存，這里只使用了db0，因此只打印了db0的內(nèi)存使用狀態(tài)，當(dāng)使用其余db時(shí)也會(huì)有相應(yīng)的信息。優(yōu)化

db的元信息有如下三項(xiàng)：

a) redis的db就是一張hash表，首先就是這張hash表使用的內(nèi)存（redis使用鏈?zhǔn)絟ash，hash表中存放全部鏈表的頭指針）；
b) 每個(gè)key-value對(duì)都有一個(gè)dictEntry來記錄他們的關(guān)系，元信息便包含該db中全部dictEntry使用的內(nèi)存；
c) redis使用redisObject來描述value所對(duì)應(yīng)的不一樣數(shù)據(jù)類型（string、list、hash、set、zset），那么redisObject占用的空間也計(jì)算在元信息中。

overhead.hashtable.main：

db的元信息也便是以上三項(xiàng)之和，計(jì)算公式為：

hashtable + dictEntry + redisObject

overhead.hashtable.expires:

對(duì)于key的過時(shí)時(shí)間，redis并無把它和value放在一塊兒，而是單獨(dú)用一個(gè)hashtable來存儲(chǔ)，可是expires這張hash表記錄的是key-expire信息，因此不須要`redisObject`來描述value，其元信息也就少了一項(xiàng)，計(jì)算公式為：

hashtable + dictEntry

9. overhead.total

3-8項(xiàng)之和：startup.allocated+replication.backlog+clients.slaves+clients.normal+aof.buffer+dbx

10. dataset.bytes

全部數(shù)據(jù)所使用的內(nèi)存——也即total.allocated - overhead.total——當(dāng)前內(nèi)存使用量減去管理類內(nèi)存使用量。

11. dataset.percentage

全部數(shù)據(jù)占比，這里并無直接使用total.allocated作分母，而是除去了redis啟動(dòng)初始化的內(nèi)存，計(jì)算公式為：

100 * dataset.bytes / (total.allocated - startup.allocated)

12. keys.count

redis當(dāng)前存儲(chǔ)的key總量

13. keys.bytes-per-key

平均每一個(gè)key的內(nèi)存大小，直覺上應(yīng)該是用dataset.bytes除以keys.count便可，可是redis并無這么作，而是把管理類內(nèi)存也平攤到了每一個(gè)key的內(nèi)存使用中，計(jì)算公式為：

(total.allocated - startup.allocated) / keys.count

14. peak.percentage

當(dāng)前使用內(nèi)存與歷史最高值比例

15. fragmentation

內(nèi)存碎片率

MEMORY USAGE

使用方法：MEMORY USAGE <key> [SAMPLES <count>]

命令參數(shù)很少，經(jīng)過字面意思也能夠看出來是評(píng)估指定key的內(nèi)存使用狀況。samples是可選參數(shù)默認(rèn)為5，以hash為例看下：

127.0.0.1:6379> HGETALL 9527
1) "name"
2) "zhouxingxing"
3) "age"
4) "50"
5) "city"
6) "hongkong"
127.0.0.1:6379> MEMORY USAGE 9527
(integer) 101

首先相似于上一節(jié)中的overhead.hashtable.main，要計(jì)算hash的元信息內(nèi)存，包括hash表的大小以及全部dictEntry的內(nèi)存占用信息。

與overhead.hashtable.main不一樣的是，每一個(gè)dictEntry中key-value都是字符串，因此沒redisObject的額外消耗。

在評(píng)估真正的數(shù)據(jù)內(nèi)存大小時(shí)redis并無去遍歷全部key，而是采用的抽樣估算：隨機(jī)抽取samples個(gè)key-value對(duì)計(jì)算其平均內(nèi)存占用，再乘以key-value對(duì)的個(gè)數(shù)即獲得結(jié)果。

試想一下若是要精確計(jì)算內(nèi)存占用，那么就須要遍歷全部的元素，當(dāng)元素不少時(shí)就是使redis阻塞，因此請(qǐng)合理設(shè)置samples的大小。

其余數(shù)據(jù)結(jié)構(gòu)的計(jì)算方式相似于hash，此處就再也不贅述。

MEMORY DOCTOR

此項(xiàng)子命令是做者給出的關(guān)于redis內(nèi)存使用方面的建議，在不一樣的容許狀態(tài)下會(huì)有不一樣的分析結(jié)果。

首先是沒問題的狀況

運(yùn)行狀態(tài)良好

Hi Sam, I can't find any memory issue in your instance. I can only account for what occurs on this base.
redis的數(shù)據(jù)量很小，暫無建議：

Hi Sam, this instance is empty or is using very little memory, my issues detector can't be used in these conditions. Please, leave for your mission on Earth and fill it with some data. The new Sam and I will be back to our programming as soon as I finished rebooting.

內(nèi)存使用峰值1.5倍于目前內(nèi)存使用量，此時(shí)內(nèi)存碎片率可能會(huì)比較高，須要注意：

Sam, I detected a few issues in this Redis instance memory implants:

* Peak memory: In the past this instance used more than 150% the memory that is currently using. The allocator is normally not able to release memory after a peak, so you can expect to see a big fragmentation ratio, however this is actually harmless and is only due to the memory peak, and if the Redis instance Resident Set Size (RSS) is currently bigger than expected, the memory will be used as soon as you fill the Redis instance with more data. If the memory peak was only occasional and you want to try to reclaim memory, please try the MEMORY PURGE command, otherwise the only other option is to shutdown and restart the instance.

I'm here to keep you safe, Sam. I want to help you.

內(nèi)存碎片率太高超過1.4，須要注意：

High fragmentation: This instance has a memory fragmentation greater than 1.4 (this means that the Resident Set Size of the Redis process is much larger than the sum of the logical allocations Redis performed). This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default libc malloc.
每一個(gè)slave緩沖區(qū)的平均內(nèi)存超過10MB，緣由多是master寫入流量太高，也有多是主從同步的網(wǎng)絡(luò)帶寬不足或者slave處理較慢：

Big slave buffers: The slave output buffers in this instance are greater than 10MB for each slave (on average). This likely means that there is some slave instance that is struggling receiving data, either because it is too slow or because of networking issues. As a result, data piles on the master output buffers. Please try to identify what slave is not receiving data correctly and why. You can use the INFO output in order to check the slaves delays and the CLIENT LIST command to check the output buffers of each slave.
普通客戶端緩沖區(qū)的平均內(nèi)存超過200KB，緣由多是pipeline使用不當(dāng)或者Pub/Sub客戶端處理消息不及時(shí)致使：

Big client buffers: The clients output buffers in this instance are greater than 200K per client (on average). This may result from different causes, like Pub/Sub clients subscribed to channels bot not receiving data fast enough, so that data piles on the Redis instance output buffer, or clients sending commands with large replies or very large sequences of commands in the same pipeline. Please use the CLIENT LIST command in order to investigate the issue if it causes problems in your instance, or to understand better why certain clients are using a big amount of memory.