欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

詳細(xì)分析Redis集群故障

 更新時(shí)間:2017年10月13日 16:50:14   作者:帶魚(yú)兄  
這篇文章主要介紹了詳細(xì)分析Redis集群故障的相關(guān)內(nèi)容,具有一定的參考價(jià)值,這里分享給大家,供需要的朋友參考。

故障表象:

業(yè)務(wù)層面顯示提示查詢r(jià)edis失敗

集群組成:

3主3從,每個(gè)節(jié)點(diǎn)的數(shù)據(jù)有8GB

機(jī)器分布:

在同一個(gè)機(jī)架中,

xx.x.xxx.199
xx.x.xxx.200
xx.x.xxx.201

redis-server進(jìn)程狀態(tài):

通過(guò)命令ps -eo pid,lstart | grep $pid,

發(fā)現(xiàn)進(jìn)程已經(jīng)持續(xù)運(yùn)行了3個(gè)月

發(fā)生故障前集群的節(jié)點(diǎn)狀態(tài):

xx.x.xxx.200:8371(bedab2c537fe94f8c0363ac4ae97d56832316e65) master
xx.x.xxx.199:8373(792020fe66c00ae56e27cd7a048ba6bb2b67adb6) slave
xx.x.xxx.201:8375(5ab4f85306da6d633e4834b4d3327f45af02171b) master
xx.x.xxx.201:8372(826607654f5ec81c3756a4a21f357e644efe605a) slave
xx.x.xxx.199:8370(462cadcb41e635d460425430d318f2fe464665c5) master
xx.x.xxx.200:8374(1238085b578390f3c8efa30824fd9a4baba10ddf) slave

---------------------------------下面是日志分析--------------------------------------

步1:
主節(jié)點(diǎn)8371失去和從節(jié)點(diǎn)8373的連接:
46590:M 09 Sep 18:57:51.379 # Connection with slave xx.x.xxx.199:8373 lost.

步2:
主節(jié)點(diǎn)8370/8375判定8371失聯(lián):
42645:M 09 Sep 18:57:50.117 * Marking node bedab2c537fe94f8c0363ac4ae97d56832316e65 as failing (quorum reached).

步3:
從節(jié)點(diǎn)8372/8373/8374收到主節(jié)點(diǎn)8375說(shuō)8371失聯(lián):
46986:S 09 Sep 18:57:50.120 * FAIL message received from 5ab4f85306da6d633e4834b4d3327f45af02171b about bedab2c537fe94f8c0363ac4ae97d56832316e65

步4:
主節(jié)點(diǎn)8370/8375授權(quán)8373升級(jí)為主節(jié)點(diǎn)轉(zhuǎn)移:
42645:M 09 Sep 18:57:51.055 # Failover auth granted to 792020fe66c00ae56e27cd7a048ba6bb2b67adb6 for epoch 16

步5:
原主節(jié)點(diǎn)8371修改自己的配置,成為8373的從節(jié)點(diǎn):
46590:M 09 Sep 18:57:51.488 # Configuration change detected. Reconfiguring myself as a replica of 792020fe66c00ae56e27cd7a048ba6bb2b67adb6

步6:
主節(jié)點(diǎn)8370/8375/8373明確8371失敗狀態(tài):
42645:M 09 Sep 18:57:51.522 * Clear FAIL state for node bedab2c537fe94f8c0363ac4ae97d56832316e65: master without slots is reachable again.

步7:
新從節(jié)點(diǎn)8371開(kāi)始從新主節(jié)點(diǎn)8373,第一次全量同步數(shù)據(jù):
8373日志::
4255:M 09 Sep 18:57:51.906 * Full resync requested by slave xx.x.xxx.200:8371
4255:M 09 Sep 18:57:51.906 * Starting BGSAVE for SYNC with target: disk
4255:M 09 Sep 18:57:51.941 * Background saving started by pid 5230
8371日志::
46590:S 09 Sep 18:57:51.948 * Full resync from master: d7751c4ebf1e63d3baebea1ed409e0e7243a4423:440721826993

步8:
主節(jié)點(diǎn)8370/8375判定8373(新主)失聯(lián):
42645:M 09 Sep 18:58:00.320 * Marking node 792020fe66c00ae56e27cd7a048ba6bb2b67adb6 as failing (quorum reached).

步9:
主節(jié)點(diǎn)8370/8375判定8373(新主)恢復(fù):
60295:M 09 Sep 18:58:18.181 * Clear FAIL state for node 792020fe66c00ae56e27cd7a048ba6bb2b67adb6: is reachable again and nobody is serving its slots after some time.

步10:
主節(jié)點(diǎn)8373完成全量同步所需要的BGSAVE操作:
5230:C 09 Sep 18:59:01.474 * DB saved on disk
5230:C 09 Sep 18:59:01.491 * RDB: 7112 MB of memory used by copy-on-write
4255:M 09 Sep 18:59:01.877 * Background saving terminated with success

步11:
從節(jié)點(diǎn)8371開(kāi)始從主節(jié)點(diǎn)8373接收到數(shù)據(jù):
46590:S 09 Sep 18:59:02.263 * MASTER <-> SLAVE sync: receiving 2657606930 bytes from master

步12:
主節(jié)點(diǎn)8373發(fā)現(xiàn)從節(jié)點(diǎn)8371對(duì)output buffer作了限制:
4255:M 09 Sep 19:00:19.014 # Client id=14259015 addr=xx.x.xxx.200:21772 fd=844 name= age=148 idle=148 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=16349 oll=4103 omem=95944066 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
4255:M 09 Sep 19:00:19.015 # Connection with slave xx.x.xxx.200:8371 lost.

步13:
從節(jié)點(diǎn)8371從主節(jié)點(diǎn)8373同步數(shù)據(jù)失敗,連接斷了,第一次全量同步失敗:
46590:S 09 Sep 19:00:19.018 # I/O error trying to sync with MASTER: connection lost
46590:S 09 Sep 19:00:20.102 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 09 Sep 19:00:20.102 * MASTER <-> SLAVE sync started

步14:
從節(jié)點(diǎn)8371重新開(kāi)始同步,連接失敗,主節(jié)點(diǎn)8373的連接數(shù)滿了:
46590:S 09 Sep 19:00:21.103 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 09 Sep 19:00:21.103 * MASTER <-> SLAVE sync started
46590:S 09 Sep 19:00:21.104 * Non blocking connect for SYNC fired the event.
46590:S 09 Sep 19:00:21.104 # Error reply to PING from master: '-ERR max number of clients reached'

步15:
從節(jié)點(diǎn)8371重新連上主節(jié)點(diǎn)8373,第二次開(kāi)始全量同步:
8371日志:
46590:S 09 Sep 19:00:49.175 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 09 Sep 19:00:49.175 * MASTER <-> SLAVE sync started
46590:S 09 Sep 19:00:49.175 * Non blocking connect for SYNC fired the event.
46590:S 09 Sep 19:00:49.176 * Master replied to PING, replication can continue...
46590:S 09 Sep 19:00:49.179 * Partial resynchronization not possible (no cached master)
46590:S 09 Sep 19:00:49.501 * Full resync from master: d7751c4ebf1e63d3baebea1ed409e0e7243a4423:440780763454
8373日志:
4255:M 09 Sep 19:00:49.176 * Slave xx.x.xxx.200:8371 asks for synchronization
4255:M 09 Sep 19:00:49.176 * Full resync requested by slave xx.x.xxx.200:8371
4255:M 09 Sep 19:00:49.176 * Starting BGSAVE for SYNC with target: disk
4255:M 09 Sep 19:00:49.498 * Background saving started by pid 18413
18413:C 09 Sep 19:01:52.466 * DB saved on disk
18413:C 09 Sep 19:01:52.620 * RDB: 2124 MB of memory used by copy-on-write
4255:M 09 Sep 19:01:53.186 * Background saving terminated with success

步16:
從節(jié)點(diǎn)8371同步數(shù)據(jù)成功,開(kāi)始加載經(jīng)內(nèi)存:
46590:S 09 Sep 19:01:53.190 * MASTER <-> SLAVE sync: receiving 2637183250 bytes from master
46590:S 09 Sep 19:04:51.485 * MASTER <-> SLAVE sync: Flushing old data
46590:S 09 Sep 19:05:58.695 * MASTER <-> SLAVE sync: Loading DB in memory

步17:
集群恢復(fù)正常:
42645:M 09 Sep 19:05:58.786 * Clear FAIL state for node bedab2c537fe94f8c0363ac4ae97d56832316e65: slave is reachable again.

步18:
從節(jié)點(diǎn)8371同步數(shù)據(jù)成功,耗時(shí)7分鐘:
46590:S 09 Sep 19:08:19.303 * MASTER <-> SLAVE sync: Finished with success

8371失聯(lián)原因分析:

由于幾臺(tái)機(jī)器在同一個(gè)機(jī)架,不太可能發(fā)生網(wǎng)絡(luò)中斷的情況,于是通過(guò)SLOWLOG GET命令查看了慢查詢?nèi)罩?,發(fā)現(xiàn)有一個(gè)KEYS命令被執(zhí)行了,耗時(shí)8.3秒,再查看集群節(jié)點(diǎn)超時(shí)設(shè)置,發(fā)現(xiàn)是5s(cluster-node-timeout 5000)

出現(xiàn)節(jié)點(diǎn)失聯(lián)的原因:

客戶端執(zhí)行了耗時(shí)1條8.3s的命令,

2016/9/9 18:57:43 開(kāi)始執(zhí)行KEYS命令
2016/9/9 18:57:50 8371被判斷失聯(lián)(redis日志)
2016/9/9 18:57:51 執(zhí)行完KEYS命令

總結(jié)來(lái)說(shuō),有以下幾個(gè)問(wèn)題:

1.由于cluster-node-timeout設(shè)置比較短,慢查詢KEYS導(dǎo)致了集群判斷節(jié)點(diǎn)8371失聯(lián)

2.由于8371失聯(lián),導(dǎo)致8373升級(jí)為主,開(kāi)始主從同步

3.由于配置client-output-buffer-limit的限制,導(dǎo)致第一次全量同步失敗了

4.又由于PHP客戶端的連接池有問(wèn)題,瘋狂連接服務(wù)器,產(chǎn)生了類似SYN攻擊的效果

5.第一次全量同步失敗后,從節(jié)點(diǎn)重連主節(jié)點(diǎn)花了30秒(超過(guò)了最大連接數(shù)1w)

關(guān)于client-output-buffer-limit參數(shù):

# The syntax of every client-output-buffer-limit directive is the following: 
# 
# client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds> 
# 
# A client is immediately disconnected once the hard limit is reached, or if 
# the soft limit is reached and remains reached for the specified number of 
# seconds (continuously). 
# So for instance if the hard limit is 32 megabytes and the soft limit is 
# 16 megabytes / 10 seconds, the client will get disconnected immediately 
# if the size of the output buffers reach 32 megabytes, but will also get 
# disconnected if the client reaches 16 megabytes and continuously overcomes 
# the limit for 10 seconds. 
# 
# By default normal clients are not limited because they don't receive data 
# without asking (in a push way), but just after a request, so only 
# asynchronous clients may create a scenario where data is requested faster 
# than it can read. 
# 
# Instead there is a default limit for pubsub and slave clients, since 
# subscribers and slaves receive data in a push fashion. 
# 
# Both the hard or the soft limit can be disabled by setting them to zero. 
client-output-buffer-limit normal 0 0 0 
client-output-buffer-limit slave 256mb 64mb 60 
client-output-buffer-limit pubsub 32mb 8mb 60 

采取措施:

1.單實(shí)例的切割到4G以下,否則發(fā)生主從切換會(huì)耗時(shí)很長(zhǎng)

2.調(diào)整client-output-buffer-limit參數(shù),防止同步進(jìn)行到一半失敗

3.調(diào)整cluster-node-timeout,不能少于15s

4.禁止任何耗時(shí)超過(guò)cluster-node-timeout的慢查詢,因?yàn)闀?huì)導(dǎo)致主從切換

5.修復(fù)客戶端類似SYN攻擊的瘋狂連接方式

總結(jié)

以上就是本文關(guān)于詳細(xì)分析Redis集群故障的全部?jī)?nèi)容,希望對(duì)大家有所幫助。感興趣的朋友可以參閱:Spring AOP實(shí)現(xiàn)Redis緩存數(shù)據(jù)庫(kù)查詢?cè)创a簡(jiǎn)述Redis和MySQL的區(qū)別、oracle 數(shù)據(jù)庫(kù)啟動(dòng)階段分析等,如有不足之處,請(qǐng)留言之處。小編會(huì)及時(shí)更正。感謝朋友們對(duì)腳本之家網(wǎng)站的支持!

相關(guān)文章

  • Redis集群新增、刪除節(jié)點(diǎn)以及動(dòng)態(tài)增加內(nèi)存的方法

    Redis集群新增、刪除節(jié)點(diǎn)以及動(dòng)態(tài)增加內(nèi)存的方法

    本文主要介紹了Redis集群新增、刪除節(jié)點(diǎn)以及動(dòng)態(tài)增加內(nèi)存的方法,文中通過(guò)示例代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下
    2021-09-09
  • Redis?如何清空所有數(shù)據(jù)

    Redis?如何清空所有數(shù)據(jù)

    這篇文章主要介紹了Redis?如何清空所有數(shù)據(jù),具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2022-08-08
  • Redis精確去重計(jì)數(shù)方法(咆哮位圖)

    Redis精確去重計(jì)數(shù)方法(咆哮位圖)

    這篇文章主要給大家介紹了關(guān)于Redis精確去重計(jì)數(shù)方法(咆哮位圖)的相關(guān)資料,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家學(xué)習(xí)或者使用Redis具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面來(lái)一起學(xué)習(xí)學(xué)習(xí)吧
    2019-06-06
  • redis中Could not get a resource from the pool異常及解決方案

    redis中Could not get a resource from

    這篇文章主要介紹了redis中Could not get a resource from the pool異常及解決方案,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2022-12-12
  • 一文詳細(xì)介紹Redis7持久化機(jī)制RDB和AOF

    一文詳細(xì)介紹Redis7持久化機(jī)制RDB和AOF

    這篇文章主要給大家分享一下Redis的數(shù)據(jù)持久化方式,Reids是一個(gè)高性能的緩存中間件,它的高性能是因?yàn)樗腔趦?nèi)存的,我們知道直接操縱內(nèi)存是比較快的,所以當(dāng)機(jī)器發(fā)生宕機(jī),那么數(shù)據(jù)就會(huì)完全丟失,所以本文詳細(xì)介紹Redis7持久化機(jī)制RDB和AOF
    2023-07-07
  • redis集合類型_動(dòng)力節(jié)點(diǎn)Java學(xué)院整理

    redis集合類型_動(dòng)力節(jié)點(diǎn)Java學(xué)院整理

    這篇文章給大家介紹了redis集合類型的常用方法,感興趣的朋友參考下吧
    2017-08-08
  • Redis源碼與設(shè)計(jì)剖析之網(wǎng)絡(luò)連接庫(kù)

    Redis源碼與設(shè)計(jì)剖析之網(wǎng)絡(luò)連接庫(kù)

    這篇文章主要為大家介紹了Redis源碼與設(shè)計(jì)剖析之網(wǎng)絡(luò)連接庫(kù)詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪
    2022-09-09
  • redis-cli 使用密碼登錄的實(shí)例

    redis-cli 使用密碼登錄的實(shí)例

    今天小編就為大家分享一篇redis-cli 使用密碼登錄的實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧
    2018-05-05
  • Redis內(nèi)存滿了的幾種原因和最佳解決方案

    Redis內(nèi)存滿了的幾種原因和最佳解決方案

    Redis是一款高性能的內(nèi)存數(shù)據(jù)庫(kù),被廣泛應(yīng)用于緩存、消息隊(duì)列、計(jì)數(shù)器等場(chǎng)景,然而,由于Redis是基于內(nèi)存的數(shù)據(jù)庫(kù),當(dāng)數(shù)據(jù)量過(guò)大或者配置不合理時(shí),就有可能導(dǎo)致Redis的內(nèi)存滿,本文將介紹Redis內(nèi)存滿的幾種原因,并提供相應(yīng)的解決方案,需要的朋友可以參考下
    2023-11-11
  • redistemplate下opsForHash操作示例

    redistemplate下opsForHash操作示例

    這篇文章主要為大家介紹了redistemplate下opsForHash操作示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪
    2023-07-07

最新評(píng)論