elasticsearch的zenDiscovery和master選舉機(jī)制原理分析
前言
上一篇通過(guò) ElectMasterService源碼,分析了master選舉的原理的大部分內(nèi)容:master候選節(jié)點(diǎn)ID排序保證選舉一致性及通過(guò)設(shè)置最小可見(jiàn)候選節(jié)點(diǎn)數(shù)目避免brain split。節(jié)點(diǎn)排序后選舉只能保證局部一致性,如果發(fā)生節(jié)點(diǎn)接收到了錯(cuò)誤的集群狀態(tài)就會(huì)選舉出錯(cuò)誤的master,因此必須有其它措施來(lái)保證選舉的一致性。這就是上一篇所提到的第二點(diǎn):被選舉的數(shù)量達(dá)到一定的數(shù)目同時(shí)自己也選舉自己,這個(gè)節(jié)點(diǎn)才能成為master。這一點(diǎn)體現(xiàn)在zenDiscovery中,本篇將結(jié)合節(jié)點(diǎn)的發(fā)現(xiàn)過(guò)程進(jìn)一步介紹master選舉機(jī)制。
節(jié)點(diǎn)啟動(dòng)后首先啟動(dòng)join線程,join線程會(huì)尋找cluster的master節(jié)點(diǎn),如果集群之前已經(jīng)啟動(dòng),并且運(yùn)行良好,則試圖連接集群的master節(jié)點(diǎn),加入集群。否則(集群正在啟動(dòng))選舉master節(jié)點(diǎn),如果自己被選為master,則向集群中其它節(jié)點(diǎn)發(fā)送一個(gè)集群狀態(tài)更新的task,如果master是其它節(jié)點(diǎn)則試圖加入該集群。
join的代碼
private void innerJoinCluster() { DiscoveryNode masterNode = null; final Thread currentThread = Thread.currentThread(); //一直阻塞直到找到master節(jié)點(diǎn),在集群剛剛啟動(dòng),或者集群master丟失的情況,這種阻塞能夠保證集群一致性 while (masterNode == null && joinThreadControl.joinThreadActive(currentThread)) { masterNode = findMaster(); } //有可能自己會(huì)被選舉為master(集群?jiǎn)?dòng),或者加入時(shí)正在選舉) if (clusterService.localNode().equals(masterNode)) { //如果本身是master,則需要向其它所有節(jié)點(diǎn)發(fā)送集群狀態(tài)更新 clusterService.submitStateUpdateTask("zen-disco-join (elected_as_master)", Priority.IMMEDIATE, new ProcessedClusterStateNonMasterUpdateTask() { @Override public ClusterState execute(ClusterState currentState) { //選舉時(shí)錯(cuò)誤的,之前的master狀態(tài)良好,則不更新?tīng)顟B(tài),仍舊使用之前狀態(tài)。 if (currentState.nodes().masterNode() != null) { return currentState; } DiscoveryNodes.Builder builder = new DiscoveryNodes.Builder(currentState.nodes()).masterNodeId(currentState.nodes().localNode().id()); // update the fact that we are the master... ClusterBlocks clusterBlocks = ClusterBlocks.builder().blocks(currentState.blocks()).removeGlobalBlock(discoverySettings.getNoMasterBlock()).build(); currentState = ClusterState.builder(currentState).nodes(builder).blocks(clusterBlocks).build(); // eagerly run reroute to remove dead nodes from routing table RoutingAllocation.Result result = allocationService.reroute(currentState); return ClusterState.builder(currentState).routingResult(result).build(); } @Override public void onFailure(String source, Throwable t) { logger.error("unexpected failure during [{}]", t, source); joinThreadControl.markThreadAsDoneAndStartNew(currentThread); } @Override public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) { if (newState.nodes().localNodeMaster()) { // we only starts nodesFD if we are master (it may be that we received a cluster state while pinging) joinThreadControl.markThreadAsDone(currentThread); nodesFD.updateNodesAndPing(newState); // start the nodes FD } else { // if we're not a master it means another node published a cluster state while we were pinging // make sure we go through another pinging round and actively join it joinThreadControl.markThreadAsDoneAndStartNew(currentThread); } sendInitialStateEventIfNeeded(); long count = clusterJoinsCounter.incrementAndGet(); logger.trace("cluster joins counter set to [{}] (elected as master)", count); } }); } else { // 找到的節(jié)點(diǎn)不是我,試圖連接該master final boolean success = joinElectedMaster(masterNode); // finalize join through the cluster state update thread final DiscoveryNode finalMasterNode = masterNode; clusterService.submitStateUpdateTask("finalize_join (" + masterNode + ")", new ClusterStateNonMasterUpdateTask() { @Override public ClusterState execute(ClusterState currentState) throws Exception { if (!success) { // failed to join. Try again... joinThreadControl.markThreadAsDoneAndStartNew(currentThread); return currentState; } if (currentState.getNodes().masterNode() == null) { // Post 1.3.0, the master should publish a new cluster state before acking our join request. we now should have // a valid master. logger.debug("no master node is set, despite of join request completing. retrying pings."); joinThreadControl.markThreadAsDoneAndStartNew(currentThread); return currentState; } if (!currentState.getNodes().masterNode().equals(finalMasterNode)) { return joinThreadControl.stopRunningThreadAndRejoin(currentState, "master_switched_while_finalizing_join"); } // Note: we do not have to start master fault detection here because it's set at {@link #handleNewClusterStateFromMaster } // when the first cluster state arrives. joinThreadControl.markThreadAsDone(currentThread); return currentState; } @Override public void onFailure(String source, @Nullable Throwable t) { logger.error("unexpected error while trying to finalize cluster join", t); joinThreadControl.markThreadAsDoneAndStartNew(currentThread); } }); } }
以上就是join的過(guò)程。zenDiscovery在啟動(dòng)時(shí)會(huì)啟動(dòng)一個(gè)join線程,這個(gè)線程調(diào)用了該方法。同時(shí)在節(jié)點(diǎn)離開(kāi),master丟失等情況下也會(huì)重啟這一線程仍然運(yùn)行join方法。
findMaster方法
這個(gè)方法體現(xiàn)了master選舉的機(jī)制。代碼如下:
private DiscoveryNode findMaster() { //ping集群中的節(jié)點(diǎn) ZenPing.PingResponse[] fullPingResponses = pingService.pingAndWait(pingTimeout); if (fullPingResponses == null) {return null; }// 過(guò)濾所得到的ping響應(yīng),慮除client節(jié)點(diǎn),單純的data節(jié)點(diǎn) List<ZenPing.PingResponse> pingResponses = Lists.newArrayList(); for (ZenPing.PingResponse pingResponse : fullPingResponses) { DiscoveryNode node = pingResponse.node(); if (masterElectionFilterClientNodes && (node.clientNode() || (!node.masterNode() && !node.dataNode()))) { // filter out the client node, which is a client node, or also one that is not data and not master (effectively, client) } else if (masterElectionFilterDataNodes && (!node.masterNode() && node.dataNode())) { // filter out data node that is not also master } else { pingResponses.add(pingResponse); } } final DiscoveryNode localNode = clusterService.localNode(); List<DiscoveryNode> pingMasters = newArrayList(); //獲取所有ping響應(yīng)中的master節(jié)點(diǎn),如果master節(jié)點(diǎn)是節(jié)點(diǎn)本身則過(guò)濾掉。pingMasters列表結(jié)果要么為空(本節(jié)點(diǎn)是master)要么是同一個(gè)節(jié)點(diǎn)(出現(xiàn)不同節(jié)點(diǎn)則集群出現(xiàn)了問(wèn)題 不過(guò)沒(méi)關(guān)系,后面會(huì)進(jìn)行選舉) for (ZenPing.PingResponse pingResponse : pingResponses) { if (pingResponse.master() != null) { if (!localNode.equals(pingResponse.master())) { pingMasters.add(pingResponse.master()); } } } // nodes discovered during pinging Set<DiscoveryNode> activeNodes = Sets.newHashSet(); // nodes discovered who has previously been part of the cluster and do not ping for the very first time Set<DiscoveryNode> joinedOnceActiveNodes = Sets.newHashSet(); Version minimumPingVersion = localNode.version(); for (ZenPing.PingResponse pingResponse : pingResponses) { activeNodes.add(pingResponse.node()); minimumPingVersion = Version.smallest(pingResponse.node().version(), minimumPingVersion); if (pingResponse.hasJoinedOnce() != null && pingResponse.hasJoinedOnce()) { joinedOnceActiveNodes.add(pingResponse.node()); } }
//本節(jié)點(diǎn)暫時(shí)是master也要加入候選節(jié)點(diǎn)進(jìn)行選舉 if (localNode.masterNode()) { activeNodes.add(localNode); long joinsCounter = clusterJoinsCounter.get(); if (joinsCounter > 0) { logger.trace("adding local node to the list of active nodes who has previously joined the cluster (joins counter is [{}})", joinsCounter); joinedOnceActiveNodes.add(localNode); } } //pingMasters為空,則本節(jié)點(diǎn)是master節(jié)點(diǎn), if (pingMasters.isEmpty()) { if (electMaster.hasEnoughMasterNodes(activeNodes)) {//保證選舉數(shù)量,說(shuō)明有足夠多的節(jié)點(diǎn)選舉本節(jié)點(diǎn)為master,但是這還不夠,本節(jié)點(diǎn)還需要再選舉一次,如果 本次選舉節(jié)點(diǎn)仍舊是自己,那么本節(jié)點(diǎn)才能成為master。這里就體現(xiàn)了master選舉的第二條原則。 DiscoveryNode master = electMaster.electMaster(joinedOnceActiveNodes); if (master != null) { return master; } return electMaster.electMaster(activeNodes); } else { // if we don't have enough master nodes, we bail, because there are not enough master to elect from logger.trace("not enough master nodes [{}]", activeNodes); return null; } } else { //pingMasters不為空(pingMasters列表中應(yīng)該都是同一個(gè)節(jié)點(diǎn)),本節(jié)點(diǎn)沒(méi)有被選舉為master,那就接受之前的選舉。 return electMaster.electMaster(pingMasters); } }
上面的重點(diǎn)部分都做了標(biāo)注,就不再分析。除了findMaster方法,還有一個(gè)方法也體現(xiàn)了master選舉,那就是handleMasterGone。下面是它的部分代碼,提交master丟失task部分,
clusterService.submitStateUpdateTask("zen-disco-master_failed (" + masterNode + ")", Priority.IMMEDIATE,?new?ProcessedClusterStateNonMasterUpdateTask() { @Override public ClusterState execute(ClusterState currentState) { //獲取到當(dāng)前集群狀態(tài)下的所有節(jié)點(diǎn) DiscoveryNodes discoveryNodes = DiscoveryNodes.builder(currentState.nodes()) // make sure the old master node, which has failed, is not part of the nodes we publish .remove(masterNode.id()) .masterNodeId(null).build(); //rejoin過(guò)程仍然是重復(fù)findMaster過(guò)程 if (rejoin) { return rejoin(ClusterState.builder(currentState).nodes(discoveryNodes).build(), "master left (reason = " + reason + ")"); } //無(wú)法達(dá)到選舉數(shù)量,進(jìn)行findMaster過(guò)程 if (!electMaster.hasEnoughMasterNodes(discoveryNodes)) { return rejoin(ClusterState.builder(currentState).nodes(discoveryNodes).build(), "not enough master nodes after master left (reason = " + reason + ")"); } //在當(dāng)前集群狀態(tài)下,如果候選節(jié)點(diǎn)數(shù)量達(dá)到預(yù)期數(shù)量,那么選舉出來(lái)的節(jié)點(diǎn)一定是同一個(gè)節(jié)點(diǎn),因?yàn)樗械墓?jié)點(diǎn)看到的集群states是一致的 final DiscoveryNode electedMaster = electMaster.electMaster(discoveryNodes); // elect master final DiscoveryNode localNode = currentState.nodes().localNode(); .... }
從以上的代碼可以看到master選舉節(jié)點(diǎn)的應(yīng)用場(chǎng)景,無(wú)論是findMaster還是handlemasterGone,他們都保證了選舉一致性。那就是所選節(jié)點(diǎn)數(shù)量必須要達(dá)到一定的數(shù)量,否則不能認(rèn)為選舉成功,進(jìn)入等待環(huán)境。如果當(dāng)前節(jié)點(diǎn)被其它節(jié)點(diǎn)選舉為master,仍然要進(jìn)行選舉一次以保證選舉的一致性。這樣在保證了選舉數(shù)量同時(shí)對(duì)候選節(jié)點(diǎn)排序從而保證選舉的一致性。
發(fā)現(xiàn)和加入集群是zenDiscovery的主要功能,當(dāng)然它還有一些其它功能,如處理節(jié)點(diǎn)離開(kāi)(handleLeaveRequest),處理master發(fā)送的最小clustersates(handleNewClusterStateFromMaster)等功能。這里就不一一介紹,有興趣請(qǐng)參考相關(guān)源碼。
總結(jié)
本節(jié)結(jié)合zenDiscovery,分析了master選舉的另外一部分內(nèi)容。同時(shí)zenDiscovery是節(jié)點(diǎn)發(fā)現(xiàn)集群功能的集合,它主要功能是發(fā)現(xiàn)(選舉)出集群的master節(jié)點(diǎn),并試圖加入集群。同時(shí)如果 本機(jī)是master還會(huì)處理節(jié)點(diǎn)的離開(kāi)和節(jié)點(diǎn)丟失,如果不是master則會(huì)處理來(lái)自master的節(jié)點(diǎn)狀態(tài)更新。
以上就是elasticsearch的zenDiscovery和master選舉機(jī)制原理分析的詳細(xì)內(nèi)容,更多關(guān)于elasticsearch的zenDiscovery和master選舉機(jī)制的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
- Nacos?Discovery服務(wù)治理解決方案
- 關(guān)于IDEA中spring-cloud-starter-alibaba-nacos-discovery 無(wú)法引入問(wèn)題
- 關(guān)于Nacos和Eureka的區(qū)別及說(shuō)明
- 關(guān)于Nacos配置管理的統(tǒng)一配置管理、自動(dòng)刷新詳解
- elasticsearch集群發(fā)現(xiàn)zendiscovery的Ping機(jī)制分析
- elasticsearch集群cluster?discovery可配式模塊示例分析
- nacos-discovery包名層級(jí)問(wèn)題解決
相關(guān)文章
Spring AOP 對(duì)象內(nèi)部方法間的嵌套調(diào)用方式
這篇文章主要介紹了Spring AOP 對(duì)象內(nèi)部方法間的嵌套調(diào)用方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2021-08-08Spring boot 路徑映射的實(shí)現(xiàn)
這篇文章主要介紹了spring boot 路徑映射的實(shí)現(xiàn),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2019-11-11MongoDB整合Spring實(shí)例詳細(xì)講解(含代碼)
這篇文章主要介紹了MongoDB整合Spring實(shí)例詳細(xì)講解(含代碼),小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,也給大家做個(gè)參考。一起跟隨小編過(guò)來(lái)看看吧2017-01-01淺談Java 三種方式實(shí)現(xiàn)接口校驗(yàn)
這篇文章主要介紹了淺談Java 三種方式實(shí)現(xiàn)接口校驗(yàn),小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,也給大家做個(gè)參考。一起跟隨小編過(guò)來(lái)看看吧2017-10-10Spring?Bean自動(dòng)裝配入門(mén)到精通
自動(dòng)裝配是使用spring滿足bean依賴的一種方法,spring會(huì)在應(yīng)用上下文中為某個(gè)bean尋找其依賴的bean,Spring中bean有三種裝配機(jī)制,分別是:在xml中顯式配置、在java中顯式配置、隱式的bean發(fā)現(xiàn)機(jī)制和自動(dòng)裝配2022-08-08Spring框架實(shí)現(xiàn)AOP的兩種方式詳解
這篇文章主要為大家詳細(xì)介紹了Spring框架實(shí)現(xiàn)AOP的兩種方式,文中的示例代碼講解詳細(xì),對(duì)我們學(xué)習(xí)有一定的借鑒價(jià)值,需要的可以參考一下2022-09-09通過(guò)實(shí)例學(xué)習(xí)Java集合框架HashSet
這篇文章主要介紹了通過(guò)實(shí)例學(xué)習(xí)Java集合框架HashSet,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2019-12-12spring boot實(shí)戰(zhàn)之內(nèi)嵌容器tomcat配置
本篇文章主要介紹了Spring Boot 使用內(nèi)嵌的tomcat容器配置,小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,也給大家做個(gè)參考。一起跟隨小編過(guò)來(lái)看看吧2018-01-01