Apache?SkyWalking?修復TTL?timer?失效bug詳解

更新時間：2022年09月23日 16:53:40 作者：Geek_ymv

這篇文章主要為大家介紹了Apache?SkyWalking?修復TTL?timer?失效bug詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進步，早日升職加薪

正文

近期，Apache SkyWalking 修復了一個隱藏了近4年的Bug - TTL timer 可能失效問題，這個 bug 在 SkyWalking <=9.2.0 版本中存在。關于這個 bug 的詳細信息可以看郵件列表 lists.apache.org/thread/ztp4… 具體如下

首先說下這個 Bug 導致的現(xiàn)象：

過期的索引不能被刪除，所有的OAP節(jié)點都出現(xiàn)類似日志 The selected first getAddress is xxx.xxx.xx.xx:port. The remove stage is skipped.
對于以 no-init 模式啟動的 OAP 節(jié)點，重啟的時候會一直打印類似日志 table:xxx does not exist. OAP is running in 'no-init' mode, waiting... retry 3s later.

如果 SkyWalking OAP 出現(xiàn)上面的兩個問題，很可能就是這個 Bug 導致的。

下面我們先了解一下 SkyWalking OAP 集群方面的設計

SkyWalking OAP 角色

SkyWalking OAP 可選的角色有 Mixed、Receiver、Aggregator

Mixed 角色主要負責接收數(shù)據(jù)、L1聚合和L2聚合；
Receiver 角色負責接收數(shù)據(jù)和L1聚合；
Aggregator 角色負責L2聚合。

默認角色是 Mixed，可以通過修改 application.yml 進行配置

core:
  selector: ${SW_CORE:default}
  default:
    # Mixed: Receive agent data, Level 1 aggregate, Level 2 aggregate
    # Receiver: Receive agent data, Level 1 aggregate
    # Aggregator: Level 2 aggregate
    role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator
    restHost: ${SW_CORE_REST_HOST:0.0.0.0}
    restPort: ${SW_CORE_REST_PORT:12800}
# 省略部分配置...

L1聚合：為了減少內(nèi)存及網(wǎng)絡負載，對于接收到的 metrics 數(shù)據(jù)進行當前 OAP 節(jié)點內(nèi)的聚合，具體實現(xiàn)參考 MetricsAggregateWorker#onWork() 方法的實現(xiàn)；

L2聚合：又稱分布式聚合，OAP 節(jié)點將L1聚合后的數(shù)據(jù)，根據(jù)一定的路由規(guī)則，發(fā)送給集群中的其他OAP節(jié)點，進行二次聚合，并入庫。具體實現(xiàn)見 MetricsPersistentWorker 類。

SkyWalking OAP 集群

OAP 支持集群部署，目前支持的注冊中心有

zookeeper
kubernetes
consul
etcd
nacos

默認是 standalone，可以通過修改 application.yml 進行配置

cluster:
  selector: ${SW_CLUSTER:standalone}
  standalone:
  # Please check your ZooKeeper is 3.5+, However, it is also compatible with ZooKeeper 3.4.x. Replace the ZooKeeper 3.5+
  # library the oap-libs folder with your ZooKeeper 3.4.x library.
  zookeeper:
    namespace: ${SW_NAMESPACE:""}
    hostPort: ${SW_CLUSTER_ZK_HOST_PORT:localhost:2181}
    # Retry Policy
    baseSleepTimeMs: ${SW_CLUSTER_ZK_SLEEP_TIME:1000} # initial amount of time to wait between retries
    maxRetries: ${SW_CLUSTER_ZK_MAX_RETRIES:3} # max number of times to retry
    # Enable ACL
    enableACL: ${SW_ZK_ENABLE_ACL:false} # disable ACL in default
    schema: ${SW_ZK_SCHEMA:digest} # only support digest schema
    expression: ${SW_ZK_EXPRESSION:skywalking:skywalking}
    internalComHost: ${SW_CLUSTER_INTERNAL_COM_HOST:""}
    internalComPort: ${SW_CLUSTER_INTERNAL_COM_PORT:-1}
  kubernetes:
    namespace: ${SW_CLUSTER_K8S_NAMESPACE:default}
  # 省略部分配置...

OAP 啟動的時候，如果當前角色是 Mixed 或 Aggregator，則會將自己注冊到集群注冊中心，standalone 模式下也有一個內(nèi)存級集群管理器，參見 StandaloneManager 類的實現(xiàn) 。

Data TTL timer 配置

application.yml 中的配置

core:
  selector: ${SW_CORE:default}
  default:
    # Mixed: Receive agent data, Level 1 aggregate, Level 2 aggregate
    # Receiver: Receive agent data, Level 1 aggregate
    # Aggregator: Level 2 aggregate
    role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator
    restHost: ${SW_CORE_REST_HOST:0.0.0.0}
    restPort: ${SW_CORE_REST_PORT:12800}
    # 省略部分配置...
    # Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted.
    enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close.
    dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:5} # How often the data keeper executor runs periodically, unit is minute
    recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:3} # Unit is day
    metricsDataTTL: ${SW_CORE_METRICS_DATA_TTL:7} # Unit is day
    # 省略部分配置...

enableDataKeeperExecutor 自動刪除過去數(shù)據(jù)的執(zhí)行器開關，默認是開啟的；
dataKeeperExecutePeriod 執(zhí)行周期，默認5分鐘；
recordDataTTL record 數(shù)據(jù)的 TTL(Time To Live)，單位：天；
metricsDataTTL metrics 數(shù)據(jù)的 TTL，單位：天。

DataTTLKeeperTimer 定時任務

DataTTLKeeperTimer 負責刪除過期的數(shù)據(jù)，SkyWalking OAP 在啟動的時候會根據(jù) enableDataKeeperExecutor 配置決定是否開啟 DataTTLKeeperTimer，也就是是否執(zhí)行 DataTTLKeeperTimer#start() 方法。 DataTTLKeeperTimer#start() 方法的執(zhí)行邏輯主要是通過 JDK 內(nèi)置的 Executors.newSingleThreadScheduledExecutor() 創(chuàng)建一個單線程的定時任務，執(zhí)行 DataTTLKeeperTimer#delete() 方法刪除過期的數(shù)據(jù)，執(zhí)行周期是dataKeeperExecutePeriod 配置值，默認5分鐘執(zhí)行一次。

Bug 產(chǎn)生的原因

DataTTLKeeperTimer#start() 方法會在所有 OAP 節(jié)點啟動一個定時任務，那如果所有節(jié)點都去執(zhí)行數(shù)據(jù)刪除操作可能會有問題，那么如何保證只有一個節(jié)點執(zhí)行呢？

如果讓我們設計的話，可能會引入一個分布式任務調(diào)度框架或者實現(xiàn)分布式鎖，這樣的話 SkyWalking 就要強依賴某個中間件了，SkyWalking 可能是考慮到了這些也沒有選擇這么實現(xiàn)。

那我們看下 SkyWalking 是如何解決這個問題的呢，我們前面提到 OAP 在啟動的時候，如果當前角色是 Mixed 或 Aggregator，則會將自己注冊到集群注冊中心，SkyWalking OAP 調(diào)用 clusterNodesQuery#queryRemoteNodes() 方法，從注冊中心獲取這些節(jié)點的注冊信息（host:port）集合，然后判斷集合中的第一個節(jié)點是否就是當前節(jié)點，如果是那么當前節(jié)點執(zhí)行過期數(shù)據(jù)刪除操作，如下圖所示

節(jié)點A和節(jié)點集合中的第一個元素相等，則節(jié)點A負責執(zhí)行過期數(shù)據(jù)刪除操作。

這就要求 queryRemoteNodes 返回的節(jié)點集合是有序的，為什么這么說呢，試想一下，如果每個 OAP 節(jié)點調(diào)用 queryRemoteNodes 方法返回的注冊信息順序不一致的話，就可能出現(xiàn)所有節(jié)點都不和集合中的第一個節(jié)點相等，這種情況下就沒有 OAP 節(jié)點能執(zhí)行過期數(shù)據(jù)刪除操作了，而 queryRemoteNodes 方法恰恰無法保證返回的注冊信息順序一致。

解決 Bug

我們既然知道了 bug 產(chǎn)生的原因，解決起來就比較簡單了，只需要對獲取到的節(jié)點集合調(diào)用 Collections.sort() 方法對 RemoteInstance（實現(xiàn)了java.lang.Comparable 接口）做排序，保證所有OAP節(jié)點做比較時都是一致的順序，代碼如下

相關代碼如下：

/**
 * TTL = Time To Live
 *
 * DataTTLKeeperTimer is an internal timer, it drives the {@link IHistoryDeleteDAO} to remove the expired data. TTL
 * configurations are provided in {@link CoreModuleConfig}, some storage implementations, such as ES6/ES7, provides an
 * override TTL, which could be more suitable for the implementation. No matter which TTL configurations are set, they
 * are all driven by this timer.
 */
@Slf4j
public enum DataTTLKeeperTimer {
    INSTANCE;
    private ModuleManager moduleManager;
    private ClusterNodesQuery clusterNodesQuery;
    private CoreModuleConfig moduleConfig;
    public void start(ModuleManager moduleManager, CoreModuleConfig moduleConfig) {
        this.moduleManager = moduleManager;
        this.clusterNodesQuery = moduleManager.find(ClusterModule.NAME).provider().getService(ClusterNodesQuery.class);
        this.moduleConfig = moduleConfig;
        // 創(chuàng)建定時任務
        Executors.newSingleThreadScheduledExecutor()
                 .scheduleAtFixedRate(
                     new RunnableWithExceptionProtection(
                         this::delete, // 刪除過期的數(shù)據(jù)
                         t -> log.error("Remove data in background failure.", t)
                     ), moduleConfig
                         .getDataKeeperExecutePeriod(), moduleConfig.getDataKeeperExecutePeriod(), TimeUnit.MINUTES);
    }
    /**
     * DataTTLKeeperTimer starts in every OAP node, but the deletion only work when it is as the first node in the OAP
     * node list from {@link ClusterNodesQuery}.
     */
    private void delete() {
        IModelManager modelGetter = moduleManager.find(CoreModule.NAME).provider().getService(IModelManager.class);
        List<Model> models = modelGetter.allModels();
        try {
            // 查詢服務節(jié)點
            List<RemoteInstance> remoteInstances = clusterNodesQuery.queryRemoteNodes();
            if (CollectionUtils.isNotEmpty(remoteInstances) && !remoteInstances.get(0).getAddress().isSelf()) {
                log.info(
                    "The selected first getAddress is {}. The remove stage is skipped.",
                    remoteInstances.get(0).toString()
                );
                return;
            }
            // 返回的第一個節(jié)點是自己，則執(zhí)行刪除操作
            log.info("Beginning to remove expired metrics from the storage.");
            models.forEach(this::execute);
        } finally {
            log.info("Beginning to inspect data boundaries.");
            this.inspect(models);
        }
    }
    private void execute(Model model) {
        try {
            if (!model.isTimeSeries()) {
                return;
            }
            if (log.isDebugEnabled()) {
                log.debug(
                    "Is record? {}. RecordDataTTL {}, MetricsDataTTL {}",
                    model.isRecord(),
                    moduleConfig.getRecordDataTTL(),
                    moduleConfig.getMetricsDataTTL());
            }
            // 獲取 IHistoryDeleteDAO 接口的具體實現(xiàn)
            moduleManager.find(StorageModule.NAME)
                         .provider()
                         .getService(IHistoryDeleteDAO.class)
                         .deleteHistory(model, Metrics.TIME_BUCKET,
                                        model.isRecord() ? moduleConfig.getRecordDataTTL() : moduleConfig.getMetricsDataTTL()
                         );
        } catch (IOException e) {
            log.warn("History of {} delete failure", model.getName());
            log.error(e.getMessage(), e);
        }
    }
    private void inspect(List<Model> models) {
        try {
            moduleManager.find(StorageModule.NAME)
                         .provider()
                         .getService(IHistoryDeleteDAO.class)
                         .inspect(models, Metrics.TIME_BUCKET);
        } catch (IOException e) {
            log.error(e.getMessage(), e);
        }
    }
}

更多技術(shù)細節(jié)大家可以參考下面的鏈接

相關鏈接