loadavg數(shù)據(jù)異常引發(fā)問題起源分析
proc
- NAME (名稱解釋):
proc - process information pseudo-filesystem (存儲(chǔ)進(jìn)程信息的偽文件系統(tǒng))
- DESCRIPTION (詳細(xì))
The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures.
It is commonly mounted at /proc. Most of it is read-only, but some files allow kernel variables to
be changedpooc文件系統(tǒng)是一個(gè)偽裝的文件系統(tǒng),它提供接口給內(nèi)核來存儲(chǔ)數(shù)據(jù),通常掛載在設(shè)備的/proc目錄,
大部分文件是只讀的,但是有些文件可以被內(nèi)和變量給改變.
具體代表的含義可以通過man proc
去查看. 以上信息就是通過man
獲取.翻譯不一定精確.
loadavg
cat /proc/loadavg
/proc/loadavg
The first three fields in this file are load average figures giving the number of
jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1, 5,
and 15 minutes.這個(gè)文件的前三個(gè)數(shù)字是平均負(fù)載的數(shù)值,計(jì)算平均1分鐘,5分鐘,15分鐘內(nèi)的運(yùn)行隊(duì)列中(R狀態(tài))或等待磁盤I/O(D狀態(tài))的任務(wù)數(shù).
The first of these is the number of cur‐rently runnable kernel scheduling entities
(processes, threads). The value after the slash is the number of kernel scheduling
entities that currently exist on the system.第四個(gè)參數(shù)/前面是可運(yùn)行的內(nèi)核調(diào)度實(shí)體的數(shù)量(調(diào)度實(shí)體指 進(jìn)程,線程), /后的值是系統(tǒng)中存在的內(nèi)核調(diào)度實(shí)體的數(shù)量.
The fifth field is the PID of the process that was most recently created on the system.
第五個(gè)參數(shù)是系統(tǒng)最新創(chuàng)建進(jìn)程的PID
1: 問題起源
在從事的大屏領(lǐng)域遇到一個(gè)問題,就是loadavg
中的數(shù)值其高無比,對(duì)比8
核手機(jī)的3+
,4+
,目前的手頭的設(shè)備loadavg
竟然高達(dá)70+
,這個(gè)問題一直困擾了我很久,最近騰出一個(gè)整塊的時(shí)間來研究一下這個(gè)數(shù)值的計(jì)算規(guī)則.
在kernel
中的loadvg.c
文件中有這樣的一個(gè)函數(shù).我們看到它就是最終的輸出函數(shù).
static int loadavg_proc_show(struct seq_file *m, void *v) { unsigned long avnrun[3]; get_avenrun(avnrun, FIXED_1/200, 0); seq_printf(m, "%lu.%02lu %lu.%02lu %lu.%02lu %ld/%d %d\n", LOAD_INT(avnrun[0]), LOAD_FRAC(avnrun[0]), // 1分鐘平均值 LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]), // 5分鐘平均值 LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]), // 15分鐘平均值 // 可運(yùn)行實(shí)體使用 nr_running()獲取, nr_threads 是存在的所有實(shí)體 nr_running() , nr_threads, // 獲取最新創(chuàng)建的進(jìn)程PID task_active_pid_ns(current)->last_pid); return 0; }
看過上面的代碼獲取具體平均負(fù)載的函數(shù)是get_avenrun()
,我們接著找一下它的具體實(shí)現(xiàn).
unsigned long avenrun[3]; EXPORT_SYMBOL(avenrun); /* should be removed */ /** * get_avenrun - get the load average array * @loads: pointer to dest load array * @offset: offset to add * @shift: shift count to shift the result left * * These values are estimates at best, so no need for locking. */ void get_avenrun(unsigned long *loads, unsigned long offset, int shift) { //數(shù)據(jù)來源主要是avenrun數(shù)組 loads[0] = (avenrun[0] + offset) << shift; loads[1] = (avenrun[1] + offset) << shift; loads[2] = (avenrun[2] + offset) << shift; }
2: 數(shù)據(jù)來源
接著我們接著尋找avenrun[]
在哪里賦值,我們先看數(shù)據(jù)的來源問題.
kernel
版本4.9
代碼路徑kernel/sched/core.c
,kernel/sched/loadavg.c
.
2.1:scheduler_tick
/* * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. * 這里注釋就比較清楚了,由計(jì)時(shí)器調(diào)度,調(diào)度的頻率為HZ */ void scheduler_tick(void) { int cpu = smp_processor_id(); struct rq *rq = cpu_rq(cpu); struct task_struct *curr = rq->curr; sched_clock_tick(); raw_spin_lock(&rq->lock); walt_set_window_start(rq); walt_update_task_ravg(rq->curr, rq, TASK_UPDATE, walt_ktime_clock(), 0); update_rq_clock(rq); curr->sched_class->task_tick(rq, curr, 0); cpu_load_update_active(rq); calc_global_load_tick(rq); // 這里調(diào)度 raw_spin_unlock(&rq->lock); perf_event_task_tick(); #ifdef CONFIG_SMP rq->idle_balance = idle_cpu(cpu); trigger_load_balance(rq); #endif rq_last_tick_reset(rq); if (curr->sched_class == &fair_sched_class) check_for_migration(rq, curr); }
2.2: calc_global_load_tick
/* * Called from scheduler_tick() to periodically update this CPU's * active count. */ void calc_global_load_tick(struct rq *this_rq) { long delta; //過濾系統(tǒng)負(fù)載重復(fù)更新,這里是同過jiffies進(jìn)行過濾,jiffies也在下面統(tǒng)一介紹 if (time_before(jiffies, this_rq->calc_load_update)) return; // 更新數(shù)據(jù) delta = calc_load_fold_active(this_rq, 0); if (delta) // 將數(shù)據(jù)同步到calc_load_tasks, atomic_long_add 是kernel中的一個(gè)原子操作函數(shù) atomic_long_add(delta, &calc_load_tasks); // 下一次系統(tǒng)更新系統(tǒng)負(fù)載的時(shí)間 LOAD_FREQ定義在include/linux/sched.h // #define LOAD_FREQ (5*HZ+1) /* 5 sec intervals */ this_rq->calc_load_update += LOAD_FREQ; }
2.3: calc_load_fold_active
long calc_load_fold_active(struct rq *this_rq, long adjust) { long nr_active, delta = 0; nr_active = this_rq->nr_running - adjust; //統(tǒng)計(jì)調(diào)度器中nr_running的task數(shù)量 adjust傳入為0,不做討論. nr_active += (long)this_rq->nr_uninterruptible; //統(tǒng)計(jì)調(diào)度器中nr_uninterruptible的task的數(shù)量. // calc_load_active代表了nr_running和nr_uninterruptible的數(shù)量,如果存在差值就計(jì)算差值 if (nr_active != this_rq->calc_load_active) { delta = nr_active - this_rq->calc_load_active; this_rq->calc_load_active = nr_active; } // 統(tǒng)計(jì)完成,return后,將數(shù)據(jù)更新到 calc_load_tasks. return delta; }
3: 數(shù)據(jù)計(jì)算
看完數(shù)據(jù)來源的邏輯,我們接著梳理數(shù)據(jù)計(jì)算的邏輯
這里前半部分的邏輯設(shè)計(jì)的底層驅(qū)動(dòng)的高分辨率定時(shí)器模塊,我并不是十分了解.簡(jiǎn)單的介紹一下,感興趣的可以自己去研究一下.(類名:tick-sched.c,因?yàn)?code>planuml不支持類名存在-
)
3.1: tick_sched_timer
/* * High resolution timer specific code */ //這里要看下內(nèi)核是否開啟了高分辨率定時(shí)器+ CONFIG_HIGH_RES_TIMERS = y #ifdef CONFIG_HIGH_RES_TIMERS /* * We rearm the timer until we get disabled by the idle code. * Called with interrupts disabled. */ // tick_sched_timer函數(shù)是高分辨率定時(shí)器的到期函數(shù),也就是定時(shí)的每個(gè)周期結(jié)束都會(huì)執(zhí)行 static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) { struct tick_sched *ts = container_of(timer, struct tick_sched, sched_timer); struct pt_regs *regs = get_irq_regs(); ktime_t now = ktime_get(); tick_sched_do_timer(now); ... return HRTIMER_RESTART; }
3.2: calc_global_load
中間的定時(shí)器模塊的函數(shù)就跳過了,已經(jīng)超出本文的范圍,我也并不是完全了解其中的邏輯.
/* * calc_load - update the avenrun load estimates 10 ticks after the * CPUs have updated calc_load_tasks. * * Called from the global timer code. */ void calc_global_load(unsigned long ticks) { long active, delta; // 在前文出現(xiàn)過的時(shí)間,這里有加上了10個(gè)tick,總間隔就是5s + 10 tick if (time_before(jiffies, calc_load_update + 10)) return; /* * Fold the 'old' idle-delta to include all NO_HZ cpus. */ // 統(tǒng)計(jì)NO_HZ模式下,cpu陷入空閑時(shí)間段錯(cuò)過統(tǒng)計(jì)的task數(shù)據(jù) delta = calc_load_fold_idle(); if (delta) atomic_long_add(delta, &calc_load_tasks); // 更新數(shù)據(jù) active = atomic_long_read(&calc_load_tasks); // 原子的方式讀取前面存入的全局變量 active = active > 0 ? active * FIXED_1 : 0; // 乘FIXED_1 avenrun[0] = calc_load(avenrun[0], EXP_1, active); // 1分鐘負(fù)載 avenrun[1] = calc_load(avenrun[1], EXP_5, active); // 5分鐘負(fù)載 avenrun[2] = calc_load(avenrun[2], EXP_15, active); // 15分鐘負(fù)載 calc_load_update += LOAD_FREQ; //更新時(shí)間 /* * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk. */ //統(tǒng)計(jì)了NO_HZ模式下的task數(shù)據(jù),也要將NO_HZ模式下的tick數(shù)重新計(jì)算,要不然數(shù)據(jù)會(huì)不準(zhǔn). calc_global_nohz(); }
這里出現(xiàn)了一個(gè)NO_HZ
模式,這個(gè)是CPU的一個(gè)概念,后文專門介紹一下.下面就是負(fù)載的計(jì)算規(guī)則了
3.3:計(jì)算規(guī)則 calc_load
/* * a1 = a0 * e + a * (1 - e) */ static unsigned long calc_load(unsigned long load, unsigned long exp, unsigned long active) { unsigned long newload; newload = load * exp + active * (FIXED_1 - exp); if (active >= load) newload += FIXED_1-1; return newload / FIXED_1; }
具體的計(jì)算規(guī)則注釋也是非常清晰了,并不復(fù)雜,整體下來就和使用man proc
獲取到的信息一樣,系統(tǒng)負(fù)載統(tǒng)計(jì)的是nr_running
和nr_uninterruptible
的數(shù)量.這兩個(gè)數(shù)據(jù)的來源就是core.c
的struct rq
,rq
是CPU運(yùn)行隊(duì)列中重要的存儲(chǔ)結(jié)構(gòu)之一.
問題解析
回到最初的問題,我司的設(shè)備系統(tǒng)負(fù)載達(dá)到70+
還沒有卡爆炸的原因,通過上面的代碼邏輯還是沒有直接給出答案.不過已經(jīng)有了邏輯,其他就很簡(jiǎn)單了.
- 1: 我輸出了
nr_running
和nr_uninterruptible
的task數(shù)量發(fā)現(xiàn),nr_running
的數(shù)據(jù)是正常的,出問題的在與nr_uninterruptible
的數(shù)量. - 2:出問題的是
nr_uninterruptible
task數(shù)量,那么我司的設(shè)備真的有那么多任務(wù)在等待I/O么,真的有怎么多任務(wù)在等待I/O,設(shè)備依然會(huì)十分卡頓,我抓取了systrace
查看后,一切是正常的. - 3: 事情到了這里,就只能借助搜索引擎了.根據(jù)
nr_uninterruptible
的關(guān)鍵字,我查到了一些蛛絲馬跡.
簡(jiǎn)述結(jié)果
首先在UNIX
系統(tǒng)上是沒有統(tǒng)計(jì)nr_uninterruptible
的,Linux
在引入后,有人提出不統(tǒng)計(jì)I/O
等待的任務(wù)數(shù)量,無法體現(xiàn)真正體現(xiàn)系統(tǒng)的負(fù)載狀況.
后面在很多Linux
大佬的文章中看到一個(gè)信息,NFS系統(tǒng)出現(xiàn)問題的的時(shí)候,會(huì)將所有訪問這個(gè)文件系統(tǒng)的線程都標(biāo)識(shí)為nr_uninterruptible
,這部分的知識(shí)太貼近內(nèi)核了.(ps:如果有大佬有相關(guān)的內(nèi)核書籍推薦的話,請(qǐng)務(wù)必推薦一下).
- 結(jié)論: 因?yàn)?code>nr_uninterruptible的數(shù)據(jù)異常,導(dǎo)致系統(tǒng)負(fù)載數(shù)據(jù)并沒有體現(xiàn)出目前設(shè)備的真實(shí)狀況.
收獲和總結(jié)
- 1: scheduler_tick這個(gè)函數(shù)注釋中提到的
HZ
,應(yīng)該是軟中斷,軟中斷和內(nèi)核配置中的CONFIG_HZ_250
,CONFIG_HZ_1000
是關(guān)聯(lián)的,例如CONFIG_HZ_1000=y,CONFIG_HZ=1000
,就是每秒內(nèi)核會(huì)發(fā)出1000的軟中斷信號(hào). 對(duì)應(yīng)的時(shí)間就是1s/1000
. (通常CONFIG_HZ=250
) - 2:
jiffies
它就是時(shí)鐘中斷次數(shù),jiffies = 1s / HZ
- 3:
rq
結(jié)構(gòu)體太長(zhǎng)了,就不全部貼出來了,結(jié)構(gòu)體定義在kernel/sched/sched.h
中,有興趣的自行查看.
struct rq *rq = cpu_rq(cpu); /* * This is the main, per-CPU runqueue data structure. * * Locking rule: those places that want to lock multiple runqueues * (such as the load balancing or the thread migration code), lock * acquire operations must be ordered by ascending &runqueue. */ struct rq { /* runqueue lock: */ raw_spinlock_t lock; /* * nr_running and cpu_load should be in the same cacheline because * remote CPUs use both these fields when doing load calculation. */ unsigned int nr_running; // 這里 #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; unsigned int nr_preferred_running; #endif #define CPU_LOAD_IDX_MAX 5 unsigned long cpu_load[CPU_LOAD_IDX_MAX]; unsigned int misfit_task; #ifdef CONFIG_NO_HZ_COMMON #ifdef CONFIG_SMP unsigned long last_load_update_tick; #endif /* CONFIG_SMP */ unsigned long nohz_flags; #endif /* CONFIG_NO_HZ_COMMON */ #ifdef CONFIG_NO_HZ_FULL unsigned long last_sched_tick; #endif #ifdef CONFIG_CPU_QUIET /* time-based average load */ u64 nr_last_stamp; u64 nr_running_integral; seqcount_t ave_seqcnt; #endif /* capture load from *all* tasks on this cpu: */ struct load_weight load; unsigned long nr_load_updates; u64 nr_switches; struct cfs_rq cfs; struct rt_rq rt; struct dl_rq dl; #ifdef CONFIG_FAIR_GROUP_SCHED /* list of leaf cfs_rq on this cpu: */ struct list_head leaf_cfs_rq_list; struct list_head *tmp_alone_branch; #endif /* CONFIG_FAIR_GROUP_SCHED */ /* * This is part of a global counter where only the total sum * over all CPUs matters. A task can increase this counter on * one CPU and if it got migrated afterwards it may decrease * it on another CPU. Always updated under the runqueue lock: */ unsigned long nr_uninterruptible; // 這里 struct task_struct *curr, *idle, *stop; unsigned long next_balance; struct mm_struct *prev_mm; unsigned int clock_skip_update; u64 clock; u64 clock_task; atomic_t nr_iowait; #ifdef CONFIG_SMP struct root_domain *rd; struct sched_domain *sd; unsigned long cpu_capacity; unsigned long cpu_capacity_orig; struct callback_head *balance_callback; unsigned char idle_balance; /* For active balancing */ int active_balance; int push_cpu; struct task_struct *push_task; struct cpu_stop_work active_balance_work; /* cpu of this runqueue: */ int cpu; int online; ... };
- 4高分辨率定時(shí)器針對(duì)單處理器系統(tǒng),可以為CPU提供的納米級(jí)定時(shí)精度.內(nèi)核配置
CONFIG_HIGH_RES_TIMERS=y
- 5:
NO_HZ
就是在CPU進(jìn)入休眠狀態(tài)時(shí),不再持續(xù)的發(fā)送軟中斷信號(hào),來減少設(shè)備功耗與耗電.內(nèi)核配置CONFIG_NO_HZ=y
&CONFIG_NO_HZ_IDLE=y
,那么相反,如果設(shè)備對(duì)功耗并不敏感,需要外部輸入電源,可以關(guān)閉這個(gè)模式,來提高性能. - 6:
Android
提取內(nèi)核配置:
adb pull /proc/config.gz .
以上就是loadavg數(shù)據(jù)異常引發(fā)問題起源分析的詳細(xì)內(nèi)容,更多關(guān)于loadavg 數(shù)據(jù)異常的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
Android的權(quán)限設(shè)置及自啟動(dòng)設(shè)置方法
今天小編就為大家分享一篇Android的權(quán)限設(shè)置及自啟動(dòng)設(shè)置方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2018-07-07安卓應(yīng)用開發(fā)通過java調(diào)用c++ jni的圖文使用方法
這篇文章主要介紹了2013-11-11Android EditText限制輸入整數(shù)和小數(shù)的位數(shù)的方法示例
這篇文章主要介紹了Android EditText限制輸入整數(shù)和小數(shù)的位數(shù)的方法示例,小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,也給大家做個(gè)參考。一起跟隨小編過來看看吧2018-08-08Android編程實(shí)現(xiàn)圖片背景漸變切換與圖層疊加效果
這篇文章主要介紹了Android編程實(shí)現(xiàn)圖片背景漸變切換與圖層疊加效果,涉及Android圖形特效的相關(guān)操作技巧,具有一定參考借鑒價(jià)值,需要的朋友可以參考下2017-01-01Android實(shí)現(xiàn)圖片加載進(jìn)度提示
這篇文章主要為大家詳細(xì)介紹了Android實(shí)現(xiàn)圖片加載進(jìn)度提示,文中示例代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2020-06-06AndroidStudio項(xiàng)目打包成jar的簡(jiǎn)單方法
JAR(Java Archive,Java 歸檔文件)是與平臺(tái)無關(guān)的文件格式,它允許將許多文件組合成一個(gè)壓縮文件,在eclipse中我們知道如何將一個(gè)項(xiàng)目導(dǎo)出為jar包,供其它項(xiàng)目使用呢?下面通過本文給大家介紹ndroidStudio項(xiàng)目打包成jar的簡(jiǎn)單方法,需要的朋友參考下吧2017-11-11Android選擇圖片或拍照?qǐng)D片上傳到服務(wù)器
這篇文章主要為大家詳細(xì)介紹了android選擇圖片或拍照?qǐng)D片上傳到服務(wù)器的方法,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2017-01-01Android實(shí)現(xiàn)網(wǎng)絡(luò)圖片瀏覽器
這篇文章主要為大家詳細(xì)介紹了Android實(shí)現(xiàn)網(wǎng)絡(luò)圖片瀏覽器的相關(guān)資料,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2017-05-05