Go調(diào)度器學(xué)習(xí)之協(xié)作與搶占詳解

更新時間：2023年04月04日 08:23:47 作者：IguoChan

如果某個G執(zhí)行時間過長，其他的G如何才能被正常調(diào)度，這就引出了接下來的話題：協(xié)作與搶占。本文將通過一些示例為大家詳細(xì)講講調(diào)度器中協(xié)作與搶占的相關(guān)知識，需要的可以參考一下

0. 簡介

在上篇博客——《Golang調(diào)度器(4)—goroutine調(diào)度》中一直遺留了一個沒有解答的問題：如果某個G執(zhí)行時間過長，其他的G如何才能被正常調(diào)度，這就引出了接下來的話題：協(xié)作與搶占。

在Go語言的v1.2版本就實現(xiàn)餓了基于協(xié)作的搶占式調(diào)用，這種調(diào)用的基本原理就是：

當(dāng)sysmon監(jiān)控線程發(fā)現(xiàn)有協(xié)程的執(zhí)行時間太長了，那么會友好地為這個協(xié)程設(shè)置搶占標(biāo)記；
當(dāng)這個協(xié)程調(diào)用（call）一個函數(shù)時，會檢查是否擴容棧，而這里就會檢查搶占標(biāo)記，如果被標(biāo)記，則會讓出CPU，從而實現(xiàn)調(diào)度。

但是這種調(diào)度方式是協(xié)程主動的，是基于協(xié)作的，但是他無法面對一些場景，比如在死循環(huán)中沒有任何調(diào)用發(fā)生，那么這個協(xié)程將永遠(yuǎn)執(zhí)行下去，永遠(yuǎn)不會發(fā)生調(diào)度，這顯然是不可接受的。

于是，在v1.14版本，Go終于引入了基于信號的搶占式調(diào)度，下面，我們將介紹一下這兩種搶占調(diào)度。

1. 用戶主動讓出CPU：runtime.Gosched函數(shù)

在介紹兩種搶占調(diào)度之前，我們首先介紹一下runtime.Gosched函數(shù)：

// Gosched yields the processor, allowing other goroutines to run. It does not
// suspend the current goroutine, so execution resumes automatically.
func Gosched() {
   checkTimeouts()
   mcall(gosched_m)
}

根據(jù)說明，runtime.Gosched函數(shù)會主動放棄當(dāng)前處理器，并且允許其他協(xié)程執(zhí)行，但是起并不會暫停自己，而只是讓渡調(diào)度權(quán)，之后依賴調(diào)度器獲得重新調(diào)度。

之后，會通過mcall函數(shù)切換到g0棧去執(zhí)行gosched_m函數(shù)：

// Gosched continuation on g0.
func gosched_m(gp *g) {
   if trace.enabled {
      traceGoSched()
   }
   goschedImpl(gp)
}

gosched_m調(diào)用goschedImpl函數(shù)，其會為協(xié)程gp讓渡出本M，并且將gp放到全局隊列中，等待調(diào)度。

func goschedImpl(gp *g) {
   status := readgstatus(gp)
   if status&^_Gscan != _Grunning {
      dumpgstatus(gp)
      throw("bad g status")
   }
   casgstatus(gp, _Grunning, _Grunnable)
   dropg()            // 使當(dāng)前m放棄gp，就是其參數(shù) curg
   lock(&sched.lock)
   globrunqput(gp)    // 并且把gp放到全局隊列中，等待調(diào)度
   unlock(&sched.lock)

   schedule()
}

雖然runtime.Gosched具有主動放棄CPU的能力，但是對用戶的要求比較高，并非用戶友好的。

2. 基于協(xié)作的搶占式調(diào)度

2.1 場景

package main

import (
   "fmt"
   "runtime"
   "sync"
   "time"
)

var once = sync.Once{}

func f() {
   once.Do(func() {
      fmt.Println("I am go routine 1!")
   })
}

func main() {
   defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))

   go func() {
      for {
         f()
      }
   }()

   time.Sleep(10 * time.Millisecond)
   fmt.Println("I am main goroutine!")
}

我們考慮如上代碼，首先我們設(shè)置P的個數(shù)為1，然后起一個協(xié)程中進入死循環(huán)，循環(huán)調(diào)用一個函數(shù)，如果沒有搶占調(diào)度，那么這個協(xié)程將一直占據(jù)P，也就是會一直占據(jù)CPU，代碼就永遠(yuǎn)不可能執(zhí)行到fmt.Println("I am main goroutine!")這行。下面我們看看，協(xié)作式搶占是怎么避免以上問題的。

2.2 棧擴張與搶占標(biāo)記

$ go tool compile -N -l main.go
$ go tool objdump main.o >> main.i

我們通過以上指令，得到2.1中代碼的匯編代碼，截取f函數(shù)的匯編代碼如下：

TEXT "".f(SB) gofile../home/chenyiguo/smb_share/go_routine_test/main.go
main.go:12 0x151a 493b6610 CMPQ 0x10(R14), SP
main.go:12 0x151e 762b JBE 0x154b
main.go:12 0x1520 4883ec18 SUBQ $0x18, SP
main.go:12 0x1524 48896c2410 MOVQ BP, 0x10(SP)
main.go:12 0x1529 488d6c2410 LEAQ 0x10(SP), BP
main.go:13 0x152e 488d0500000000 LEAQ 0(IP), AX [3:7]R_PCREL:"".once
main.go:13 0x1535 488d1d00000000 LEAQ 0(IP), BX [3:7]R_PCREL:"".f.func1·f
main.go:13 0x153c e800000000 CALL 0x1541 [1:5]R_CALL:sync.(*Once).Do
main.go:16 0x1541 488b6c2410 MOVQ 0x10(SP), BP
main.go:16 0x1546 4883c418 ADDQ $0x18, SP
main.go:16 0x154a c3 RET
main.go:12 0x154b e800000000 CALL 0x1550 [1:5]R_CALL:runtime.morestack_noctxt
main.go:12 0x1550 ebc8 JMP "".f(SB)

其中第一行，CMPQ 0x10(R14), SP就是比較SP和0x10(R14)（其實就是stackguard0）的大小（注意AT&T格式下CMP系列指令的順序），當(dāng)SP小于等于0x10(R14)時，就會調(diào)轉(zhuǎn)到0x154b地址調(diào)用runtime.morestack_noctxt，觸發(fā)棧擴張操作。其實如果你仔細(xì)觀察就會發(fā)現(xiàn)，所有的函數(shù)的序言（函數(shù)調(diào)用的最前方）都被插入了檢測指令，除非在函數(shù)上標(biāo)記//go:nosplit。

接下來，我們將關(guān)注于兩點來打通整個鏈路，即：

棧擴張怎么重新調(diào)度，讓出CPU的執(zhí)行權(quán)？
何時會設(shè)置棧擴張標(biāo)記？

2.3 棧擴張怎么觸發(fā)重新調(diào)度

// morestack but not preserving ctxt.
TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0
   MOVL   $0, DX
   JMP    runtime·morestack(SB)

TEXT runtime·morestack(SB),NOSPLIT,$0-0
   ...

   // Set g->sched to context in f.
   MOVQ   0(SP), AX // f's PC
   MOVQ   AX, (g_sched+gobuf_pc)(SI)
   LEAQ   8(SP), AX // f's SP
   MOVQ   AX, (g_sched+gobuf_sp)(SI)
   MOVQ   BP, (g_sched+gobuf_bp)(SI)
   MOVQ   DX, (g_sched+gobuf_ctxt)(SI)

   ...
   CALL   runtime·newstack(SB)
   CALL   runtime·abort(SB)  // crash if newstack returns
   RET

以上代碼中，runtime·morestack_noctxt調(diào)用runtime·morestack，在runtime·morestack中，會首先記錄協(xié)程的PC和SP，然后調(diào)用runtime.newstack：

func newstack() {
   ...

   gp := thisg.m.curg
   
   ...
   stackguard0 := atomic.Loaduintptr(&gp.stackguard0)

   ...
   preempt := stackguard0 == stackPreempt
   ...

   if preempt {
      if gp == thisg.m.g0 {
         throw("runtime: preempt g0")
      }
      if thisg.m.p == 0 && thisg.m.locks == 0 {
         throw("runtime: g is running but p is not")
      }

      if gp.preemptShrink {
         // We're at a synchronous safe point now, so
         // do the pending stack shrink.
         gp.preemptShrink = false
         shrinkstack(gp)
      }

      if gp.preemptStop {
         preemptPark(gp) // never returns
      }

      // Act like goroutine called runtime.Gosched.
      gopreempt_m(gp) // never return
   }

   ...
}

我們簡化runtime.newstack函數(shù)，總結(jié)起來就是通過現(xiàn)有工作協(xié)程的stackguard0字段，來判斷是不是應(yīng)該發(fā)生搶占，如果需要的話，則調(diào)用gopreempt_m(gp)函數(shù)：

func gopreempt_m(gp *g) {
   if trace.enabled {
      traceGoPreempt()
   }
   goschedImpl(gp)
}

可以看到，gopreempt_m函數(shù)和前面講到Gosched函數(shù)時說到的gosched_m函數(shù)一樣，都將調(diào)用goschedImpl函數(shù)，為協(xié)程gp讓渡出本M，并且將gp放到全局隊列中，等待調(diào)度。

這里我們就明白了，一旦發(fā)生棧擴張，就有可能會發(fā)生讓渡出執(zhí)行權(quán)，進行重新調(diào)度的可能性，那什么時候會發(fā)生棧擴張呢？

2.4 何時設(shè)置棧擴張標(biāo)記

在代碼中，將stackguard0字段置為stackPreempt的地方有不少，但是和我們以上場景相符的還是在后臺監(jiān)護線程sysmon循環(huán)中，對于陷入系統(tǒng)調(diào)用和長時間運行的goroutine的運行權(quán)進行奪取的retake函數(shù)：

func sysmon() {
   ...

   for {
      ...
      // retake P's blocked in syscalls
      // and preempt long running G's
      if retake(now) != 0 {
         idle = 0
      } else {
         idle++
      }
      ...
   }
}

func retake(now int64) uint32 {
   ...
   for i := 0; i < len(allp); i++ {
      ...
      s := _p_.status
      sysretake := false
      if s == _Prunning || s == _Psyscall {
         // Preempt G if it's running for too long.
         t := int64(_p_.schedtick)
         if int64(pd.schedtick) != t {
            pd.schedtick = uint32(t)
            pd.schedwhen = now
         } else if pd.schedwhen+forcePreemptNS <= now { // forcePreemptNS=10ms
            preemptone(_p_) // 在這里設(shè)置棧擴張標(biāo)記
            // In case of syscall, preemptone() doesn't
            // work, because there is no M wired to P.
            sysretake = true
         }
      }
      ...
   }
   unlock(&allpLock)
   return uint32(n)
}

其中，在preemptone函數(shù)中進行棧擴張標(biāo)記的設(shè)置：

func preemptone(_p_ *p) bool {
   mp := _p_.m.ptr()
   if mp == nil || mp == getg().m {
      return false
   }
   gp := mp.curg
   if gp == nil || gp == mp.g0 {
      return false
   }

   gp.preempt = true

   // Every call in a goroutine checks for stack overflow by
   // comparing the current stack pointer to gp->stackguard0.
   // Setting gp->stackguard0 to StackPreempt folds
   // preemption into the normal stack overflow check.
   gp.stackguard0 = stackPreempt // 設(shè)置棧擴張標(biāo)記

   // Request an async preemption of this P.
   if preemptMSupported && debug.asyncpreemptoff == 0 {
      _p_.preempt = true
      preemptM(mp)
   }

   return true
}

通過以上，我們串通起了goroutine協(xié)作式搶占的邏輯：

首先，后臺監(jiān)控線程會對運行時間過長（≥10ms）的協(xié)程設(shè)置棧擴張標(biāo)記；
協(xié)程運行到任何一個函數(shù)的序言的時候，都會首先檢查棧擴張標(biāo)記；
如果需要進行棧擴張，在進行棧擴張的時候，會奪取這個協(xié)程的運行權(quán)，從而實現(xiàn)搶占式調(diào)度。

3. 基于信號的搶占式調(diào)度

分析以上結(jié)論我們可以知道，上述搶占觸發(fā)邏輯有一個致命的缺點，那就是必須要運行到函數(shù)棧的序言部分，而這根本無法讀取以下協(xié)程的運行權(quán)，在Go的1.14版本之前，一下代碼不會打印最后一句"I am main goroutine!"：

package main

import (
   "fmt"
   "runtime"
   "sync"
   "time"
)

var once = sync.Once{}

func main() {
   defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))

   go func() {
      for {
         once.Do(func() {
            fmt.Println("I am go routine 1!")
         })
      }
   }()

   time.Sleep(10 * time.Millisecond)
   fmt.Println("I am main goroutine!")
}

因為以上協(xié)程中的for循環(huán)是個死循環(huán)，且并不會包含棧擴張邏輯，所以不會讓渡出自身的執(zhí)行權(quán)。

3.1 發(fā)送搶占信號

為此，Go SDK引入了基于信號的搶占式調(diào)度。我們注意分析上一節(jié)preemptone函數(shù)代碼中有以下部分：

if preemptMSupported && debug.asyncpreemptoff == 0 {
   _p_.preempt = true
   preemptM(mp)
}

其中preemptM函數(shù)會發(fā)送_SIGURG信號給需要搶占的線程：

const sigPreempt = _SIGURG


func preemptM(mp *m) {
   // On Darwin, don't try to preempt threads during exec.
   // Issue #41702.
   if GOOS == "darwin" || GOOS == "ios" {
      execLock.rlock()
   }

   if atomic.Cas(&mp.signalPending, 0, 1) {
      if GOOS == "darwin" || GOOS == "ios" {
         atomic.Xadd(&pendingPreemptSignals, 1)
      }

      // If multiple threads are preempting the same M, it may send many
      // signals to the same M such that it hardly make progress, causing
      // live-lock problem. Apparently this could happen on darwin. See
      // issue #37741.
      // Only send a signal if there isn't already one pending.
      signalM(mp, sigPreempt)
   }

   if GOOS == "darwin" || GOOS == "ios" {
      execLock.runlock()
   }
}

3.2 搶占調(diào)用的注入

說到這里，我們就需要回到最開始，在第一個協(xié)程m0開啟mstart的調(diào)用鏈路上，會調(diào)用mstartm0函數(shù)，在這里會調(diào)用initsig：

func initsig(preinit bool) {
  ...

   for i := uint32(0); i < _NSIG; i++ {
      ...

      handlingSig[i] = 1
      setsig(i, abi.FuncPCABIInternal(sighandler))
   }
}

在以上，注冊了sighandler函數(shù)：

func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
   ...

   if sig == sigPreempt && debug.asyncpreemptoff == 0 {
      // Might be a preemption signal.
      doSigPreempt(gp, c)
      // Even if this was definitely a preemption signal, it
      // may have been coalesced with another signal, so we
      // still let it through to the application.
   }

   ...
}

然后接收到sigPreempt信號時，會通過doSigPreempt函數(shù)處理如下：

func doSigPreempt(gp *g, ctxt *sigctxt) {
   // Check if this G wants to be preempted and is safe to
   // preempt.
   if wantAsyncPreempt(gp) {
      if ok, newpc := isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()); ok {
         // Adjust the PC and inject a call to asyncPreempt.
         ctxt.pushCall(abi.FuncPCABI0(asyncPreempt), newpc) // 插入搶占調(diào)用
      }
   }

   // Acknowledge the preemption.
   atomic.Xadd(&gp.m.preemptGen, 1)
   atomic.Store(&gp.m.signalPending, 0)

   if GOOS == "darwin" || GOOS == "ios" {
      atomic.Xadd(&pendingPreemptSignals, -1)
   }
}

最終，doSigPreempt—>asyncPreempt->asyncPreempt2：

func asyncPreempt2() {
   gp := getg()
   gp.asyncSafePoint = true
   if gp.preemptStop {
      mcall(preemptPark)
   } else {
      mcall(gopreempt_m)
   }
   gp.asyncSafePoint = false
}

然后，又回到了我們熟悉的gopreempt_m函數(shù)，這里就不贅述了。

所以對于基于信號的搶占調(diào)度，總結(jié)如下：

M1發(fā)送信號_SIGURG；
M2接收到信號，并通過信號處理函數(shù)進行處理；
M2修改執(zhí)行的上下文，并恢復(fù)到修改后的位置；
重新進入調(diào)度循環(huán)，進而調(diào)度其他goroutine。

4. 小結(jié)

總的來說，Go的調(diào)度策略的發(fā)展，也是隨著需求的豐富而逐步發(fā)展的，協(xié)作式調(diào)度能夠保證具備函數(shù)調(diào)用的用戶 Goroutine 正常停止；搶占式調(diào)度則能避免由于死循環(huán)導(dǎo)致的任意時間的垃圾回收延遲。

到此這篇關(guān)于Go調(diào)度器學(xué)習(xí)之協(xié)作與搶占詳解的文章就介紹到這了,更多相關(guān)Go調(diào)度器內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: