腳本之家服務器常用軟件

快捷導航

聊聊Golang性能分析工具pprof的使用

更新時間：2023年05月25日 11:46:06 作者：洛小妍

對于線上穩(wěn)定運行的服務來說,?可能會遇到?cpu、mem?利用率升高的問題，那我們就需要使用?pprof?工具來進行性能分析，所以本文就來和大家講講pprof的具體使用吧

pprof

pprof 是 golang 的性能分析工具。pprof 通過讀取 profile.proto 格式的性能采樣數(shù)據(jù)來生成相關報告。從而分析服務中存在的性能問題。

pprof profiles

存儲采樣的數(shù)據(jù)集合。在 golang 中, 采樣數(shù)據(jù)分別包含這幾種類型：

cpu

cpu 采樣是最常用的采用類型。當開始 cpu 采樣時, 每隔 10ms 會進行一次采樣, 記錄當前運行 goroutines 的堆棧信息。當采樣完成之后, 可以分析哪些代碼比較消耗 cpu。

memory

memory 采樣主要用來分析服務的堆內存的分配和回收。程序運行時, 棧內存是可以直接分配和回收的。因此, 無需采集棧內存的使用情況。

block

采集阻塞數(shù)據(jù)。主要用于定位 channel 所導致的 goroutines 阻塞。

mutex

主要用來采集鎖競爭相關數(shù)據(jù), 分析鎖競爭導致的系統(tǒng)的延遲。

生成 profile 文件

在開始進行性能相關分析之前, 需要生成對應的采樣文件。然后通過 pprof 工具對 profile 文件進行分析, 找出影響性能的代碼。在 golang 中, 可以使用以下三種方式生成 profile 文件：

測試函數(shù)

在使用基準測試時, 可以指定參數(shù)生成對應的 profile 文件。

go test -cpuprofile {proflie_name} -bench .

http 訪問

對于持續(xù)運行的任務, 例如：http、rpc?？梢詫?net/http/pprof 來注冊 debug 路由, 生成對應的 proflie 文件。對于非 http 的服務, 需要注冊一個 debug 端口。

直接生成 profile 文件

對于非持續(xù)性任務。例如：定時任務。可以通過指定方法生成對應的 proflie 文件。

import (
   "github.com/pkg/profile" // 對 runtime/pprof 進行的封裝
   "fmt"
)
func main() {
   defer func() {
      profile.Start(profile.CPUProfile, profile.ProfilePath("."))
   }()
   func() {
      fmt.Println(5 + 6)
   }()
}

分析 profile 文件

下面將通過具體的例子, 來嘗試分析對應的 cpu、mem 等指標。

cpu 分析

在下面的例子中, 嘗試分析哪個函數(shù)比較占用 cpu.

package main
import (
   "log"
   "net/http"
   _ "net/http/pprof"
   "runtime"
   "time"
)
//go:noinline
func req(num int64) {
   A(num)
   B(num)
}
//go:noinline
func B(num int64) {
   for i := int64(0); i < num/2; i++ {
   }
}
//go:noinline
func A(num int64) {
   for i := int64(0); i < num; i++ {
   }
}
func main() {
   runtime.GOMAXPROCS(1)
   go func() {
      log.Println(http.ListenAndServe("localhost:3016", nil))
   }()
   for {
      time.Sleep(time.Second / 4)
      go func() {
         req(time.Now().Unix())
      }()
   }
}

下載并保存對應的 cpu profile, 采樣時間 19s：

curl ``http://127.0.0.1:3016/debug/pprof/profile``?seconds=19 --output cpu.pprof

啟動可視化界面：

go tool pprof -http=:8080 cpu.pprof

從最右側可以看出, 此次采集共耗時 19.21s, 采樣得到的 cpu 時間為 16.73s 「87.10%」。由于是采樣, 所以 total samples 是小于 Duration 的。通過 view 來切換不同的試圖。

Top, 用于展示不同函數(shù)的耗時占比。Flat: 當前函數(shù)運行耗時。Flat%: Flat 對應的占比。Sum%: 累積使用占比。Cum: 當前函數(shù)及子函數(shù)運行耗時。Cum%: Cum 對應的占比。

對于下面的例子：統(tǒng)計到的 cpu 時間為 16.73s, 函數(shù) A 本身耗時 13.3s 「79.49%」; 函數(shù) A 及其子函數(shù)耗時為 13.85s「82.78%」; 函數(shù) req 本身耗時 0; 子函數(shù)耗時 16.73s「100%」。

Graph, 以圖的形式展示不同函數(shù)的耗時占比。如下圖所示：函數(shù)入口為 main.func2「匿名函數(shù)」; 0 of 16730ms「匿名函數(shù)不占用任何資源, cpu 采樣總時長為 16730ms」; main.req 為 main.func2 的子函數(shù)「函數(shù)自身不占用任何資源」; main.A「13850ms 函數(shù)自身占用 13300ms」和 mian.B「2880ms 函數(shù)自身占用 2810ms」為 main.req「16730ms」的子函數(shù)。

通過 Graph 可以直觀的看到哪些函數(shù)占用的資源比較高, 方框越大資源占比越高。同時可以直觀的看到調用關系。

Flame Graph, 火焰圖。通過火焰圖可以看到各個函數(shù)的耗時, 以及子函數(shù)的耗時。橫軸越長占用的資源越多, 縱軸越長, 調用層級越深。

Source, 查看對應的源碼。在下圖中, req 的子函數(shù) A 和 B 共占用了 100% 的資源。在函數(shù) B 中, 函數(shù)自身耗時占用了 2.81s, 其中 for 循環(huán)語句占用 2.18s。

mem 分析

下面是一段內存分配的代碼。heap 采樣并不需要像 cpu 采樣一樣確定固定周期。heap 關注的是內存的分配和回收, 只需采集對應的事件即可。

package main
import (
   "fmt"
   "log"
   "net/http"
   _ "net/http/pprof"
   "runtime"
   "time"
)
//go:noinline
func req() {
   buf = append(buf, allocate()...)
}
//go:noinline
func allocate() []byte {
   var b []byte
   for i := 0; i < 1024; i++ {
      temp := make([]byte, 1024)
      b = append(b, temp...)
   }
   return b
}
var buf []byte
func main() {
   runtime.GOMAXPROCS(1)
   go func() {
      log.Println(http.ListenAndServe("localhost:3016", nil))
   }()
   for {
      time.Sleep(time.Second)
      go func() {
         req()
         fmt.Println(len(buf))
      }()
   }
}

下載堆內存采樣文件。

curl ``http://127.0.0.1:3016/debug/pprof/heap`` --output mem.pprof

使用 webui 打開文件。

go tool pprof -http=:8080 mem.pprof

共分為四種采樣類型：alloc_objects: 所有分配的對象; alloc_space: 所有分配的空間; inuse_objects: 活躍的對象; inuse_space: 活躍的空間。這里需要注意的是, inuse_space 使用的空間是要小于當前進程使用的系統(tǒng)空間的「使用中的, gc 未釋放給操作系統(tǒng)的, cgo 申請的空間」

通過 alloc 可以分析哪里在頻繁的分配空間, inuse 可以用來分析內存沒有釋放。切到 inuse_sapce, 可以看到 req 函數(shù)以及它的子函數(shù) allocate 在不斷分配空間。

goroutine 分析

下面是一個 goroutine 泄漏的例子。

package main
import (
   "fmt"
   "log"
   "net/http"
   _ "net/http/pprof"
   "runtime"
   "time"
)
//go:noinline
func req() {
   c := make(chan struct{})
   tick := time.NewTicker(time.Millisecond)
   go func() {
      time.Sleep(time.Millisecond * 2)
      c <- struct{}{}
   }()
   select {
   case <-c:
      return
   case <-tick.C:
      return
   }
}
func main() {
   runtime.GOMAXPROCS(1)
   go func() {
      log.Println(http.ListenAndServe("localhost:3016", nil))
   }()
   for {
      time.Sleep(time.Second / 10)
      go func() {
         req()
      }()
      fmt.Println(runtime.NumGoroutine())
   }
}

下載堆 goroutine 采樣文件。

curl ``http://127.0.0.1:3016/debug/pprof/``goroutine --output goroutine.pprof

使用 webui 打開文件。

go tool pprof -http=:8080 goroutine.pprof

切到火焰圖, 可以看到 main.req.func1「匿名函數(shù)」占用了需要 goroutine。我們切到 Souce 定位到代碼行數(shù)?？梢钥吹? 卡在了往非緩存的 channel 寫數(shù)據(jù)。由于接收端已經(jīng) return, 以此 goroutine 等待寫入而無法釋放。我們只需聲明一個帶緩沖的 channel 就可以解決問題。