Go標(biāo)準(zhǔn)庫(kù)http與fasthttp服務(wù)端性能對(duì)比場(chǎng)景分析
1. 背景
Go初學(xué)者學(xué)習(xí)Go時(shí),在編寫了經(jīng)典的“hello, world”程序之后,可能會(huì)迫不及待的體驗(yàn)一下Go強(qiáng)大的標(biāo)準(zhǔn)庫(kù),比如:用幾行代碼寫一個(gè)像下面示例這樣擁有完整功能的web server:
// 來自https://tip.golang.org/pkg/net/http/#example_ListenAndServe package main import ( "io" "log" "net/http" ) func main() { helloHandler := func(w http.ResponseWriter, req *http.Request) { io.WriteString(w, "Hello, world!\n") } http.HandleFunc("/hello", helloHandler) log.Fatal(http.ListenAndServe(":8080", nil)) }
go net/http包是一個(gè)比較均衡的通用實(shí)現(xiàn),能滿足大多數(shù)gopher 90%以上場(chǎng)景的需要,并且具有如下優(yōu)點(diǎn):
- 標(biāo)準(zhǔn)庫(kù)包,無需引入任何第三方依賴;
- 對(duì)http規(guī)范的滿足度較好;
- 無需做任何優(yōu)化,即可獲得相對(duì)較高的性能;
- 支持HTTP代理;
- 支持HTTPS;
- 無縫支持HTTP/2。
不過也正是因?yàn)閔ttp包的“均衡”通用實(shí)現(xiàn),在一些對(duì)性能要求嚴(yán)格的領(lǐng)域,net/http的性能可能無法勝任,也沒有太多的調(diào)優(yōu)空間。這時(shí)我們會(huì)將眼光轉(zhuǎn)移到其他第三方的http服務(wù)端框架實(shí)現(xiàn)上。
而在第三方http服務(wù)端框架中,一個(gè)“行如其名”的框架fasthttp被提及和采納的較多,fasthttp官網(wǎng)宣稱其性能是net/http的十倍(基于go test benchmark的測(cè)試結(jié)果)。
fasthttp采用了許多性能優(yōu)化上的最佳實(shí)踐,尤其是在內(nèi)存對(duì)象的重用上,大量使用sync.Pool以降低對(duì)Go GC的壓力。
那么在真實(shí)環(huán)境中,到底fasthttp能比net/http快多少呢?恰好手里有兩臺(tái)性能還不錯(cuò)的服務(wù)器可用,在本文中我們就在這個(gè)真實(shí)環(huán)境下看看他們的實(shí)際性能。
2. 性能測(cè)試
我們分別用net/http和fasthttp實(shí)現(xiàn)兩個(gè)幾乎“零業(yè)務(wù)”的被測(cè)程序:
- nethttp:
// github.com/bigwhite/experiments/blob/master/http-benchmark/nethttp/main.go package main import ( _ "expvar" "log" "net/http" _ "net/http/pprof" "runtime" "time" ) func main() { go func() { for { log.Println("當(dāng)前routine數(shù)量:", runtime.NumGoroutine()) time.Sleep(time.Second) } }() http.Handle("/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.Write([]byte("Hello, Go!")) })) log.Fatal(http.ListenAndServe(":8080", nil)) }
- fasthttp:
// github.com/bigwhite/experiments/blob/master/http-benchmark/fasthttp/main.go package main import ( "fmt" "log" "net/http" "runtime" "time" _ "expvar" _ "net/http/pprof" "github.com/valyala/fasthttp" ) type HelloGoHandler struct { } func fastHTTPHandler(ctx *fasthttp.RequestCtx) { fmt.Fprintln(ctx, "Hello, Go!") } func main() { go func() { http.ListenAndServe(":6060", nil) }() go func() { for { log.Println("當(dāng)前routine數(shù)量:", runtime.NumGoroutine()) time.Sleep(time.Second) } }() s := &fasthttp.Server{ Handler: fastHTTPHandler, } s.ListenAndServe(":8081") }
對(duì)被測(cè)目標(biāo)實(shí)施壓力測(cè)試的客戶端,我們基于hey這個(gè)http壓測(cè)工具進(jìn)行,為了方便調(diào)整壓力水平,我們將hey“包裹”在下面這個(gè)shell腳本中(僅適于在linux上運(yùn)行):
// github.com/bigwhite/experiments/blob/master/http-benchmark/client/http_client_load.sh # ./http_client_load.sh 3 10000 10 GET http://10.10.195.181:8080 echo "$0 task_num count_per_hey conn_per_hey method url" task_num=$1 count_per_hey=$2 conn_per_hey=$3 method=$4 url=$5 start=$(date +%s%N) for((i=1; i<=$task_num; i++)); do { tm=$(date +%T.%N) echo "$tm: task $i start" hey -n $count_per_hey -c $conn_per_hey -m $method $url > hey_$i.log tm=$(date +%T.%N) echo "$tm: task $i done" } & done wait end=$(date +%s%N) count=$(( $task_num * $count_per_hey )) runtime_ns=$(( $end - $start )) runtime=`echo "scale=2; $runtime_ns / 1000000000" | bc` echo "runtime: "$runtime speed=`echo "scale=2; $count / $runtime" | bc` echo "speed: "$speed
該腳本的執(zhí)行示例如下:
bash http_client_load.sh 8 1000000 200 GET http://10.10.195.134:8080 http_client_load.sh task_num count_per_hey conn_per_hey method url 16:58:09.146948690: task 1 start 16:58:09.147235080: task 2 start 16:58:09.147290430: task 3 start 16:58:09.147740230: task 4 start 16:58:09.147896010: task 5 start 16:58:09.148314900: task 6 start 16:58:09.148446030: task 7 start 16:58:09.148930840: task 8 start 16:58:45.001080740: task 3 done 16:58:45.241903500: task 8 done 16:58:45.261501940: task 1 done 16:58:50.032383770: task 4 done 16:58:50.985076450: task 7 done 16:58:51.269099430: task 5 done 16:58:52.008164010: task 6 done 16:58:52.166402430: task 2 done runtime: 43.02 speed: 185960.01
從傳入的參數(shù)來看,該腳本并行啟動(dòng)了8個(gè)task(一個(gè)task啟動(dòng)一個(gè)hey),每個(gè)task向http://10.10.195.134:8080建立200個(gè)并發(fā)連接,并發(fā)送100w http GET請(qǐng)求。
我們使用兩臺(tái)服務(wù)器分別放置被測(cè)目標(biāo)程序和壓力工具腳本:
- 目標(biāo)程序所在服務(wù)器:10.10.195.181(物理機(jī),Intel x86-64 CPU,40核,128G內(nèi)存, CentOs 7.6)
$ cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 座: 2 NUMA 節(jié)點(diǎn): 2 廠商 ID: GenuineIntel CPU 系列: 6 型號(hào): 85 型號(hào)名稱: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz 步進(jìn): 4 CPU MHz: 800.000 CPU max MHz: 2201.0000 CPU min MHz: 800.0000 BogoMIPS: 4400.00 虛擬化: VT-x L1d 緩存: 32K L1i 緩存: 32K L2 緩存: 1024K L3 緩存: 14080K NUMA 節(jié)點(diǎn)0 CPU: 0-9,20-29 NUMA 節(jié)點(diǎn)1 CPU: 10-19,30-39 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp flush_l1d
- 壓力工具所在服務(wù)器:10.10.195.133(物理機(jī),鯤鵬arm64 cpu,96核,80G內(nèi)存, CentOs 7.9)
# cat /etc/redhat-release CentOS Linux release 7.9.2009 (AltArch) # lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 1 Core(s) per socket: 48 座: 2 NUMA 節(jié)點(diǎn): 4 型號(hào): 0 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 L1d 緩存: 64K L1i 緩存: 64K L2 緩存: 512K L3 緩存: 49152K NUMA 節(jié)點(diǎn)0 CPU: 0-23 NUMA 節(jié)點(diǎn)1 CPU: 24-47 NUMA 節(jié)點(diǎn)2 CPU: 48-71 NUMA 節(jié)點(diǎn)3 CPU: 72-95 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
我用dstat監(jiān)控被測(cè)目標(biāo)所在主機(jī)資源占用情況(dstat -tcdngym),尤其是cpu負(fù)荷;通過[expvarmon監(jiān)控memstats],由于沒有業(yè)務(wù),內(nèi)存占用很少;通過go tool pprof查看目標(biāo)程序中對(duì)各類資源消耗情況的排名。
下面是多次測(cè)試后制作的一個(gè)數(shù)據(jù)表格:
圖:測(cè)試數(shù)據(jù)
3. 對(duì)結(jié)果的簡(jiǎn)要分析
受特定場(chǎng)景、測(cè)試工具及腳本精確性以及壓力測(cè)試環(huán)境的影響,上面的測(cè)試結(jié)果有一定局限,但卻真實(shí)反映了被測(cè)目標(biāo)的性能趨勢(shì)。我們看到在給予同樣壓力的情況下,fasthttp并沒有10倍于net http的性能,甚至在這樣一個(gè)特定的場(chǎng)景下,兩倍于net/http的性能都沒有達(dá)到:我們看到在目標(biāo)主機(jī)cpu資源消耗接近70%的幾個(gè)用例中,fasthttp的性能僅比net/http高出30%~70%左右。
那么為什么fasthttp的性能未及預(yù)期呢?要回答這個(gè)問題,那就要看看net/http和fasthttp各自的實(shí)現(xiàn)原理了!我們先來看看net/http的工作原理示意圖:
圖:nethttp工作原理示意圖
http包作為server端的原理很簡(jiǎn)單,那就是accept到一個(gè)連接(conn)之后,將這個(gè)conn甩給一個(gè)worker goroutine去處理,后者一直存在,直到該conn的生命周期結(jié)束:即連接關(guān)閉。
下面是fasthttp的工作原理示意圖:
圖:fasthttp工作原理示意圖
而fasthttp設(shè)計(jì)了一套機(jī)制,目的是盡量復(fù)用goroutine,而不是每次都創(chuàng)建新的goroutine。fasthttp的Server accept一個(gè)conn之后,會(huì)嘗試從workerpool中的ready切片中取出一個(gè)channel,該channel與某個(gè)worker goroutine一一對(duì)應(yīng)。一旦取出channel,就會(huì)將accept到的conn寫到該channel里,而channel另一端的worker goroutine就會(huì)處理該conn上的數(shù)據(jù)讀寫。當(dāng)處理完該conn后,該worker goroutine不會(huì)退出,而是會(huì)將自己對(duì)應(yīng)的那個(gè)channel重新放回workerpool中的ready切片中,等待這下一次被取出。
fasthttp的goroutine復(fù)用策略初衷很好,但在這里的測(cè)試場(chǎng)景下效果不明顯,從測(cè)試結(jié)果便可看得出來,在相同的客戶端并發(fā)和壓力下,net/http使用的goroutine數(shù)量與fasthttp相差無幾。這是由測(cè)試模型導(dǎo)致的:在我們這個(gè)測(cè)試中,每個(gè)task中的hey都會(huì)向被測(cè)目標(biāo)發(fā)起固定數(shù)量的[長(zhǎng)連接(keep-alive)],然后在每條連接上發(fā)起“飽和”請(qǐng)求。這樣fasthttp workerpool中的goroutine一旦接收到某個(gè)conn就只能在該conn上的通訊結(jié)束后才能重新放回,而該conn直到測(cè)試結(jié)束才會(huì)close,因此這樣的場(chǎng)景相當(dāng)于讓fasthttp“退化”成了net/http的模型,也染上了net/http的“缺陷”:goroutine的數(shù)量一旦多起來,go runtime自身調(diào)度所帶來的消耗便不可忽視甚至超過了業(yè)務(wù)處理所消耗的資源占比。下面分別是fasthttp在200長(zhǎng)連接、8000長(zhǎng)連接以及16000長(zhǎng)連接下的cpu profile的結(jié)果:
200長(zhǎng)連接: (pprof) top -cum Showing nodes accounting for 88.17s, 55.35% of 159.30s total Dropped 150 nodes (cum <= 0.80s) Showing top 10 nodes out of 60 flat flat% sum% cum cum% 0.46s 0.29% 0.29% 101.46s 63.69% github.com/valyala/fasthttp.(*Server).serveConn 0 0% 0.29% 101.46s 63.69% github.com/valyala/fasthttp.(*workerPool).getCh.func1 0 0% 0.29% 101.46s 63.69% github.com/valyala/fasthttp.(*workerPool).workerFunc 0.04s 0.025% 0.31% 89.46s 56.16% internal/poll.ignoringEINTRIO (inline) 87.38s 54.85% 55.17% 89.27s 56.04% syscall.Syscall 0.12s 0.075% 55.24% 60.39s 37.91% bufio.(*Writer).Flush 0 0% 55.24% 60.22s 37.80% net.(*conn).Write 0.08s 0.05% 55.29% 60.21s 37.80% net.(*netFD).Write 0.09s 0.056% 55.35% 60.12s 37.74% internal/poll.(*FD).Write 0 0% 55.35% 59.86s 37.58% syscall.Write (inline) (pprof) 8000長(zhǎng)連接: (pprof) top -cum Showing nodes accounting for 108.51s, 54.46% of 199.23s total Dropped 204 nodes (cum <= 1s) Showing top 10 nodes out of 66 flat flat% sum% cum cum% 0 0% 0% 119.11s 59.79% github.com/valyala/fasthttp.(*workerPool).getCh.func1 0 0% 0% 119.11s 59.79% github.com/valyala/fasthttp.(*workerPool).workerFunc 0.69s 0.35% 0.35% 119.05s 59.76% github.com/valyala/fasthttp.(*Server).serveConn 0.04s 0.02% 0.37% 104.22s 52.31% internal/poll.ignoringEINTRIO (inline) 101.58s 50.99% 51.35% 103.95s 52.18% syscall.Syscall 0.10s 0.05% 51.40% 79.95s 40.13% runtime.mcall 0.06s 0.03% 51.43% 79.85s 40.08% runtime.park_m 0.23s 0.12% 51.55% 79.30s 39.80% runtime.schedule 5.67s 2.85% 54.39% 77.47s 38.88% runtime.findrunnable 0.14s 0.07% 54.46% 68.96s 34.61% bufio.(*Writer).Flush 16000長(zhǎng)連接: (pprof) top -cum Showing nodes accounting for 239.60s, 87.07% of 275.17s total Dropped 190 nodes (cum <= 1.38s) Showing top 10 nodes out of 46 flat flat% sum% cum cum% 0.04s 0.015% 0.015% 153.38s 55.74% runtime.mcall 0.01s 0.0036% 0.018% 153.34s 55.73% runtime.park_m 0.12s 0.044% 0.062% 153s 55.60% runtime.schedule 0.66s 0.24% 0.3% 152.66s 55.48% runtime.findrunnable 0.15s 0.055% 0.36% 127.53s 46.35% runtime.netpoll 127.04s 46.17% 46.52% 127.04s 46.17% runtime.epollwait 0 0% 46.52% 121s 43.97% github.com/valyala/fasthttp.(*workerPool).getCh.func1 0 0% 46.52% 121s 43.97% github.com/valyala/fasthttp.(*workerPool).workerFunc 0.41s 0.15% 46.67% 120.18s 43.67% github.com/valyala/fasthttp.(*Server).serveConn 111.17s 40.40% 87.07% 111.99s 40.70% syscall.Syscall (pprof)
通過上述profile的比對(duì),我們發(fā)現(xiàn)當(dāng)長(zhǎng)連接數(shù)量增多時(shí)(即workerpool中g(shù)oroutine數(shù)量增多時(shí)),go runtime調(diào)度的占比會(huì)逐漸提升,在16000連接時(shí),runtime調(diào)度的各個(gè)函數(shù)已經(jīng)排名前4了。
4. 優(yōu)化途徑
從上面的測(cè)試結(jié)果,我們看到fasthttp的模型不太適合這種連接連上后進(jìn)行持續(xù)“飽和”請(qǐng)求的場(chǎng)景,更適合短連接或長(zhǎng)連接但沒有持續(xù)飽和請(qǐng)求,在后面這樣的場(chǎng)景下,它的goroutine復(fù)用模型才能更好的得以發(fā)揮。
但即便“退化”為了net/http模型,fasthttp的性能依然要比net/http略好,這是為什么呢?這些性能提升主要是fasthttp在內(nèi)存分配層面的優(yōu)化trick的結(jié)果,比如大量使用sync.Pool,比如避免在[]byte和string互轉(zhuǎn)等。
那么,在持續(xù)“飽和”請(qǐng)求的場(chǎng)景下,如何讓fasthttp workerpool中g(shù)oroutine的數(shù)量不會(huì)因conn的增多而線性增長(zhǎng)呢?fasthttp官方?jīng)]有給出答案,但一條可以考慮的路徑是使用os的多路復(fù)用(linux上的實(shí)現(xiàn)為epoll),即go runtime netpoll使用的那套機(jī)制。在多路復(fù)用的機(jī)制下,這樣可以讓每個(gè)workerpool中的goroutine處理同時(shí)處理多個(gè)連接,這樣我們可以根據(jù)業(yè)務(wù)規(guī)模選擇workerpool池的大小,而不是像目前這樣幾乎是任意增長(zhǎng)goroutine的數(shù)量。當(dāng)然,在用戶層面引入epoll也可能會(huì)帶來系統(tǒng)調(diào)用占比的增多以及響應(yīng)延遲增大等問題。至于該路徑是否可行,還是要看具體實(shí)現(xiàn)和測(cè)試結(jié)果。
注:fasthttp.Server中的Concurrency可以用來限制workerpool中并發(fā)處理的goroutine的個(gè)數(shù),但由于每個(gè)goroutine只處理一個(gè)連接,當(dāng)Concurrency設(shè)置過小時(shí),后續(xù)的連接可能就會(huì)被fasthttp拒絕服務(wù)。因此fasthttp的默認(rèn)Concurrency為:
const DefaultConcurrency = 256 * 1024
到此這篇關(guān)于Go標(biāo)準(zhǔn)庫(kù)http與fasthttp服務(wù)端性能比較的文章就介紹到這了,更多相關(guān)go http與fasthttp服務(wù)端性能內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Go語言基于viper實(shí)現(xiàn)apollo多實(shí)例快速
viper是適用于go應(yīng)用程序的配置解決方案,這款配置管理神器,支持多種類型、開箱即用、極易上手。本文主要介紹了如何基于viper實(shí)現(xiàn)apollo多實(shí)例快速接入,感興趣的可以了解一下2023-01-01golang根據(jù)生日計(jì)算星座和屬相實(shí)例
這篇文章主要為大家介紹了golang根據(jù)生日計(jì)算星座和屬相的示例代碼,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2022-07-07深入探討Golang中如何進(jìn)行并發(fā)發(fā)送HTTP請(qǐng)求
在?Golang?領(lǐng)域,并發(fā)發(fā)送?HTTP?請(qǐng)求是優(yōu)化?Web?應(yīng)用程序的一項(xiàng)重要技能,本文探討了實(shí)現(xiàn)此目的的各種方法,文中的示例代碼講解詳細(xì),希望對(duì)大家有所幫助2024-01-01