快捷導(dǎo)航

深入了解Golang的map增量擴(kuò)容

更新時(shí)間：2022年06月08日 14:12:55 作者：??談笑風(fēng)生間????

這篇文章主要介紹了深入了解Golang的map增量擴(kuò)容，擴(kuò)容的主要目的是為了縮短map容器的響應(yīng)時(shí)間。增量擴(kuò)容的本質(zhì)其實(shí)就是將總的擴(kuò)容時(shí)間分?jǐn)偟搅嗣恳淮蝖ash操作上，更多相關(guān)內(nèi)容需要的小伙伴可以參考一下

核心思想

以空間換時(shí)間，訪問(wèn)速度與填充因子有關(guān)

擴(kuò)容hash表的時(shí)候每次都增大2倍，hash表大小始終為2的整數(shù)倍，有(hash mod 2^B) == (hash & (2^B-1))，方便于簡(jiǎn)化運(yùn)算，避免取余操作

擴(kuò)容前后的 hash mod 容量大小是不等的，因此要重新計(jì)算每一項(xiàng)在hash表中的位置，擴(kuò)容后需要將old pair重新hash到新的hash表上（就是一個(gè)evacuate的過(guò)程）。這個(gè)過(guò)程不是一次性完成的，在每次insert、remove的時(shí)候會(huì)搬移1-2個(gè)pair。就是使用的是增量擴(kuò)容

每個(gè)舊桶的鍵值對(duì)都會(huì)分流到2個(gè)不同的新桶中

為什么要使用增量擴(kuò)容?

主要是縮短map容器的響應(yīng)時(shí)間。如果不用增量擴(kuò)容，當(dāng)一個(gè)map存儲(chǔ)很多元素后進(jìn)行擴(kuò)容，會(huì)阻塞很長(zhǎng)時(shí)間無(wú)法響應(yīng)請(qǐng)求。增量擴(kuò)容的本質(zhì)其實(shí)就是將總的擴(kuò)容時(shí)間分?jǐn)偟搅嗣恳淮蝖ash操作上

在搬數(shù)據(jù)的時(shí)候，并不會(huì)把舊的bucket從oldbucket中刪除，只是加上了一個(gè)已刪除的標(biāo)記

擴(kuò)容期間一部分?jǐn)?shù)據(jù)在oldbucket中，一部分在bucket中，會(huì)對(duì)hash表的insert，remove，lookup操作的處理邏輯產(chǎn)生影響，如耗時(shí)更長(zhǎng)等

只有當(dāng)oldbucket中所有bucket移動(dòng)到新表后，才會(huì)將oldbucket釋放掉

擴(kuò)容方式

如果grow的太頻繁，空間的利用率就會(huì)很低，如果很久才grow，會(huì)形成很多的overflow buckets，查找速率會(huì)下降

map的填充因子是6.5

即當(dāng)count / 2^B > 6.5時(shí)會(huì)觸發(fā)一次grow.翻倍擴(kuò)容

如果負(fù)載因子沒(méi)有超標(biāo)，但是使用的溢出桶較多，也會(huì)觸發(fā)擴(kuò)容。但是是等量擴(kuò)容

原因是原桶中有太多的鍵值對(duì)被刪除，等量擴(kuò)容可以使得剩余的鍵值對(duì)排列更加緊湊，節(jié)省空間

	// Maximum average load of a bucket that triggers growth is 6.5.
	// Represent as loadFactorNum/loadFactorDen, to allow integer math.
	loadFactorNum = 13
	loadFactorDen = 2

這個(gè)6.5來(lái)源于作者的一個(gè)測(cè)試程序，取了一個(gè)相對(duì)適中的值

// Picking loadFactor: too large and we have lots of overflow
// buckets, too small and we waste a lot of space. I wrote
// a simple program to check some stats for different loads:
// (64-bit, 8 byte keys and elems)
//  loadFactor    %overflow  bytes/entry     hitprobe    missprobe
//        4.00         2.13        20.77         3.00         4.00
//        4.50         4.05        17.30         3.25         4.50
//        5.00         6.85        14.77         3.50         5.00
//        5.50        10.55        12.94         3.75         5.50
//        6.00        15.27        11.67         4.00         6.00
//        6.50        20.90        10.79         4.25         6.50
//        7.00        27.14        10.15         4.50         7.00
//        7.50        34.03         9.73         4.75         7.50
//        8.00        41.10         9.40         5.00         8.00
//
// %overflow   = percentage of buckets which have an overflow bucket
// bytes/entry = overhead bytes used per key/elem pair
// hitprobe    = # of entries to check when looking up a present key
// missprobe   = # of entries to check when looking up an absent key
//
// Keep in mind this data is for maximally loaded tables, i.e. just
// before the table grows. Typical tables will be somewhat less loaded.

源碼分析

// 只是分配新的buckets，并將老的buckets掛到oldbuckets字段上
// 真正搬遷的動(dòng)作在growWork()中
func hashGrow(t *maptype, h *hmap) {
	// If we've hit the load factor, get bigger.
	// Otherwise, there are too many overflow buckets,
	// so keep the same number of buckets and "grow" laterally.
    // B+1 相當(dāng)于之前的2倍空間
	bigger := uint8(1)
    // 對(duì)應(yīng)條件2
	if !overLoadFactor(h.count+1, h.B) {
        // 進(jìn)行等量擴(kuò)容，B不變
		bigger = 0
		h.flags |= sameSizeGrow
	}
    // 將oldbuckets掛到buckets上
	oldbuckets := h.buckets
    // 申請(qǐng)新的buckets空間
	newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

    // 對(duì)標(biāo)志位的處理
    // &^表示 按位置0
    // x=01010011
    // y=01010100
    // z=x&^y=00000011
    // 如果y的bit位為1，那么z相應(yīng)的bit位就為0
    // 否則z對(duì)應(yīng)的bit位就和x對(duì)應(yīng)的bit位相同
    // 
    // 其實(shí)就是將h.flags的iterator和oldItertor位置為0
    // 如果發(fā)現(xiàn)iterator位為1，那就把它轉(zhuǎn)接到 oldIterator 位
    // 使得 oldIterator 標(biāo)志位變成 1
    // bucket掛到了oldbuckets下，那么標(biāo)志位也一樣轉(zhuǎn)移過(guò)去
	flags := h.flags &^ (iterator | oldIterator)
	if h.flags&iterator != 0 {
		flags |= oldIterator
	}
    
    // // 可能有迭代器使用 buckets
    // iterator     = 1
    // 可能有迭代器使用 oldbuckets
    // oldIterator  = 2
    // 有協(xié)程正在向 map 中寫入 key
    // hashWriting  = 4
    // 等量擴(kuò)容（對(duì)應(yīng)條件 2）
    // sameSizeGrow = 8
    // 提交grow的動(dòng)作
	// commit the grow (atomic wrt gc)
	h.B += bigger
	h.flags = flags
	h.oldbuckets = oldbuckets
	h.buckets = newbuckets
    // 搬遷進(jìn)度為0
	h.nevacuate = 0
    // 溢出bucket數(shù)量為0
	h.noverflow = 0

	if h.extra != nil && h.extra.overflow != nil {
		// Promote current overflow buckets to the old generation.
		if h.extra.oldoverflow != nil {
			throw("oldoverflow is not nil")
		}
		h.extra.oldoverflow = h.extra.overflow
		h.extra.overflow = nil
	}
	if nextOverflow != nil {
		if h.extra == nil {
			h.extra = new(mapextra)
		}
		h.extra.nextOverflow = nextOverflow
	}

	// the actual copying of the hash table data is done incrementally
	// by growWork() and evacuate().
}

// growWork 真正執(zhí)行搬遷工作的函數(shù)
// 調(diào)用其的動(dòng)作在mapssign和mapdelete函數(shù)中，也就是插入、修改或刪除的時(shí)候都會(huì)嘗試進(jìn)行搬遷
func growWork(t *maptype, h *hmap, bucket uintptr) {
	// make sure we evacuate the oldbucket corresponding
	// to the bucket we're about to use
    // 確保搬遷的老bucket對(duì)應(yīng)的正在使用的新bucket
    // bucketmask 作用就是將key算出來(lái)的hash值與bucketmask相&，得到key應(yīng)該落入的bucket
    // 只有hash值低B位決策key落入那個(gè)bucket
	evacuate(t, h, bucket&h.oldbucketmask())

	// evacuate one more oldbucket to make progress on growing
    // 再搬遷一個(gè)bucket，加快搬遷進(jìn)度，這就是說(shuō)為什么可能每次操作會(huì)搬遷1-2個(gè)bucket
	if h.growing() {
		evacuate(t, h, h.nevacuate)
	}
}

// 返回?cái)U(kuò)容前的bucketmask
// 
// 所謂的bucketmask作用就是將 key 計(jì)算出來(lái)的哈希值與 bucketmask 相與
// 得到的結(jié)果就是 key 應(yīng)該落入的桶
// 比如 B = 5，那么 bucketmask 的低 5 位是 11111，其余位是 0
// hash 值與其相與的意思是，只有 hash 值的低 5 位決策 key 到底落入哪個(gè) bucket。
// oldbucketmask provides a mask that can be applied to calculate n % noldbuckets().
func (h *hmap) oldbucketmask() uintptr {
	return h.noldbuckets() - 1
}

// 檢查oldbuckets是否搬遷完
// growing reports whether h is growing. The growth may be to the same size or bigger.
func (h *hmap) growing() bool {
	return h.oldbuckets != nil
}

// evacDst is an evacuation destination.
type evacDst struct {
    // 標(biāo)識(shí)bucket移動(dòng)的目標(biāo)地址
	b *bmap          // current destination bucket
    // k-v的索引
	i int            // key/elem index into b
    // 指向k
	k unsafe.Pointer // pointer to current key storage
    // 指向v
	e unsafe.Pointer // pointer to current elem storage
}
// evacuate 核心搬遷函數(shù)
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    // 定位老的bucket的地址
	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
	// 結(jié)果是2^B，進(jìn)行計(jì)算 如 B = 5，結(jié)果為32
    newbit := h.noldbuckets()
    // 如果b沒(méi)被搬遷過(guò)
	if !evacuated(b) {
		// TODO: reuse overflow buckets instead of using new ones, if there
		// is no iterator using the old buckets.  (If !oldIterator.)

		// xy contains the x and y (low and high) evacuation destinations.
        // xy包含了兩個(gè)可能搬遷到的目的bucket地址
        // 默認(rèn)是等量擴(kuò)容的，用x來(lái)搬遷
		var xy [2]evacDst
		x := &xy[0]
		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
		x.k = add(unsafe.Pointer(x.b), dataOffset)
		x.e = add(x.k, bucketCnt*uintptr(t.keysize))

        // 如果不是等量擴(kuò)容，前后的bucket序號(hào)有變
        // 使用y來(lái)搬遷
		if !h.sameSizeGrow() {
			// Only calculate y pointers if we're growing bigger.
			// Otherwise GC can see bad pointers.
			y := &xy[1]
            // y代表的bucket序號(hào)增加了2^B
			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
			y.k = add(unsafe.Pointer(y.b), dataOffset)
			y.e = add(y.k, bucketCnt*uintptr(t.keysize))
		}

        // 遍歷所有的bucket，包括溢出bucket
        // b是老bucket的地址
		for ; b != nil; b = b.overflow(t) {
			k := add(unsafe.Pointer(b), dataOffset)
			e := add(k, bucketCnt*uintptr(t.keysize))
            // 遍歷bucket里所有的cell
			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
				// 當(dāng)前cell的tophash值
                top := b.tophash[i]
                // 如果cell為空，即沒(méi)有key
                // 說(shuō)明其被搬遷過(guò)，作標(biāo)記然后繼續(xù)下一個(gè)cell
				if isEmpty(top) {
					b.tophash[i] = evacuatedEmpty
					continue
				}

                // 一般不會(huì)出現(xiàn)這種情況
                // 未搬遷的cell只可能是empty或者正常的tophash
                // 不會(huì)小于minTopHash
				if top < minTopHash {
					throw("bad map state")
				}
                // 進(jìn)行一次拷貝避免相同內(nèi)存地址問(wèn)題
				k2 := k
                // key如果是指針就進(jìn)行解引用
				if t.indirectkey() {
					k2 = *((*unsafe.Pointer)(k2))
				}
                // 默認(rèn)值為0標(biāo)識(shí)默認(rèn)是使用x，進(jìn)行等量擴(kuò)容
				var useY uint8
                // 增量擴(kuò)容
				if !h.sameSizeGrow() {
					// Compute hash to make our evacuation decision (whether we need
					// to send this key/elem to bucket x or bucket y).
                    // 計(jì)算hash值，與第一次寫入一樣
					hash := t.hasher(k2, uintptr(h.hash0))

                    // 有協(xié)程在遍歷map 且 出現(xiàn)相同的key，計(jì)算出的hash值不同
                    // 這里只會(huì)有一種情況，也就是float64的時(shí)候
                    // 每次hash出來(lái)都會(huì)是不同的hash值，這就意味著無(wú)法通過(guò)get去獲取其key確切位置
                    // 因此采用取最低位位置來(lái)分辨
                    // 為下一個(gè)level重新計(jì)算一個(gè)隨機(jī)的tophash
                    // 這些key將會(huì)在多次增長(zhǎng)后均勻的分布在所有的存儲(chǔ)桶中
					if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
						// If key != key (NaNs), then the hash could be (and probably
						// will be) entirely different from the old hash. Moreover,
						// it isn't reproducible. Reproducibility is required in the
						// presence of iterators, as our evacuation decision must
						// match whatever decision the iterator made.
						// Fortunately, we have the freedom to send these keys either
						// way. Also, tophash is meaningless for these kinds of keys.
						// We let the low bit of tophash drive the evacuation decision.
						// We recompute a new random tophash for the next level so
						// these keys will get evenly distributed across all buckets
						// after multiple grows.
                        // 第B位 置1
                        // 如果tophash最低位是0就分配到x part 否則分配到y(tǒng) part
						useY = top & 1
						top = tophash(hash)
					} else {
                        // 對(duì)于正常的key
                        // 第B位 置0
						if hash&newbit != 0 {
                            // 使用y部分
							useY = 1
						}
					}
				}

				if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
					throw("bad evacuatedN")
				}
                // 這里其實(shí)就是重新設(shè)置tophash值
                // 標(biāo)記老的cell的tophash值，表示搬到useT部分（可能是x也可能是y，看具體取值）
				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
				// 選擇目標(biāo)bucket的內(nèi)存起始部分
                dst := &xy[useY]                 // evacuation destination
                
                // 如果i=8說(shuō)明要溢出了
				if dst.i == bucketCnt {
                    // 新建一個(gè)溢出bucket
					dst.b = h.newoverflow(t, dst.b)
                    // 從0開(kāi)始計(jì)數(shù)
					dst.i = 0
                    // 標(biāo)識(shí)key要移動(dòng)到的位置
					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
                    // 標(biāo)識(shí)value要移動(dòng)到的位置
					dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
				}
                // 重新設(shè)置tophash
				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
				if t.indirectkey() {
                    // 將原key（指針類型）復(fù)制到新的位置
					*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
				} else {
                    // 將原key（值類型）復(fù)制到新位置
					typedmemmove(t.key, dst.k, k) // copy elem
				}
                // 如果v是指針，操作同key
				if t.indirectelem() {
					*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
				} else {
					typedmemmove(t.elem, dst.e, e)
				}
                // 定位到下一個(gè)cell
				dst.i++
				// These updates might push these pointers past the end of the
				// key or elem arrays.  That's ok, as we have the overflow pointer
				// at the end of the bucket to protect against pointing past the
				// end of the bucket.
                // 兩個(gè)溢出指針在bucket末尾用于保證 遍歷到bucket末尾的指針
				dst.k = add(dst.k, uintptr(t.keysize))
				dst.e = add(dst.e, uintptr(t.elemsize))
			}
		}
        // 如果沒(méi)有協(xié)程在用老的bucket，就將老的bucket清除，幫助gc
		// Unlink the overflow buckets & clear key/elem to help GC.
		if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
			b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
			// Preserve b.tophash because the evacuation
			// state is maintained there.
			ptr := add(b, dataOffset)
			n := uintptr(t.bucketsize) - dataOffset
            // 只清除k-v部分，tophash用于標(biāo)識(shí)搬遷狀態(tài)
			memclrHasPointers(ptr, n)
		}
	}

    // 如果此次搬遷的bucket等于當(dāng)前搬遷進(jìn)度，更新搬遷進(jìn)度
	if oldbucket == h.nevacuate {
		advanceEvacuationMark(h, t, newbit)
	}
}

// 更新搬遷進(jìn)度
func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
    // 進(jìn)度+1
	h.nevacuate++
    // 嘗試往后看1024個(gè)bucket，確保行為是O(1)的
	// Experiments suggest that 1024 is overkill by at least an order of magnitude.
	// Put it in there as a safeguard anyway, to ensure O(1) behavior.
	stop := h.nevacuate + 1024
	if stop > newbit {
		stop = newbit
	}
    // 尋找沒(méi)有搬遷過(guò)的bucket
	for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {
		h.nevacuate++
	}

    // 現(xiàn)在h.nevacuate之前的bucket都被搬遷完畢了

    // 如果所有bucket搬遷完畢
	if h.nevacuate == newbit { // newbit == # of oldbuckets
        // 清除oldbuckets，釋放bucket array
		// Growing is all done. Free old main bucket array.
		h.oldbuckets = nil
        // 清除老的溢出bucket
        // [0]表示當(dāng)前溢出bucket
        // [1]表示老的溢出bucket
		// Can discard old overflow buckets as well.
		// If they are still referenced by an iterator,
		// then the iterator holds a pointers to the slice.
		if h.extra != nil {
			h.extra.oldoverflow = nil
		}
        // 清除正在擴(kuò)容的標(biāo)志位
		h.flags &^= sameSizeGrow
	}
}

源碼里提到 X, Y part，其實(shí)就是我們說(shuō)的如果是擴(kuò)容到原來(lái)的 2 倍，桶的數(shù)量是原來(lái)的 2 倍，前一半桶被稱為 X part，后一半桶被稱為 Y part。一個(gè) bucket 中的 key 會(huì)分裂落到 2 個(gè)桶中。一個(gè)位于 X part，一個(gè)位于 Y part。所以在搬遷一個(gè) cell 之前，需要知道這個(gè) cell 中的 key 是落到哪個(gè) Part。

其實(shí)很簡(jiǎn)單，重新計(jì)算 cell 中 key 的 hash，并向前“多看”一位，決定落入哪個(gè) Part

設(shè)置 key 在原始 buckets 的 tophash 為 evacuatedX 或是 evacuatedY，表示已經(jīng)搬遷到了新 map 的 x part 或是 y part。新 map 的 tophash 則正常取 key 哈希值的高 8 位。

對(duì)于增量擴(kuò)容來(lái)說(shuō)：某個(gè) key 在搬遷前后 bucket 序號(hào)可能和原來(lái)相等，也可能是相比原來(lái)加上 2^B（原來(lái)的 B 值），取決于 hash 值第 6 bit 位是 0 還是 1。

當(dāng)搬遷碰到 math.NaN() 的 key 時(shí)，只通過(guò) tophash 的最低位決定分配到 X part 還是 Y part（如果擴(kuò)容后是原來(lái) buckets 數(shù)量的 2 倍）。如果 tophash 的最低位是 0 ，分配到 X part；如果是 1 ，則分配到 Y part，已搬遷完的key的tophash值是一個(gè)狀態(tài)值，表示key的搬遷去向

到此這篇關(guān)于深入了解Golang的map增量擴(kuò)容的文章就介紹到這了,更多相關(guān) Go增量擴(kuò)容內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: