快捷導(dǎo)航

源碼剖析Golang中map擴(kuò)容底層的實(shí)現(xiàn)

更新時(shí)間：2023年03月06日 14:06:49 作者：勁仔Go

之前的文章詳細(xì)介紹過(guò)Go切片和map的基本使用，以及切片的擴(kuò)容機(jī)制。本文針對(duì)map的擴(kuò)容，會(huì)從源碼的角度全面的剖析一下map擴(kuò)容的底層實(shí)現(xiàn)，需要的可以參考一下

前言

之前的文章詳細(xì)介紹過(guò)Go切片和map的基本使用，以及切片的擴(kuò)容機(jī)制。本文針對(duì)map的擴(kuò)容，會(huì)從源碼的角度全面的剖析一下map擴(kuò)容的底層實(shí)現(xiàn)。

map底層結(jié)構(gòu)

主要包含兩個(gè)核心結(jié)構(gòu)體hmap和bmap

數(shù)據(jù)會(huì)先存儲(chǔ)在正常桶hmap.buckets指向的bmap數(shù)組中，一個(gè)bmap只能存儲(chǔ)8組鍵值對(duì)數(shù)據(jù)，超過(guò)則會(huì)將數(shù)據(jù)存儲(chǔ)到溢出桶hmap.extra.overflow指向的bmap數(shù)組中

那么，當(dāng)溢出桶也存儲(chǔ)不下了，會(huì)怎么辦呢，數(shù)據(jù)得存儲(chǔ)到哪去呢？答案，肯定是擴(kuò)容，那么擴(kuò)容怎么實(shí)現(xiàn)的呢？接著往下看

擴(kuò)容時(shí)機(jī)

在向 map 插入新 key 的時(shí)候，會(huì)進(jìn)行條件檢測(cè)，符合下面這 2 個(gè)條件，就會(huì)觸發(fā)擴(kuò)容

// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
  hashGrow(t, h)
  goto again // Growing the table invalidates everything, so try again
}

// growing reports whether h is growing. The growth may be to the same size or bigger.
func (h *hmap) growing() bool {
  return h.oldbuckets != nil
}

條件1：超過(guò)負(fù)載

map元素個(gè)數(shù) > 6.5 * 桶個(gè)數(shù)

// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
  return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

其中

bucketCnt = 8，一個(gè)桶可以裝的最大元素個(gè)數(shù)
loadFactor = 6.5，負(fù)載因子，平均每個(gè)桶的元素個(gè)數(shù)
bucketShift(B): 桶的個(gè)數(shù)

條件2：溢出桶太多

當(dāng)桶總數(shù) < 2 ^ 15 時(shí)，如果溢出桶總數(shù) >= 桶總數(shù)，則認(rèn)為溢出桶過(guò)多。

當(dāng)桶總數(shù) >= 2 ^ 15 時(shí)，直接與 2 ^ 15 比較，當(dāng)溢出桶總數(shù) >= 2 ^ 15 時(shí)，即認(rèn)為溢出桶太多了。

// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
  // If the threshold is too low, we do extraneous work.
  // If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
  // "too many" means (approximately) as many overflow buckets as regular buckets.
  // See incrnoverflow for more details.
  if B > 15 {
    B = 15
  }
  // The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
  return noverflow >= uint16(1)<<(B&15)
}

對(duì)于條件2，其實(shí)算是對(duì)條件1的補(bǔ)充。因?yàn)樵谪?fù)載因子比較小的情況下，有可能 map 的查找和插入效率也很低，而第 1 點(diǎn)識(shí)別不出來(lái)這種情況。

表面現(xiàn)象就是負(fù)載因子比較小，即 map 里元素總數(shù)少，但是桶數(shù)量多（真實(shí)分配的桶數(shù)量多，包括大量的溢出桶）。比如不斷的增刪，這樣會(huì)造成overflow的bucket數(shù)量增多，但負(fù)載因子又不高，達(dá)不到第 1 點(diǎn)的臨界值，就不能觸發(fā)擴(kuò)容來(lái)緩解這種情況。這樣會(huì)造成桶的使用率不高，值存儲(chǔ)得比較稀疏，查找插入效率會(huì)變得非常低，因此有了第 2 擴(kuò)容條件。

擴(kuò)容方式

雙倍擴(kuò)容

針對(duì)條件1，新建一個(gè)buckets數(shù)組，新的buckets大小是原來(lái)的2倍，然后舊buckets數(shù)據(jù)搬遷到新的buckets，該方法我們稱(chēng)之為雙倍擴(kuò)容

等量擴(kuò)容

針對(duì)條件2，并不擴(kuò)大容量，buckets數(shù)量維持不變，重新做一遍類(lèi)似雙倍擴(kuò)容的搬遷動(dòng)作，把松散的鍵值對(duì)重新排列一次，使得同一個(gè) bucket 中的 key 排列地更緊密，節(jié)省空間，提高 bucket 利用率，進(jìn)而保證更快的存取，該方法我們稱(chēng)之為等量擴(kuò)容

擴(kuò)容函數(shù)

上面說(shuō)的 hashGrow() 函數(shù)實(shí)際上并沒(méi)有真正地“搬遷”，它只是分配好了新的 buckets，并將老的 buckets 掛到了 oldbuckets 字段上。

真正搬遷 buckets 的動(dòng)作在 growWork() 函數(shù)中，而調(diào)用 growWork() 函數(shù)的動(dòng)作是在 mapassign 和 mapdelete 函數(shù)中。也就是插入或修改、刪除 key 的時(shí)候，都會(huì)嘗試進(jìn)行搬遷 buckets 的工作。先檢查 oldbuckets 是否搬遷完畢，具體來(lái)說(shuō)就是檢查 oldbuckets 是否為 nil

func hashGrow(t *maptype, h *hmap) {
  // If we've hit the load factor, get bigger.
  // Otherwise, there are too many overflow buckets,
  // so keep the same number of buckets and "grow" laterally.
  bigger := uint8(1)
  if !overLoadFactor(h.count+1, h.B) {
    bigger = 0
    h.flags |= sameSizeGrow
  }
  oldbuckets := h.buckets
  newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

  flags := h.flags &^ (iterator | oldIterator)
  if h.flags&iterator != 0 {
    flags |= oldIterator
  }
  // commit the grow (atomic wrt gc)
  h.B += bigger
  h.flags = flags
  h.oldbuckets = oldbuckets
  h.buckets = newbuckets
  h.nevacuate = 0
  h.noverflow = 0

  if h.extra != nil && h.extra.overflow != nil {
    // Promote current overflow buckets to the old generation.
    if h.extra.oldoverflow != nil {
      throw("oldoverflow is not nil")
    }
    h.extra.oldoverflow = h.extra.overflow
    h.extra.overflow = nil
  }
  if nextOverflow != nil {
    if h.extra == nil {
      h.extra = new(mapextra)
    }
    h.extra.nextOverflow = nextOverflow
  }

  // the actual copying of the hash table data is done incrementally
  // by growWork() and evacuate().
}

由于 map 擴(kuò)容需要將原有的 key/value 重新搬遷到新的內(nèi)存地址，如果map存儲(chǔ)了數(shù)以億計(jì)的key-value，一次性搬遷將會(huì)造成比較大的延時(shí)，因此 Go map 的擴(kuò)容采取了一種稱(chēng)為“漸進(jìn)式”的方式，原有的 key 并不會(huì)一次性搬遷完畢，每次最多只會(huì)搬遷 2 個(gè) bucket。

func growWork(t *maptype, h *hmap, bucket uintptr) {
  // make sure we evacuate the oldbucket corresponding
  // to the bucket we're about to use
  evacuate(t, h, bucket&h.oldbucketmask())

  // evacuate one more oldbucket to make progress on growing
  if h.growing() {
    evacuate(t, h, h.nevacuate)
  }
}