Golang基礎(chǔ)學(xué)習(xí)之map的示例詳解

更新時間：2023年03月30日 08:44:38 作者：IguoChan

哈希表是常見的數(shù)據(jù)結(jié)構(gòu)，有的語言會將哈希稱作字典或者映射，在Go中，哈希就是常見的數(shù)據(jù)類型map，本文就來聊聊Golang中map的相關(guān)知識吧

0. 簡介

哈希表是常見的數(shù)據(jù)結(jié)構(gòu)，有的語言會將哈希稱作字典或者映射，在Go中，哈希就是常見的數(shù)據(jù)類型map。哈希表提供了鍵值之間的映射，其讀寫性能是O(1)。

1. 實現(xiàn)原理

1.1 底層結(jié)構(gòu)

hmap

在Go中，map的底層結(jié)構(gòu)是hmap，如下。實際上，map類型就是一個指向一個hmap結(jié)構(gòu)體的指針，所以其可以理解為是Go中的”引用“類型（有的文章認(rèn)為slice也是引用類型，說實話這我不敢茍同，因為切片的拷貝切片發(fā)生的操作并不一定會完全影響原切片，譬如append操作）。

// A header for a Go map.
type hmap struct {
   // Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
   // Make sure this stays in sync with the compiler's definition.
   count     int // # live cells == size of map.  Must be first (used by len() builtin)
   flags     uint8
   B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
   noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
   hash0     uint32 // hash seed

   buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
   oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
   nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

   extra *mapextra // optional fields
}

以上字段中，含義我們都可以按照注釋理解，我們需要著重關(guān)注buckets、oldbuckets和extra幾個字段。bucket就是我們常說的”桶“，一個桶中最多裝8個key-value對，我們也可以理解為8個槽。

bmap

以下runtime.bmap定義的bucket的結(jié)構(gòu)體，可以看到，其只是存儲了8個tophash值，即8個key的哈希的高 8 位，通過比較不同鍵的哈希的高 8 位可以減少訪問鍵值對次數(shù)以提高性能。

// A bucket for a Go map.
type bmap struct {
   // tophash generally contains the top byte of the hash value
   // for each key in this bucket. If tophash[0] < minTopHash,
   // tophash[0] is a bucket evacuation state instead.
   tophash [bucketCnt]uint8
   // Followed by bucketCnt keys and then bucketCnt elems.
   // NOTE: packing all the keys together and then all the elems together makes the
   // code a bit more complicated than alternating key/elem/key/elem/... but it allows
   // us to eliminate padding which would be needed for, e.g., map[int64]int8.
   // Followed by an overflow pointer.
}

因為哈希表中可能存儲不同類型的鍵值對，所以鍵值對的空間大小只能在實際編譯時進行推導(dǎo)，在編譯時，bmap結(jié)構(gòu)體會被以下結(jié)構(gòu)所替代，參考cmd/compile/internal/reflectdata.MapBucketType?？梢园l(fā)現(xiàn)，在內(nèi)存排列上，沒有采取key1/elem1/key2/elem2...的排列，而是將所有的key存放在一起，所有的value存放在一起，這是為了避免鍵值的類型間隔排列帶來的內(nèi)存對齊問題，反而更加浪費內(nèi)存。

type bmap struct {
   topbits  [8]uint8
   keys     [8]keytype
   elems    [8]elemtype
   overflow uintptr

需要注意的是，如果keys和elems沒有指針，map實現(xiàn)可以在旁邊保留一個溢出指針列表，以便可以將buckets標(biāo)記為沒有指針，這樣就可以避免在GC時掃描整個map。在這種情況下，overflow字段的類型是uintptr；否則，其類型就是unsafe.Pointer。而這個溢出的指針列表就是hmap中的extra字段，其類型定義如下。其實，extra字段就是為了優(yōu)化GC而設(shè)計的。

// mapextra holds fields that are not present on all maps.
type mapextra struct {
   // If both key and elem do not contain pointers and are inline, then we mark bucket
   // type as containing no pointers. This avoids scanning such maps.
   // However, bmap.overflow is a pointer. In order to keep overflow buckets
   // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
   // overflow and oldoverflow are only used if key and elem do not contain pointers.
   // overflow contains overflow buckets for hmap.buckets.
   // oldoverflow contains overflow buckets for hmap.oldbuckets.
   // The indirection allows to store a pointer to the slice in hiter.
   overflow    *[]*bmap
   oldoverflow *[]*bmap

   // nextOverflow holds a pointer to a free overflow bucket.
   nextOverflow *bmap
}

1.2 map創(chuàng)建

map在代碼中的創(chuàng)建有多種方式，其函數(shù)類似于make(map[KT]VT, hint intType)，hint并不能認(rèn)為是map的容量，只能說是給map創(chuàng)建傳入的一個提示大小，不填時默認(rèn)為0.

var map1 = map[int]int{
   1: 1,
}

func makeMapIntInt() map[int]int {
   return make(map[int]int)
}

func makeMapIntIntWithHint(hint int) map[int]int {
   return make(map[int]int, hint)
}

func main() {
   _ = map1

   map2 := make(map[int]int)
   _ = map2

   map3 := makeMapIntInt()
   _ = map3

   map4 := make(map[int]int, 9)
   _ = map4

   map5 := makeMapIntIntWithHint(9)
   _ = map5

   map6 := make(map[int]int, 53)
   _ = map6

   map7 := makeMapIntIntWithHint(53)
   _ = map7

如上，通過運行go tool compile -S main.go > main.i得到匯編代碼以及調(diào)試，可以總結(jié)如下：

當(dāng)創(chuàng)建的map被分配到棧上，且其hint小于等于bucketCnt = 8時（map2），會被采取如下優(yōu)化：

MOVD $""..autotmp_28-1200(SP), R16
MOVD $""..autotmp_28-1072(SP), R0
STP.P (ZR, ZR), 16(R16)
CMP R0, R16
BLE 44
PCDATA $1, ZR
CALL runtime.fastrand(SB)

當(dāng)創(chuàng)建的map被分配到堆上且其hint小于等于8時，不管是通過字面量初始化（map1）還是通過make函數(shù)初始化（map3），其都將調(diào)用makemap_small函數(shù)創(chuàng)建；

當(dāng)創(chuàng)建的map的hint大于8，且小于等于52（此時是hmap中B=3時的最大裝載量）時，其將調(diào)用 makemap函數(shù)完成初始化，且extra字段是nil，且會在堆上分配buckets；

當(dāng)hint大于52（即hmap.B ≥ 4）時，其將調(diào)用 makemap函數(shù)完成初始化，且extra字段不為nil，且會在堆上分配buckets；

func makemap64(t *maptype, hint int64, h *hmap) *hmap

// makemap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap

接下來，我們具體分析一下map創(chuàng)建時所做的事情，即分析makemap_small和makemap函數(shù)到底做了什么。

hint=0并新增一個元素 如上所述，當(dāng)調(diào)用make創(chuàng)建map時不傳入hint，調(diào)用的是makemap_small函數(shù)，其實這個函數(shù)做的事情特別簡單，就是在堆上創(chuàng)建了一個hmap對象，初始化了哈希種子。

func makemap_small() *hmap {
   h := new(hmap)
   h.hash0 = fastrand()
   return h
}

在寫操作的時候，會判斷這個hmap對象的buckets是否為空，如果是空的，那么就會創(chuàng)建一個bucket，如下圖片，可以很好地展現(xiàn)以下代碼創(chuàng)建的map對象的內(nèi)存結(jié)構(gòu)。

m := make(map[int]int)
m[1] = 1

hint=53 前面說過，當(dāng)hint大于52時，會調(diào)用makemap函數(shù)，并且生成溢出桶，下面，我們就借助這種情況，好好分析一下makemap函數(shù)。

func makemap(t *maptype, hint int, h *hmap) *hmap {
   mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
   if overflow || mem > maxAlloc {
      hint = 0
   }

   // initialize Hmap
   if h == nil {
      h = new(hmap)
   }
   h.hash0 = fastrand()

   // Find the size parameter B which will hold the requested # of elements.
   // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
   B := uint8(0)
   for overLoadFactor(hint, B) {
      B++
   }
   h.B = B

   // allocate initial hash table
   // if B == 0, the buckets field is allocated lazily later (in mapassign)
   // If hint is large zeroing this memory could take a while.
   if h.B != 0 {
      var nextOverflow *bmap
      h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
      if nextOverflow != nil {
         h.extra = new(mapextra)
         h.extra.nextOverflow = nextOverflow
      }
   }

   return h
}

makemap函數(shù)會首先判斷設(shè)置的hint大小是不是超出了限制，比如超過了最大允許申請內(nèi)存，或者最大指針數(shù)，如果超出了的話，會將hint置為0，所以可以看出，map創(chuàng)建時的hint是個建議值；然后，會通過overLoadFactor函數(shù)判斷對于hint大小的map，根據(jù)6.5的裝載因子，大致需要多少個bucket，從而確定h.B這個參數(shù)；最后會根據(jù)h.B參數(shù)和運行時的表類型參數(shù)t確定需要為buckets申請多少內(nèi)存，以及是否需要申請溢出桶。以下代碼的hint=53，計算出來的h.B=4，所以需要24個桶，同時也會分配溢出桶。

m := make(map[int]int, 53)

值得注意的是，上面兩種不同的桶（可分為正常桶和溢出桶，可以看出2hmap.B指的是正常桶的數(shù)目，不包括溢出桶）在內(nèi)存中是連續(xù)存儲的，只是用不同的指針指向而已，其中，extra.nextOverflow指向的是下一個能用的溢出桶，而extra.overflow和extra.oldoverflow在map的key-value都是非指針類型時起作用，用于存儲指向溢出桶的指針，優(yōu)化GC。

1.3 寫操作

對于map而言，不管是修改key對應(yīng)的value還是設(shè)置value，對其都是寫操作，在運行時，大致會調(diào)用runtime.mapassign函數(shù)，不過，Go SDK包對一些特殊的key值做了優(yōu)化操作，比如：

key類型	插入函數(shù)	備注
uint32	runtime.mapassign_fast32
uint64	runtime.mapassign_fast64	int類型時也會用這個函數(shù)
string	runtime.mapassign_faststr

這里，我們還是分析基礎(chǔ)的runtime.mapassign函數(shù)，鑒于函數(shù)太長，我們分段解析函數(shù)。

if h == nil {
   panic(plainError("assignment to entry in nil map"))
}
...
if h.flags&hashWriting != 0 {
   throw("concurrent map writes")
}
hash := t.hasher(key, uintptr(h.hash0))

// Set hashWriting after calling t.hasher, since t.hasher may panic,
// in which case we have not actually done a write.
h.flags ^= hashWriting

以上，mapassign會做map的空值校驗和并發(fā)寫校驗，這里也說明，map是并發(fā)不安全的；并且在hash之后再置標(biāo)志位的行，代碼也做了解釋：即hasher函數(shù)可能panic，這種情況下并沒有在寫入（but，我并沒有十分理解，panic了也沒有recover，程序都崩潰了，還能咋地？再說，并發(fā)寫的時候，兩個協(xié)程同時執(zhí)行到取hash步驟，可能導(dǎo)致throw那一行無法觸發(fā)呀！）

again:
   bucket := hash & bucketMask(h.B)
   if h.growing() {
      growWork(t, h, bucket)
   }
   b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
   top := tophash(hash)

   var inserti *uint8
   var insertk unsafe.Pointer
   var elem unsafe.Pointer

以上代碼中，bucketMask函數(shù)會根據(jù)h.B的大小，返回不同的掩碼，說白了，就是根據(jù)bucket的數(shù)目生成掩碼，其實就是從最低位開始數(shù)B個1?？梢哉f，上述代碼中的bucket其實就是桶序號（從0開始）。這時候還要檢查一下是否在擴容，如果是的話，需要先執(zhí)行擴容操作。接著，會根據(jù)前面的桶序號生成指向這個桶的指針b。最后聲明三個指針，inserti表示目標(biāo)元素的在桶中的索引，insertk和 elem分別表示鍵值對的地址。

bucketloop:
   for {
      for i := uintptr(0); i < bucketCnt; i++ {
         if b.tophash[i] != top {
            if isEmpty(b.tophash[i]) && inserti == nil {
               inserti = &b.tophash[i]
               insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
               elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            }
            if b.tophash[i] == emptyRest {
               break bucketloop
            }
            continue
         }
         k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
         if t.indirectkey() {
            k = *((*unsafe.Pointer)(k))
         }
         if !t.key.equal(key, k) {
            continue
         }
         // already have a mapping for key. Update it.
         if t.needkeyupdate() {
            typedmemmove(t.key, k, key)
         }
         elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
         goto done
      }
      ovf := b.overflow(t)
      if ovf == nil {
         break
      }
      b = ovf
   }

以上代碼，接下來就是在桶內(nèi)尋找空隙或者原有的key值進行插入或者修改，基本邏輯就是，循環(huán)遍歷這個桶的八個槽，通過tophash判斷，效率可能會高一些，如果未匹配且這個槽是空的狀態(tài)（可能是剛初始化的空，即tophash[i]值為0，也有可能是被刪除后的空，即tophash[i]的值為1），我們先講以上三個指針賦值到此槽對應(yīng)的位置；如果是后者，即是未被使用過的槽，那直接跳出循環(huán)，將此key-value插入到這個位置（因為不可能存在其他的槽插入過這個鍵值）。如果找到了，那么更新數(shù)據(jù)就好，這里不贅述。

值得注意的是，如果將整個桶都找遍了，還是沒有找到，那么會通過b.overflow(t)檢查是否有溢出桶在此桶后面，如果有的話，會繼續(xù)搜尋；如果沒有，則在后續(xù)判斷是否需要擴容，或者是否需要新建溢出桶。

// Did not find mapping for key. Allocate new cell & add entry.

// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
   hashGrow(t, h)
   goto again // Growing the table invalidates everything, so try again
}

if inserti == nil {
   // The current bucket and all the overflow buckets connected to it are full, allocate a new one.
   newb := h.newoverflow(t, b)
   inserti = &newb.tophash[0]
   insertk = add(unsafe.Pointer(newb), dataOffset)
   elem = add(insertk, bucketCnt*uintptr(t.keysize))
}

// store new key/elem at insert position
if t.indirectkey() {
   kmem := newobject(t.key)
   *(*unsafe.Pointer)(insertk) = kmem
   insertk = kmem
}
if t.indirectelem() {
   vmem := newobject(t.elem)
   *(*unsafe.Pointer)(elem) = vmem
}
typedmemmove(t.key, insertk, key)
*inserti = top
h.count++

以上代碼，都是在原先所有的桶中沒有找到的一些處理，首先是通過overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)來判斷map是否需要擴容，這里涉及到兩種擴容條件，分別是裝載因子過高和溢出桶過多，只要滿足一種，都將引起擴容，并且返回到again標(biāo)記處進行擴容處理，之后再進行一次主流程。擴容的機制在后面介紹。

如果不需要進行擴容，那么就需要在現(xiàn)有桶的鏈表后（這里需要提及的是，Go中的map使用拉鏈法解哈希沖突[相關(guān)知識可以參考文末補充內(nèi)容]）新增一個溢出桶，然后分配我們的數(shù)據(jù)未知，其思路也很簡單，如果預(yù)先申請了空余的溢出桶，那使用這個桶，如果沒有，那么申請一個桶，并且設(shè)置一些參數(shù)和標(biāo)志等。

done:
   if h.flags&hashWriting == 0 {
      throw("concurrent map writes")
   }
   h.flags &^= hashWriting
   if t.indirectelem() {
      elem = *((*unsafe.Pointer)(elem))
   }
   return elem

以上，最后一段就是標(biāo)志位的處理，并且返回找到的value的地址，在其他函數(shù)中對這段地址進行賦值操作等，此不贅述了。

1.4 讀操作

v := m[k]     // 如果存在對應(yīng) v，則返回 v；如果不存在，則返回 對應(yīng)零值
v, ok := m[k] // 如果存在對應(yīng) v，則返回 v, true；如果不存在，則返回 對應(yīng)零值, false

我們都知道，map讀取操作有以上兩種方式，那對應(yīng)的runtime函數(shù)也應(yīng)該有兩種方式，分別是mapaccess1和mapaccess2，前者只返回值，后者返回值和是否存在，其他沒有什么區(qū)別，同理，針對一些類型，Go SDK也做了對應(yīng)優(yōu)化：

key類型	讀取函數(shù)1	讀取函數(shù)2	備注
uint32	runtime.mapaccess1_fast32	runtime.mapaccess2_fast32
uint64	runtime.mapaccess1_fast64	runtime.mapaccess2_fast64	int類型時也會用這個函數(shù)
string	runtime.mapaccess1_faststr	runtime.mapaccess2_faststr

下面，我們以mapaccess1為例，分析一下map的讀操作。

if h == nil || h.count == 0 {
   if t.hashMightPanic() {
      t.hasher(key, 0) // see issue 23734
   }
   return unsafe.Pointer(&zeroVal[0])
}
if h.flags&hashWriting != 0 {
   throw("concurrent map read and map write")
}

以上代碼，當(dāng)表為空時，直接返回零值，當(dāng)有并發(fā)寫操作時，報panic。我們把中間一部分和擴容相關(guān)的內(nèi)容留待后續(xù)講解，直接看下面的代碼。

bucketloop:
   for ; b != nil; b = b.overflow(t) {
      for i := uintptr(0); i < bucketCnt; i++ {
         if b.tophash[i] != top {
            if b.tophash[i] == emptyRest {
               break bucketloop
            }
            continue
         }
         k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
         if t.indirectkey() {
            k = *((*unsafe.Pointer)(k))
         }
         if t.key.equal(key, k) {
            e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            if t.indirectelem() {
               e = *((*unsafe.Pointer)(e))
            }
            return e
         }
      }
   }
   return unsafe.Pointer(&zeroVal[0])

和寫操作一樣，在確定了需要讀取的桶之后，有以上這個循環(huán)函數(shù)，我們先看內(nèi)循環(huán)，如果在槽i不匹配且該槽未被使用過，說明其后的槽也肯定沒有使用過，說明這個key不可能在表中，可以直接返回零值。而如果不滿足則一個一個找，本桶找完以后還會通過外循環(huán)去找溢出桶（如果有的話），找到了就返回；如果最后還沒找到，說明不存在，則返回零值。

1.5 for-range操作

在map的迭代操作中，其依托于以下結(jié)構(gòu)體，我們需要關(guān)注的是key、elem和startBucket、offset兩對參數(shù)需要關(guān)注一下。

// A hash iteration structure.
// If you modify hiter, also change cmd/compile/internal/reflectdata/reflect.go
// and reflect/value.go to match the layout of this structure.
type hiter struct {
   key         unsafe.Pointer // Must be in first position.  Write nil to indicate iteration end (see cmd/compile/internal/walk/range.go).
   elem        unsafe.Pointer // Must be in second position (see cmd/compile/internal/walk/range.go).
   t           *maptype
   h           *hmap
   buckets     unsafe.Pointer // bucket ptr at hash_iter initialization time
   bptr        *bmap          // current bucket
   overflow    *[]*bmap       // keeps overflow buckets of hmap.buckets alive
   oldoverflow *[]*bmap       // keeps overflow buckets of hmap.oldbuckets alive
   startBucket uintptr        // bucket iteration started at
   offset      uint8          // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)
   wrapped     bool           // already wrapped around from end of bucket array to beginning
   B           uint8
   i           uint8
   bucket      uintptr
   checkBucket uintptr
}

1.5.1 注意遍歷時的閉包

可以看到，hiter作為for-range遍歷時的結(jié)構(gòu)體，key和elem作為指向key-value的指針，在整個遍歷期間，其只有一份，所以在如下的一些場景下，可能出現(xiàn)錯誤。

m := map[int]string{
   1: "hello",
   2: "world",
   3: "hello",
   4: "go",
}

wg := sync.WaitGroup{}
for k, v := range m {
   wg.Add(1)
   go func() {
      defer wg.Done()
      fmt.Println(k, v)
   }()
}
wg.Wait()

最后的打印如下，并不符合最初的設(shè)計。這是因為閉包持有的是捕獲變量的引用，而不是復(fù)制，而map的遍歷是始終只有一對指針在指向遍歷元素（其實所有的類型遍歷都是），導(dǎo)致最后打印的內(nèi)容并不是想要的。

4 go
4 go
4 go
4 go

1.5.2 map的遍歷是無序的

前面說過，map的遍歷圍繞著hiter這個結(jié)構(gòu)體展開，在結(jié)構(gòu)體中，startBucket字段表示開始遍歷的桶，offset表示在這個桐中的偏移量，在hiter的初始化函數(shù)runtime.mapiterinit中有如下代碼，可以看到，起始位置是隨機的。

// decide where to start
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
   r += uintptr(fastrand()) << 31
}
it.startBucket = r & bucketMask(h.B)
it.offset = uint8(r >> h.B & (bucketCnt - 1))

// iterator state
it.bucket = it.startBucket

這是因為，一旦map發(fā)生擴容，那么位置可能會變，而且如上所示，Go SDK加入了隨機值使得每次的遍歷都是隨機位置起始，也是為了不給程序員帶來困擾。

1.6 刪除操作

和讀寫操作一樣，map的刪除操作一般而言會調(diào)用runtime.mapdelete函數(shù)，同時也有幾個特殊類型的優(yōu)化操作，如下。和寫操作一樣，如果刪除過程中發(fā)現(xiàn)正在擴容中，那么則會進行一次數(shù)據(jù)遷移操作。

key類型	刪除函數(shù)	備注
uint32	runtime.mapdelete_fast32
uint64	runtime.mapdelete_fast64	int類型時也會用這個函數(shù)
string	runtime.mapdelete_faststr

刪除操作的整體和之前的讀操作比較類似，都是先找到位置，然后刪除，刪除之后，將tophash[i]的標(biāo)志位置為1；但是其中有個操作是，當(dāng)這個桶沒有后繼的溢出桶，且以1結(jié)束，則將這些1都置為0。這是因為，前面的讀寫操作都有如果查找到該位置標(biāo)志為0時則直接不再查找或者直接插入，這是因為，在map的實現(xiàn)設(shè)計中，如果一個桶的槽標(biāo)志為0，說明這個位置及之后的槽都沒有被占用，且肯定沒有后繼的溢出桶；所以刪除的時候這么設(shè)計，可以提高map的讀寫效率。

    // If the bucket now ends in a bunch of emptyOne states,
   // change those to emptyRest states.
   // It would be nice to make this a separate function, but
   // for loops are not currently inlineable.
   if i == bucketCnt-1 {
      if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
         goto notLast
      }
   } else {
      if b.tophash[i+1] != emptyRest {
         goto notLast
      }
   }
   for {
      b.tophash[i] = emptyRest
      if i == 0 {
         if b == bOrig {
            break // beginning of initial bucket, we're done.
         }
         // Find previous bucket, continue at its last entry.
         c := b
         for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
         }
         i = bucketCnt - 1
      } else {
         i--
      }
      if b.tophash[i] != emptyOne {
         break
      }
   }
notLast:
   h.count--

值得注意的是，在刪除操作中，我們并不會真正將這個桶對應(yīng)的內(nèi)存真正的釋放，而只是將tophash[i]標(biāo)記成了emptyOne。

1.7 擴容

在map中，只有在寫操作時，觸發(fā)以下兩種情況才會觸發(fā)擴容，擴容會帶來數(shù)據(jù)的遷移，而在寫操作和刪除操作時，都會判斷是否在數(shù)據(jù)遷移中，如果是，那都將進行一次數(shù)據(jù)遷移工作。

overLoadFactor(h.count+1, h.B)，判斷新增一個數(shù)據(jù)（h.count+1）導(dǎo)致裝載因子是否超過6.5；
tooManyOverflowBuckets(h.noverflow, h.B)，當(dāng)使用的溢出桶過多時，會進行一次擴容；不過此次擴容并不新增桶的個數(shù)，而是等量擴容sameSizeGrow，sameSizeGrow是一種特殊情況下發(fā)生的擴容，當(dāng)我們持續(xù)向哈希中插入數(shù)據(jù)并將它們?nèi)縿h除時，如果哈希表中的數(shù)據(jù)量沒有超過閾值，就會不斷積累溢出桶造成緩慢的內(nèi)存泄漏。

在判斷需要進行擴容操作之后，會調(diào)用runtime.hashGrow函數(shù)，這是開始擴容的入口，在這個函數(shù)中，其實相當(dāng)于做一些擴容前的準(zhǔn)備工作，首先會判斷是不是裝載因子過大，如果是的話，則bigger為1，如果不是則為0，即對應(yīng)了上面的分類，如果是裝載因子過大，則發(fā)生真實的擴容，即整個桶的大小翻倍（2B+1 = 2*2B）；如果不是的話，那桶的大小維持不變。接下來會通過runtime.makeBucketArray創(chuàng)建一組新桶和預(yù)創(chuàng)建的溢出桶，隨后將原有的桶數(shù)組設(shè)置到 oldbuckets 上并將新的空桶設(shè)置到 buckets 上h.buckets則指向新申請的桶。

func hashGrow(t *maptype, h *hmap) {
   // If we've hit the load factor, get bigger.
   // Otherwise, there are too many overflow buckets,
   // so keep the same number of buckets and "grow" laterally.
   bigger := uint8(1)
   if !overLoadFactor(h.count+1, h.B) {
      bigger = 0
      h.flags |= sameSizeGrow
   }
   oldbuckets := h.buckets
   newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

   flags := h.flags &^ (iterator | oldIterator)
   if h.flags&iterator != 0 {
      flags |= oldIterator
   }
   // commit the grow (atomic wrt gc)
   h.B += bigger
   h.flags = flags
   h.oldbuckets = oldbuckets
   h.buckets = newbuckets
   h.nevacuate = 0
   h.noverflow = 0

   if h.extra != nil && h.extra.overflow != nil {
      // Promote current overflow buckets to the old generation.
      if h.extra.oldoverflow != nil {
         throw("oldoverflow is not nil")
      }
      h.extra.oldoverflow = h.extra.overflow
      h.extra.overflow = nil
   }
   if nextOverflow != nil {
      if h.extra == nil {
         h.extra = new(mapextra)
      }
      h.extra.nextOverflow = nextOverflow
   }

   // the actual copying of the hash table data is done incrementally
   // by growWork() and evacuate().
}

擴容真正的操作實際是在以下runtime.growWork中完成的，這里有一點需要注意，hmap有個參數(shù)是nevacuate，作為已經(jīng)擴容的bucket的計數(shù)，所有低于這個數(shù)的桶序號（即hash后的桶序號，注意，是舊桶的序號）都已經(jīng)被擴容，當(dāng)nevacuate等于舊桶數(shù)時，說明擴容結(jié)束了。

func growWork(t *maptype, h *hmap, bucket uintptr) {
   // make sure we evacuate the oldbucket corresponding
   // to the bucket we're about to use
   evacuate(t, h, bucket&h.oldbucketmask())

   // evacuate one more oldbucket to make progress on growing
   if h.growing() {
      evacuate(t, h, h.nevacuate)
   }
}

那是怎么保證這點的呢，在接下來看到的runtime.evacuate中，當(dāng)遷移結(jié)束，nevacuate等于桶序號時才會調(diào)用advanceEvacuationMark函數(shù)將計數(shù)+1，所以在runtime.growWork函數(shù)中做了兩次桶遷移，即第一次保證此次操作（寫操作或者刪除操作）的桶數(shù)據(jù)會遷移，如果這個桶序號和nevacuate不相等，則利用第二次的evacuate(t, h, h.nevacuate)保證這個計數(shù)會加一。這個過程中也不用擔(dān)心桶會被重復(fù)遷移，因為if !evacuated(b)判斷條件會判斷桶是否做過遷移了，只有沒有做過遷移的桶才會進行操作，這里判斷的標(biāo)志位還是占用的tophash[0]，有興趣可以看看代碼。

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
   b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
   newbit := h.noldbuckets()
   if !evacuated(b) {
      ...
   }

   if oldbucket == h.nevacuate {
      advanceEvacuationMark(h, t, newbit)
   }
}

接下來可以看看以上省略號中，即真正的遷移發(fā)生了什么，runtime.evacuate會將一個舊桶中的數(shù)據(jù)分流到兩個新桶，會創(chuàng)建兩個用于保存分配上下文的runtime.evacDst結(jié)構(gòu)體，這兩個結(jié)構(gòu)體分別指向了一個新桶，如果是等量擴容，那么第二個runtime.evacDst結(jié)構(gòu)體不會被創(chuàng)建。

// TODO: reuse overflow buckets instead of using new ones, if there
// is no iterator using the old buckets.  (If !oldIterator.)

// xy contains the x and y (low and high) evacuation destinations.
var xy [2]evacDst
x := &xy[0]
x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
x.k = add(unsafe.Pointer(x.b), dataOffset)
x.e = add(x.k, bucketCnt*uintptr(t.keysize))

if !h.sameSizeGrow() {
   // Only calculate y pointers if we're growing bigger.
   // Otherwise GC can see bad pointers.
   y := &xy[1]
   y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
   y.k = add(unsafe.Pointer(y.b), dataOffset)
   y.e = add(y.k, bucketCnt*uintptr(t.keysize))
}

接下來就是循環(huán)這個bucket以及其后的溢出桶，有些邏輯都是一些常規(guī)邏輯，就不一一分析了，對于等量擴容，因為只有一個runtime.evacDst對象，所以會直接通過指針復(fù)制或者typedmemmove的值復(fù)制來復(fù)制鍵值對到新的桶。

for ; b != nil; b = b.overflow(t) {
   k := add(unsafe.Pointer(b), dataOffset)
   e := add(k, bucketCnt*uintptr(t.keysize))
   for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
      top := b.tophash[i]
      if isEmpty(top) {
         b.tophash[i] = evacuatedEmpty
         continue
      }
      if top < minTopHash {
         throw("bad map state")
      }
      k2 := k
      if t.indirectkey() {
         k2 = *((*unsafe.Pointer)(k2))
      }
      var useY uint8
      if !h.sameSizeGrow() {
         // Compute hash to make our evacuation decision (whether we need
         // to send this key/elem to bucket x or bucket y).
         hash := t.hasher(k2, uintptr(h.hash0))
         if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
            // If key != key (NaNs), then the hash could be (and probably
            // will be) entirely different from the old hash. Moreover,
            // it isn't reproducible. Reproducibility is required in the
            // presence of iterators, as our evacuation decision must
            // match whatever decision the iterator made.
            // Fortunately, we have the freedom to send these keys either
            // way. Also, tophash is meaningless for these kinds of keys.
            // We let the low bit of tophash drive the evacuation decision.
            // We recompute a new random tophash for the next level so
            // these keys will get evenly distributed across all buckets
            // after multiple grows.
            useY = top & 1
            top = tophash(hash)
         } else {
            if hash&newbit != 0 {
               useY = 1
            }
         }
      }

      if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
         throw("bad evacuatedN")
      }

      b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
      dst := &xy[useY]                 // evacuation destination

      if dst.i == bucketCnt {
         dst.b = h.newoverflow(t, dst.b)
         dst.i = 0
         dst.k = add(unsafe.Pointer(dst.b), dataOffset)
         dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
      }
      dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
      if t.indirectkey() {
         *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
      } else {
         typedmemmove(t.key, dst.k, k) // copy elem
      }
      if t.indirectelem() {
         *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
      } else {
         typedmemmove(t.elem, dst.e, e)
      }
      dst.i++
      // These updates might push these pointers past the end of the
      // key or elem arrays.  That's ok, as we have the overflow pointer
      // at the end of the bucket to protect against pointing past the
      // end of the bucket.
      dst.k = add(dst.k, uintptr(t.keysize))
      dst.e = add(dst.e, uintptr(t.elemsize))
   }
}

如果是增量擴容，假設(shè)原來的B是2，那么就是四個桶，其mask就是0b11，hash & 0b11會有四種結(jié)果，最后分配到四個桶中，假設(shè)發(fā)生了增量擴容，此時用舊的桶數(shù)newbits（4）和hash相與，即hash & 0b100，即相當(dāng)于通過新的mask(0b111)的最高位來決定這個數(shù)據(jù)是分配到X桶還是Y桶，實現(xiàn)了分流（上述代碼中的hash&newbit）。當(dāng)然，if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2)中對特殊情況做了處理，這里就不詳述了。

值得注意的是以下代碼，前面說過，只有當(dāng)舊桶編號（hash和舊mask相與）與nevacuate相等時，才會調(diào)用advanceEvacuationMark(h, t, newbit)進行計數(shù)+1，所以在runtime.growWork中會調(diào)用兩次evacuate函數(shù)，保證小于等于nevacuate的桶都被遷移了。

if oldbucket == h.nevacuate {
   advanceEvacuationMark(h, t, newbit)
}

另外，在讀表的時候，當(dāng)判斷舊桶還沒有被遷移的時候，會從舊桶中取出數(shù)據(jù)。

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
   ...
   hash := t.hasher(key, uintptr(h.hash0))
   m := bucketMask(h.B)
   b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
   if c := h.oldbuckets; c != nil {
      if !h.sameSizeGrow() {
         // There used to be half as many buckets; mask down one more power of two.
         m >>= 1
      }
      oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
      if !evacuated(oldb) {
         b = oldb
      }
   }
   ...
}

從上面可以看出，map表數(shù)據(jù)的遷移是漸進式的，是在調(diào)用寫、刪除操作時增量進行的，不會造成瞬間性能的巨大抖動。其實這個和redis的rehash技術(shù)是類似的原理。

2. Map使用的一些注意事項

通過以上內(nèi)容，我們知道了map構(gòu)建的基本原理，所以我們在實際工作中，使用字典表時，需要有一些注意事項。

2.1 大數(shù)據(jù)量map不使用指針作為key-value

通過上面學(xué)習(xí)，我們知道，當(dāng)map的kv類型都不為指針時，那么GC就不會掃描整個表，具體實現(xiàn)是在GC過程中，檢查runtime._type.gcdata字段，這是個指針的bitmap，當(dāng)其為全零時，說明整個對象中無需掃描的下一級指針，從而節(jié)省時間，具體可參考深度剖析 Golang 的 GC 掃描對象實現(xiàn)。

// Needs to be in sync with ../cmd/link/internal/ld/decodesym.go:/^func.commonsize,
// ../cmd/compile/internal/reflectdata/reflect.go:/^func.dcommontype and
// ../reflect/type.go:/^type.rtype.
// ../internal/reflectlite/type.go:/^type.rtype.
type _type struct {
   size       uintptr
   ptrdata    uintptr // size of memory prefix holding all pointers
   hash       uint32
   tflag      tflag
   align      uint8
   fieldAlign uint8
   kind       uint8
   // function for comparing objects of this type
   // (ptr to object A, ptr to object B) -> ==?
   equal func(unsafe.Pointer, unsafe.Pointer) bool
   // gcdata stores the GC type data for the garbage collector.
   // If the KindGCProg bit is set in kind, gcdata is a GC program.
   // Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
   gcdata    *byte
   str       nameOff
   ptrToThis typeOff
}

為驗證以上觀點，我們寫出如下的測試函數(shù)，測試在從10到100萬數(shù)據(jù)量的情形下，以整型和整型指針作為value類型的映射表在GC時的耗時差異。

func TestGCTimeWithoutPointer(t *testing.T) {
   for _, N := range Ns {
      runtime.GC()
      m1 := make(map[int]int)
      for i := 0; i < N; i++ {
         m1[i] = rand.Int()
      }

      start := time.Now()
      runtime.GC()
      delta := time.Since(start)
      t.Logf("GC without pointer spend %+v, when N = %d", delta, N)

      runtime.KeepAlive(m1)
   }
}

func TestGCTimeWithPointer(t *testing.T) {
   for _, N := range Ns {
      runtime.GC()
      m2 := make(map[int]*int)
      for i := 0; i < N; i++ {
         val := rand.Int()
         m2[i] = &val
      }

      start := time.Now()
      runtime.GC()
      delta := time.Since(start)
      t.Logf("GC with pointer spend %+v, when N = %d", delta, N)

      runtime.KeepAlive(m2)
   }
}

測試結(jié)果如下，可以發(fā)現(xiàn)，在沒有指針的情形下，不管表的大小是什么數(shù)量級，其GC時間幾乎無差異；而在有指針的情形下，其GC時間在100萬數(shù)量級的時候已經(jīng)達到了15ms，這將大大影響程序的性能。

=== RUN   TestGCTimeWithoutPointer
    map_test.go:63: GC without pointer spend 252.208μs, when N = 10
    map_test.go:63: GC without pointer spend 297.292μs, when N = 100
    map_test.go:63: GC without pointer spend 438.208μs, when N = 1000
    map_test.go:63: GC without pointer spend 377μs, when N = 10000
    map_test.go:63: GC without pointer spend 205.666μs, when N = 100000
    map_test.go:63: GC without pointer spend 380.584μs, when N = 1000000
--- PASS: TestGCTimeWithoutPointer (0.13s)
=== RUN   TestGCTimeWithPointer
    map_test.go:81: GC with pointer spend 570.041μs, when N = 10
    map_test.go:81: GC with pointer spend 325.708μs, when N = 100
    map_test.go:81: GC with pointer spend 287.542μs, when N = 1000
    map_test.go:81: GC with pointer spend 476.709μs, when N = 10000
    map_test.go:81: GC with pointer spend 1.714833ms, when N = 100000
    map_test.go:81: GC with pointer spend 15.756958ms, when N = 1000000
--- PASS: TestGCTimeWithPointer (0.18s)

值得注意的是，在正常桶后面跟著的溢出桶的地址會存放在hmap.extra.overflow中，避免被GC誤傷。

這一點也同樣適用于其他容器類型，比如切片、數(shù)組和通道。

2.1.1 引申1——使用字節(jié)數(shù)組代替字符串作為key

每個字符串的底層包括一個指針，用來指向其底層數(shù)組，如果一個映射值的key類型是字符串類型，且其有一個最大長度、且最大長度較小，可設(shè)置為N，則我們可以使用[N]byte來代替字符串作為鍵值，可以避免垃圾回收時掃描整個表。當(dāng)然，這是在數(shù)據(jù)量比較大的情形下考慮的優(yōu)化。

2.2 清空表操作

前面說過，map表有刪除操作，但是刪除后的表所占的內(nèi)存空間并不會釋放，除非保證后續(xù)會有很多新的條目放入到表中，否則我們使用以下方法清空映射表。

m = nil           // 后續(xù)不再使用
m = make(map[K]V) // 后續(xù)繼續(xù)使用

2.3 確定大小時盡量傳入hint

前面說過，傳入的hint可以讓Go SDK預(yù)測這個映射表中最大的條目數(shù)量，所以我們?nèi)绻阎淼拇笮?，盡量在創(chuàng)建表的時候傳入。

知識補充

HashMap拉鏈法簡介

1.拉鏈法用途

解決hash沖突（即put操作時計算key值問題）。

2.拉鏈法原理

把具有相同散列地址的關(guān)鍵字(同義詞)值放在同一個單鏈表中，稱為同義詞鏈表。

有m個散列地址就有m個鏈表，同時用指針數(shù)組A[0,1,2…m-1]存放各個鏈表的頭指針，凡是散列地址為i的記錄都以結(jié)點方式插入到以A[i]為指針的單鏈表中。A中各分量的初值為空指針。

3.拉鏈法原理解釋

HashMap是一個數(shù)組，數(shù)組中的每個元素是鏈表。put元素進去的時候，會通過計算key的hash值來獲取到一個index，根據(jù)index找到數(shù)組中的位置，進行元素插入。當(dāng)新來的元素映射到?jīng)_突的數(shù)組位置時，就會插入到鏈表的頭部。

HashMap采用拉鏈法將HashMap的key是轉(zhuǎn)化成了hashcode，但hashcode可能重復(fù)，所以采用求交集方式解決沖突。