快捷導(dǎo)航

Python實現(xiàn)一個帶權(quán)無回置隨機(jī)抽選函數(shù)的方法

更新時間：2019年07月24日 11:26:13 作者：EVE

這篇文章主要介紹了Python實現(xiàn)一個帶權(quán)無回置隨機(jī)抽選函數(shù)的方法，文中通過示例代碼介紹的非常詳細(xì)，對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧

需求

有一個抽獎應(yīng)用，從所有參與的用戶抽出K位中獎用戶(K=獎品數(shù)量)，且要根據(jù)每位用戶擁有的抽獎碼數(shù)量作為權(quán)重。

如假設(shè)有三個用戶及他們的權(quán)重是: A(1), B(1), C(2)。希望抽到A的概率為25%，抽到B的概率為25%, 抽到C的概率為50%。

分析

比較直觀的做法是把兩個C放到列表中抽選，如[A, B, C, C]，使用Python內(nèi)置的函數(shù)random.choice[A, B, C, C], 這樣C抽到的概率即為50%。

這個辦法的問題是權(quán)重比較大的時候，浪費內(nèi)存空間。

更一般的方法是，將所有權(quán)重加和4，然后從[0, 4)區(qū)間里隨機(jī)挑選一個值，將A, B, C占用不同大小的區(qū)間。[0,1)是A, [1,2)是B, [2,4)是C。

使用Python的函數(shù)random.ranint(0, 3)或者int(random.random()*4)均可產(chǎn)生0-3的隨機(jī)整數(shù)R。判斷R在哪個區(qū)間即選擇哪個用戶。

接下來是尋找隨機(jī)數(shù)在哪個區(qū)間的方法，

一種方法是按順序遍歷列表并保存已遍歷的元素權(quán)重綜合S，一旦S大于R，就返回當(dāng)前元素。

from operator import itemgetter

users = [('A', 1), ('B', 1), ('C', 2)]

total = sum(map(itemgetter(1), users))

rnd = int(random.random()*total) # 0~3

s = 0
for u, w in users:
  s += w
  if s > rnd:
   return u

不過這種方法的復(fù)雜度是O(N)，因為要遍歷所有的users。

可以想到另外一種方法，先按順序把累積加的權(quán)重排成列表，然后對它使用二分法搜索，二分法復(fù)雜度降到O(logN)(除去其他的處理)

users = [('A', 1), ('B', 1), ('C', 2)]

cum_weights = list(itertools.accumulate(map(itemgetter(1), users))) # [1, 2, 4]

total = cum_weights[-1]

rnd = int(random.random()*total) # 0~3

hi = len(cum_weights) - 1
index = bisect.bisect(cum_weights, rnd, 0, hi)

return users(index)[0]

Python內(nèi)置庫random的choices函數(shù)(3.6版本后有)即是如此實現(xiàn)，random.choices函數(shù)簽名為 random.choices(population, weights=None, *, cum_weights=None, k=1) population是待選列表， weights是各自的權(quán)重，cum_weights是可選的計算好的累加權(quán)重（兩者選一），k是抽選數(shù)量（有回置抽選）。源碼如下:

def choices(self, population, weights=None, *, cum_weights=None, k=1):
  """Return a k sized list of population elements chosen with replacement.
  If the relative weights or cumulative weights are not specified,
  the selections are made with equal probability.
  """
  random = self.random
  if cum_weights is None:
    if weights is None:
      _int = int
      total = len(population)
      return [population[_int(random() * total)] for i in range(k)]
    cum_weights = list(_itertools.accumulate(weights))
  elif weights is not None:
    raise TypeError('Cannot specify both weights and cumulative weights')
  if len(cum_weights) != len(population):
    raise ValueError('The number of weights does not match the population')
  bisect = _bisect.bisect
  total = cum_weights[-1]
  hi = len(cum_weights) - 1
  return [population[bisect(cum_weights, random() * total, 0, hi)]
      for i in range(k)]

更進(jìn)一步

因為Python內(nèi)置的random.choices是有回置抽選，無回置抽選函數(shù)是random.sample，但該函數(shù)不能根據(jù)權(quán)重抽選（random.sample(population, k)）。

原生的random.sample可以抽選個多個元素但不影響原有的列表，其使用了兩種算法實現(xiàn), 保證了各種情況均有良好的性能。 (源碼地址：random.sample)

第一種是部分shuffle，得到K個元素就返回。時間復(fù)雜度是O(N)，不過需要復(fù)制原有的序列，增加內(nèi)存使用。

result = [None] * k
n = len(population)
pool = list(population) # 不改變原有的序列
for i in range(k):
  j = int(random.random()*(n-i))
  result[k] = pool[j]
  pool[j] = pool[n-i-1] # 已選中的元素移走，后面未選中元素填上
return result

而第二種是設(shè)置一個已選擇的set，多次隨機(jī)抽選，如果抽中的元素在set內(nèi)，就重新再抽，無需復(fù)制新的序列。當(dāng)k相對n較小時，random.sample使用該算法，重復(fù)選擇元素的概率較小。

selected = set()
selected_add = selected.add # 加速方法訪問
for i in range(k):
  j = int(random.random()*n)
  while j in selected:
    j = int(random.random()*n)
  selected_add(j)
  result[j] = population[j]
return result

抽獎應(yīng)用需要的是帶權(quán)無回置抽選算法，結(jié)合random.choices和random.sample的實現(xiàn)寫一個函數(shù)weighted_sample。

一般抽獎的人數(shù)都比獎品數(shù)量大得多，可選用random.sample的第二種方法作為無回置抽選，當(dāng)然可以繼續(xù)優(yōu)化。

代碼如下：

def weighted_sample(population, weights, k=1):
  """Like random.sample, but add weights.
  """
  n = len(population)
  if n == 0:
    return []
  if not 0 <= k <= n:
    raise ValueError("Sample larger than population or is negative")
  if len(weights) != n:
    raise ValueError('The number of weights does not match the population')

  cum_weights = list(itertools.accumulate(weights))
  total = cum_weights[-1]
  if total <= 0: # 預(yù)防一些錯誤的權(quán)重
    return random.sample(population, k=k)
  hi = len(cum_weights) - 1

  selected = set()
  _bisect = bisect.bisect
  _random = random.random
  selected_add = selected.add
  result = [None] * k
  for i in range(k):
    j = _bisect(cum_weights, _random()*total, 0, hi)
    while j in selected:
      j = _bisect(cum_weights, _random()*total, 0, hi)
    selected_add(j)
    result[i] = population[j]
  return result

以上就是本文的全部內(nèi)容，希望對大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章:

相關(guān)文章

python中opencv支持向量機(jī)的實現(xiàn)
本文主要介紹了python中opencv支持向量機(jī)的實現(xiàn)，文中通過示例代碼介紹的非常詳細(xì)，具有一定的參考價值，感興趣的小伙伴們可以參考一下
2022-03-03
python Pygame的具體使用講解
本篇文章主要介紹了python Pygame的具體使用講解，小編覺得挺不錯的，現(xiàn)在分享給大家，也給大家做個參考。一起跟隨小編過來看看吧
2017-11-11
python刪除特定文件的方法
這篇文章主要介紹了python刪除特定文件的方法,涉及Python文件查找及刪除的相關(guān)技巧,需要的朋友可以參考下
2015-07-07
Python工程師面試題與Python基礎(chǔ)語法相關(guān)
這篇文章主要為大家分享了Python工程師面試題，面試題的內(nèi)容主要與Python基礎(chǔ)語法相關(guān)，感興趣的小伙伴們可以參考一下
2016-01-01
使用python編寫udp協(xié)議的ping程序方法
下面小編就為大家分享一篇使用python編寫udp協(xié)議的ping程序方法，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2018-04-04
這篇文章主要介紹了Python多重繼承的方法解析執(zhí)行順序,結(jié)合實例形式分析了Python多重繼承時存在方法命名沖突情況的解析執(zhí)行順序與相關(guān)原理,需要的朋友可以參考下
2018-05-05