快捷導(dǎo)航

python分布式計(jì)算dispy的使用詳解

更新時(shí)間：2019年12月22日 15:17:16 作者：振裕

今天小編就為大家分享一篇python分布式計(jì)算dispy的使用詳解，具有很好的參考價(jià)值，希望對(duì)大家有所幫助。一起跟隨小編過來看看吧

dispy，是用asyncoro實(shí)現(xiàn)的分布式并行計(jì)算框架。

框架也是非常精簡，只有4個(gè)組件，在其源碼文件夾下可以找到：

dispy.py (client) provides two ways of creating “clusters”: JobCluster when only one instance of dispy may run and SharedJobCluster when multiple instances may run (in separate processes). If JobCluster is used, the scheduler contained within dispy.py will distribute jobs on the server nodes; if SharedJobCluster is used, a separate scheduler (dispyscheduler) must be running.

dispynode.py executes jobs on behalf of dispy. dispynode must be running on each of the (server) nodes that form the cluster.

dispyscheduler.py is needed only when SharedJobCluster is used; this provides a scheduler that can be shared by multiple dispy users.

dispynetrelay.py is needed when nodes are located across different networks; this relays information about nodes on a network to the scheduler. If all the nodes are on same network, there is no need for dispynetrelay - the scheduler and nodes automatically discover each other.

一般情況下，使用dispy和dispynode就已經(jīng)足夠解決問題了。

簡單使用：

服務(wù)器端：

在服務(wù)器端啟動(dòng)dispy，監(jiān)聽并接收所有發(fā)來的計(jì)算任務(wù)，完成計(jì)算后將結(jié)果返回給客戶端。

打開python_home/Scripts文件夾，在安裝dispy后會(huì)有上面說到的4個(gè)dispy組件，以py文件形式存在。當(dāng)然你也可以在dispy的源碼文件夾里面找到對(duì)于的dispynode.py文件，然后執(zhí)行

python dispynode.py -c 2 -i 192.168.138.128 -p 51348 -s secret --clean

python dispynode.py -c 2 -i 192.168.8.143 -p 51348 -s secret --clean

這里192.168.138.128和192.168.8.143是執(zhí)行計(jì)算節(jié)點(diǎn)的ip（對(duì)服務(wù)器來說相當(dāng)于localhost），這里我啟用了兩個(gè)節(jié)點(diǎn)，每個(gè)節(jié)點(diǎn)使用2個(gè)cpu資源，其中有一個(gè)節(jié)點(diǎn)是在虛擬機(jī)，一個(gè)是本地機(jī)器。

-s secret是通信密碼，客戶端和服務(wù)器連接需要密碼，密碼隨意。

--clean表示每次啟動(dòng)服務(wù)都刪除上次的啟動(dòng)信息，如果不刪除，可能會(huì)出現(xiàn)pid占用的錯(cuò)誤。

客戶端：

在客戶端需要注意的是，發(fā)送到計(jì)算節(jié)點(diǎn)函數(shù)所引用的模塊，不能在py文件的頂層導(dǎo)入，而需要在函數(shù)內(nèi)導(dǎo)入。

對(duì)于需要導(dǎo)入自定義模塊，比較麻煩一點(diǎn)，需要先實(shí)例化函數(shù)，才能在計(jì)算節(jié)點(diǎn)的函數(shù)中使用。

# 這些在頂層導(dǎo)入的模塊只能是這個(gè)py文件用
import time
import socket
import numpy
import datetime

# 這個(gè)是自定義函數(shù)，要在本模塊中先實(shí)例化才能在計(jì)算節(jié)點(diǎn)函數(shù)中調(diào)用使用，
# 而本模塊的其他地方可以直接調(diào)用使用
from my_package.my_model import get_time 

# 實(shí)例化自定義的函數(shù)，注意后面是沒有括號(hào)的，否則就是直接調(diào)用得到返回值了
now = get_time.now

# 計(jì)算函數(shù)，dispy將這個(gè)函數(shù)和參數(shù)一并發(fā)送到服務(wù)器節(jié)點(diǎn)
# 如果函數(shù)有多個(gè)參數(shù)，需要包裝程tuple格式
def compute(args):
 n,array=args # 如果函數(shù)有多個(gè)參數(shù)，需要包裝程tuple格式
 # 看到?jīng)]，計(jì)算需要的模塊是在函數(shù)內(nèi)導(dǎo)入的
 import time, socket
 time.sleep(3)
 host = socket.gethostname()
 # 這個(gè)py文件中自定義函數(shù)，可以直接引用
 total= my_sum(array)
 # 這個(gè)now是在其他模塊中自定義的函數(shù)，需要在頂層先實(shí)例化才能引用
 now_time=now()
 return (host, n, total,now_time)

def sum(array):
 # 自定義函數(shù)，需要的模塊同樣需要在函數(shù)內(nèi)導(dǎo)入
 import numpy as np
 return np.sum(array)

def loadData():
 # 自定義函數(shù)，生成測(cè)試數(shù)據(jù)
 import numpy as np
 data = np.random.rand(20,20)
 data = [line for line in data]
 return data



if __name__ == '__main__':
 import dispy, random
 # 定義兩個(gè)計(jì)算節(jié)點(diǎn)
 nodes = ['192.168.8.143', '192.168.138.128']
 # 啟動(dòng)計(jì)算集群，和服務(wù)器通信，通信密鑰是'secret'
 # depends 為依賴函數(shù)
 cluster = dispy.JobCluster(compute,nodes=nodes,
      secret='secret',depends=[sum，now])
 jobs = []

 datas = loadData()
 for n in range(len(datas)):
  # 提交任務(wù)
  job = cluster.submit((n,datas[n]))
  job.id = n
  jobs.append(job)
 # print(datetime.datetime.now())
 # cluster.wait() # 等待所有任務(wù)完成后才接著往下執(zhí)行
 # print(datetime.datetime.now())
 for job in jobs:
  host, n, total,t = job()
  print('%s executed job %s at %s with %s total=%.2f t=%s' 
    % (host, job.id, job.start_time, n,total,t))
  # other fields of 'job' that may be useful:
  # print job.stdout, job.stderr, job.exception, 
  # job.ip_addr, job.start_time, job.end_time
 # 顯示集群計(jì)算狀態(tài)
 cluster.stats()

以上這篇python分布式計(jì)算dispy的使用詳解就是小編分享給大家的全部內(nèi)容了，希望能給大家一個(gè)參考，也希望大家多多支持腳本之家。

您可能感興趣的文章: