處理python中多線程與多進(jìn)程中的數(shù)據(jù)共享問題
之前在寫多線程與多進(jìn)程的時(shí)候,因?yàn)橐话闱闆r下都是各自完成各自的任務(wù),各個(gè)子線程或者各個(gè)子進(jìn)程之前并沒有太多的聯(lián)系,如果需要通信的話我會(huì)使用隊(duì)列或者數(shù)據(jù)庫來完成,但是最近我在寫一些多線程與多進(jìn)程的代碼時(shí),發(fā)現(xiàn)如果它們需要用到共享變量的話,需要有一些注意的地方
多線程之間的共享數(shù)據(jù)
標(biāo)準(zhǔn)數(shù)據(jù)類型在線程間共享
看以下代碼
#coding:utf-8 import threading def test(name,data): print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) if __name__ == '__main__': d = 5 name = "楊彥星" for i in range(5): th = threading.Thread(target=test,args=(name,d)) th.start()
這里我創(chuàng)建一個(gè)全局的int變量d,它的值是5,當(dāng)我在5個(gè)線程中調(diào)用test函數(shù)時(shí),將d作為參數(shù)傳進(jìn)去,那么這5個(gè)線程所擁有的是同一個(gè)d嗎?我在test函數(shù)中通過 id(data) 來打印一下它們的ID,得到了如下的結(jié)果
in thread <Thread(Thread-1, started 6624)> name is 楊彥星 data is 5 id(data) is 1763791776 in thread <Thread(Thread-2, started 8108)> name is 楊彥星 data is 5 id(data) is 1763791776 in thread <Thread(Thread-3, started 3356)> name is 楊彥星 data is 5 id(data) is 1763791776 in thread <Thread(Thread-4, started 13728)> name is 楊彥星 data is 5 id(data) is 1763791776 in thread <Thread(Thread-5, started 3712)> name is 楊彥星 data is 5 id(data) is 1763791776
從結(jié)果中可以看到,在5個(gè)子線程中,data的id都是1763791776,說明在主線程中創(chuàng)建了變量d,在子線程中是可以共享的,在子線程中對共享元素的改變是會(huì)影響到其它線程的,所以如果要對共享變量進(jìn)行修改時(shí),也就是線程不安全的,需要加鎖。
自定義類型對象在線程間共享
如果我們要自定義一個(gè)類呢,將一個(gè)對象作為變量在子線程中傳遞呢?會(huì)是什么效果呢?
#coding:utf-8 import threading class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data def test(name,data): print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data.get(),id(data))) if __name__ == '__main__': d = Data(10) name = "楊彥星" print("in main thread id(data) is {}".format(id(d))) for i in range(5): th = threading.Thread(target=test,args=(name,d)) th.start()
這里我定義一個(gè)簡單的類,在主線程初始化了一個(gè)該類型的對象d,然后將它作為參數(shù)傳給子線程,主線程和子線程分別打印了這個(gè)對象的id,我們來看一下結(jié)果
in main thread id(data) is 2849240813864 in thread <Thread(Thread-1, started 11648)> name is 楊彥星 data is 10 id(data) is 2849240813864 in thread <Thread(Thread-2, started 11016)> name is 楊彥星 data is 10 id(data) is 2849240813864 in thread <Thread(Thread-3, started 10416)> name is 楊彥星 data is 10 id(data) is 2849240813864 in thread <Thread(Thread-4, started 8668)> name is 楊彥星 data is 10 id(data) is 2849240813864 in thread <Thread(Thread-5, started 4420)> name is 楊彥星 data is 10 id(data) is 2849240813864
我們看到,在主線程和子線程中,這個(gè)對象的id是一樣的,說明它們用的是同一個(gè)對象。
無論是標(biāo)準(zhǔn)數(shù)據(jù)類型還是復(fù)雜的自定義數(shù)據(jù)類型,它們在多線程之間是共享同一個(gè)的,但是在多進(jìn)程中是這樣的嗎?
多進(jìn)程之間的共享數(shù)據(jù)
標(biāo)準(zhǔn)數(shù)據(jù)類型在進(jìn)程間共享
還是上面的代碼,我們先來看一下int類型的變量的子進(jìn)程間的共享
#coding:utf-8 import threading import multiprocessing def test(name,data): print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) if __name__ == '__main__': d = 10 name = "楊彥星" print("in main thread id(data) is {}".format(id(d))) for i in range(5): pro = multiprocessing.Process(target=test,args=(name,d)) pro.start()
得到的結(jié)果是
in main thread id(data) is 1763791936 in thread <_MainThread(MainThread, started 9364)> name is 楊彥星 data is 10 id(data) is 1763791936 in thread <_MainThread(MainThread, started 9464)> name is 楊彥星 data is 10 id(data) is 1763791936 in thread <_MainThread(MainThread, started 3964)> name is 楊彥星 data is 10 id(data) is 1763791936 in thread <_MainThread(MainThread, started 10480)> name is 楊彥星 data is 10 id(data) is 1763791936 in thread <_MainThread(MainThread, started 13608)> name is 楊彥星 data is 10 id(data) is 1763791936
可以看到它們的id是一樣的,說明用的是同一個(gè)變量,但是當(dāng)我嘗試把d由int變?yōu)榱藄tring時(shí),發(fā)現(xiàn)它們又不一樣了……
if __name__ == '__main__': d = 'yangyanxing' name = "楊彥星" print("in main thread id(data) is {}".format(id(d))) for i in range(5): pro = multiprocessing.Process(target=test,args=(name,d)) pro.start()
此時(shí)得到的結(jié)果是
in main thread id(data) is 2629633397040 in thread <_MainThread(MainThread, started 9848)> name is 楊彥星 data is yangyanxing id(data) is 1390942032880 in thread <_MainThread(MainThread, started 988)> name is 楊彥星 data is yangyanxing id(data) is 2198251377648 in thread <_MainThread(MainThread, started 3728)> name is 楊彥星 data is yangyanxing id(data) is 2708672287728 in thread <_MainThread(MainThread, started 5288)> name is 楊彥星 data is yangyanxing id(data) is 2376058999792 in thread <_MainThread(MainThread, started 12508)> name is 楊彥星 data is yangyanxing id(data) is 2261044040688
于是我又嘗試了list、Tuple、dict,結(jié)果它們都是不一樣的,我又回過頭來試著在多線程中使用列表元組和字典,結(jié)果它們還是一樣的。
這里有一個(gè)有趣的問題,如果是int類型,當(dāng)值小于等于256時(shí),它們在多進(jìn)程間的id是相同的,如果大于256,則它們的id就會(huì)不同了,這個(gè)我沒有查看原因。
自定義類型對象在進(jìn)程間共享
#coding:utf-8 import threading import multiprocessing class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data def test(name,data): print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data.get(),id(data))) if __name__ == '__main__': d = Data(10) name = "楊彥星" print("in main thread id(data) is {}".format(id(d))) for i in range(5): pro = multiprocessing.Process(target=test,args=(name,d)) pro.start()
得到的結(jié)果是
in main thread id(data) is 1927286591728 in thread <_MainThread(MainThread, started 2408)> name is 楊彥星 data is 10 id(data) is 1561177927752 in thread <_MainThread(MainThread, started 5728)> name is 楊彥星 data is 10 id(data) is 2235260514376 in thread <_MainThread(MainThread, started 1476)> name is 楊彥星 data is 10 id(data) is 2350586073040 in thread <_MainThread(MainThread, started 996)> name is 楊彥星 data is 10 id(data) is 2125002248088 in thread <_MainThread(MainThread, started 10740)> name is 楊彥星 data is 10 id(data) is 1512231669656
可以看到它們的id是不同的,也就是不同的對象。
在多進(jìn)程間如何共享數(shù)據(jù)
我們看到,數(shù)據(jù)在多進(jìn)程間是不共享的(小于256的int類型除外),但是我們又想在主進(jìn)程和子進(jìn)程間共享一個(gè)數(shù)據(jù)對象時(shí)該如何操作呢?
在看這個(gè)問題之前,我們先將之前的多線程代碼做下修改
#coding:utf-8 import threading import multiprocessing class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data def test(name,data,lock): lock.acquire() print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) data.set(data.get()+1) lock.release() if __name__ == '__main__': d = Data(0) thlist = [] name = "yang" lock = threading.Lock() for i in range(5): th = threading.Thread(target=test,args=(name,d,lock)) th.start() thlist.append(th) for i in thlist: i.join() print(d.get())
我們這個(gè)代碼的目的是這樣,使用自定義的Data類型對象,當(dāng)經(jīng)過5個(gè)子線程操作以后,每個(gè)子線程對其data值進(jìn)行加1操作,最后在主線程打印對象的data值。
該輸出結(jié)果如下
in thread <Thread(Thread-1, started 3296)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 in thread <Thread(Thread-2, started 9436)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 in thread <Thread(Thread-3, started 760)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 in thread <Thread(Thread-4, started 1952)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272 in thread <Thread(Thread-5, started 5988)> name is yang data is <__main__.Data object at 0x000001A451139198> id(data) is 1805246501272
可以看到在主線程最后打印出來了5,符合我們的預(yù)期,但是如果放到多進(jìn)程中呢?因?yàn)槎噙M(jìn)程下,每個(gè)子進(jìn)程所持有的對象是不同的,所以每個(gè)子進(jìn)程操作的是各自的Data對象,對于主進(jìn)程的Data對象應(yīng)該是沒有影響的,我們來看下它的結(jié)果
#coding:utf-8 import threading import multiprocessing class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data def test(name,data,lock): lock.acquire() print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) data.set(data.get()+1) lock.release() if __name__ == '__main__': d = Data(0) thlist = [] name = "yang" lock = multiprocessing.Lock() for i in range(5): th = multiprocessing.Process(target=test,args=(name,d,lock)) th.start() thlist.append(th) for i in thlist: i.join() print(d.get())
它的輸出結(jié)果是:
in thread <_MainThread(MainThread, started 7604)> name is yang data is <__mp_main__.Data object at 0x000001D110130EB8> id(data) is 1997429477048 in thread <_MainThread(MainThread, started 12108)> name is yang data is <__mp_main__.Data object at 0x000002C4E88E0E80> id(data) is 3044738469504 in thread <_MainThread(MainThread, started 3848)> name is yang data is <__mp_main__.Data object at 0x0000027827270EF0> id(data) is 2715076202224 in thread <_MainThread(MainThread, started 12368)> name is yang data is <__mp_main__.Data object at 0x000002420EA80E80> id(data) is 2482736991872 in thread <_MainThread(MainThread, started 4152)> name is yang data is <__mp_main__.Data object at 0x000001B1577F0E80> id(data) is 1861188783744
最后的輸出是0,說明了子進(jìn)程對于主進(jìn)程傳入的Data對象操作其實(shí)對于主進(jìn)程的對象是不起作用的,我們需要怎樣的操作才能實(shí)現(xiàn)子進(jìn)程可以操作主進(jìn)程的對象呢?我們可以使用 multiprocessing.managers
下的 BaseManager 來實(shí)現(xiàn)
#coding:utf-8 import threading import multiprocessing from multiprocessing.managers import BaseManager class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data BaseManager.register("mydata",Data) def test(name,data,lock): lock.acquire() print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) data.set(data.get()+1) lock.release() def getManager(): m = BaseManager() m.start() return m if __name__ == '__main__': manager = getManager() d = manager.mydata(0) thlist = [] name = "yang" lock = multiprocessing.Lock() for i in range(5): th = multiprocessing.Process(target=test,args=(name,d,lock)) th.start() thlist.append(th) for i in thlist: i.join() print(d.get())
使用 from multiprocessing.managers import BaseManager
引入 BaseManager以后,在定義完Data類型之后,使用 BaseManager.register("mydata",Data)
將Data類型注冊到BaseManager中,并且給了它一個(gè)名字叫 mydata ,之后就可以使用 BaseManager 對象的這個(gè)名字來初始化對象,我們來看一下輸出
C:\Python35\python.exe F:/python/python3Test/multask.py in thread <_MainThread(MainThread, started 12244)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2222932504080 in thread <_MainThread(MainThread, started 2860)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 1897574510096 in thread <_MainThread(MainThread, started 2748)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2053415775760 in thread <_MainThread(MainThread, started 7812)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2766155820560 in thread <_MainThread(MainThread, started 2384)> name is yang data is <__mp_main__.Data object at 0x000001FE1B7D9668> id(data) is 2501159890448
我們看到,雖然在每個(gè)子進(jìn)程中使用的是不同的對象,但是它們的值卻是可以“共享”的。
標(biāo)準(zhǔn)的數(shù)據(jù)類型也可以通過multiprocessing
庫中的Value對象,舉一個(gè)簡單的例子
#coding:utf-8 import threading import multiprocessing from multiprocessing.managers import BaseManager class Data: def __init__(self,data=None): self.data = data def get(self): return self.data def set(self,data): self.data = data BaseManager.register("mydata",Data) def test(name,data,lock): lock.acquire() print("in thread {} name is {}".format(threading.current_thread(),name)) print("data is {} id(data) is {}".format(data,id(data))) data.value +=1 lock.release() if __name__ == '__main__': d = multiprocessing.Value("l",10) # print(d) thlist = [] name = "yang" lock = multiprocessing.Lock() for i in range(5): th = multiprocessing.Process(target=test,args=(name,d,lock)) th.start() thlist.append(th) for i in thlist: i.join() print(d.value)
這里使用 d = multiprocessing.Value("l",10)
初始化了一個(gè)數(shù)字類型的對象,這個(gè)類型是 Synchronized wrapper for c_long , multiprocessing.Value
在初始化時(shí),第一個(gè)參數(shù)是類型,第二個(gè)參數(shù)是值,具體支持的類型如下
還可以使用ctypes庫里和類初始化字符串
>>> from ctypes import c_char_p >>> s = multiprocessing.Value(c_char_p, b'\xd1\xee\xd1\xe5\xd0\xc7') >>> print(s.value.decode('gbk'))
楊彥星
還可以使用Manager對象初始list,dict等
#coding:utf-8 import multiprocessing def func(mydict, mylist): # 子進(jìn)程改變dict,主進(jìn)程跟著改變 mydict["index1"] = "aaaaaa" # 子進(jìn)程改變List,主進(jìn)程跟著改變 mydict["index2"] = "bbbbbb" mylist.append(11) mylist.append(22) mylist.append(33) if __name__ == "__main__": # 主進(jìn)程與子進(jìn)程共享這個(gè)字典 mydict = multiprocessing.Manager().dict() # 主進(jìn)程與子進(jìn)程共享這個(gè)List mylist = multiprocessing.Manager().list(range(5)) p = multiprocessing.Process(target=func, args=(mydict, mylist)) p.start() p.join() print(mylist) print(mydict)
其實(shí)我們這里所說的共享只是數(shù)據(jù)值上的共享,因?yàn)樵诙噙M(jìn)程中,各自持有的對象都不相同,所以如果想要同步狀態(tài)需要曲線救國。不過這種在自己寫的小項(xiàng)目中可以簡單的使用,如果做一些大一點(diǎn)的項(xiàng)目,還是建議不要使用這種共享數(shù)據(jù)的方式,這種大大的增加了程序間的耦合性,使用邏輯變得復(fù)雜難懂,所以建議還是使用隊(duì)列或者數(shù)據(jù)為進(jìn)行間通信的渠道。
總結(jié)
以上所述是小編給大家介紹的處理python中多線程與多進(jìn)程中的數(shù)據(jù)共享問題,希望對大家有所幫助,如果大家有任何疑問請給我留言,小編會(huì)及時(shí)回復(fù)大家的。在此也非常感謝大家對腳本之家網(wǎng)站的支持!
如果你覺得本文對你有幫助,歡迎轉(zhuǎn)載,煩請注明出處,謝謝!
相關(guān)文章
使用Python給頭像加上圣誕帽或圣誕老人小圖標(biāo)附源碼
圣誕的到來給大家?guī)硐矏?,今天圣誕老人給大家送一頂圣誕帽,今天小編通過代碼給大家分享使用Python給頭像加上圣誕帽或圣誕老人小圖標(biāo)附源碼,需要的朋友一起看看吧2019-12-12Python中常用數(shù)據(jù)類型使用示例概括總結(jié)
這篇文章主要為大家介紹了Python中常用數(shù)據(jù)類型使用示例概括總結(jié),有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2023-04-04Python 作圖實(shí)現(xiàn)坐標(biāo)軸截?cái)?打斷)的效果
這篇文章主要介紹了Python 作圖實(shí)現(xiàn)坐標(biāo)軸截?cái)?打斷)的效果,具有很好的參考價(jià)值,希望對大家有所幫助。一起跟隨小編過來看看吧2021-04-04numpy中實(shí)現(xiàn)二維數(shù)組按照某列、某行排序的方法
下面小編就為大家分享一篇numpy中實(shí)現(xiàn)二維數(shù)組按照某列、某行排序的方法,具有很好的參考價(jià)值,希望對大家有所幫助。一起跟隨小編過來看看吧2018-04-04python動(dòng)態(tài)加載技術(shù)解析
這篇文章主要介紹了python動(dòng)態(tài)加載技術(shù)解析,說簡單點(diǎn)就是,如果開發(fā)者發(fā)現(xiàn)自己的代碼有bug,那么他可以在不關(guān)閉原來代碼的基礎(chǔ)之上,動(dòng)態(tài)替換模塊替換方法一般用reload來完成,需要的朋友可以參考下2023-07-07