python多線程與多進程及其區(qū)別詳解

更新時間：2019年08月08日 08:31:51 作者：alpha_panda

這篇文章主要介紹了python多線程與多進程及其區(qū)別詳解,文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下

前言

個人一直覺得對學習任何知識而言，概念是相當重要的。掌握了概念和原理，細節(jié)可以留給實踐去推敲。掌握的關(guān)鍵在于理解，通過具體的實例和實際操作來感性的體會概念和原理可以起到很好的效果。本文通過一些具體的例子簡單介紹一下python的多線程和多進程，后續(xù)會寫一些進程通信和線程通信的一些文章。

python多線程

python中提供兩個標準庫thread和threading用于對線程的支持，python3中已放棄對前者的支持，后者是一種更高層次封裝的線程庫，接下來均以后者為例。

創(chuàng)建線程

python中有兩種方式實現(xiàn)線程：

1.實例化一個threading.Thread的對象，并傳入一個初始化函數(shù)對象（initial function )作為線程執(zhí)行的入口；

2.繼承threading.Thread，并重寫run函數(shù)；

方式1：創(chuàng)建threading.Thread對象

import threading
import time
def tstart(arg):
 time.sleep(0.5)
 print("%s running...." % arg)
if __name__ == '__main__':
 t1 = threading.Thread(target=tstart, args=('This is thread 1',))
 t2 = threading.Thread(target=tstart, args=('This is thread 2',))
 t1.start()
 t2.start()
 print("This is main function")

結(jié)果：

This is main function
This is thread 2 running....
This is thread 1 running....

方式2：繼承threading.Thread，并重寫run

import threading
import time
class CustomThread(threading.Thread):
 def __init__(self, thread_name):
  # step 1: call base __init__ function
  super(CustomThread, self).__init__(name=thread_name)
  self._tname = thread_name
 def run(self):
  # step 2: overide run function
  time.sleep(0.5)
  print("This is %s running...." % self._tname)
if __name__ == "__main__":
 t1 = CustomThread("thread 1")
 t2 = CustomThread("thread 2")
 t1.start()
 t2.start()
 print("This is main function")

執(zhí)行結(jié)果同方式1.

threading.Thread

上面兩種方法本質(zhì)上都是直接或者間接使用threading.Thread類

threading.Thread(group=None, target=None, name=None, args=(), kwargs={})

關(guān)聯(lián)上面兩種創(chuàng)建線程的方式：

import threading
import time
class CustomThread(threading.Thread):
 def __init__(self, thread_name, target = None):
  # step 1: call base __init__ function
  super(CustomThread, self).__init__(name=thread_name, target=target, args = (thread_name,))
  self._tname = thread_name
 def run(self):
  # step 2: overide run function
  # time.sleep(0.5)
  # print("This is %s running....@run" % self._tname)
  super(CustomThread, self).run()
def target(arg):
 time.sleep(0.5)
 print("This is %s running....@target" % arg)
if __name__ == "__main__":
 t1 = CustomThread("thread 1", target)
 t2 = CustomThread("thread 2", target)
 t1.start()
 t2.start()
 print("This is main function")

結(jié)果：

This is main function
This is thread 1 running....@target
This is thread 2 running....@target

上面這段代碼說明：

1.兩種方式創(chuàng)建線程，指定的參數(shù)最終都會傳給threading.Thread類；

2.傳給線程的目標函數(shù)是在基類Thread的run函數(shù)體中被調(diào)用的，如果run沒有被重寫的話。

threading模塊的一些屬性和方法可以參照官網(wǎng)，這里重點介紹一下threading.Thread對象的方法

下面是threading.Thread提供的線程對象方法和屬性：

start()：創(chuàng)建線程后通過start啟動線程，等待CPU調(diào)度，為run函數(shù)執(zhí)行做準備；
run()：線程開始執(zhí)行的入口函數(shù)，函數(shù)體中會調(diào)用用戶編寫的target函數(shù)，或者執(zhí)行被重載的run函數(shù)；
join([timeout])：阻塞掛起調(diào)用該函數(shù)的線程，直到被調(diào)用線程執(zhí)行完成或超時。通常會在主線程中調(diào)用該方法，等待其他線程執(zhí)行完成。
name、getName()&setName()：線程名稱相關(guān)的操作；
ident：整數(shù)類型的線程標識符，線程開始執(zhí)行前（調(diào)用start之前）為None；
isAlive()、is_alive()：start函數(shù)執(zhí)行之后到run函數(shù)執(zhí)行完之前都為True；
daemon、isDaemon()&setDaemon()：守護線程相關(guān)；

這些是我們創(chuàng)建線程之后通過線程對象對線程進行管理和獲取線程信息的方法。

多線程執(zhí)行

在主線程中創(chuàng)建若線程之后，他們之間沒有任何協(xié)作和同步，除主線程之外每個線程都是從run開始被執(zhí)行，直到執(zhí)行完畢。

join

我們可以通過join方法讓主線程阻塞，等待其創(chuàng)建的線程執(zhí)行完成。

import threading
import time
def tstart(arg):
 print("%s running....at: %s" % (arg,time.time()))
 time.sleep(1)
 print("%s is finished! at: %s" % (arg,time.time()))
if __name__ == '__main__':
 t1 = threading.Thread(target=tstart, args=('This is thread 1',))
 t1.start()
 t1.join() # 當前線程阻塞，等待t1線程執(zhí)行完成
 print("This is main function at：%s" % time.time())

結(jié)果：

This is thread 1 running....at: 1564906617.43
This is thread 1 is finished! at: 1564906618.43
This is main function at：1564906618.43

如果不加任何限制，當主線程執(zhí)行完畢之后，當前程序并不會結(jié)束，必須等到所有線程都結(jié)束之后才能結(jié)束當前進程。

將上面程序中的t1.join()去掉，執(zhí)行結(jié)果如下：

This is thread 1 running....at: 1564906769.52
This is main function at：1564906769.52
This is thread 1 is finished! at: 1564906770.52

可以通過將創(chuàng)建的線程指定為守護線程（daemon），這樣主線程執(zhí)行完畢之后會立即結(jié)束未執(zhí)行完的線程，然后結(jié)束程序。

deamon守護線程

import threading
import time
def tstart(arg):
  print("%s running....at: %s" % (arg,time.time()))
  time.sleep(1)
  print("%s is finished! at: %s" % (arg,time.time()))
if __name__ == '__main__':
  t1 = threading.Thread(target=tstart, args=('This is thread 1',))
  t1.setDaemon(True)
  t1.start()
  # t1.join()  # 當前線程阻塞，等待t1線程執(zhí)行完成
  print("This is main function at：%s" % time.time())

結(jié)果：

This is thread 1 running....at: 1564906847.85
This is main function at：1564906847.85

python多進程

相比較于threading模塊用于創(chuàng)建python多線程，python提供multiprocessing用于創(chuàng)建多進程。先看一下創(chuàng)建進程的兩種方式。

The multiprocessing package mostly replicates the API of the threading module.　　—— python doc

創(chuàng)建進程

創(chuàng)建進程的方式和創(chuàng)建線程的方式類似：

1.實例化一個multiprocessing.Process的對象，并傳入一個初始化函數(shù)對象（initial function )作為新建進程執(zhí)行入口；

2.繼承multiprocessing.Process，并重寫run函數(shù)；

方式1：

from multiprocessing import Process 
import os, time
def pstart(name):
  # time.sleep(0.1)
  print("Process name: %s, pid: %s "%(name, os.getpid()))
if __name__ == "__main__": 
  subproc = Process(target=pstart, args=('subprocess',)) 
  subproc.start() 
  subproc.join()
  print("subprocess pid: %s"%subproc.pid)
  print("current process pid: %s" % os.getpid())

結(jié)果：

Process name: subprocess, pid: 4888 
subprocess pid: 4888
current process pid: 9912

方式2：

from multiprocessing import Process 
import os, time
class CustomProcess(Process):
  def __init__(self, p_name, target=None):
    # step 1: call base __init__ function()
    super(CustomProcess, self).__init__(name=p_name, target=target, args=(p_name,))

  def run(self):
    # step 2:
    # time.sleep(0.1)
    print("Custom Process name: %s, pid: %s "%(self.name, os.getpid()))
if __name__ == '__main__':
  p1 = CustomProcess("process_1")
  p1.start()
  p1.join()
  print("subprocess pid: %s"%p1.pid)
  print("current process pid: %s" % os.getpid())

這里可以思考一下，如果像多線程一樣，存在一個全局的變量share_data，不同進程同時訪問share_data會有問題嗎？

由于每一個進程擁有獨立的內(nèi)存地址空間且互相隔離，因此不同進程看到的share_data是不同的、分別位于不同的地址空間，同時訪問不會有問題。這里需要注意一下。

Subprocess模塊

既然說道了多進程，那就順便提一下另一種創(chuàng)建進程的方式。

python提供了Sunprocess模塊可以在程序執(zhí)行過程中，調(diào)用外部的程序。

如我們可以在python程序中打開記事本，打開cmd，或者在某個時間點關(guān)機:

>>> import subprocess
>>> subprocess.Popen(['cmd'])
<subprocess.Popen object at 0x0339F550>
>>> subprocess.Popen(['notepad'])
<subprocess.Popen object at 0x03262B70>
>>> subprocess.Popen(['shutdown', '-p'])

或者使用ping測試一下網(wǎng)絡(luò)連通性：

>>> res = subprocess.Popen(['ping', 'www.cnblogs.com'], stdout=subprocess.PIPE).communicate()[0]
>>> print res
正在 Ping www.cnblogs.com [101.37.113.127] 具有 32 字節(jié)的數(shù)據(jù):
來自 101.37.113.127 的回復(fù): 字節(jié)=32 時間=1ms TTL=91 來自 101.37.113.127 的回復(fù): 字節(jié)=32 時間=1ms TTL=91
來自 101.37.113.127 的回復(fù): 字節(jié)=32 時間=1ms TTL=91
來自 101.37.113.127 的回復(fù): 字節(jié)=32 時間=1ms TTL=91

101.37.113.127 的 Ping 統(tǒng)計信息:
  數(shù)據(jù)包: 已發(fā)送 = 4，已接收 = 4，丟失 = 0 (0% 丟失)，
往返行程的估計時間(以毫秒為單位):
  最短 = 1ms，最長 = 1ms，平均 = 1ms

python多線程與多進程比較

先來看兩個例子：

開啟兩個python線程分別做一億次加一操作，和單獨使用一個線程做一億次加一操作：

def tstart(arg):
  var = 0
  for i in xrange(100000000):
    var += 1

if __name__ == '__main__':
  t1 = threading.Thread(target=tstart, args=('This is thread 1',))
  t2 = threading.Thread(target=tstart, args=('This is thread 2',))
  start_time = time.time()
  t1.start()
  t2.start()
  t1.join()
  t2.join()
  print("Two thread cost time: %s" % (time.time() - start_time))
  start_time = time.time()
  tstart("This is thread 0")
  print("Main thread cost time: %s" % (time.time() - start_time))

結(jié)果：

Two thread cost time: 20.6570000648
Main thread cost time: 2.52800011635

上面的例子如果只開啟t1和t2兩個線程中的一個，那么運行時間和主線程基本一致。這個后面會解釋原因。

使用兩個進程進行上面的操作：

def pstart(arg):
  var = 0
  for i in xrange(100000000):
    var += 1
if __name__ == '__main__':
  p1 = Process(target = pstart, args = ("1", ))
  p2 = Process(target = pstart, args = ("2", ))
  start_time = time.time()
  p1.start()
  p2.start()
  p1.join()
  p2.join()
  print("Two process cost time: %s" % (time.time() - start_time))
  start_time = time.time()
  pstart("0")
  print("Current process cost time: %s" % (time.time() - start_time))

結(jié)果：

Two process cost time: 2.91599988937
Current process cost time: 2.52400016785

對比分析

雙進程并行執(zhí)行和單進程執(zhí)行相同的運算代碼，耗時基本相同，雙進程耗時會稍微多一些，可能的原因是進程創(chuàng)建和銷毀會進行系統(tǒng)調(diào)用，造成額外的時間開銷。

但是對于python線程，雙線程并行執(zhí)行耗時比單線程要高的多，效率相差近10倍。如果將兩個并行線程改成串行執(zhí)行，即：

t1.start()
  t1.join()
  t2.start()
  t2.join()
  #Two thread cost time: 5.12199997902
  #Main thread cost time: 2.54200005531

可以看到三個線程串行執(zhí)行，每一個執(zhí)行的時間基本相同。

本質(zhì)原因雙線程是并發(fā)執(zhí)行的，而不是真正的并行執(zhí)行。原因就在于GIL鎖。

GIL鎖

提起python多線程就不得不提一下GIL(Global Interpreter Lock 全局解釋器鎖)，這是目前占統(tǒng)治地位的python解釋器CPython中為了保證數(shù)據(jù)安全所實現(xiàn)的一種鎖。不管進程中有多少線程，只有拿到了GIL鎖的線程才可以在CPU上運行，即時是多核處理器。對一個進程而言，不管有多少線程，任一時刻，只會有一個線程在執(zhí)行。對于CPU密集型的線程，其效率不僅僅不高，反而有可能比較低。python多線程比較適用于IO密集型的程序。對于的確需要并行運行的程序，可以考慮多進程。

多線程對鎖的爭奪，CPU對線程的調(diào)度，線程之間的切換等均會有時間開銷。

線程與進程區(qū)別

下面簡單的比較一下線程與進程

進程是資源分配的基本單位，線程是CPU執(zhí)行和調(diào)度的基本單位；
通信/同步方式：
- 進程：
  - 通信方式：管道，F(xiàn)IFO，消息隊列，信號，共享內(nèi)存，socket，stream流；
  - 同步方式：PV信號量，管程
- 線程：
  - 同步方式：互斥鎖，遞歸鎖，條件變量，信號量
  - 通信方式：位于同一進程的線程共享進程資源，因此線程間沒有類似于進程間用于數(shù)據(jù)傳遞的通信方式，線程間的通信主要是用于線程同步。
CPU上真正執(zhí)行的是線程，線程比進程輕量，其切換和調(diào)度代價比進程要?。?/li>
線程間對于共享的進程數(shù)據(jù)需要考慮線程安全問題，由于進程之間是隔離的，擁有獨立的內(nèi)存空間資源，相對比較安全，只能通過上面列出的IPC(Inter-Process Communication)進行數(shù)據(jù)傳輸；
系統(tǒng)有一個個進程組成，每個進程包含代碼段、數(shù)據(jù)段、堆空間和?？臻g，以及操作系統(tǒng)共享部分，有等待，就緒和運行三種狀態(tài)；
一個進程可以包含多個線程，線程之間共享進程的資源（文件描述符、全局變量、堆空間等），寄存器變量和?？臻g等是線程私有的；
操作系統(tǒng)中一個進程掛掉不會影響其他進程，如果一個進程中的某個線程掛掉而且OS對線程的支持是多對一模型，那么會導致當前進程掛掉；
如果CPU和系統(tǒng)支持多線程與多進程，多個進程并行執(zhí)行的同時，每個進程中的線程也可以并行執(zhí)行，這樣才能最大限度的榨取硬件的性能；