Python 代碼在函數(shù)中運行得更快的原因解析

更新時間：2023年09月20日 09:06:01 作者：edisonfish

我們知道,python 是一種解釋型語言,它會逐行讀取并執(zhí)行代碼,小伙伴們可能會有這個疑問：為什么在函數(shù)中運行的 Python 代碼速度更快,今天這篇文章將會解答大家心中的疑惑

譯文

要理解為什么 Python 代碼在函數(shù)中運行得更快，我們需要首先了解 Python 是如何執(zhí)行代碼的

我們知道，python 是一種解釋型語言，它會逐行讀取并執(zhí)行代碼

當運行一個 python 程序的時候，首先將代碼編譯成字節(jié)碼（一種更接近機器碼的中間語言）然后 python 解釋器執(zhí)行字節(jié)碼

def hello_world():
    print("Hello, World!")
import dis
dis.dis(hello_world)

#結果
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello, World!')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE

由上所示，python 中的 dis 模塊將函數(shù) hello_world 分解為字節(jié)碼

需要注意的是，python 解釋器是一個執(zhí)行字節(jié)碼的虛擬機，默認的 python 解釋器是用 C 編寫的，即 CPython

還有其他的 python 解釋器如 Jython(用 Java 編寫)，IronPython(用于 .net)和PyPy(用 Python 和 C 編寫)

為什么 Python 代碼在函數(shù)中運行得更快

我們來編寫一個簡單的例子：定義一個函數(shù) my_function ，函數(shù)內部包含一個 for 循環(huán)

def my_function():
    for i in range(100000000):
        pass

編譯該函數(shù)的時候，字節(jié)碼可能如下所示

  SETUP_LOOP              20 (to 23)
  LOAD_GLOBAL             0 (range)
  LOAD_CONST              3 (100000000)
  CALL_FUNCTION           1
  GET_ITER            
  FOR_ITER                6 (to 22)
  STORE_FAST              0 (i)
  JUMP_ABSOLUTE           13
  POP_BLOCK           
  LOAD_CONST              0 (None)
  RETURN_VALUE

這里的關鍵指令是 STORE_FAST ，用于存儲循環(huán)變量 i

現(xiàn)在我們把這個 for 循環(huán)放在 python 腳本的頂層（全局范圍內），然后再來看一下字節(jié)碼

for i in range(100000000):
	pass

  SETUP_LOOP              20 (to 23)
  LOAD_NAME               0 (range)
  LOAD_CONST              3 (100000000)
  CALL_FUNCTION           1
  GET_ITER            
  FOR_ITER                6 (to 22)
  STORE_NAME              1 (i)
  JUMP_ABSOLUTE           13
  POP_BLOCK           
  LOAD_CONST              2 (None)
  RETURN_VALUE

可以看到關鍵指令變成了 STORE_NAME ，而不是 STORE_FAST

字節(jié)碼 STORE_FAST 比 STORE_NAME 快，因為在函數(shù)中，局部變量存儲在固定長度的數(shù)組中，而不是存儲在字典中。這個數(shù)組可以通過索引直接訪問，使得變量檢索非?？?/p>

基本上，它只是一個指向列表的指針，并增加了 PyObject 的引用計數(shù)，這兩個都是高效的操作

另一方面，全局變量存儲在一個字典。當訪問全局變量時，Python 必須執(zhí)行哈希表查找，這涉及計算哈希值，然后檢索與之關聯(lián)的值

雖然經過優(yōu)化，但仍然比基于索引的查找慢

基準測試驗證

我們知道在 Python 中，代碼執(zhí)行的速度取決于代碼執(zhí)行的位置——在函數(shù)中還是在全局作用域中

讓我們用一個簡單的基準測試的例子來比較一下

首先定義一個求階乘的函數(shù)

def factorial(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

然后在全局范圍內執(zhí)行相同的代碼

n = 20
result = 1
for i in range(1, n + 1):
    result *= i

為了對這兩段代碼進行基準測試，我們可以在 Python 中使用 timeit 模塊，它提供了一種簡單的方法來對少量 Python 代碼進行計時

import timeit
# 函數(shù)
def benchmark():
    start = timeit.default_timer()
    factorial(20)
    end = timeit.default_timer()
    print(end - start)
benchmark()
# Prints: 3.541994374245405e-06
# 全局范圍
start = timeit.default_timer()
n = 20
result = 1
for i in range(1, n + 1):
    result *= i
end = timeit.default_timer()
print(end - start) 
# Pirnts: 5.375011824071407e-06

可以看到，函數(shù)代碼的執(zhí)行速度比全局作用域代碼要快

需要注意的是，這兩段代碼最好不要放在同一腳本中，要分開單獨運行

這是因為 benchmark() 函數(shù)在執(zhí)行時間上增加了一些開銷，并且全局代碼在內部進行了優(yōu)化

cProfile 分析

python 提供了一個 cProfile 內置模塊

讓我們用它來分析一個新例子：在局部和全局范圍內計算平方和

import cProfile
def sum_of_squares():
    total = 0
    for i in range(1, 10000000):
        total += i * i
i = None
total = 0
def sum_of_squares_g():
    global i
    global total
    for i in range(1, 10000000):
        total += i * i
def profile(func):
    pr = cProfile.Profile()
    pr.enable()
    func()
    pr.disable()
    pr.print_stats()
#
# Profile function code
#
print("Function scope:")
profile(sum_of_squares)
#
# Profile global scope code
#
print("Global scope:")
profile(sum_of_squares_g)

上面的例子中，可以認為 sum_of_squares_g() 函數(shù)是全局的，因為它使用了兩個全局變量， i 和 total

從性能分析結果中，可以看到函數(shù)代碼在執(zhí)行時間方面比全局更有效

Function scope:
         2 function calls in 0.903 seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   1       0.903    0.903    0.903    0.903 profiler.py:3(sum_of_squares)
   1       0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
Global scope:
         2 function calls in 1.358 seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   1       1.358    1.358    1.358    1.358 profiler.py:10(sum_of_squares_g)
   1       0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}