PyTorch使用自動微分模塊的方法和理解

更新時間：2024年09月23日 15:15:54 作者：小言從不摸魚

自動微分模塊Autograd為張量增加了自動求導功能,是神經網絡訓練不可或缺的組成部分,通過backward方法和grad屬性,實現梯度的計算和訪問,本小節(jié)主要講解了 PyTorch 中非常重要的自動微分模塊的使用和理解,感興趣的朋友一起看看吧

自動微分（Autograd）模塊對張量做了進一步的封裝，具有自動求導功能。自動微分模塊是構成神經網絡訓練的必要模塊，在神經網絡的反向傳播過程中，Autograd 模塊基于正向計算的結果對當前的參數進行微分計算，從而實現網絡權重參數的更新。

?? 梯度基本計算

我們使用 backward 方法、grad 屬性來實現梯度的計算和訪問.

import torch

1.1 單標量梯度的計算

   
 # y = x**2 + 20
    def test01():
    # 定義需要求導的張量
    # 張量的值類型必須是浮點類型
    x = torch.tensor(10, requires_grad=True, dtype=torch.float64)
    # 變量經過中間運算
    f = x ** 2 + 20
    # 自動微分
    f.backward()
    # 打印 x 變量的梯度
    # backward 函數計算的梯度值會存儲在張量的 grad 變量中
    print(x.grad)

1.2 單向量梯度的計算

# y = x**2 + 20
def test02():
    # 定義需要求導張量
    x = torch.tensor([10, 20, 30, 40], requires_grad=True, dtype=torch.float64)
    # 變量經過中間計算
    f1 = x ** 2 + 20
    # 注意:
    # 由于求導的結果必須是標量
    # 而 f 的結果是: tensor([120., 420.])
    # 所以, 不能直接自動微分
    # 需要將結果計算為標量才能進行計算
    f2 = f1.mean()  # f2 = 1/2 * x
    # 自動微分
    f2.backward()
    # 打印 x 變量的梯度
    print(x.grad)

1.3 多標量梯度計算

# y = x1 ** 2 + x2 ** 2 + x1*x2
def test03():
    # 定義需要計算梯度的張量
    x1 = torch.tensor(10, requires_grad=True, dtype=torch.float64)
    x2 = torch.tensor(20, requires_grad=True, dtype=torch.float64)
    # 經過中間的計算
    y = x1**2 + x2**2 + x1*x2
    # 將輸出結果變?yōu)闃肆?
    y = y.sum()
    # 自動微分
    y.backward()
    # 打印兩個變量的梯度
    print(x1.grad, x2.grad)

1.4 多向量梯度計算

def test04():
    # 定義需要計算梯度的張量
    x1 = torch.tensor([10, 20], requires_grad=True, dtype=torch.float64)
    x2 = torch.tensor([30, 40], requires_grad=True, dtype=torch.float64)
    # 經過中間的計算
    y = x1 ** 2 + x2 ** 2 + x1 * x2
    print(y)
    # 將輸出結果變?yōu)闃肆?
    y = y.sum()
    # 自動微分
    y.backward()
    # 打印兩個變量的梯度
    print(x1.grad, x2.grad)
if __name__ == '__main__':
    test04()

1.5 運行結果??

tensor(20., dtype=torch.float64)
tensor([ 5., 10., 15., 20.], dtype=torch.float64)
tensor(40., dtype=torch.float64) tensor(50., dtype=torch.float64)
tensor([1300., 2800.], dtype=torch.float64, grad_fn=<AddBackward0>)
tensor([50., 80.], dtype=torch.float64) tensor([ 70., 100.], dtype=torch.float64)

?? 控制梯度計算

我們可以通過一些方法使得在 requires_grad=True 的張量在某些時候計算不進行梯度計算。

import torch

2.1 控制不計算梯度

def test01():
    x = torch.tensor(10, requires_grad=True, dtype=torch.float64)
    print(x.requires_grad)
    # 第一種方式: 對代碼進行裝飾
    with torch.no_grad():
        y = x ** 2
    print(y.requires_grad)
    # 第二種方式: 對函數進行裝飾
    @torch.no_grad()
    def my_func(x):
        return x ** 2
    print(my_func(x).requires_grad)
    # 第三種方式
    torch.set_grad_enabled(False)
    y = x ** 2
    print(y.requires_grad)

2.2 注意: 累計梯度

def test02():
    # 定義需要求導張量
    x = torch.tensor([10, 20, 30, 40], requires_grad=True, dtype=torch.float64)
    for _ in range(3):
        f1 = x ** 2 + 20
        f2 = f1.mean()
        # 默認張量的 grad 屬性會累計歷史梯度值
        # 所以, 需要我們每次手動清理上次的梯度
        # 注意: 一開始梯度不存在, 需要做判斷
        if x.grad is not None:
            x.grad.data.zero_()
        f2.backward()
        print(x.grad)

2.3 梯度下降優(yōu)化最優(yōu)解

def test03():
    # y = x**2
    x = torch.tensor(10, requires_grad=True, dtype=torch.float64)
    for _ in range(5000):
        # 正向計算
        f = x ** 2
        # 梯度清零
        if x.grad is not None:
            x.grad.data.zero_()
        # 反向傳播計算梯度
        f.backward()
        # 更新參數
        x.data = x.data - 0.001 * x.grad
        print('%.10f' % x.data)
if __name__ == '__main__':
    test01()
    test02()
    test03()

2.4 運行結果??

True
False
False
False
tensor([ 5., 10., 15., 20.], dtype=torch.float64)
tensor([ 5., 10., 15., 20.], dtype=torch.float64)
tensor([ 5., 10., 15., 20.], dtype=torch.float64)

?? 梯度計算注意

當對設置 requires_grad=True 的張量使用 numpy 函數進行轉換時, 會出現如下報錯:

Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

此時, 需要先使用 detach 函數將張量進行分離, 再使用 numpy 函數.

注意: detach 之后會產生一個新的張量, 新的張量作為葉子結點，并且該張量和原來的張量共享數據, 但是分離后的張量不需要計算梯度。

import torch

3.1 detach 函數用法

def test01():
    x = torch.tensor([10, 20], requires_grad=True, dtype=torch.float64)
    # Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
    # print(x.numpy())  # 錯誤
    print(x.detach().numpy())  # 正確

3.2 detach 前后張量共享內存

def test02():
    x1 = torch.tensor([10, 20], requires_grad=True, dtype=torch.float64)
    # x2 作為葉子結點
    x2 = x1.detach()
    # 兩個張量的值一樣: 140421811165776 140421811165776
    print(id(x1.data), id(x2.data))
    x2.data = torch.tensor([100, 200])
    print(x1)
    print(x2)
    # x2 不會自動計算梯度: False
    print(x2.requires_grad)
if __name__ == '__main__':
    test01()
    test02()