Pytorch在訓(xùn)練時(shí)凍結(jié)某些層使其不參與訓(xùn)練問題(更新梯度)
首先,我們知道,深度學(xué)習(xí)網(wǎng)絡(luò)中的參數(shù)是通過計(jì)算梯度,在反向傳播進(jìn)行更新的,從而能得到一個(gè)優(yōu)秀的參數(shù),但是有的時(shí)候,我們想固定其中的某些層的參數(shù)不參與反向傳播。
比如說,進(jìn)行微調(diào)時(shí),我們想固定已經(jīng)加載預(yù)訓(xùn)練模型的參數(shù)部分,只想更新最后一層的分類器,這時(shí)應(yīng)該怎么做呢。
定義網(wǎng)絡(luò)
# 定義一個(gè)簡單的網(wǎng)絡(luò) class net(nn.Module): ? ? def __init__(self, num_class=10): ? ? ? ? super(net, self).__init__() ? ? ? ? self.fc1 = nn.Linear(8, 4) ? ? ? ? self.fc2 = nn.Linear(4, num_class) ? ?? ? ?? ? ? def forward(self, x): ? ? ? ? return self.fc2(self.fc1(x))
情況一:當(dāng)不凍結(jié)層時(shí)
代碼
model = net() # 情況一:不凍結(jié)參數(shù)時(shí) loss_fn = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-2) ?# 傳入的是所有的參數(shù) # 訓(xùn)練前的模型參數(shù) print("model.fc1.weight", model.fc1.weight) print("model.fc2.weight", model.fc2.weight) for epoch in range(10): ? ? x = torch.randn((3, 8)) ? ? label = torch.randint(0,10,[3]).long() ? ? output = model(x) ? ?? ? ? loss = loss_fn(output, label) ? ? optimizer.zero_grad() ? ? loss.backward() ? ? optimizer.step() # 訓(xùn)練后的模型參數(shù) print("model.fc1.weight", model.fc1.weight) print("model.fc2.weight", model.fc2.weight)
結(jié)果
(bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"
model.fc1.weight Parameter containing:
tensor([[ 0.3362, -0.2676, -0.3497, -0.3009, -0.1013, -0.2316, -0.0189, 0.1430],
[-0.2486, 0.2900, -0.1818, -0.0942, 0.1445, 0.2410, -0.1407, -0.3176],
[-0.3198, 0.2039, -0.2249, 0.2819, -0.3136, -0.2794, -0.3011, -0.2270],
[ 0.3376, -0.0842, 0.2747, -0.0232, 0.0768, 0.3160, -0.1185, 0.2911]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[ 0.4277, 0.0945, 0.1768, 0.3773],
[-0.4595, -0.2447, 0.4701, 0.2873],
[ 0.3281, -0.1861, -0.2202, 0.4413],
[-0.1053, -0.1238, 0.0275, -0.0072],
[-0.4448, -0.2787, -0.0280, 0.4629],
[ 0.4063, -0.2091, 0.0706, 0.3216],
[-0.2287, -0.1352, -0.0502, 0.3434],
[-0.2946, -0.4074, 0.4926, -0.0832],
[-0.2608, 0.0165, 0.0501, -0.1673],
[ 0.2507, 0.3006, 0.0481, 0.2257]], requires_grad=True)
model.fc1.weight Parameter containing:
tensor([[ 0.3316, -0.2628, -0.3391, -0.2989, -0.0981, -0.2178, -0.0056, 0.1410],
[-0.2529, 0.2991, -0.1772, -0.0992, 0.1447, 0.2480, -0.1370, -0.3186],
[-0.3246, 0.2055, -0.2229, 0.2745, -0.3158, -0.2750, -0.2994, -0.2295],
[ 0.3366, -0.0877, 0.2693, -0.0182, 0.0807, 0.3117, -0.1184, 0.2946]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[ 0.4189, 0.0985, 0.1723, 0.3804],
[-0.4593, -0.2356, 0.4772, 0.2784],
[ 0.3269, -0.1874, -0.2173, 0.4407],
[-0.1061, -0.1248, 0.0309, -0.0062],
[-0.4322, -0.2868, -0.0319, 0.4647],
[ 0.4048, -0.2150, 0.0692, 0.3228],
[-0.2252, -0.1353, -0.0433, 0.3396],
[-0.2936, -0.4118, 0.4875, -0.0782],
[-0.2625, 0.0192, 0.0509, -0.1670],
[ 0.2474, 0.3056, 0.0418, 0.2265]], requires_grad=True)
結(jié)論
當(dāng)不凍結(jié)層時(shí),隨著訓(xùn)練的進(jìn)行,模型中的可學(xué)習(xí)參數(shù)層的參數(shù)會發(fā)生改變
情況二:采用方式一凍結(jié)fc1層時(shí)
方式一
優(yōu)化器傳入所有的參數(shù)
optimizer = optim.SGD(model.parameters(), lr=1e-2) ?# 傳入的是所有的參數(shù)
將要凍結(jié)層的參數(shù)的requires_grad置為False
for name, param in model.named_parameters(): ? ? if "fc1" in name: ? ? ? ? param.requires_grad = False
代碼
# 情況二:采用方式一凍結(jié)fc1層時(shí) loss_fn = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-2) ?# 優(yōu)化器傳入的是所有的參數(shù) # 訓(xùn)練前的模型參數(shù) print("model.fc1.weight", model.fc1.weight) print("model.fc2.weight", model.fc2.weight) # 凍結(jié)fc1層的參數(shù) for name, param in model.named_parameters(): ? ? if "fc1" in name: ? ? ? ? param.requires_grad = False for epoch in range(10): ? ? x = torch.randn((3, 8)) ? ? label = torch.randint(0,10,[3]).long() ? ? output = model(x) ? ? ? loss = loss_fn(output, label) ? ? optimizer.zero_grad() ? ? loss.backward() ? ? optimizer.step() print("model.fc1.weight", model.fc1.weight) print("model.fc2.weight", model.fc2.weight)
結(jié)果
(bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"
model.fc1.weight Parameter containing:
tensor([[ 0.3163, -0.1592, -0.2360, 0.1436, 0.1158, 0.0406, -0.0627, 0.0566],
[-0.1688, 0.3519, 0.2464, -0.2693, 0.1284, 0.0544, -0.0188, 0.2404],
[ 0.0738, 0.2013, 0.0868, 0.1396, -0.2885, 0.3431, -0.1109, 0.2549],
[ 0.1222, -0.1877, 0.3511, 0.1951, 0.2147, -0.0427, -0.3374, -0.0653]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[-0.1830, -0.3147, -0.1698, 0.3235],
[-0.1347, 0.3096, 0.4895, 0.1221],
[ 0.2735, -0.2238, 0.4713, -0.0683],
[-0.3150, -0.1905, 0.3645, 0.3766],
[-0.0340, 0.3212, 0.0650, 0.1380],
[-0.2500, 0.1128, -0.3338, -0.4151],
[ 0.0446, -0.4776, -0.3655, 0.0822],
[-0.1871, -0.0602, -0.4855, -0.3604],
[-0.3296, 0.0523, -0.3424, 0.2151],
[-0.2478, 0.1424, 0.4547, -0.1969]], requires_grad=True)
model.fc1.weight Parameter containing:
tensor([[ 0.3163, -0.1592, -0.2360, 0.1436, 0.1158, 0.0406, -0.0627, 0.0566],
[-0.1688, 0.3519, 0.2464, -0.2693, 0.1284, 0.0544, -0.0188, 0.2404],
[ 0.0738, 0.2013, 0.0868, 0.1396, -0.2885, 0.3431, -0.1109, 0.2549],
[ 0.1222, -0.1877, 0.3511, 0.1951, 0.2147, -0.0427, -0.3374, -0.0653]])
model.fc2.weight Parameter containing:
tensor([[-0.1821, -0.3155, -0.1637, 0.3213],
[-0.1353, 0.3130, 0.4807, 0.1245],
[ 0.2731, -0.2206, 0.4687, -0.0718],
[-0.3138, -0.1925, 0.3561, 0.3809],
[-0.0344, 0.3152, 0.0606, 0.1332],
[-0.2501, 0.1154, -0.3267, -0.4137],
[ 0.0400, -0.4723, -0.3586, 0.0808],
[-0.1823, -0.0667, -0.4854, -0.3543],
[-0.3285, 0.0547, -0.3388, 0.2166],
[-0.2497, 0.1410, 0.4551, -0.2008]], requires_grad=True)
結(jié)論
由實(shí)驗(yàn)的結(jié)果可以看出:只要設(shè)置requires_grad=False雖然傳入模型所有的參數(shù),仍然只更新requires_grad=True的層的參數(shù)
情況三:采用方式二凍結(jié)fc1層時(shí)
方式二
優(yōu)化器傳入不凍結(jié)的fc2層的參數(shù)
optimizer = optim.SGD(model.fc2.parameters(), lr=1e-2) ?# 優(yōu)化器只傳入fc2的參數(shù)
注:不需要將要凍結(jié)層的參數(shù)的requires_grad置為False
代碼
# 情況三:采用方式二凍結(jié)fc1層時(shí) loss_fn = nn.CrossEntropyLoss() optimizer = optim.SGD(model.fc2.parameters(), lr=1e-2) ?# 優(yōu)化器只傳入fc2的參數(shù) print("model.fc1.weight", model.fc1.weight) print("model.fc2.weight", model.fc2.weight) for epoch in range(10): ? ? x = torch.randn((3, 8)) ? ? label = torch.randint(0,3,[3]).long() ? ? output = model(x) ? ? ? loss = loss_fn(output, label) ? ? optimizer.zero_grad() ? ? loss.backward() ? ? optimizer.step() ? print("model.fc1.weight", model.fc1.weight) print("model.fc2.weight", model.fc2.weight)
結(jié)果
model.fc1.weight Parameter containing:
tensor([[ 0.2519, -0.1772, -0.2229, 0.0711, -0.1681, 0.1233, -0.3217, -0.0412],
[ 0.2032, -0.2045, 0.2723, 0.3272, 0.1034, 0.1519, -0.0587, -0.3436],
[ 0.0470, 0.2379, 0.0590, 0.2400, 0.2280, 0.2045, -0.0229, -0.3484],
[-0.3023, -0.1195, 0.1792, -0.2173, -0.0492, 0.2640, -0.3511, -0.2845]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[-0.3263, -0.2938, -0.3516, -0.4578],
[-0.4549, -0.0060, 0.4696, -0.0174],
[-0.4841, 0.2861, 0.2658, 0.4483],
[-0.3093, 0.0977, -0.2735, 0.1033],
[-0.2421, 0.4489, -0.4649, 0.0110],
[-0.3671, 0.0182, -0.1027, -0.4441],
[ 0.0205, -0.0659, 0.4183, -0.2068],
[-0.1846, 0.1741, -0.2302, -0.1745],
[-0.3423, -0.2642, 0.2796, 0.4976],
[-0.0770, -0.3766, -0.0512, -0.2105]], requires_grad=True)
model.fc1.weight Parameter containing:
tensor([[ 0.2519, -0.1772, -0.2229, 0.0711, -0.1681, 0.1233, -0.3217, -0.0412],
[ 0.2032, -0.2045, 0.2723, 0.3272, 0.1034, 0.1519, -0.0587, -0.3436],
[ 0.0470, 0.2379, 0.0590, 0.2400, 0.2280, 0.2045, -0.0229, -0.3484],
[-0.3023, -0.1195, 0.1792, -0.2173, -0.0492, 0.2640, -0.3511, -0.2845]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[-0.3253, -0.2973, -0.3707, -0.4560],
[-0.4566, 0.0015, 0.4655, -0.0166],
[-0.4796, 0.2931, 0.2592, 0.4661],
[-0.3097, 0.0966, -0.2695, 0.1002],
[-0.2433, 0.4455, -0.4587, 0.0063],
[-0.3669, 0.0171, -0.0988, -0.4452],
[ 0.0198, -0.0679, 0.4203, -0.2088],
[-0.1854, 0.1717, -0.2241, -0.1781],
[-0.3429, -0.2653, 0.2822, 0.4938],
[-0.0773, -0.3765, -0.0464, -0.2127]], requires_grad=True)
結(jié)論
當(dāng)優(yōu)化器只傳入要更新的層的參數(shù)時(shí),只會更新優(yōu)化器傳入的參數(shù),對于沒有傳入的參數(shù)可以求導(dǎo),但是仍然不會更新參數(shù)
方式一與方式二對比總結(jié)
在訓(xùn)練過程中可能需要固定一部分模型的參數(shù),只更新另一部分參數(shù)。
有兩種思路實(shí)現(xiàn)這個(gè)目標(biāo),一個(gè)是設(shè)置不要更新參數(shù)的網(wǎng)絡(luò)層為false,另一個(gè)就是在定義優(yōu)化器時(shí)只傳入要更新的參數(shù)。
最優(yōu)做法是,優(yōu)化器只傳入requires_grad=True的參數(shù),這樣占用的內(nèi)存會更小一點(diǎn),效率也會更高。
最優(yōu)寫法
最優(yōu)寫法
將不更新的參數(shù)的requires_grad設(shè)置為False,同時(shí)不將該參數(shù)傳入optimizer
將不更新的參數(shù)的requires_grad設(shè)置為False
# 凍結(jié)fc1層的參數(shù) for name, param in model.named_parameters(): ? ? if "fc1" in name: ? ? ? ? param.requires_grad = False
不將不更新的模型參數(shù)傳入optimizer
# 定義一個(gè)fliter,只傳入requires_grad=True的模型參數(shù) optimizer = optim.SGD(filter(lambda p : p.requires_grad, model.parameters()), lr=1e-2)?
代碼
# 最優(yōu)寫法 loss_fn = nn.CrossEntropyLoss() # # 訓(xùn)練前的模型參數(shù) print("model.fc1.weight", model.fc1.weight) print("model.fc2.weight", model.fc2.weight) print("model.fc1.weight.requires_grad:", model.fc1.weight.requires_grad) print("model.fc2.weight.requires_grad:", model.fc2.weight.requires_grad) # 凍結(jié)fc1層的參數(shù) for name, param in model.named_parameters(): ? ? if "fc1" in name: ? ? ? ? param.requires_grad = False optimizer = optim.SGD(filter(lambda p : p.requires_grad, model.parameters()), lr=1e-2) ?# 定義一個(gè)fliter,只傳入requires_grad=True的模型參數(shù) for epoch in range(10): ? ? x = torch.randn((3, 8)) ? ? label = torch.randint(0,3,[3]).long() ? ? output = model(x) ? ? ? loss = loss_fn(output, label) ? ? optimizer.zero_grad() ? ? loss.backward() ? ? optimizer.step() print("model.fc1.weight", model.fc1.weight) print("model.fc2.weight", model.fc2.weight) print("model.fc1.weight.requires_grad:", model.fc1.weight.requires_grad) print("model.fc2.weight.requires_grad:", model.fc2.weight.requires_grad)
結(jié)果
(bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"
model.fc1.weight Parameter containing:
tensor([[-0.1193, 0.2354, 0.2520, 0.1187, 0.2699, -0.2301, 0.1622, -0.0478],
[-0.2862, -0.1716, 0.2865, 0.2615, -0.2205, -0.2046, -0.0983, -0.1564],
[-0.3143, -0.2248, 0.2198, 0.2338, 0.1184, -0.2033, -0.3418, 0.1434],
[ 0.3107, -0.0411, -0.3016, 0.1924, -0.1756, -0.2881, 0.0528, -0.0444]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[-0.2548, 0.2107, -0.1293, -0.2562],
[-0.1989, -0.2624, 0.2226, 0.4861],
[-0.1501, 0.2516, 0.4311, -0.1650],
[ 0.0334, -0.0963, -0.1731, 0.1706],
[ 0.2451, -0.2102, 0.0499, 0.0497],
[-0.1464, -0.2973, 0.3692, 0.0523],
[ 0.1192, 0.3575, -0.1911, 0.1457],
[-0.0990, 0.2059, 0.2072, -0.2013],
[-0.4397, 0.4036, -0.3402, -0.0417],
[ 0.0379, 0.0128, -0.3212, -0.0867]], requires_grad=True)
model.fc1.weight.requires_grad: True
model.fc2.weight.requires_grad: True
model.fc1.weight Parameter containing:
tensor([[-0.1193, 0.2354, 0.2520, 0.1187, 0.2699, -0.2301, 0.1622, -0.0478],
[-0.2862, -0.1716, 0.2865, 0.2615, -0.2205, -0.2046, -0.0983, -0.1564],
[-0.3143, -0.2248, 0.2198, 0.2338, 0.1184, -0.2033, -0.3418, 0.1434],
[ 0.3107, -0.0411, -0.3016, 0.1924, -0.1756, -0.2881, 0.0528, -0.0444]])
model.fc2.weight Parameter containing:
tensor([[-0.2637, 0.2073, -0.1293, -0.2422],
[-0.2027, -0.2641, 0.2152, 0.4897],
[-0.1543, 0.2504, 0.4188, -0.1576],
[ 0.0356, -0.0947, -0.1698, 0.1669],
[ 0.2474, -0.2081, 0.0536, 0.0456],
[-0.1445, -0.2962, 0.3708, 0.0500],
[ 0.1219, 0.3574, -0.1876, 0.1404],
[-0.0961, 0.2058, 0.2091, -0.2046],
[-0.4368, 0.4039, -0.3376, -0.0450],
[ 0.0398, 0.0143, -0.3181, -0.0897]], requires_grad=True)
model.fc1.weight.requires_grad: False
model.fc2.weight.requires_grad: True
結(jié)論
最優(yōu)寫法能夠節(jié)省顯存和提升速度:
節(jié)省顯存:不將不更新的參數(shù)傳入optimizer
提升速度:將不更新的參數(shù)的requires_grad設(shè)置為False,節(jié)省了計(jì)算這部分參數(shù)梯度的時(shí)間
總結(jié)
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。
相關(guān)文章
使用Python串口實(shí)時(shí)顯示數(shù)據(jù)并繪圖的例子
今天小編就為大家分享一篇使用Python串口實(shí)時(shí)顯示數(shù)據(jù)并繪圖的例子,具有很好的參考價(jià)值,希望對大家有所幫助。一起跟隨小編過來看看吧2019-12-12python 環(huán)境安裝及編輯器配置方法小結(jié)
這篇文章主要介紹了python 環(huán)境安裝及編輯器配置方法小結(jié)的相關(guān)資料,需要的朋友可以參考下2021-06-06Python使用Streamlit快速創(chuàng)建儀表盤
這篇文章主要為大家詳細(xì)介紹了Python如何使用Streamlit快速創(chuàng)建一個(gè)簡單的儀表盤,文中的示例代碼簡潔易懂,快跟隨小編一起來學(xué)習(xí)一下吧2023-09-09pytorch 實(shí)現(xiàn)將自己的圖片數(shù)據(jù)處理成可以訓(xùn)練的圖片類型
今天小編就為大家分享一篇pytorch 實(shí)現(xiàn)將自己的圖片數(shù)據(jù)處理成可以訓(xùn)練的圖片類型,具有很好的參考價(jià)值,希望對大家有所幫助。一起跟隨小編過來看看吧2020-01-01PyTorch中的nn.ConvTranspose2d模塊詳解
nn.ConvTranspose2d是PyTorch中用于實(shí)現(xiàn)二維轉(zhuǎn)置卷積的模塊,廣泛應(yīng)用于生成對抗網(wǎng)絡(luò)(GANs)和卷積神經(jīng)網(wǎng)絡(luò)(CNNs)的解碼器中。該模塊通過參數(shù)如輸入輸出通道數(shù)、卷積核大小、步長、填充等,能控制輸出尺寸和避免棋盤效應(yīng)2024-09-09Python對多屬性的重復(fù)數(shù)據(jù)去重實(shí)例
下面小編就為大家分享一篇Python對多屬性的重復(fù)數(shù)據(jù)去重實(shí)例,具有很好的參考價(jià)值,希望對大家有所幫助。一起跟隨小編過來看看吧2018-04-04使用Python標(biāo)準(zhǔn)庫中的wave模塊繪制樂譜的簡單教程
這篇文章主要介紹了使用Python標(biāo)準(zhǔn)庫中的wave模塊繪制樂譜,涉及到了numpy模塊和坐標(biāo)的可視化運(yùn)用,用到了需要的朋友可以參考下2015-03-03