pytorch 實(shí)現(xiàn)L2和L1正則化regularization的操作
1.torch.optim優(yōu)化器實(shí)現(xiàn)L2正則化
torch.optim集成了很多優(yōu)化器,如SGD,Adadelta,Adam,Adagrad,RMSprop等,這些優(yōu)化器自帶的一個(gè)參數(shù)weight_decay,用于指定權(quán)值衰減率,相當(dāng)于L2正則化中的λ參數(shù),注意torch.optim集成的優(yōu)化器只有L2正則化方法,你可以查看注釋?zhuān)瑓?shù)weight_decay 的解析是:
weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
使用torch.optim的優(yōu)化器,可如下設(shè)置L2正則化
optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.01)
但是這種方法存在幾個(gè)問(wèn)題,
(1)一般正則化,只是對(duì)模型的權(quán)重W參數(shù)進(jìn)行懲罰,而偏置參數(shù)b是不進(jìn)行懲罰的,而torch.optim的優(yōu)化器weight_decay參數(shù)指定的權(quán)值衰減是對(duì)網(wǎng)絡(luò)中的所有參數(shù),包括權(quán)值w和偏置b同時(shí)進(jìn)行懲罰。很多時(shí)候如果對(duì)b 進(jìn)行L2正則化將會(huì)導(dǎo)致嚴(yán)重的欠擬合,因此這個(gè)時(shí)候一般只需要對(duì)權(quán)值w進(jìn)行正則即可。(PS:這個(gè)我真不確定,源碼解析是 weight decay (L2 penalty) ,但有些網(wǎng)友說(shuō)這種方法會(huì)對(duì)參數(shù)偏置b也進(jìn)行懲罰,可解惑的網(wǎng)友給個(gè)明確的答復(fù))
(2)缺點(diǎn):torch.optim的優(yōu)化器固定實(shí)現(xiàn)L2正則化,不能實(shí)現(xiàn)L1正則化。如果需要L1正則化,可如下實(shí)現(xiàn):
(3)根據(jù)正則化的公式,加入正則化后,loss會(huì)變?cè)瓉?lái)大,比如weight_decay=1的loss為10,那么weight_decay=100時(shí),loss輸出應(yīng)該也提高100倍左右。而采用torch.optim的優(yōu)化器的方法,如果你依然采用loss_fun= nn.CrossEntropyLoss()進(jìn)行計(jì)算loss,你會(huì)發(fā)現(xiàn),不管你怎么改變weight_decay的大小,loss會(huì)跟之前沒(méi)有加正則化的大小差不多。這是因?yàn)槟愕膌oss_fun損失函數(shù)沒(méi)有把權(quán)重W的損失加上。
(4)采用torch.optim的優(yōu)化器實(shí)現(xiàn)正則化的方法,是沒(méi)問(wèn)題的!只不過(guò)很容易讓人產(chǎn)生誤解,對(duì)鄙人而言,我更喜歡TensorFlow的正則化實(shí)現(xiàn)方法,只需要tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES),實(shí)現(xiàn)過(guò)程幾乎跟正則化的公式對(duì)應(yīng)的上。
(5)Github項(xiàng)目源碼:點(diǎn)擊進(jìn)入
為了,解決這些問(wèn)題,我特定自定義正則化的方法,類(lèi)似于TensorFlow正則化實(shí)現(xiàn)方法。
2. 如何判斷正則化作用了模型?
一般來(lái)說(shuō),正則化的主要作用是避免模型產(chǎn)生過(guò)擬合,當(dāng)然啦,過(guò)擬合問(wèn)題,有時(shí)候是難以判斷的。但是,要判斷正則化是否作用了模型,還是很容易的。下面我給出兩組訓(xùn)練時(shí)產(chǎn)生的loss和Accuracy的log信息,一組是未加入正則化的,一組是加入正則化:
2.1 未加入正則化loss和Accuracy
優(yōu)化器采用Adam,并且設(shè)置參數(shù)weight_decay=0.0,即無(wú)正則化的方法
optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.0)
訓(xùn)練時(shí)輸出的 loss和Accuracy信息
step/epoch:0/0,Train Loss: 2.418065, Acc: [0.15625] step/epoch:10/0,Train Loss: 5.194936, Acc: [0.34375] step/epoch:20/0,Train Loss: 0.973226, Acc: [0.8125] step/epoch:30/0,Train Loss: 1.215165, Acc: [0.65625] step/epoch:40/0,Train Loss: 1.808068, Acc: [0.65625] step/epoch:50/0,Train Loss: 1.661446, Acc: [0.625] step/epoch:60/0,Train Loss: 1.552345, Acc: [0.6875] step/epoch:70/0,Train Loss: 1.052912, Acc: [0.71875] step/epoch:80/0,Train Loss: 0.910738, Acc: [0.75] step/epoch:90/0,Train Loss: 1.142454, Acc: [0.6875] step/epoch:100/0,Train Loss: 0.546968, Acc: [0.84375] step/epoch:110/0,Train Loss: 0.415631, Acc: [0.9375] step/epoch:120/0,Train Loss: 0.533164, Acc: [0.78125] step/epoch:130/0,Train Loss: 0.956079, Acc: [0.6875] step/epoch:140/0,Train Loss: 0.711397, Acc: [0.8125]
2.1 加入正則化loss和Accuracy
優(yōu)化器采用Adam,并且設(shè)置參數(shù)weight_decay=10.0,即正則化的權(quán)重lambda =10.0
optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=10.0)
這時(shí),訓(xùn)練時(shí)輸出的 loss和Accuracy信息:
step/epoch:0/0,Train Loss: 2.467985, Acc: [0.09375] step/epoch:10/0,Train Loss: 5.435320, Acc: [0.40625] step/epoch:20/0,Train Loss: 1.395482, Acc: [0.625] step/epoch:30/0,Train Loss: 1.128281, Acc: [0.6875] step/epoch:40/0,Train Loss: 1.135289, Acc: [0.6875] step/epoch:50/0,Train Loss: 1.455040, Acc: [0.5625] step/epoch:60/0,Train Loss: 1.023273, Acc: [0.65625] step/epoch:70/0,Train Loss: 0.855008, Acc: [0.65625] step/epoch:80/0,Train Loss: 1.006449, Acc: [0.71875] step/epoch:90/0,Train Loss: 0.939148, Acc: [0.625] step/epoch:100/0,Train Loss: 0.851593, Acc: [0.6875] step/epoch:110/0,Train Loss: 1.093970, Acc: [0.59375] step/epoch:120/0,Train Loss: 1.699520, Acc: [0.625] step/epoch:130/0,Train Loss: 0.861444, Acc: [0.75] step/epoch:140/0,Train Loss: 0.927656, Acc: [0.625]
當(dāng)weight_decay=10000.0
step/epoch:0/0,Train Loss: 2.337354, Acc: [0.15625] step/epoch:10/0,Train Loss: 2.222203, Acc: [0.125] step/epoch:20/0,Train Loss: 2.184257, Acc: [0.3125] step/epoch:30/0,Train Loss: 2.116977, Acc: [0.5] step/epoch:40/0,Train Loss: 2.168895, Acc: [0.375] step/epoch:50/0,Train Loss: 2.221143, Acc: [0.1875] step/epoch:60/0,Train Loss: 2.189801, Acc: [0.25] step/epoch:70/0,Train Loss: 2.209837, Acc: [0.125] step/epoch:80/0,Train Loss: 2.202038, Acc: [0.34375] step/epoch:90/0,Train Loss: 2.192546, Acc: [0.25] step/epoch:100/0,Train Loss: 2.215488, Acc: [0.25] step/epoch:110/0,Train Loss: 2.169323, Acc: [0.15625] step/epoch:120/0,Train Loss: 2.166457, Acc: [0.3125] step/epoch:130/0,Train Loss: 2.144773, Acc: [0.40625] step/epoch:140/0,Train Loss: 2.173397, Acc: [0.28125]
2.3 正則化說(shuō)明
就整體而言,對(duì)比加入正則化和未加入正則化的模型,訓(xùn)練輸出的loss和Accuracy信息,我們可以發(fā)現(xiàn),加入正則化后,loss下降的速度會(huì)變慢,準(zhǔn)確率Accuracy的上升速度會(huì)變慢,并且未加入正則化模型的loss和Accuracy的浮動(dòng)比較大(或者方差比較大),而加入正則化的模型訓(xùn)練loss和Accuracy,表現(xiàn)的比較平滑。
并且隨著正則化的權(quán)重lambda越大,表現(xiàn)的更加平滑。這其實(shí)就是正則化的對(duì)模型的懲罰作用,通過(guò)正則化可以使得模型表現(xiàn)的更加平滑,即通過(guò)正則化可以有效解決模型過(guò)擬合的問(wèn)題。
3.自定義正則化的方法
為了解決torch.optim優(yōu)化器只能實(shí)現(xiàn)L2正則化以及懲罰網(wǎng)絡(luò)中的所有參數(shù)的缺陷,這里實(shí)現(xiàn)類(lèi)似于TensorFlow正則化的方法。
3.1 自定義正則化Regularization類(lèi)
這里封裝成一個(gè)實(shí)現(xiàn)正則化的Regularization類(lèi),各個(gè)方法都給出了注釋?zhuān)约郝窗?,有?wèn)題再留言吧
# 檢查GPU是否可用 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # device='cuda' print("-----device:{}".format(device)) print("-----Pytorch version:{}".format(torch.__version__)) class Regularization(torch.nn.Module): def __init__(self,model,weight_decay,p=2): ''' :param model 模型 :param weight_decay:正則化參數(shù) :param p: 范數(shù)計(jì)算中的冪指數(shù)值,默認(rèn)求2范數(shù), 當(dāng)p=0為L(zhǎng)2正則化,p=1為L(zhǎng)1正則化 ''' super(Regularization, self).__init__() if weight_decay <= 0: print("param weight_decay can not <=0") exit(0) self.model=model self.weight_decay=weight_decay self.p=p self.weight_list=self.get_weight(model) self.weight_info(self.weight_list) def to(self,device): ''' 指定運(yùn)行模式 :param device: cude or cpu :return: ''' self.device=device super().to(device) return self def forward(self, model): self.weight_list=self.get_weight(model)#獲得最新的權(quán)重 reg_loss = self.regularization_loss(self.weight_list, self.weight_decay, p=self.p) return reg_loss def get_weight(self,model): ''' 獲得模型的權(quán)重列表 :param model: :return: ''' weight_list = [] for name, param in model.named_parameters(): if 'weight' in name: weight = (name, param) weight_list.append(weight) return weight_list def regularization_loss(self,weight_list, weight_decay, p=2): ''' 計(jì)算張量范數(shù) :param weight_list: :param p: 范數(shù)計(jì)算中的冪指數(shù)值,默認(rèn)求2范數(shù) :param weight_decay: :return: ''' # weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True) # reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True) # weight_decay=torch.FloatTensor([weight_decay]).to(self.device) # reg_loss=torch.FloatTensor([0.]).to(self.device) reg_loss=0 for name, w in weight_list: l2_reg = torch.norm(w, p=p) reg_loss = reg_loss + l2_reg reg_loss=weight_decay*reg_loss return reg_loss def weight_info(self,weight_list): ''' 打印權(quán)重列表信息 :param weight_list: :return: ''' print("---------------regularization weight---------------") for name ,w in weight_list: print(name) print("---------------------------------------------------")
3.2 Regularization使用方法
使用方法很簡(jiǎn)單,就當(dāng)一個(gè)普通Pytorch模塊來(lái)使用:例如
# 檢查GPU是否可用 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print("-----device:{}".format(device)) print("-----Pytorch version:{}".format(torch.__version__)) weight_decay=100.0 # 正則化參數(shù) model = my_net().to(device) # 初始化正則化 if weight_decay>0: reg_loss=Regularization(model, weight_decay, p=2).to(device) else: print("no regularization") criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropy optimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定參數(shù)weight_decay # train batch_train_data=... batch_train_label=... out = model(batch_train_data) # loss and regularization loss = criterion(input=out, target=batch_train_label) if weight_decay > 0: loss = loss + reg_loss(model) total_loss = loss.item() # backprop optimizer.zero_grad()#清除當(dāng)前所有的累積梯度 total_loss.backward() optimizer.step()
訓(xùn)練時(shí)輸出的 loss和Accuracy信息:
(1)當(dāng)weight_decay=0.0時(shí),未使用正則化
step/epoch:0/0,Train Loss: 2.379627, Acc: [0.09375] step/epoch:10/0,Train Loss: 1.473092, Acc: [0.6875] step/epoch:20/0,Train Loss: 0.931847, Acc: [0.8125] step/epoch:30/0,Train Loss: 0.625494, Acc: [0.875] step/epoch:40/0,Train Loss: 2.241885, Acc: [0.53125] step/epoch:50/0,Train Loss: 1.132131, Acc: [0.6875] step/epoch:60/0,Train Loss: 0.493038, Acc: [0.8125] step/epoch:70/0,Train Loss: 0.819410, Acc: [0.78125] step/epoch:80/0,Train Loss: 0.996497, Acc: [0.71875] step/epoch:90/0,Train Loss: 0.474205, Acc: [0.8125] step/epoch:100/0,Train Loss: 0.744587, Acc: [0.8125] step/epoch:110/0,Train Loss: 0.502217, Acc: [0.78125] step/epoch:120/0,Train Loss: 0.531865, Acc: [0.8125] step/epoch:130/0,Train Loss: 1.016807, Acc: [0.875] step/epoch:140/0,Train Loss: 0.411701, Acc: [0.84375]
(2)當(dāng)weight_decay=10.0時(shí),使用正則化
--------------------------------------------------- step/epoch:0/0,Train Loss: 1563.402832, Acc: [0.09375] step/epoch:10/0,Train Loss: 1530.002686, Acc: [0.53125] step/epoch:20/0,Train Loss: 1495.115234, Acc: [0.71875] step/epoch:30/0,Train Loss: 1461.114136, Acc: [0.78125] step/epoch:40/0,Train Loss: 1427.868164, Acc: [0.6875] step/epoch:50/0,Train Loss: 1395.430054, Acc: [0.6875] step/epoch:60/0,Train Loss: 1363.358154, Acc: [0.5625] step/epoch:70/0,Train Loss: 1331.439697, Acc: [0.75] step/epoch:80/0,Train Loss: 1301.334106, Acc: [0.625] step/epoch:90/0,Train Loss: 1271.505005, Acc: [0.6875] step/epoch:100/0,Train Loss: 1242.488647, Acc: [0.75] step/epoch:110/0,Train Loss: 1214.184204, Acc: [0.59375] step/epoch:120/0,Train Loss: 1186.174561, Acc: [0.71875] step/epoch:130/0,Train Loss: 1159.148438, Acc: [0.78125] step/epoch:140/0,Train Loss: 1133.020020, Acc: [0.65625]
(3)當(dāng)weight_decay=10000.0時(shí),使用正則化
step/epoch:0/0,Train Loss: 1570211.500000, Acc: [0.09375] step/epoch:10/0,Train Loss: 1522952.125000, Acc: [0.3125] step/epoch:20/0,Train Loss: 1486256.125000, Acc: [0.125] step/epoch:30/0,Train Loss: 1451671.500000, Acc: [0.25] step/epoch:40/0,Train Loss: 1418959.750000, Acc: [0.15625] step/epoch:50/0,Train Loss: 1387154.000000, Acc: [0.125] step/epoch:60/0,Train Loss: 1355917.500000, Acc: [0.125] step/epoch:70/0,Train Loss: 1325379.500000, Acc: [0.125] step/epoch:80/0,Train Loss: 1295454.125000, Acc: [0.3125] step/epoch:90/0,Train Loss: 1266115.375000, Acc: [0.15625] step/epoch:100/0,Train Loss: 1237341.000000, Acc: [0.0625] step/epoch:110/0,Train Loss: 1209186.500000, Acc: [0.125] step/epoch:120/0,Train Loss: 1181584.250000, Acc: [0.125] step/epoch:130/0,Train Loss: 1154600.125000, Acc: [0.1875] step/epoch:140/0,Train Loss: 1128239.875000, Acc: [0.125]
對(duì)比torch.optim優(yōu)化器的實(shí)現(xiàn)L2正則化方法,這種Regularization類(lèi)的方法也同樣達(dá)到正則化的效果,并且與TensorFlow類(lèi)似,loss把正則化的損失也計(jì)算了。
此外更改參數(shù)p,如當(dāng)p=0表示L2正則化,p=1表示L1正則化。
4. Github項(xiàng)目源碼下載
《Github項(xiàng)目源碼》點(diǎn)擊進(jìn)入
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教。
相關(guān)文章
python實(shí)現(xiàn)畫(huà)循環(huán)圓
今天小編就為大家分享一篇python實(shí)現(xiàn)畫(huà)循環(huán)圓,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-11-11Python實(shí)現(xiàn)Event回調(diào)機(jī)制的方法
今天小編就為大家分享一篇Python實(shí)現(xiàn)Event回調(diào)機(jī)制的方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-02-02如何取消pyecharts繪制地圖時(shí)默認(rèn)顯示小圓點(diǎn)標(biāo)識(shí)
這篇文章主要介紹了如何取消pyecharts繪制地圖時(shí)默認(rèn)顯示小圓點(diǎn)標(biāo)識(shí),文章內(nèi)容介紹詳細(xì)具有一定的參考價(jià)值?需要的小伙伴可以參考一下2022-04-04python中的type,元類(lèi),類(lèi),對(duì)象用法
這篇文章主要介紹了python中的type,元類(lèi),類(lèi),對(duì)象用法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2022-05-05Python實(shí)現(xiàn)PS濾鏡中馬賽克效果示例
這篇文章主要介紹了Python實(shí)現(xiàn)PS濾鏡中馬賽克效果,涉及Python基于skimage庫(kù)的圖形馬賽克效果相關(guān)實(shí)現(xiàn)技巧,需要的朋友可以參考下2018-01-01Python讀寫(xiě)文件模式和文件對(duì)象方法實(shí)例詳解
這篇文章主要介紹了Python讀寫(xiě)文件模式和文件對(duì)象方法,結(jié)合實(shí)例形式詳細(xì)分析了Python文件操作常用技巧與相關(guān)注意事項(xiàng),需要的朋友可以參考下2019-09-09