python神經(jīng)網(wǎng)絡Keras常用學習率衰減匯總
前言
增加了論文中的余弦退火下降方式。如圖所示:

學習率是深度學習中非常重要的一環(huán),好好學習吧!
為什么要調控學習率
在深度學習中,學習率的調整非常重要。
學習率大有如下優(yōu)點:
1、加快學習速率。
2、幫助跳出局部最優(yōu)值。
但存在如下缺點:
1、導致模型訓練不收斂。
2、單單使用大學習率容易導致模型不精確。
學習率小有如下優(yōu)點:
1、幫助模型收斂,有助于模型細化。
2、提高模型精度。
但存在如下缺點:
1、無法跳出局部最優(yōu)值。
2、收斂緩慢。
學習率大和學習率小的功能是幾乎相反的。因此我們適當?shù)恼{整學習率,才可以最大程度的提高訓練性能。
下降方式匯總
1、階層性下降
在Keras當中,常用ReduceLROnPlateau函數(shù)實現(xiàn)階層性下降。階層性下降指的就是學習率會突然變?yōu)樵瓉淼?/2或者1/10。
使用ReduceLROnPlateau可以指定某一項指標不繼續(xù)下降后,比如說驗證集的loss、訓練集的loss等,突然下降學習率,變?yōu)樵瓉淼?/2或者1/10。
ReduceLROnPlateau的主要參數(shù)有:
1、factor:在某一項指標不繼續(xù)下降后學習率下降的比率。
2、patience:在某一項指標不繼續(xù)下降幾個時代后,學習率開始下降。
# 導入ReduceLROnPlateau from keras.callbacks import ReduceLROnPlateau # 定義ReduceLROnPlateau reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1) # 使用ReduceLROnPlateau model.fit(X_train, Y_train, callbacks=[reduce_lr])
2、指數(shù)型下降
在Keras當中,我沒有找到特別好的Callback直接實現(xiàn)指數(shù)型下降,于是利用Callback類實現(xiàn)了一個。
指數(shù)型下降指的就是學習率會隨著指數(shù)函數(shù)不斷下降。
具體公式如下:

1、learning_rate指的是當前的學習率。
2、learning_rate_base指的是基礎學習率。
3、decay_rate指的是衰減系數(shù)。
效果如圖所示:

實現(xiàn)方式如下,利用Callback實現(xiàn),與普通的ReduceLROnPlateau調用方式類似:
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras import backend as K
from keras.layers import Flatten,Conv2D,Dropout,Input,Dense,MaxPooling2D
from keras.models import Model
def exponent(global_epoch,
learning_rate_base,
decay_rate,
min_learn_rate=0,
):
learning_rate = learning_rate_base * pow(decay_rate, global_epoch)
learning_rate = max(learning_rate,min_learn_rate)
return learning_rate
class ExponentDecayScheduler(keras.callbacks.Callback):
"""
繼承Callback,實現(xiàn)對學習率的調度
"""
def __init__(self,
learning_rate_base,
decay_rate,
global_epoch_init=0,
min_learn_rate=0,
verbose=0):
super(ExponentDecayScheduler, self).__init__()
# 基礎的學習率
self.learning_rate_base = learning_rate_base
# 全局初始化epoch
self.global_epoch = global_epoch_init
self.decay_rate = decay_rate
# 參數(shù)顯示
self.verbose = verbose
# learning_rates用于記錄每次更新后的學習率,方便圖形化觀察
self.min_learn_rate = min_learn_rate
self.learning_rates = []
def on_epoch_end(self, epochs ,logs=None):
self.global_epoch = self.global_epoch + 1
lr = K.get_value(self.model.optimizer.lr)
self.learning_rates.append(lr)
#更新學習率
def on_epoch_begin(self, batch, logs=None):
lr = exponent(global_epoch=self.global_epoch,
learning_rate_base=self.learning_rate_base,
decay_rate = self.decay_rate,
min_learn_rate = self.min_learn_rate)
K.set_value(self.model.optimizer.lr, lr)
if self.verbose > 0:
print('\nBatch %05d: setting learning '
'rate to %s.' % (self.global_epoch + 1, lr))
# 載入Mnist手寫數(shù)據(jù)集
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = np.expand_dims(x_train,-1)
x_test = np.expand_dims(x_test,-1)
#-----------------------------#
# 創(chuàng)建模型
#-----------------------------#
inputs = Input([28,28,1])
x = Conv2D(32, kernel_size= 5,padding = 'same',activation="relu")(inputs)
x = MaxPooling2D(pool_size = 2, strides = 2, padding = 'same',)(x)
x = Conv2D(64, kernel_size= 5,padding = 'same',activation="relu")(x)
x = MaxPooling2D(pool_size = 2, strides = 2, padding = 'same',)(x)
x = Flatten()(x)
x = Dense(1024)(x)
x = Dense(256)(x)
out = Dense(10, activation='softmax')(x)
model = Model(inputs,out)
# 設定優(yōu)化器,loss,計算準確率
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# 設置訓練參數(shù)
epochs = 10
init_epoch = 0
# 每一次訓練使用多少個Batch
batch_size = 31
# 最大學習率
learning_rate_base = 1e-3
sample_count = len(x_train)
# 學習率
exponent_lr = ExponentDecayScheduler(learning_rate_base = learning_rate_base,
global_epoch_init = init_epoch,
decay_rate = 0.9,
min_learn_rate = 1e-6
)
# 利用fit進行訓練
model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
verbose=1, callbacks=[exponent_lr])
plt.plot(exponent_lr.learning_rates)
plt.xlabel('Step', fontsize=20)
plt.ylabel('lr', fontsize=20)
plt.axis([0, epochs, 0, learning_rate_base*1.1])
plt.xticks(np.arange(0, epochs, 1))
plt.grid()
plt.title('lr decay with exponent', fontsize=20)
plt.show()
3、余弦退火衰減
余弦退火衰減法,學習率會先上升再下降,這是退火優(yōu)化法的思想。(關于什么是退火算法可以百度。)
上升的時候使用線性上升,下降的時候模擬cos函數(shù)下降。
效果如圖所示:

余弦退火衰減有幾個比較必要的參數(shù):
1、learning_rate_base:學習率最高值。
2、warmup_learning_rate:最開始的學習率。
3、warmup_steps:多少步長后到達頂峰值。
實現(xiàn)方式如下,利用Callback實現(xiàn),與普通的ReduceLROnPlateau調用方式類似:
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras import backend as K
from keras.layers import Flatten,Conv2D,Dropout,Input,Dense,MaxPooling2D
from keras.models import Model
def cosine_decay_with_warmup(global_step,
learning_rate_base,
total_steps,
warmup_learning_rate=0.0,
warmup_steps=0,
hold_base_rate_steps=0,
min_learn_rate=0,
):
"""
參數(shù):
global_step: 上面定義的Tcur,記錄當前執(zhí)行的步數(shù)。
learning_rate_base:預先設置的學習率,當warm_up階段學習率增加到learning_rate_base,就開始學習率下降。
total_steps: 是總的訓練的步數(shù),等于epoch*sample_count/batch_size,(sample_count是樣本總數(shù),epoch是總的循環(huán)次數(shù))
warmup_learning_rate: 這是warm up階段線性增長的初始值
warmup_steps: warm_up總的需要持續(xù)的步數(shù)
hold_base_rate_steps: 這是可選的參數(shù),即當warm up階段結束后保持學習率不變,知道hold_base_rate_steps結束后才開始學習率下降
"""
if total_steps < warmup_steps:
raise ValueError('total_steps must be larger or equal to '
'warmup_steps.')
#這里實現(xiàn)了余弦退火的原理,設置學習率的最小值為0,所以簡化了表達式
learning_rate = 0.5 * learning_rate_base * (1 + np.cos(np.pi *
(global_step - warmup_steps - hold_base_rate_steps) / float(total_steps - warmup_steps - hold_base_rate_steps)))
#如果hold_base_rate_steps大于0,表明在warm up結束后學習率在一定步數(shù)內保持不變
if hold_base_rate_steps > 0:
learning_rate = np.where(global_step > warmup_steps + hold_base_rate_steps,
learning_rate, learning_rate_base)
if warmup_steps > 0:
if learning_rate_base < warmup_learning_rate:
raise ValueError('learning_rate_base must be larger or equal to '
'warmup_learning_rate.')
#線性增長的實現(xiàn)
slope = (learning_rate_base - warmup_learning_rate) / warmup_steps
warmup_rate = slope * global_step + warmup_learning_rate
#只有當global_step 仍然處于warm up階段才會使用線性增長的學習率warmup_rate,否則使用余弦退火的學習率learning_rate
learning_rate = np.where(global_step < warmup_steps, warmup_rate,
learning_rate)
learning_rate = max(learning_rate,min_learn_rate)
return learning_rate
class WarmUpCosineDecayScheduler(keras.callbacks.Callback):
"""
繼承Callback,實現(xiàn)對學習率的調度
"""
def __init__(self,
learning_rate_base,
total_steps,
global_step_init=0,
warmup_learning_rate=0.0,
warmup_steps=0,
hold_base_rate_steps=0,
min_learn_rate=0,
verbose=0):
super(WarmUpCosineDecayScheduler, self).__init__()
# 基礎的學習率
self.learning_rate_base = learning_rate_base
# 總共的步數(shù),訓練完所有世代的步數(shù)epochs * sample_count / batch_size
self.total_steps = total_steps
# 全局初始化step
self.global_step = global_step_init
# 熱調整參數(shù)
self.warmup_learning_rate = warmup_learning_rate
# 熱調整步長,warmup_epoch * sample_count / batch_size
self.warmup_steps = warmup_steps
self.hold_base_rate_steps = hold_base_rate_steps
# 參數(shù)顯示
self.verbose = verbose
# learning_rates用于記錄每次更新后的學習率,方便圖形化觀察
self.min_learn_rate = min_learn_rate
self.learning_rates = []
#更新global_step,并記錄當前學習率
def on_batch_end(self, batch, logs=None):
self.global_step = self.global_step + 1
lr = K.get_value(self.model.optimizer.lr)
self.learning_rates.append(lr)
#更新學習率
def on_batch_begin(self, batch, logs=None):
lr = cosine_decay_with_warmup(global_step=self.global_step,
learning_rate_base=self.learning_rate_base,
total_steps=self.total_steps,
warmup_learning_rate=self.warmup_learning_rate,
warmup_steps=self.warmup_steps,
hold_base_rate_steps=self.hold_base_rate_steps,
min_learn_rate = self.min_learn_rate)
K.set_value(self.model.optimizer.lr, lr)
if self.verbose > 0:
print('\nBatch %05d: setting learning '
'rate to %s.' % (self.global_step + 1, lr))
# 載入Mnist手寫數(shù)據(jù)集
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = np.expand_dims(x_train,-1)
x_test = np.expand_dims(x_test,-1)
#-----------------------------#
# 創(chuàng)建模型
#-----------------------------#
inputs = Input([28,28,1])
x = Conv2D(32, kernel_size= 5,padding = 'same',activation="relu")(inputs)
x = MaxPooling2D(pool_size = 2, strides = 2, padding = 'same',)(x)
x = Conv2D(64, kernel_size= 5,padding = 'same',activation="relu")(x)
x = MaxPooling2D(pool_size = 2, strides = 2, padding = 'same',)(x)
x = Flatten()(x)
x = Dense(1024)(x)
x = Dense(256)(x)
out = Dense(10, activation='softmax')(x)
model = Model(inputs,out)
# 設定優(yōu)化器,loss,計算準確率
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# 設置訓練參數(shù)
epochs = 10
# 預熱期
warmup_epoch = 3
# 每一次訓練使用多少個Batch
batch_size = 16
# 最大學習率
learning_rate_base = 1e-3
sample_count = len(x_train)
# 總共的步長
total_steps = int(epochs * sample_count / batch_size)
# 預熱步長
warmup_steps = int(warmup_epoch * sample_count / batch_size)
# 學習率
warm_up_lr = WarmUpCosineDecayScheduler(learning_rate_base=learning_rate_base,
total_steps=total_steps,
warmup_learning_rate=1e-5,
warmup_steps=warmup_steps,
hold_base_rate_steps=5,
min_learn_rate = 1e-6
)
# 利用fit進行訓練
model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
verbose=1, callbacks=[warm_up_lr])
plt.plot(warm_up_lr.learning_rates)
plt.xlabel('Step', fontsize=20)
plt.ylabel('lr', fontsize=20)
plt.axis([0, total_steps, 0, learning_rate_base*1.1])
plt.xticks(np.arange(0, epochs, 1))
plt.grid()
plt.title('Cosine decay with warmup', fontsize=20)
plt.show()
4、余弦退火衰減更新版
論文當中的余弦退火衰減并非只上升下降一次,因此我重新寫了一段代碼用于實現(xiàn)多次上升下降:

實現(xiàn)方式如下,利用Callback實現(xiàn),與普通的ReduceLROnPlateau調用方式類似:
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras import backend as K
from keras.layers import Flatten,Conv2D,Dropout,Input,Dense,MaxPooling2D
from keras.models import Model
def cosine_decay_with_warmup(global_step,
learning_rate_base,
total_steps,
warmup_learning_rate=0.0,
warmup_steps=0,
hold_base_rate_steps=0,
min_learn_rate=0,
):
"""
參數(shù):
global_step: 上面定義的Tcur,記錄當前執(zhí)行的步數(shù)。
learning_rate_base:預先設置的學習率,當warm_up階段學習率增加到learning_rate_base,就開始學習率下降。
total_steps: 是總的訓練的步數(shù),等于epoch*sample_count/batch_size,(sample_count是樣本總數(shù),epoch是總的循環(huán)次數(shù))
warmup_learning_rate: 這是warm up階段線性增長的初始值
warmup_steps: warm_up總的需要持續(xù)的步數(shù)
hold_base_rate_steps: 這是可選的參數(shù),即當warm up階段結束后保持學習率不變,知道hold_base_rate_steps結束后才開始學習率下降
"""
if total_steps < warmup_steps:
raise ValueError('total_steps must be larger or equal to '
'warmup_steps.')
#這里實現(xiàn)了余弦退火的原理,設置學習率的最小值為0,所以簡化了表達式
learning_rate = 0.5 * learning_rate_base * (1 + np.cos(np.pi *
(global_step - warmup_steps - hold_base_rate_steps) / float(total_steps - warmup_steps - hold_base_rate_steps)))
#如果hold_base_rate_steps大于0,表明在warm up結束后學習率在一定步數(shù)內保持不變
if hold_base_rate_steps > 0:
learning_rate = np.where(global_step > warmup_steps + hold_base_rate_steps,
learning_rate, learning_rate_base)
if warmup_steps > 0:
if learning_rate_base < warmup_learning_rate:
raise ValueError('learning_rate_base must be larger or equal to '
'warmup_learning_rate.')
#線性增長的實現(xiàn)
slope = (learning_rate_base - warmup_learning_rate) / warmup_steps
warmup_rate = slope * global_step + warmup_learning_rate
#只有當global_step 仍然處于warm up階段才會使用線性增長的學習率warmup_rate,否則使用余弦退火的學習率learning_rate
learning_rate = np.where(global_step < warmup_steps, warmup_rate,
learning_rate)
learning_rate = max(learning_rate,min_learn_rate)
return learning_rate
class WarmUpCosineDecayScheduler(keras.callbacks.Callback):
"""
繼承Callback,實現(xiàn)對學習率的調度
"""
def __init__(self,
learning_rate_base,
total_steps,
global_step_init=0,
warmup_learning_rate=0.0,
warmup_steps=0,
hold_base_rate_steps=0,
min_learn_rate=0,
# interval_epoch代表余弦退火之間的最低點
interval_epoch=[0.05, 0.15, 0.30, 0.50],
verbose=0):
super(WarmUpCosineDecayScheduler, self).__init__()
# 基礎的學習率
self.learning_rate_base = learning_rate_base
# 熱調整參數(shù)
self.warmup_learning_rate = warmup_learning_rate
# 參數(shù)顯示
self.verbose = verbose
# learning_rates用于記錄每次更新后的學習率,方便圖形化觀察
self.min_learn_rate = min_learn_rate
self.learning_rates = []
self.interval_epoch = interval_epoch
# 貫穿全局的步長
self.global_step_for_interval = global_step_init
# 用于上升的總步長
self.warmup_steps_for_interval = warmup_steps
# 保持最高峰的總步長
self.hold_steps_for_interval = hold_base_rate_steps
# 整個訓練的總步長
self.total_steps_for_interval = total_steps
self.interval_index = 0
# 計算出來兩個最低點的間隔
self.interval_reset = [self.interval_epoch[0]]
for i in range(len(self.interval_epoch)-1):
self.interval_reset.append(self.interval_epoch[i+1]-self.interval_epoch[i])
self.interval_reset.append(1-self.interval_epoch[-1])
#更新global_step,并記錄當前學習率
def on_batch_end(self, batch, logs=None):
self.global_step = self.global_step + 1
self.global_step_for_interval = self.global_step_for_interval + 1
lr = K.get_value(self.model.optimizer.lr)
self.learning_rates.append(lr)
#更新學習率
def on_batch_begin(self, batch, logs=None):
# 每到一次最低點就重新更新參數(shù)
if self.global_step_for_interval in [0]+[int(i*self.total_steps_for_interval) for i in self.interval_epoch]:
self.total_steps = self.total_steps_for_interval * self.interval_reset[self.interval_index]
self.warmup_steps = self.warmup_steps_for_interval * self.interval_reset[self.interval_index]
self.hold_base_rate_steps = self.hold_steps_for_interval * self.interval_reset[self.interval_index]
self.global_step = 0
self.interval_index += 1
lr = cosine_decay_with_warmup(global_step=self.global_step,
learning_rate_base=self.learning_rate_base,
total_steps=self.total_steps,
warmup_learning_rate=self.warmup_learning_rate,
warmup_steps=self.warmup_steps,
hold_base_rate_steps=self.hold_base_rate_steps,
min_learn_rate = self.min_learn_rate)
K.set_value(self.model.optimizer.lr, lr)
if self.verbose > 0:
print('\nBatch %05d: setting learning '
'rate to %s.' % (self.global_step + 1, lr))
# 載入Mnist手寫數(shù)據(jù)集
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = np.expand_dims(x_train,-1)
x_test = np.expand_dims(x_test,-1)
y_train = y_train
#-----------------------------#
# 創(chuàng)建模型
#-----------------------------#
inputs = Input([28,28,1])
x = Conv2D(32, kernel_size= 5,padding = 'same',activation="relu")(inputs)
x = MaxPooling2D(pool_size = 2, strides = 2, padding = 'same',)(x)
x = Conv2D(64, kernel_size= 5,padding = 'same',activation="relu")(x)
x = MaxPooling2D(pool_size = 2, strides = 2, padding = 'same',)(x)
x = Flatten()(x)
x = Dense(1024)(x)
x = Dense(256)(x)
out = Dense(10, activation='softmax')(x)
model = Model(inputs,out)
# 設定優(yōu)化器,loss,計算準確率
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# 設置訓練參數(shù)
epochs = 10
# 預熱期
warmup_epoch = 2
# 每一次訓練使用多少個Batch
batch_size = 256
# 最大學習率
learning_rate_base = 1e-3
sample_count = len(x_train)
# 總共的步長
total_steps = int(epochs * sample_count / batch_size)
# 預熱步長
warmup_steps = int(warmup_epoch * sample_count / batch_size)
# 學習率
warm_up_lr = WarmUpCosineDecayScheduler(learning_rate_base=learning_rate_base,
total_steps=total_steps,
warmup_learning_rate=1e-5,
warmup_steps=warmup_steps,
hold_base_rate_steps=5,
min_learn_rate=1e-6
)
# 利用fit進行訓練
model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
verbose=1, callbacks=[warm_up_lr])
plt.plot(warm_up_lr.learning_rates)
plt.xlabel('Step', fontsize=20)
plt.ylabel('lr', fontsize=20)
plt.axis([0, total_steps, 0, learning_rate_base*1.1])
plt.grid()
plt.title('Cosine decay with warmup', fontsize=20)
plt.show()
以上就是python神經(jīng)網(wǎng)絡Keras常用學習率衰減匯總的詳細內容,更多關于Keras學習率衰減的資料請關注腳本之家其它相關文章!
相關文章
TensorFlow人工智能學習數(shù)據(jù)合并分割統(tǒng)計示例詳解
這篇文章主要為大家介紹了TensorFlow人工智能學習數(shù)據(jù)合并分割及統(tǒng)計的示例詳解有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進步2021-11-11
pytest-sugar?執(zhí)行過程中顯示進度條的腳本分享
Pytest-sugar是一款用來改善控制臺顯示的插件,增加了進度條顯示,使得在用例執(zhí)行過程中可以看到進度條,而且進度條是根據(jù)用例是否通過標注不同顏色,非常醒目,接下來通過本文給大家分享下pytest?sugar?顯示進度條的腳本,感興趣的朋友一起看看吧2022-12-12

