TensorFlow實現(xiàn)模型斷點訓練,checkpoint模型載入方式

更新時間：2020年05月26日 10:55:20 作者：Sesen_s

這篇文章主要介紹了TensorFlow實現(xiàn)模型斷點訓練,checkpoint模型載入方式，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧

深度學習中，模型訓練一般都需要很長的時間，由于很多原因，導致模型中斷訓練，下面介紹繼續(xù)斷點訓練的方法。

方法一：載入模型時，不必指定迭代次數(shù)，一般默認最新

# 保存模型
saver = tf.train.Saver(max_to_keep=1) # 最多保留最新的模型
 
# 開啟會話
with tf.Session() as sess:
 # saver.restore(sess, './log/' + "model_savemodel.cpkt-" + str(20000))
 sess.run(tf.global_variables_initializer())
 ckpt = tf.train.get_checkpoint_state('./log/') # 注意此處是checkpoint存在的目錄，千萬不要寫成‘./log'
 if ckpt and ckpt.model_checkpoint_path:
 saver.restore(sess,ckpt.model_checkpoint_path) # 自動恢復model_checkpoint_path保存模型一般是最新
 print("Model restored...")
 else:
 print('No Model')

方法二：載入時，指定想要載入模型的迭代次數(shù)

需要到Log文件夾中，查看當前迭代的次數(shù)，如下：此時為111000次。

# 保存模型
saver = tf.train.Saver(max_to_keep=1)
# 開啟會話
 
with tf.Session() as sess:
 saver.restore(sess, './log/' + "model_savemodel.cpkt-" + str(111000))
 sess.run(tf.global_variables_initializer())

載入模型后，會繼續(xù)端點處的變量繼續(xù)訓練，那么是否可以減小剩余的需要的迭代次數(shù)？

模型斷點訓練效果展示：

訓練到167000次后，載入模型重新訓練。設置迭代次數(shù)為10000次，（d_step=1000）。原始設置的迭代的次數(shù)為1000000，已經訓練了167000次。

Model restored...
Iter:0, D_loss:0.5139875411987305, G_loss:2.8023970127105713
Iter:1000, D_loss:0.4400891065597534, G_loss:2.781547784805298
Iter:2000, D_loss:0.5169454216957092, G_loss:2.58009934425354
Iter:3000, D_loss:0.4507023096084595, G_loss:2.584151268005371
Iter:4000, D_loss:0.5746167898178101, G_loss:2.5365757942199707
Iter:5000, D_loss:0.5288565158843994, G_loss:2.426676034927368
Iter:6000, D_loss:0.549595057964325, G_loss:2.820535659790039
Iter:7000, D_loss:0.32620012760162354, G_loss:2.540236473083496
Iter:8000, D_loss:0.4363398551940918, G_loss:2.5880446434020996
Iter:9000, D_loss:0.569464921951294, G_loss:2.5133447647094727
done！

保存的圖片仍然從頭開始編號，會覆蓋掉之前的圖片。

以前對應編號的采樣圖片為：

若有朋友有高見，還請不吝賜教。

補充知識：tensorflow加載訓練好的模型及參數(shù)(讀取checkpoint)

checkpoint 保存路徑

model_path下存有包含多個迭代次數(shù)的模型

1.獲取最新保存的模型

即上圖中的model-9400

import tensorflow as tf

graph=tf.get_default_graph()  # 獲取當前圖
sess=tf.Session()
sess.run(tf.global_variables_initializer())

checkpoint_file=tf.train.latest_checkpoint(model_path)
saver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file))
saver.restore(sess,checkpoint_file)

2.獲取某個迭代次數(shù)的模型

比如上圖中的model-9200

import tensorflow as tf

graph=tf.get_default_graph()  # 獲取當前圖
sess=tf.Session()
sess.run(tf.global_variables_initializer())

checkpoint_file=os.path.join(model_path,'model-9200')
saver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file))
saver.restore(sess,checkpoint_file)

獲取變量值

## 得到當前圖中所有變量的名稱
tensor_name_list=[tensor.name for tensor in graph.as_graph_def().node] 
# 查看所有變量
print(tensor_name_list) 

# 獲取input_x和input_y的變量值
input_x = graph.get_operation_by_name("input_x").outputs[0]
input_y = graph.get_operation_by_name("input_y").outputs[0]

以上這篇TensorFlow實現(xiàn)模型斷點訓練,checkpoint模型載入方式就是小編分享給大家的全部內容了，希望能給大家一個參考，也希望大家多多支持腳本之家。

您可能感興趣的文章: