Pytorch實(shí)驗(yàn)常用代碼段匯總

更新時(shí)間：2020年11月19日 09:32:56 作者：Gelthin

這篇文章主要介紹了Pytorch實(shí)驗(yàn)常用代碼段匯總,文中通過示例代碼介紹的非常詳細(xì)，對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下

1. 大幅度提升 Pytorch 的訓(xùn)練速度

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.backends.cudnn.benchmark = True

但加了這一行，似乎運(yùn)行結(jié)果不一樣了。

2. 把原有的記錄文件加個(gè)后綴變?yōu)?.bak 文件，避免直接覆蓋

# from co-teaching train codetxtfile = save_dir + "/" + model_str + "_%s.txt"%str(args.optimizer)  ## good job！
nowTime=datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S')
if os.path.exists(txtfile):
  os.system('mv %s %s' % (txtfile, txtfile+".bak-%s" % nowTime)) # bakeup 備份文件

3. 計(jì)算 Accuracy 返回list, 調(diào)用函數(shù)時(shí)，直接提取值，而非提取list

# from co-teaching code but MixMatch_pytorch code also has itdef accuracy(logit, target, topk=(1,)):
  """Computes the precision@k for the specified values of k"""
  output = F.softmax(logit, dim=1) # but actually not need it 
  maxk = max(topk)
  batch_size = target.size(0)

  _, pred = output.topk(maxk, 1, True, True) # _, pred = logit.topk(maxk, 1, True, True)
  pred = pred.t()
  correct = pred.eq(target.view(1, -1).expand_as(pred))

  res = []
  for k in topk:
    correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
    res.append(correct_k.mul_(100.0 / batch_size)) # it seems this is a bug, when not all batch has same size, the mean of accuracy of each batch is not the mean of accu of all dataset
  return res

prec1, = accuracy(logit, labels, topk=(1,)) # , indicate tuple unpackage
prec1, prec5 = accuracy(logits, labels, topk=(1, 5))

4. 善于利用 logger 文件來記錄每一個(gè) epoch 的實(shí)驗(yàn)值

# from Pytorch_MixMatch codeclass Logger(object):
  '''Save training process to log file with simple plot function.'''
  def __init__(self, fpath, title=None, resume=False): 
    self.file = None
    self.resume = resume
    self.title = '' if title == None else title
    if fpath is not None:
      if resume: 
        self.file = open(fpath, 'r') 
        name = self.file.readline()
        self.names = name.rstrip().split('\t')
        self.numbers = {}
        for _, name in enumerate(self.names):
          self.numbers[name] = []

        for numbers in self.file:
          numbers = numbers.rstrip().split('\t')
          for i in range(0, len(numbers)):
            self.numbers[self.names[i]].append(numbers[i])
        self.file.close()
        self.file = open(fpath, 'a') 
      else:
        self.file = open(fpath, 'w')

  def set_names(self, names):
    if self.resume: 
      pass
    # initialize numbers as empty list
    self.numbers = {}
    self.names = names
    for _, name in enumerate(self.names):
      self.file.write(name)
      self.file.write('\t')
      self.numbers[name] = []
    self.file.write('\n')
    self.file.flush()


  def append(self, numbers):
    assert len(self.names) == len(numbers), 'Numbers do not match names'
    for index, num in enumerate(numbers):
      self.file.write("{0:.4f}".format(num))
      self.file.write('\t')
      self.numbers[self.names[index]].append(num)
    self.file.write('\n')
    self.file.flush()

  def plot(self, names=None):  
    names = self.names if names == None else names
    numbers = self.numbers
    for _, name in enumerate(names):
      x = np.arange(len(numbers[name]))
      plt.plot(x, np.asarray(numbers[name]))
    plt.legend([self.title + '(' + name + ')' for name in names])
    plt.grid(True)

  def close(self):
    if self.file is not None:
      self.file.close()
# usage
logger = Logger(new_folder+'/log_for_%s_WebVision1M.txt'%data_type, title=title)
logger.set_names(['epoch', 'val_acc', 'val_acc_ImageNet'])
for epoch in range(100):
  logger.append([epoch, val_acc, val_acc_ImageNet])
logger.close()

5. 利用 argparser 命令行工具來進(jìn)行代碼重構(gòu)，使用不同參數(shù)適配不同數(shù)據(jù)集，不同優(yōu)化方式，不同setting，避免多個(gè)高度冗余的重復(fù)代碼

# argparser 命令行工具有一個(gè)坑的地方是，無法設(shè)置 bool 變量， flag=FALSE, 然后會(huì)解釋為字符串，仍然當(dāng)做 True

發(fā)現(xiàn)可以使用如下命令來進(jìn)行修補(bǔ)，來自 ICML-19-SGC github 上代碼

parser.add_argument('--test', action='store_true', default=False, help='inductive training.')

當(dāng)命令行出現(xiàn) test 字樣時(shí)，則為 args.test = true

若未出現(xiàn) test 字樣，則為 args.test = false

6. 使用shell 變量來設(shè)置所使用的顯卡，便于利用shell 腳本進(jìn)行程序的串行，從而掛起來跑?；蛘叨嚅_幾個(gè) screen 進(jìn)行同一張卡上多個(gè)程序并行跑，充分利用顯卡的內(nèi)存。

命令行中使用如下語句，或者把語句寫在 shell 腳本中 # 不要忘了 export

export CUDA_VISIBLE_DEVICES=1 #設(shè)置當(dāng)前可用顯卡為編號為1的顯卡（從 0 開始編號），即不在 0 號上跑
export CUDA_VISIBlE_DEVICES=0,1 # 設(shè)置當(dāng)前可用顯卡為 0,1 顯卡，當(dāng) 0 用滿后，就會(huì)自動(dòng)使用 1 顯卡

一般經(jīng)驗(yàn)，即使多個(gè)程序并行跑時(shí)，即使顯存完全足夠，單個(gè)程序的速度也會(huì)變慢，這可能是由于還有 cpu 和內(nèi)存的限制。

這里顯存占用不是阻礙，應(yīng)該主要看GPU 利用率（也就是計(jì)算單元的使用，如果達(dá)到了 99% 就說明程序過多了。）

使用 watch nvidia-smi 來監(jiān)測每個(gè)程序當(dāng)前是否在正常跑。

7. 使用 python 時(shí)間戳來保存并進(jìn)行區(qū)別不同的 result 文件

　　參照自己很早之前寫的 co-training 的代碼

8. 把訓(xùn)練時(shí) 命令行窗口的 print 輸出全部保存到一個(gè) log 文件：（參照 DIEN）

mkdir dnn_save_path
mkdir dnn_best_model
CUDA_VISIBLE_DEVICES=0 /usr/bin/python2.7 script/train.py train DIEN >train_dein2.log 2>&1 &

并且使用如下命令 | tee 命令則可以同時(shí)保存到文件并且寫到命令行輸出：

python script/train.py train DIEN | tee train_dein2.log

9. git clone 可以用來下載 github 上的代碼，更快。（由 DIEN 的下載）

git clone https://github.com/mouna99/dien.git 使用這個(gè)命令可以下載 github 上的代碼庫

10. (來自 DIEN ) 對于命令行參數(shù)不一定要使用 argparser 來讀取，也可以直接使用 sys.argv 讀取，不過這樣的話，就無法指定關(guān)鍵字參數(shù)，只能使用位置參數(shù)。

### run.sh ###
CUDA_VISIBLE_DEVICES=0 /usr/bin/python2.7 script/train.py train DIEN >train_dein2.log 2>&1 &
#############

if __name__ == '__main__':
  if len(sys.argv) == 4:
    SEED = int(sys.argv[3]) # 0,1,2,3
  else:
    SEED = 3
  tf.set_random_seed(SEED)
  numpy.random.seed(SEED)
  random.seed(SEED)
  if sys.argv[1] == 'train':
    train(model_type=sys.argv[2], seed=SEED)
  elif sys.argv[1] == 'test':
    test(model_type=sys.argv[2], seed=SEED)
  else:
    print('do nothing...')

11.代碼的一種邏輯：time_point 是一個(gè)參數(shù)變量，可以有兩種方案來處理

一種直接在外面判斷：

#適用于輸出變量的個(gè)數(shù)不同的情況
if time_point：
　　A, B, C = f1(x, y, time_point=True)
else:
　　A, B = f1(x, y, time_point=False)
# 適用于輸出變量個(gè)數(shù)和類型相同的情況
C, D = f2(x, y, time_point=time_point)

12. 寫一個(gè) shell 腳本文件來進(jìn)行調(diào)節(jié)超參數(shù)，來自 [NIPS-20 Grand]

mkdir cora
for num in $(seq 0 99) do
　　python train_grand.py --hidden 32 --lr 0.01 --patience 200 --seed $num --dropnode_rate 0.5 > cora/"$num".txt
done

13. 使用或者不使用 cuda 運(yùn)行結(jié)果可能會(huì)不一樣，有細(xì)微差別。

cuda 也有一個(gè)相關(guān)的隨機(jī)數(shù)種子的參數(shù)，當(dāng)不使用 cuda 時(shí)，這一個(gè)隨機(jī)數(shù)種子沒有起到作用，因此可能會(huì)得到不同的結(jié)果。

來自 NIPS-20 Grand （2020.11.18）的實(shí)驗(yàn)結(jié)果發(fā)現(xiàn)。

以上就是本文的全部內(nèi)容，希望對大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章: