詳解pytorch 0.4.0遷移指南

更新時間：2019年06月16日 11:38:46 作者：吃不飽吃不飽

這篇文章主要介紹了詳解pytorch 0.4.0遷移指南，小編覺得挺不錯的，現(xiàn)在分享給大家，也給大家做個參考。一起跟隨小編過來看看吧

總說

由于pytorch 0.4版本更新實在太大了, 以前版本的代碼必須有一定程度的更新. 主要的更新在于 Variable和Tensor的合并., 當(dāng)然還有Windows的支持, 其他一些就是支持scalar tensor以及修復(fù)bug和提升性能吧. Variable和Tensor的合并導(dǎo)致以前的代碼會出錯, 所以需要遷移, 其實遷移代價并不大.

Tensor和Variable的合并

說是合并, 其實是按照以前(0.1-0.3版本)的觀點是: Tensor現(xiàn)在默認(rèn)requires_grad=False的Variable了.torch.Tensor和torch.autograd.Variable現(xiàn)在其實是同一個類! 沒有本質(zhì)的區(qū)別! 所以也就是說,現(xiàn)在已經(jīng)沒有純粹的Tensor了, 是個Tensor, 它就支持自動求導(dǎo)!你現(xiàn)在要不要給Tensor包一下Variable, 都沒有任何意義了.

查看Tensor的類型

使用.isinstance()或是x.type(), 用type()不能看tensor的具體類型.

>>> x = torch.DoubleTensor([1, 1, 1])
>>> print(type(x)) # was torch.DoubleTensor
"<class 'torch.Tensor'>"
>>> print(x.type()) # OK: 'torch.DoubleTensor'
'torch.DoubleTensor'
>>> print(isinstance(x, torch.DoubleTensor)) # OK: True
True

requires_grad 已經(jīng)是Tensor的一個屬性了

>>> x = torch.ones(1)
>>> x.requires_grad #默認(rèn)是False
False
>>> y = torch.ones(1)
>>> z = x + y
>>> # 顯然z的該屬性也是False
>>> z.requires_grad
False
>>> # 所有變量都不需要grad, 所以會出錯
>>> z.backward()
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
>>>
>>> # 可以將`requires_grad`作為一個參數(shù), 構(gòu)造tensor
>>> w = torch.ones(1, requires_grad=True)
>>> w.requires_grad
True
>>> total = w + z
>>> total.requires_grad
True
>>> # 現(xiàn)在可以backward了
>>> total.backward()
>>> w.grad
tensor([ 1.])
>>> # x,y,z都是不需要梯度的,他們的grad也沒有計算
>>> z.grad == x.grad == y.grad == None
True

通過.requires_grad()來進行使得Tensor需要梯度.

不要隨便用.data

以前.data是為了拿到Variable中的Tensor,但是后來, 兩個都合并了. 所以.data返回一個新的requires_grad=False的Tensor!然而新的這個Tensor與以前那個Tensor是共享內(nèi)存的. 所以不安全, 因為

y = x.data # x需要進行autograd
# y和x是共享內(nèi)存的,但是這里y已經(jīng)不需要grad了, 
# 所以會導(dǎo)致本來需要計算梯度的x也沒有梯度可以計算.從而x不會得到更新!

所以, 推薦用x.detach(), 這個仍舊是共享內(nèi)存的, 也是使得y的requires_grad為False,但是,如果x需要求導(dǎo), 仍舊是可以自動求導(dǎo)的!

scalar的支持

這個非常重要啊!以前indexing一個一維Tensor,返回的是一個number類型,但是indexing一個Variable確實返回一個size為(1,)的vector.再比如一些reduction操作, 比如tensor.sum()返回一個number, 但是variable.sum()返回的是一個size為(1,)的vector.

scalar是0-維度的Tensor, 所以我們不能簡單的用以前的方法創(chuàng)建, 我們用一個torch.tensor注意,是小寫的!

y = x.data # x需要進行autograd
# y和x是共享內(nèi)存的,但是這里y已經(jīng)不需要grad了, 
# 所以會導(dǎo)致本來需要計算梯度的x也沒有梯度可以計算.從而x不會得到更新!

從上面例子可以看出, 通過引入scalar, 可以將返回值的類型進行統(tǒng)一.
重點:
1. 取得一個tensor的值(返回number), 用.item()
2. 創(chuàng)建scalar的話,需要用torch.tensor(number)
3.torch.tensor(list)也可以進行創(chuàng)建tensor

累加loss

以前了累加loss(為了看loss的大小)一般是用total_loss+=loss.data[0], 比較詭異的是, 為啥是.data[0]? 這是因為, 這是因為loss是一個Variable, 所以以后累加loss, 用loss.item().
這個是必須的, 如果直接加, 那么隨著訓(xùn)練的進行, 會導(dǎo)致后來的loss具有非常大的graph, 可能會超內(nèi)存. 然而total_loss只是用來看的, 所以沒必要進行維持這個graph!

棄用volatile

現(xiàn)在這個flag已經(jīng)沒用了. 被替換成torch.no_grad(),torch.set_grad_enable(grad_mode)等函數(shù)

>>> x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
...   y = x * 2
>>> y.requires_grad
False
>>>
>>> is_train = False
>>> with torch.set_grad_enabled(is_train):
...   y = x * 2
>>> y.requires_grad
False
>>> torch.set_grad_enabled(True) # this can also be used as a function
>>> y = x * 2
>>> y.requires_grad
True
>>> torch.set_grad_enabled(False)
>>> y = x * 2
>>> y.requires_grad
False

dypes,devices以及numpy-style的構(gòu)造函數(shù)

dtype是data types, 對應(yīng)關(guān)系如下:

通過.dtype可以得到

其他就是以前寫device type都是用.cup()或是.cuda(), 現(xiàn)在獨立成一個函數(shù), 我們可以

>>> device = torch.device("cuda:1")
>>> x = torch.randn(3, 3, dtype=torch.float64, device=device)
tensor([[-0.6344, 0.8562, -1.2758],
    [ 0.8414, 1.7962, 1.0589],
    [-0.1369, -1.0462, -0.4373]], dtype=torch.float64, device='cuda:1')
>>> x.requires_grad # default is False
False
>>> x = torch.zeros(3, requires_grad=True)
>>> x.requires_grad
True

新的創(chuàng)建Tensor方法

主要是可以指定dtype以及device.

>>> device = torch.device("cuda:1")
>>> x = torch.randn(3, 3, dtype=torch.float64, device=device)
tensor([[-0.6344, 0.8562, -1.2758],
    [ 0.8414, 1.7962, 1.0589],
    [-0.1369, -1.0462, -0.4373]], dtype=torch.float64, device='cuda:1')
>>> x.requires_grad # default is False
False
>>> x = torch.zeros(3, requires_grad=True)
>>> x.requires_grad
True

用 torch.tensor來創(chuàng)建Tensor

這個等價于numpy.array,用途:
1.將python list的數(shù)據(jù)用來創(chuàng)建Tensor
2. 創(chuàng)建scalar

# 從列表中, 創(chuàng)建tensor
>>> cuda = torch.device("cuda")
>>> torch.tensor([[1], [2], [3]], dtype=torch.half, device=cuda)
tensor([[ 1],
    [ 2],
    [ 3]], device='cuda:0')

>>> torch.tensor(1)        # 創(chuàng)建scalar
tensor(1)

torch.*like以及torch.new_*

第一個是可以創(chuàng)建, shape相同, 數(shù)據(jù)類型相同.

 >>> x = torch.randn(3, dtype=torch.float64)
 >>> torch.zeros_like(x)
 tensor([ 0., 0., 0.], dtype=torch.float64)
 >>> torch.zeros_like(x, dtype=torch.int)
 tensor([ 0, 0, 0], dtype=torch.int32)

當(dāng)然如果是單純想要得到屬性與前者相同的Tensor, 但是shape不想要一致:

>>> x = torch.randn(3, dtype=torch.float64)
 >>> x.new_ones(2) # 屬性一致
 tensor([ 1., 1.], dtype=torch.float64)
 >>> x.new_ones(4, dtype=torch.int)
 tensor([ 1, 1, 1, 1], dtype=torch.int32)

書寫 device-agnostic 的代碼

這個含義是, 不要顯示的指定是gpu, cpu之類的. 利用.to()來執(zhí)行.

# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)

遷移代碼對比

以前的寫法

model = MyRNN()
 if use_cuda:
   model = model.cuda()

 # train
 total_loss = 0
 for input, target in train_loader:
   input, target = Variable(input), Variable(target)
   hidden = Variable(torch.zeros(*h_shape)) # init hidden
   if use_cuda:
     input, target, hidden = input.cuda(), target.cuda(), hidden.cuda()
   ... # get loss and optimize
   total_loss += loss.data[0]

 # evaluate
 for input, target in test_loader:
   input = Variable(input, volatile=True)
   if use_cuda:
     ...
   ...

現(xiàn)在的寫法

 # torch.device object used throughout this script
 device = torch.device("cuda" if use_cuda else "cpu")

 model = MyRNN().to(device)

 # train
 total_loss = 0
 for input, target in train_loader:
   input, target = input.to(device), target.to(device)
   hidden = input.new_zeros(*h_shape) # has the same device & dtype as `input`
   ... # get loss and optimize
   total_loss += loss.item()      # get Python number from 1-element Tensor

 # evaluate
 with torch.no_grad():          # operations inside don't track history
   for input, target in test_loader:
     ...

以上就是本文的全部內(nèi)容，希望對大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章: