快捷導(dǎo)航

PyTorch中torch.utils.data.DataLoader實(shí)例詳解

更新時(shí)間：2022年09月27日 09:46:31 作者：進(jìn)擊的程小白

torch.utils.data.DataLoader主要是對(duì)數(shù)據(jù)進(jìn)行batch的劃分,下面這篇文章主要給大家介紹了關(guān)于PyTorch中torch.utils.data.DataLoader的相關(guān)資料,文中通過(guò)實(shí)例代碼介紹的非常詳細(xì),需要的朋友可以參考下

1、dataset：（數(shù)據(jù)類型 dataset）

輸入的數(shù)據(jù)類型,這里是原始數(shù)據(jù)的輸入。PyTorch內(nèi)也有這種數(shù)據(jù)結(jié)構(gòu)。

2、batch_size：（數(shù)據(jù)類型 int）

批訓(xùn)練數(shù)據(jù)量的大小，根據(jù)具體情況設(shè)置即可（默認(rèn)：1）。PyTorch訓(xùn)練模型時(shí)調(diào)用數(shù)據(jù)不是一行一行進(jìn)行的（這樣太沒(méi)效率），而是一捆一捆來(lái)的。這里就是定義每次喂給神經(jīng)網(wǎng)絡(luò)多少行數(shù)據(jù)，如果設(shè)置成1，那就是一行一行進(jìn)行（個(gè)人偏好，PyTorch默認(rèn)設(shè)置是1）。每次是隨機(jī)讀取大小為batch_size。如果dataset中的數(shù)據(jù)個(gè)數(shù)不是batch_size的整數(shù)倍，這最后一次把剩余的數(shù)據(jù)全部輸出。若想把剩下的不足batch size個(gè)的數(shù)據(jù)丟棄，則將drop_last設(shè)置為True，會(huì)將多出來(lái)不足一個(gè)batch的數(shù)據(jù)丟棄。

3、shuffle：（數(shù)據(jù)類型 bool）

洗牌。默認(rèn)設(shè)置為False。在每次迭代訓(xùn)練時(shí)是否將數(shù)據(jù)洗牌，默認(rèn)設(shè)置是False。將輸入數(shù)據(jù)的順序打亂，是為了使數(shù)據(jù)更有獨(dú)立性，但如果數(shù)據(jù)是有序列特征的，就不要設(shè)置成True了。

4、collate_fn：（數(shù)據(jù)類型 callable，沒(méi)見(jiàn)過(guò)的類型）

將一小段數(shù)據(jù)合并成數(shù)據(jù)列表，默認(rèn)設(shè)置是False。如果設(shè)置成True，系統(tǒng)會(huì)在返回前會(huì)將張量數(shù)據(jù)（Tensors）復(fù)制到CUDA內(nèi)存中。

5、batch_sampler：（數(shù)據(jù)類型 Sampler）

批量采樣，默認(rèn)設(shè)置為None。但每次返回的是一批數(shù)據(jù)的索引（注意：不是數(shù)據(jù)）。其和batch_size、shuffle 、sampler and drop_last參數(shù)是不兼容的。我想，應(yīng)該是每次輸入網(wǎng)絡(luò)的數(shù)據(jù)是隨機(jī)采樣模式，這樣能使數(shù)據(jù)更具有獨(dú)立性質(zhì)。所以，它和一捆一捆按順序輸入，數(shù)據(jù)洗牌，數(shù)據(jù)采樣，等模式是不兼容的。

6、sampler：（數(shù)據(jù)類型 Sampler）

采樣，默認(rèn)設(shè)置為None。根據(jù)定義的策略從數(shù)據(jù)集中采樣輸入。如果定義采樣規(guī)則，則洗牌（shuffle）設(shè)置必須為False。

7、num_workers：（數(shù)據(jù)類型 Int）

工作者數(shù)量，默認(rèn)是0。使用多少個(gè)子進(jìn)程來(lái)導(dǎo)入數(shù)據(jù)。設(shè)置為0，就是使用主進(jìn)程來(lái)導(dǎo)入數(shù)據(jù)。注意：這個(gè)數(shù)字必須是大于等于0的，負(fù)數(shù)估計(jì)會(huì)出錯(cuò)。

8、pin_memory：（數(shù)據(jù)類型 bool）

內(nèi)存寄存，默認(rèn)為False。在數(shù)據(jù)返回前，是否將數(shù)據(jù)復(fù)制到CUDA內(nèi)存中。

9、drop_last：（數(shù)據(jù)類型 bool）

丟棄最后數(shù)據(jù)，默認(rèn)為False。設(shè)置了 batch_size 的數(shù)目后，最后一批數(shù)據(jù)未必是設(shè)置的數(shù)目，有可能會(huì)小些。這時(shí)你是否需要丟棄這批數(shù)據(jù)。

10、timeout：（數(shù)據(jù)類型 numeric）

超時(shí)，默認(rèn)為0。是用來(lái)設(shè)置數(shù)據(jù)讀取的超時(shí)時(shí)間的，但超過(guò)這個(gè)時(shí)間還沒(méi)讀取到數(shù)據(jù)的話就會(huì)報(bào)錯(cuò)。所以，數(shù)值必須大于等于0。

11、worker_init_fn（數(shù)據(jù)類型 callable，沒(méi)見(jiàn)過(guò)的類型）

子進(jìn)程導(dǎo)入模式，默認(rèn)為Noun。在數(shù)據(jù)導(dǎo)入前和步長(zhǎng)結(jié)束后，根據(jù)工作子進(jìn)程的ID逐個(gè)按順序?qū)霐?shù)據(jù)。

對(duì)batch_size舉例分析：

"""
    批訓(xùn)練，把數(shù)據(jù)變成一小批一小批數(shù)據(jù)進(jìn)行訓(xùn)練。
    DataLoader就是用來(lái)包裝所使用的數(shù)據(jù)，每次拋出一批數(shù)據(jù)
"""
import torch
import torch.utils.data as Data
 
BATCH_SIZE = 5
 
x = torch.linspace(1, 11, 11)
y = torch.linspace(11, 1, 11)
print(x)
print(y)
# 把數(shù)據(jù)放在數(shù)據(jù)庫(kù)中
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
    # 從數(shù)據(jù)庫(kù)中每次抽出batch size個(gè)樣本
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    # num_workers=2,
)
 
def show_batch():
    for epoch in range(3):
        for step, (batch_x, batch_y) in enumerate(loader):
            # training
            print("steop:{}, batch_x:{}, batch_y:{}".format(step, batch_x, batch_y))
 
if __name__ == '__main__':
    show_batch()

輸出為：

tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.])
tensor([11., 10., 9., 8., 7., 6., 5., 4., 3., 2., 1.])
steop:0, batch_x:tensor([ 3., 2., 8., 11., 1.]), batch_y:tensor([ 9., 10., 4., 1., 11.])
steop:1, batch_x:tensor([ 5., 6., 7., 4., 10.]), batch_y:tensor([7., 6., 5., 8., 2.])
steop:2, batch_x:tensor([9.]), batch_y:tensor([3.])
steop:0, batch_x:tensor([ 9., 7., 10., 2., 4.]), batch_y:tensor([ 3., 5., 2., 10., 8.])
steop:1, batch_x:tensor([ 5., 11., 3., 6., 8.]), batch_y:tensor([7., 1., 9., 6., 4.])
steop:2, batch_x:tensor([1.]), batch_y:tensor([11.])
steop:0, batch_x:tensor([10., 5., 7., 4., 2.]), batch_y:tensor([ 2., 7., 5., 8., 10.])
steop:1, batch_x:tensor([3., 9., 1., 8., 6.]), batch_y:tensor([ 9., 3., 11., 4., 6.])
steop:2, batch_x:tensor([11.]), batch_y:tensor([1.])

Process finished with exit code 0

若drop_last=True

"""
    批訓(xùn)練，把數(shù)據(jù)變成一小批一小批數(shù)據(jù)進(jìn)行訓(xùn)練。
    DataLoader就是用來(lái)包裝所使用的數(shù)據(jù)，每次拋出一批數(shù)據(jù)
"""
import torch
import torch.utils.data as Data
 
BATCH_SIZE = 5
 
x = torch.linspace(1, 11, 11)
y = torch.linspace(11, 1, 11)
print(x)
print(y)
# 把數(shù)據(jù)放在數(shù)據(jù)庫(kù)中
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
    # 從數(shù)據(jù)庫(kù)中每次抽出batch size個(gè)樣本
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    # num_workers=2,
    drop_last=True,
)
 
def show_batch():
    for epoch in range(3):
        for step, (batch_x, batch_y) in enumerate(loader):
            # training
            print("steop:{}, batch_x:{}, batch_y:{}".format(step, batch_x, batch_y))
 
if __name__ == '__main__':
    show_batch()

對(duì)應(yīng)的輸出為：

tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.])
tensor([11., 10., 9., 8., 7., 6., 5., 4., 3., 2., 1.])
steop:0, batch_x:tensor([ 9., 2., 7., 4., 11.]), batch_y:tensor([ 3., 10., 5., 8., 1.])
steop:1, batch_x:tensor([ 3., 5., 10., 1., 8.]), batch_y:tensor([ 9., 7., 2., 11., 4.])
steop:0, batch_x:tensor([ 5., 11., 6., 1., 2.]), batch_y:tensor([ 7., 1., 6., 11., 10.])
steop:1, batch_x:tensor([ 3., 4., 10., 8., 9.]), batch_y:tensor([9., 8., 2., 4., 3.])
steop:0, batch_x:tensor([10., 4., 9., 8., 7.]), batch_y:tensor([2., 8., 3., 4., 5.])
steop:1, batch_x:tensor([ 6., 1., 11., 2., 5.]), batch_y:tensor([ 6., 11., 1., 10., 7.])

Process finished with exit code 0

總結(jié)

到此這篇關(guān)于PyTorch中torch.utils.data.DataLoader的文章就介紹到這了,更多相關(guān)PyTorch torch.utils.data.DataLoader內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: