使用pytorch讀取數(shù)據(jù)集

更新時(shí)間：2022年05月18日 11:36:20 作者：日常搬磚xbw

這篇文章主要介紹了使用pytorch讀取數(shù)據(jù)集，具有很好的參考價(jià)值，希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方，望不吝賜教

pytorch讀取數(shù)據(jù)集

使用pytorch讀取數(shù)據(jù)集一般有三種情況

第一種

讀取官方給的數(shù)據(jù)集，例如Imagenet，CIFAR10，MNIST等

這些庫(kù)調(diào)用torchvision.datasets.XXXX()即可，例如想要讀取MNIST數(shù)據(jù)集

import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
train_data = torchvision.datasets.MNIST(
    root='./mnist/',
    train=True,                                     # this is training data
    transform=torchvision.transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to
                                                    # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
    download=True,
)

這樣就會(huì)自動(dòng)從網(wǎng)上下載MNIST數(shù)據(jù)集，并且以保存好的數(shù)據(jù)格式來(lái)讀取

然后直接定義DataLoader的一個(gè)對(duì)象，就可以進(jìn)行訓(xùn)練了

train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
for epoch in range(EPOCH):
    for step, (b_x, b_y) in enumerate(train_loader):   # gives batch data, normalize x when iterate train_loader
    	XXXX
    	XXXX

第二種

這種就比較常用了，針對(duì)圖像的分類(lèi)問(wèn)題

適用情況是，對(duì)于圖片的多分類(lèi)問(wèn)題，圖片按照指定的格式來(lái)存放：

根路徑/類(lèi)別（標(biāo)簽label）/圖片

按照上面的格式來(lái)存放圖片，根路徑下面保存了許多文件夾，每個(gè)文件夾中存放了某一類(lèi)的圖片，并且文件夾名就是類(lèi)的映射，例如這樣，根目錄就是learn_pytorch，下面的每個(gè)文件夾代表一個(gè)類(lèi)，類(lèi)的名字隨便命名，在訓(xùn)練過(guò)程中會(huì)自動(dòng)被映射成0，1，2，3…

在這里插入圖片描述

保存成這樣的格式之后，就可以直接利用pytorch定義好的派生類(lèi)ImageFolder來(lái)讀取了，ImageFolder其實(shí)就是Dataset的派生類(lèi)，專(zhuān)門(mén)被定義來(lái)讀取特定格式的圖片的，它也是 torchvision庫(kù)幫我們方便使用的，比如這樣

然后就可以作為DataLoader的數(shù)據(jù)集輸入用了

from torchvision.datasets import ImageFolder
data_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5,0.5,0.5], std=[0.5, 0.5, 0.5])
])
dataset = ImageFolder("/home/xxx/learn_pytorch/",transform = data_transform)
train_loader = Data.DataLoader(dataset=dataset, batch_size=BATCH_SIZE, shuffle=True)

它的構(gòu)造函數(shù)要求輸入兩個(gè)參數(shù)，一個(gè)根目錄，一個(gè)對(duì)數(shù)據(jù)的操作，因?yàn)閳D片被自動(dòng)讀取成PILimage數(shù)據(jù)格式，因此Totensor()必不可少，而且可以用transforms.Compose把許多操作合成一個(gè)參數(shù)輸入，就能實(shí)現(xiàn)數(shù)據(jù)增強(qiáng)，非常方便。上面的例子是先轉(zhuǎn)成tensor，然后歸一化，沒(méi)做數(shù)據(jù)增強(qiáng)的各種操作。如果要數(shù)據(jù)增強(qiáng)，可以再加一些裁剪、反轉(zhuǎn)之類(lèi)的，都可以。比如下面的

transforms.RandomSizedCrop
transforms.RandomHorizontalFlip()

還有一個(gè)問(wèn)題是，如何知道文件夾名被映射成了什么標(biāo)簽，這個(gè)可以直接查看定義的對(duì)象的class_to_idx屬性

這個(gè)ImageFolder產(chǎn)生的dataset對(duì)象，第一維就是第幾張圖片，第二維元素0是圖片矩陣元素1是label

在這里插入圖片描述

接下來(lái)就是建立模型+訓(xùn)練了

訓(xùn)練的過(guò)程和第一種一樣

在這里插入圖片描述

第三種

這種情況是最通用的，適用于不是分類(lèi)問(wèn)題，或者標(biāo)簽不是簡(jiǎn)單的文件名的映射

思路就是自己定義一個(gè)Dataset的派生類(lèi),并且對(duì)數(shù)據(jù)的處理、數(shù)據(jù)增強(qiáng)之類(lèi)的都需要自己定義，這些定義的時(shí)候利用__call_()就可以了

實(shí)現(xiàn)過(guò)程是：

首先

定義一個(gè)Dataset的派生類(lèi)，這個(gè)派生類(lèi)目標(biāo)是重載兩個(gè)魔法方法 __ len __ ()，__ getitem__()

__ len __ () 函數(shù)是在調(diào)用 len(對(duì)象)的時(shí)候會(huì)被調(diào)用并返回，重載的目的是，在調(diào)用的時(shí)候返回?cái)?shù)據(jù)集的大小
__getitem __() 函數(shù)可讓對(duì)象編程可迭代的，定義了它之后就可以使得對(duì)像被for語(yǔ)句迭代，重載它的目的是能夠使得它每次都迭代返回?cái)?shù)據(jù)集的一個(gè)樣本

現(xiàn)在定義一個(gè)派生類(lèi)

class FaceLandmarksDataset(Dataset):
    """Face Landmarks dataset."""
    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform
    def __len__(self):
        return len(self.landmarks_frame)
    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx, 1:].as_matrix()
        landmarks = landmarks.astype('float').reshape(-1, 2)
        sample = {'image': image, 'landmarks': landmarks}
        if self.transform:
            sample = self.transform(sample)
        return sample

構(gòu)造函數(shù)就是定義了一些屬性，例如讀取出保存整個(gè)數(shù)據(jù)集的表格，然后len就是返回了數(shù)據(jù)集的數(shù)目，getitem則是定義了迭代返回一個(gè)數(shù)據(jù)集樣本，返回值可以是包含訓(xùn)練樣本和標(biāo)簽的list，也可以是字典，根據(jù)這個(gè)不同后面的用法也回不太一樣（無(wú)非就是索引是數(shù)字還是key的區(qū)別）

除此之外，Dataset一般還會(huì)要求輸入對(duì)數(shù)據(jù)集的操作，要是不想數(shù)據(jù)增強(qiáng)，就加個(gè)ToTensor就可以（因?yàn)橐D(zhuǎn)換成tensor才能訓(xùn)練），要是想數(shù)據(jù)增強(qiáng)就自己加一些新的類(lèi)（沒(méi)錯(cuò)，ToTensor、各種數(shù)據(jù)增強(qiáng)的函數(shù)其實(shí)都是一個(gè)類(lèi)，然后定義的一個(gè)對(duì)象），接著用transforms.Compose把他們連在一起就可以了。上面的transform寫(xiě)的是None，就是不進(jìn)行數(shù)據(jù)處理，直接輸出

然后實(shí)例化這個(gè)類(lèi)，就可以作為DataLoader的參數(shù)輸入了

face_dataset = FaceLandmarksDataset(csv_file='faces/face_landmarks.csv',
                                    root_dir='faces/')

這時(shí)候分析一下這個(gè)對(duì)象，定義它的參數(shù)就是init構(gòu)造函數(shù)需要的，然后對(duì)他進(jìn)行迭代的時(shí)候會(huì)自動(dòng)調(diào)用getitem 例如下面的操作結(jié)果是

for i in range(len(face_dataset)):
    sample = face_dataset[i]
    print(sample['image'])
    print(i,sample['image'].shape, sample['landmarks'].shape)

在這里插入圖片描述

可以看到每次迭代的時(shí)候都會(huì)輸入一個(gè)字典

接下來(lái)定義一下DataLoader，就可以去迭代輸入了，當(dāng)然這里還不行，因?yàn)樾枰獙?shù)據(jù)集轉(zhuǎn)換成tensor才能輸入到模型進(jìn)行訓(xùn)練

那么接下來(lái)就是考慮剛才那個(gè)DataSet類(lèi)里的transform怎么改，最初給的是None,不做處理，因此出來(lái)的還是ImageArray,至少要實(shí)現(xiàn)ToTensor才行。

實(shí)現(xiàn)ToTensor這個(gè)類(lèi)就主要用到了 __call __()魔法函數(shù)

__ call__()函數(shù)比較特殊，可以讓對(duì)象本身變成可調(diào)用的，可以后面加括號(hào)并輸入?yún)?shù)，然后就會(huì)自動(dòng)調(diào)用call這個(gè)魔法函數(shù)

Totensor類(lèi)的實(shí)現(xiàn)如下，注意numpy和tensor數(shù)組區(qū)別在一個(gè)通道數(shù)在后，一個(gè)通道數(shù)在前，因此還需要交換不同維度的位置

class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""
    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']
        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose((2, 0, 1))
        return {'image': torch.from_numpy(image),
                'landmarks': torch.from_numpy(landmarks)}

使用的時(shí)候先定義一個(gè)對(duì)象，然后對(duì)象（參數(shù)）就會(huì)自動(dòng)調(diào)用call函數(shù)了

再看幾個(gè)數(shù)據(jù)增強(qiáng)的類(lèi)的實(shí)現(xiàn)，它們所有的相似點(diǎn)都是，call函數(shù)的參數(shù)都是sample，也就是輸入的數(shù)據(jù)集

class Rescale(object):
    """Rescale the image in a sample to a given size.
    Args:
        output_size (tuple or int): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.
    """
    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size
    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']
        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size
        new_h, new_w = int(new_h), int(new_w)
        img = transform.resize(image, (new_h, new_w))
        # h and w are swapped for landmarks because for images,
        # x and y axes are axis 1 and 0 respectively
        landmarks = landmarks * [new_w / w, new_h / h]
        return {'image': img, 'landmarks': landmarks}
class RandomCrop(object):
    """Crop randomly the image in a sample.
    Args:
        output_size (tuple or int): Desired output size. If int, square crop
            is made.
    """
    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        if isinstance(output_size, int):
            self.output_size = (output_size, output_size)
        else:
            assert len(output_size) == 2
            self.output_size = output_size
    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']
        h, w = image.shape[:2]
        new_h, new_w = self.output_size
        top = np.random.randint(0, h - new_h)
        left = np.random.randint(0, w - new_w)
        image = image[top: top + new_h,
                      left: left + new_w]
        landmarks = landmarks - [left, top]
        return {'image': image, 'landmarks': landmarks}

這兩個(gè)就很清晰了，首先是構(gòu)造函數(shù)要求在定義對(duì)象的時(shí)候輸入?yún)?shù)，接著再用call實(shí)現(xiàn)直接調(diào)用對(duì)象。

用的時(shí)候就可以

transformed_dataset = FaceLandmarksDataset(csv_file='faces/face_landmarks.csv',
                                           root_dir='faces/',
                                           transform=transforms.Compose([
                                               Rescale(256),
                                               RandomCrop(224),
                                               ToTensor()
                                           ]))
for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]
    print(i, sample['image'].size(), sample['landmarks'].size())
    if i == 3:
        break

分析一下，首先定義重載DataSet類(lèi)的對(duì)象，transform參數(shù)寫(xiě)成上面定義的三個(gè)操作類(lèi)的組合，回頭去看這個(gè)類(lèi)的定義

        self.transform = transform

上面就定義了一個(gè)三個(gè)類(lèi)聯(lián)合起來(lái)的對(duì)象

        if self.transform:
            sample = self.transform(sample)

然后直接調(diào)用該對(duì)象，調(diào)用了三個(gè)類(lèi)的call函數(shù)，就返回了處理后的數(shù)據(jù)集了

最后終于可以迭代訓(xùn)練了

在這里插入圖片描述

dataloader = DataLoader(transformed_dataset, batch_size=4, shuffle=True, num_workers=4)

定義一個(gè)DataLoader的對(duì)象，剩下的用法就和第二種的一樣，兩重循環(huán)進(jìn)行訓(xùn)練了，這個(gè)DataLoader也有點(diǎn)技巧，就是每次對(duì)它迭代的時(shí)候，返回的還是DataSet類(lèi)對(duì)象返回值的形式，但是里面的內(nèi)容又在前面加了一個(gè)維度，大小就是batch_size，也就是說(shuō)，DataLoader對(duì)象調(diào)用的時(shí)候每次從迭代器里取出來(lái)batch_size個(gè)樣本，并把它們堆疊起來(lái)（這個(gè)堆疊是在列表/字典內(nèi)堆疊的），每次迭代出來(lái)的內(nèi)容還都是一個(gè)字典/數(shù)組

pytorch學(xué)習(xí)記錄

這是我隨便搭的一個(gè)簡(jiǎn)單模型，測(cè)試一下

import os
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt
from torchvision import transforms
from torchvision.datasets import ImageFolder
import matplotlib.pyplot as plt
%matplotlib inline
#定義幾個(gè)參數(shù)
EPOCH = 20
BATCH_SIZE = 4
LR = 0.001
#讀取數(shù)據(jù)
data_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5,0.5,0.5], std=[0.5, 0.5, 0.5])
])
dataset = ImageFolder("/home/xxx/learn_pytorch/",transform = data_transform)
print(dataset[0][0].size())
print(dataset.class_to_idx)
#定義
train_loader = Data.DataLoader(dataset=dataset, batch_size=BATCH_SIZE, shuffle=True)
#定義模型類(lèi)，是 nn.Module的繼承類(lèi)，思路是先把每個(gè)層都定義出來(lái)，每個(gè)都是模型類(lèi)的屬性，然后再定義一個(gè)成員函數(shù)forward()作為前向傳播過(guò)程，就可以把每個(gè)層連起來(lái)了，通過(guò)這個(gè)就搭好了整個(gè)模型
class CNN(nn.Module):
    def __init__(self):
        super(CNN,self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(3,16,5,1,2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
        )
        self.conv2 = nn.Sequential(         
            nn.Conv2d(16, 32, 5, 1, 2),     
            nn.ReLU(),                      
            nn.MaxPool2d(2),                
        )
        self.conv3 = nn.Sequential(         
            nn.Conv2d(32, 64, 5, 1, 2),     
            nn.ReLU(),                      
            nn.MaxPool2d(2),                
        )
        self.conv4 = nn.Sequential(         
            nn.Conv2d(64, 128, 5, 1, 2),     
            nn.ReLU(),                      
            nn.MaxPool2d(2),                
        )
        self.out1 = nn.Sequential(
            nn.Linear(128*16*30, 1000),
            nn.ReLU(),
        )
        self.out2 = nn.Sequential(
            nn.Linear(1000, 100),
            nn.ReLU(),
        )
        self.out3 = nn.Sequential(
            nn.Linear(100, 4),
        )
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = x.view(x.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
        x = self.out1(x)
        x = self.out2(x)
        output = self.out3(x)
        return output, x    # return x for visualization
#如果使用GPU訓(xùn)練要把模型和tensor放到GPU上，通過(guò).cuda來(lái)實(shí)現(xiàn)
cnn = CNN().cuda()
print(cnn)
#定義優(yōu)化器對(duì)象、損失函數(shù)
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)   # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss()                       # the target label is not one-hotted
#二重循環(huán)開(kāi)始訓(xùn)練，外層循環(huán)是迭代次數(shù)，第二重循環(huán)就是每次對(duì)batch_size的數(shù)據(jù)讀取并訓(xùn)練
for epoch in range(EPOCH):
    accy_count = 0
    for step,(b_x,b_y) in enumerate(train_loader):
        output = cnn(b_x.cuda())[0]
        loss = loss_func(output,b_y.cuda())     #carcute loss
        optimizer.zero_grad()           #clear gradient
        loss.backward()                 #sovel gradient
        optimizer.step()                #gradient sovel
        output_index = torch.max(output,1)[1].cpu().data.numpy()
        accy_count += float((output_index==b_y.data.numpy()).astype(int).sum())
    accuracy = accy_count/(BATCH_SIZE * train_loader.__len__())
    print("Epoch:",epoch," accuracy is: ",accuracy)