Python深度學(xué)習(xí)albumentations數(shù)據(jù)增強(qiáng)庫

更新時(shí)間：2021年09月30日 09:39:55 作者：算法菜鳥飛高高

下面開始albumenations的正式介紹，在這里我強(qiáng)烈建議英語基礎(chǔ)還好的讀者去官方網(wǎng)站跟著教程一步步學(xué)習(xí)，而這里的內(nèi)容主要是我自己的一個(gè)總結(jié)以及方便英語能力較弱的讀者學(xué)習(xí)

數(shù)據(jù)增強(qiáng)的必要性

深度學(xué)習(xí)在最近十年得以風(fēng)靡得益于計(jì)算機(jī)算力的提高以及數(shù)據(jù)資源獲取的難度下降。一個(gè)好的深度模型往往需要大量具有l(wèi)abel的數(shù)據(jù)，使得模型能夠很好的學(xué)習(xí)這種數(shù)據(jù)的分布。而給數(shù)據(jù)打標(biāo)簽往往是一件耗時(shí)耗力的工作。
拿cv里的經(jīng)典任務(wù)為例，classification需要人準(zhǔn)確識別物品類別或者生物種類，object detection需要人工畫出bounding box，確定其坐標(biāo)，semantic segmentation甚至需要在像素級別進(jìn)行標(biāo)簽標(biāo)注。對于一些專業(yè)領(lǐng)域的圖像標(biāo)注，依賴于專業(yè)人士的知識素養(yǎng)（例如醫(yī)療，遙感等），這無疑對有標(biāo)簽數(shù)據(jù)的收集帶來了麻煩。

那么有沒有什么方法能夠在數(shù)據(jù)集規(guī)模很小的情況，盡可能提高模型的表現(xiàn)力呢？

1.transfer learning或者說是domain adaptation，這種方法期望降低源域與目標(biāo)域之間的數(shù)據(jù)分布差異，使得具有大量標(biāo)注數(shù)據(jù)的源域幫助提升模型的訓(xùn)練效果。

2.對現(xiàn)有數(shù)據(jù)進(jìn)行數(shù)據(jù)增強(qiáng)深度學(xué)習(xí)能夠?qū)W習(xí)到的空間不變性，像素級別的不變性特征都有限。所以對圖片進(jìn)行平移，縮放，旋轉(zhuǎn)，改變色調(diào)值等方法，可以使得模型見過各種類型的數(shù)據(jù)，提高模型在測試數(shù)據(jù)上的判別力。

albumentations

上面我只是籠統(tǒng)的談了下數(shù)據(jù)增強(qiáng)的必要性，對于其更加深刻的理解往往需要在實(shí)驗(yàn)中不斷體會或者總結(jié)。

albumentations的安裝

這步?jīng)]什么好說，利用包管理工具直接安裝。

pip install albumentations

albumentations的流水線工作方式

導(dǎo)入所需要的庫

import albumentations as A
from PIL import Image
import numpy as np

讀入數(shù)據(jù)這步需要其它庫進(jìn)行配合，可以利用CV2，PIL等，這里出于習(xí)慣我選擇使用PIL

image_path = './your/image/path'
image = np.array(Image.open(image_path))  # 獲得了一個(gè)[H, W, C]的三維數(shù)組

創(chuàng)建流水線

transform = A.Compose([
	A.Resize(width=256, height=256),
	A.HorizontalFlip(p=0.5),
	A.RandomBrightnessContrast(p=0.2)
])

A.Compose中需要傳入一個(gè)list， list包含了一系列數(shù)據(jù)增強(qiáng)操作的對象。這里可以理解為A.Compose返回一條工業(yè)流水線，第一步進(jìn)行A.Resize操作，將圖片縮放成256 * 256；第二步在上一步的基礎(chǔ)上以0.5的概率對圖片進(jìn)行鏡像翻轉(zhuǎn)(p這個(gè)參數(shù)代表進(jìn)行這個(gè)操作的概率)；第三步同理，對第一步第二步處理完的圖像以0.2的概率進(jìn)行亮度和對比度的改變。

transform就是我們將要對圖片進(jìn)行的操作流程，下一步就需要將圖片數(shù)據(jù)傳入進(jìn)去。

獲得數(shù)據(jù)增強(qiáng)完的圖片數(shù)據(jù)

transformed = transform(image=image)
tranformed_image = transformed['image']

將圖片數(shù)據(jù)傳遞給transform（很明顯這是個(gè)可調(diào)用的對象）的image參數(shù)，它會返回一個(gè)處理完的對象，對象的key值image對應(yīng)的value就是處理完的圖像數(shù)據(jù)。

圖像處理結(jié)果展示

在這里插入圖片描述

object detection的數(shù)據(jù)增強(qiáng)

上述對albumentations流水線工作過程的簡要說明其實(shí)就是classification任務(wù)的大致流程。
當(dāng)然，albumentations如果僅僅只能做到上述的功能，那么torchvision中transform API可以把它完全替代，并且它也滿足不了大多數(shù)cv任務(wù)的數(shù)據(jù)增強(qiáng)需求。

拿object detection為例，一張圖片數(shù)據(jù)往往對應(yīng)了若干個(gè)bounding box，如果你對圖片數(shù)據(jù)進(jìn)行的操作具有空間變換性，那么原有的bounding box數(shù)據(jù)畫出的目標(biāo)框必然已經(jīng)對應(yīng)不了圖片中的對象了。
所以對圖片數(shù)據(jù)進(jìn)行變換的同時(shí)也必須對bounding box數(shù)據(jù)進(jìn)行變換，保持二者的一致性。

繪制目標(biāo)框

在介紹object detection的數(shù)據(jù)增強(qiáng)之前，先介紹一個(gè)繪制目標(biāo)框的函數(shù)。在albumentation中展示的代碼是用cv2實(shí)現(xiàn)，個(gè)人覺得畫出的bounding box不太美觀，下面使用的是matplotlib實(shí)現(xiàn)的代碼。

import matplotlib.pyplot as plt
import matplotlib.patches as patches
def visualize_bbox(img, bbox, class_name, color, ax):
	"""
	img:圖片數(shù)據(jù) (H, W, C)數(shù)據(jù)格式
	bbox:array或者tensor， 假定數(shù)據(jù)格式是 [x_mid, y_mid, width, height]
	classname:str 目標(biāo)框?qū)?yīng)的種類
	color:str
	thickness:目標(biāo)框的寬度
	"""
	x_mid, y_mid, width, height = bbox
	x_min = int(x_mid - width / 2)
	y_min = int(y_mid - height / 2)
	# 畫目標(biāo)檢測框
	rect = patches.Rectangle((x_min, y_min), 
								width, 
								height, 
								linewidth=3,
								edgecolor=color,
								facecolor="none"
								)
	ax.imshow(img)
	ax.add_patch(rect)
	ax.text(x_min + 1, y_min - 3, class_name, fontSize=10, bbox={'facecolor':color, 'pad': 3, 'edgecolor':color})
def visualize(img, bboxes, category_ids, category_id_to_name, category_id_to_color):
	fig, ax = plt.subplots(1, figsize=(8, 8))
	ax.axis('off')
	for box, category in zip(bboxes, category_ids):
		class_name = category_id_to_name[category]
		color = category_id_to_color[category]
		visualize_bbox(img, box, class_name, color, ax)
	plt.show()

在這里插入圖片描述

對bounding box進(jìn)行空間變換

導(dǎo)入所需要的庫

import albumentations as A
from PIL import Image
import numpy as np
image_path = './your/image/path'
image = np.array(Image.open(image_path))

構(gòu)造流水線

transform = A.Compose([
	A.Resize(width=256, height=256),
	A.HorizontalFlip(p=0.5),
	A.RandomBrightnessContrast(p=0.2)
], bbox_params = A.BboxParams(format='yolo'))

相較于最簡單的流水線(for classification)，oject detection需要傳入一個(gè)叫做bbox_params的參數(shù)，它接收的是用于配置bounding box參數(shù)的對象。
format表示的是bounding box數(shù)據(jù)的格式，albumentations提供了4種格式。

在這里插入圖片描述

1.pascal_voc [x_min, y_min, x_max, y_max] 數(shù)值并沒有歸一化

直接使用像素值[98, 345, 420, 462]

2.albumentations [x_min, y_min, x_max, y_max] 與上一種格式不一樣的是

這里值都是normalized 做了歸一化處理[0.153125, 0.71875, 0.65625, 0.9625]

3.coco [x_min, y_min, width, height] 沒有歸一化

4.yolo [x_center, y_center, width, height] 歸一化了

傳入image數(shù)據(jù)和bounding box數(shù)據(jù)進(jìn)行變換

label = np.array([
        [0.339, 0.6693333333333333, 0.402, 0.42133333333333334],
        [0.379, 0.5666666666666667, 0.158, 0.3813333333333333],
        [0.612, 0.7093333333333333, 0.084, 0.3466666666666667],
        [0.555, 0.7026666666666667, 0.078, 0.34933333333333333]
])  # normalized (x_center, y_center, width, height) 對應(yīng)format yolo
category_ids = [12, 14, 14, 14]
category_id_to_name = {
    12: 'horse',
    14: 'people'
}
category_id_to_color = {
    12: 'yellow',
    14: 'red'
}
transformed = transform(image=image,bboxes=label)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
height, width, _ = transformed_image.shape
transformed_bboxes[:, [0, 2]] = transformed_bboxes[:, [0, 2]] * width
transformed_bboxes[:, [1, 3]] = transformed_bboxes[:, [1, 3]] * height
visualize(transformed_image, transformed_bboxes, category_ids, category_id_to_name, category_id_to_color)

在這里插入圖片描述

BboxParams中不止format這一個(gè)參數(shù)。當(dāng)我們做隨機(jī)裁剪操作的時(shí)候，bounding box完全可能只保留了一部分，當(dāng)保留比例小于某一個(gè)閾值的時(shí)候，我們可以將其drop掉，具體的操作細(xì)節(jié)可以查看albumentations的相關(guān)教程。

semantic segmentation的數(shù)據(jù)增強(qiáng)

object detection和semantic segmentation在像素級別的data agumentation和classification沒什么區(qū)別，而在空間變換上segmentation沒有bounding box變換，與之對應(yīng)的是mask變換。
mask是像素級別的label，與原圖中的像素一一對應(yīng)。
albumentations上的教程使用的是kaggle上的數(shù)據(jù)集，這里為了方便展示我們使用同樣的數(shù)據(jù)集。

數(shù)據(jù)集網(wǎng)址

在這里插入圖片描述

下載完數(shù)據(jù)并解壓縮完成后可以得到如上的目錄結(jié)構(gòu)，通過train.csv文件可以得到所用的image和mask名稱。

image = np.array(Image.open(image_path))  # 這里使用的是/train/images/0fea4b5049.png
mask = np.array(Image.open(mask_path))  # /train/masks/0fea4b5049.png

下面介紹一下展示結(jié)果的函數(shù)

from matplotlib import pyplot as plt
def visualize(image, mask, original_image=None, original_mask=None):
	fontsize=8
	if original_image == None and original_mask == None:
		fg, ax = plt.subplots(2, 1, figsize=(8, 8))
		ax[0].axis('off')
		ax[0].imshow(image)
		ax[0].set_title('image', fontsize=fontsize)
		ax[1].axis('off')
		ax[1].imshow(mask)
		ax[1].set_title('mask', fontsize=fontsize)
	else:
		fg, ax = plt.subplots(2, 2, figsize=(8, 8))
		ax[0, 0].axis('off')
		ax[0, 0].imshow(original_image)
		ax[0, 0].set_title('Original Image', fontsize=fontsize)
		ax[0, 1].axis('off')
		ax[0, 1].imshow(original_mask)
		ax[0, 1].set_title('Original Mask', fontsize=fontsize)
		ax[1, 0].axis('off')
		ax[1, 0].imshow(image)
		ax[1, 0].set_title('Transformed Image', fontsize=fontsize)
		ax[1, 1].axis('off')
		ax[1, 1].imshow(mask)
		ax[1, 1].set_title('Transformed Mask', fontsize=fontsize)

data agumentation的流水線操作

aug = A.PadIfNeeded(min_height=128, min_width=128, p=1)
augmented = aug(image=image, mask=mask)
augmented_img = augmented['image']
augmented_mask = augmented['mask']
visualize(augmented_img, augmented_mask, original_image=image, original_mask=mask)

這里相較于classification就是多了個(gè)mask函數(shù)，將mask數(shù)據(jù)直接傳進(jìn)入即可。

在這里插入圖片描述

padding的填充方式默認(rèn)是reflection，可以看到變換以后的mask右側(cè)多了些黃色區(qū)域。
對于一些分割任務(wù)而言，我們不想增加或者刪除額外的信息，所以往往采用 Non destructive transformations(非破壞性變換)如HorizontalFlip(水平翻轉(zhuǎn)), VerticalFlip(垂直翻轉(zhuǎn)), RandomRotate90(Randomly rotates by 0, 90, 180, 270 degrees)

aug = A.RandomRotate(p=1)
augmented = aug(image=image, mask=mask)
augmented_image = augmented['image']
augmented_mask = augmented['mask']
visualize(augmented_image, augmented_mask, original_image=image, original_mask=mask)

在這里插入圖片描述

下面介紹下多個(gè)transform綜合起來的流水線操作

original_height, original_width = image.shape[:2]
aug = A.Compose([
    A.OneOf([
        A.RandomSizedCrop(min_max_height=(50, 101), height=original_height, width=original_width, p=0.5),
        A.PadIfNeeded(min_height=original_height, min_width=original_width, p=0.5)
    ]),
    A.VerticalFlip(p=0.5),
    A.RandomRotate90(p=0.5),
    A.OneOf([
        A.ElasticTransform(p=0.5, alpha=120, sigma=120 * 0.05, alpha_affine=120 * 0.03),
        A.GridDistortion(p=0.5),
        A.OpticalDistortion(distort_limit=1, shift_limit=0.5, p=1)
    ], p=0.8)
])
augmented = aug(image=image, mask=mask)
image_medium = augmented['image']
mask_medium = augmented['mask']
visualize(image_medium, mask_medium, original_image=image, original_mask=mask)

這里一個(gè)較新的知識點(diǎn)是A.OneOf,它接收的transform對象的list，從中按照權(quán)重隨機(jī)選擇一個(gè)進(jìn)行變換，它本身也有概率。

在這里插入圖片描述

可以看到OneOf將list中的transform的概率進(jìn)行歸一化再重新分配。所以這里transform的p不再理解為概率，而是權(quán)重，取到1，甚至比1大都沒有關(guān)系。

以上就是Python深度學(xué)習(xí)albumentations數(shù)據(jù)增強(qiáng)庫的詳細(xì)內(nèi)容，更多關(guān)于Python數(shù)據(jù)增強(qiáng)庫albumentations的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: