Python圖像處理之圖像與視頻處理基礎(chǔ)教程

更新時(shí)間：2023年04月11日 10:22:26 作者：AI technophile

這篇文章主要介紹了Python圖像處理之圖像與視頻處理基礎(chǔ)教程,本文給大家介紹的非常詳細(xì)，對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值，需要的朋友可以參考下

圖像與視頻處理基礎(chǔ)

0. 前言

圖像處理是指在計(jì)算機(jī)上使用算法和代碼對(duì)圖像進(jìn)行自動(dòng)處理、操作和分析，而視頻處理是圖像處理的一種特殊情況(視頻文件或視頻流有連續(xù)的圖像序列構(gòu)成)。圖像和視頻處理在許多領(lǐng)域都有應(yīng)用廣泛的應(yīng)用，如電視、攝影、機(jī)器人、遙感、醫(yī)學(xué)診斷和工業(yè)檢查等。
在本節(jié)中，我們將聚焦一些簡單的圖像和視頻處理問題，用于幫助我們理解圖像和視頻的基本概念。在我們開始分析圖像/視頻之前，我們需要使用合適的數(shù)據(jù)結(jié)構(gòu)將圖像加載到內(nèi)存中，并且能夠?qū)⑻幚砗蟮膱D像/視頻保存到文件中；能夠在計(jì)算機(jī)屏幕上實(shí)時(shí)可視化(繪制)圖像能夠幫助我們立即觀察到圖像處理算法對(duì)圖像的處理結(jié)果。

1. 在 3D 空間中顯示 RGB 圖像顏色通道

1.1 圖像表示

圖像可以抽象為一個(gè)函數(shù)，并將其可視化，以進(jìn)行進(jìn)一步的分析/處理?；叶葓D像可以認(rèn)為是像素位置的二元函數(shù) f ( x , y ) f(x, y) f(x,y)， f ( x , y ) f(x, y) f(x,y) 將每個(gè)像素映射到其相應(yīng)灰度強(qiáng)度級(jí)別( [0, 255] 中的整數(shù)或 [0, 1] 中的浮點(diǎn)數(shù))，即：

對(duì)于 RGB 圖像，有三個(gè)這樣的函數(shù)，可以表示為：

其分別對(duì)應(yīng)于每個(gè)通道及其色值，我們可以使用 matplotlib 庫的三維繪圖函數(shù)繪制以上函數(shù)，使用 Python 代碼在 3D 空間中單獨(dú)繪制每個(gè) RGB 通道。

1.2 在 3D 空間中繪制顏色通道

(1) 首先，導(dǎo)入所有必需的包。為了讀取圖像，我們需要 scikit-image 庫的 io 模塊中的 imread() 函數(shù)；由于我們將圖像加載為 array 類型，因此需要 Numpy 操作數(shù)據(jù)類型 array；為了顯示圖像，我們將使用 matplotlib.pyplot 函數(shù)；對(duì)于 3D 圖像的顯示，我們需要使用 mpl_toolkit 庫的 mplt3d 模塊以及 matplotlib 庫中的其他模塊：

from skimage.io import imread
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter

(2) 使用 plot_surface() 函數(shù)繪制通道的像素值，這是繪制 3D 圖像的關(guān)鍵函數(shù)：

Axes3D.plot_surface(X, Y, Z, *args, **kwargs)

接下來，我們實(shí)現(xiàn) plot_3d() 的函數(shù)，在以下代碼中，X 軸和 Y 軸分別用于顯示水平軸和垂直軸，Z 軸用于顯示圖像的深度。需要注意的是，x、y 和 z 的尺寸必須相同，cmap 用于顯示不同像素值的顏色映射：

def plot_3d(x, y, z, cmap='Reds', title=''):
    fig = plt.figure(figsize=(15,15))
    ax = fig.gca(projection='3d')
    # 曲面繪制
    surf = ax.plot_surface(x, y, z, cmap=cmap, linewidth=0, antialiased=False, rstride=2, cstride=2, alpha=0.5)
    ax.xaxis.set_major_locator(LinearLocator(10))
    ax.xaxis.set_major_formatter(FormatStrFormatter('%.02f'))
    ax.view_init(elev=10., azim=5)
    ax.set_title(title, size=20)
    plt.show()

(3) 從磁盤讀取 RGB 圖像，并使用 scikit-image 庫 io 模塊的 imread() 函數(shù)將其加載到內(nèi)存中:

skimage.io.imread(fname, as_gray=False, plugin=None, flatten=None, **plugin_args)

使用 imread() 函數(shù)從文件加載圖像：

im = imread('1.jpg')

(4) 然后，使用 Numpy 庫的 arange() 和 meshbrid() 函數(shù)創(chuàng)建像素坐標(biāo) ( x , y ) (x, y) (x,y) 的二維網(wǎng)格：

y = np.arange(im.shape[0])
x = np.arange(im.shape[1])
x, y = np.meshgrid(x, y)

(5) 最后，將圖像的紅色、綠色和藍(lán)色通道分別分配給相應(yīng)變量，這些通道使用 plot_3D() 函數(shù)以 3D 方式顯示：

z1 = im[...,0]
z2 = im[...,1]
z3 = im[...,2]

(6) 在 3D 空間中可視化圖像，使用 plot_3d() 函數(shù)可視化 RGB 圖像的顏色通道。使用 Z 軸作為深度軸，并從圖像的高度中減去 y 軸值，以便將坐標(biāo)原點(diǎn)從左上角移動(dòng)到左側(cè)中心點(diǎn)處。使用函數(shù) plot_3d() 可視化紅色通道：

plot_3d(z1, x, im.shape[1]-y, cmap='Reds', title='3D plot for the Red Channel')

紅色通道的 3D 繪圖結(jié)果如下所示：

從以上繪圖結(jié)果可以看出，每個(gè)通道中的顏色深度和 3D 繪圖結(jié)果在視覺上與原始 2D 圖像類似。

2. 使用 scikit-video 讀/寫視頻文件

2.1 scikit-video 庫

在本節(jié)中，我們將學(xué)習(xí)如何使用 scikit-video 庫函數(shù)從磁盤加載視頻，該庫使用 FFmpeg 軟件執(zhí)行視頻 I/O，因此首先需要安裝 FFmpeg 后再安裝 scikit-video 模塊：

pip3 install scikit-video

2.2 讀/寫視頻文件

首先導(dǎo)入所有必需的包：

import skvideo.io
import numpy as np
import matplotlib.pyplot as plt

接下來，使用 FFmpegReader() 函數(shù)從磁盤讀取視頻文件，并隨機(jī)顯示視頻中的一些圖像幀。函數(shù) FFmpegReader() 的用法如下：

skvideo.io.FFmpegReader(*args, **kwargs)

使用 FFmpeg 讀取視頻幀：

inputparameters = {}
outputparameters = {}
reader = skvideo.io.FFmpegReader('example.mp4',
                inputdict=inputparameters,
                outputdict=outputparameters)

2.3 提取視頻文件屬性

使用 getShape() 方法以及 FFmpegReader() 函數(shù)返回的對(duì)象獲取視頻的幀數(shù)、高度、寬度和通道數(shù)等屬性：

# 讀取視頻文件屬性
num_frames, height, width, num_channels = reader.getShape()
print(num_frames, height, width, num_channels)

2.4 讀取并保存視頻

使用 nextFrame() 方法從視頻中讀取幀。通過使用 NumPy 的 random.choice() 函數(shù)隨機(jī)選擇四個(gè)幀，并顯示這些幀：

plt.figure(figsize=(20,10))

frame_list = np.random.choice(num_frames, 4)
i, j = 0, 1
for frame in reader.nextFrame():
    if i in frame_list:
        plt.subplot(2,2,j)
        plt.imshow(frame)
        plt.title("Frame {}".format(i), size=20)
        plt.axis('off')
        j += 1
    i += 1
plt.show()

二值圖像是只有兩個(gè)不同灰度值(黑色和白色)的圖像，二值圖像處理通常是圖像處理應(yīng)用程序的主要過程之一，例如，形態(tài)學(xué)圖像處理算法通常需要輸入二值圖像開始。要計(jì)算二值圖像，最簡單的方法是使用閾值算法，高于閾值的像素變?yōu)榘咨陀陂撝档南袼鼐優(yōu)楹谏?br />filters 模塊中的 threshold_otsu() 函數(shù)能夠?qū)σ曨l中的幀進(jìn)行閾值處理，threshold_otsu() 函數(shù)能夠?qū)⒒叶葓D像轉(zhuǎn)換為二值圖像(在之后的學(xué)習(xí)中，我們會(huì)對(duì)其進(jìn)行詳細(xì)介紹)。
對(duì)每個(gè)顏色通道應(yīng)用閾值以從圖像幀中獲得二值圖像幀。使用 FFmpegWriter() 函數(shù)保存二值化視頻，方法是按讀取視頻幀順序依次疊加二值圖像幀：

from skimage.color import rgb2gray
from skimage.filters import threshold_otsu

writer = skvideo.io.FFmpegWriter("r2_binary.mp4", outputdict={})
for frame in skvideo.io.vreader("r3.mp4"):
    frame = rgb2gray(frame)
    thresh = threshold_otsu(frame)
    binary = np.zeros((frame.shape[0], frame.shape[1], 3), dtype=np.uint8)
    binary[...,0] = binary[...,1] = binary[...,2] = 255*(frame > thresh).astype(np.uint8)
    writer.writeFrame(binary)
writer.close()

最后，讀取剛剛保存的二進(jìn)制視頻，然后顯示一些隨機(jī)幀：

plt.figure(figsize=(20,10))

reader = skvideo.io.FFmpegReader("example_binary.mp4")
num_frames, height, width, num_channels = reader.getShape()
frame_list = np.random.choice(num_frames, 4)
i, j = 0, 1
for frame in reader.nextFrame():
    if i in frame_list:
        plt.subplot(2,2,j)
        plt.imshow(frame)
        plt.title("Frame {}".format(i), size=20)
        plt.axis('off')
        j += 1
    i += 1
plt.show()

3. 使用 OpenCV 從相機(jī)捕獲實(shí)時(shí)視頻

在本節(jié)中，我們將學(xué)習(xí)如何使用 OpenCV 庫捕獲視頻并提取圖像幀，同時(shí)我們將捕獲攝像機(jī)(例如，筆記本電腦的內(nèi)置網(wǎng)絡(luò)攝像頭或 USB 攝像頭)錄制的實(shí)時(shí)視頻流。

(1) 首先導(dǎo)入所需的庫：

import cv2
import matplotlib.pyplot as plt

(2) 要使用 OpenCV 捕獲視頻，我們需要?jiǎng)?chuàng)建一個(gè) VideoCapture 對(duì)象，它的參數(shù)可以是設(shè)備索引(實(shí)時(shí)視頻)或視頻文件的名稱(本地文件)：

vc = cv2.VideoCapture(0)
plt.ion()
if vc.isOpened(): # 讀取第一幀
    is_capturing, frame = vc.read()    
    webcam_preview = plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))    
else:
    is_capturing = False

設(shè)備索引是指定攝像機(jī)的整數(shù)數(shù)字，通常，如果只有一臺(tái)相機(jī)連接到計(jì)算機(jī)，只需傳遞 0 作為參數(shù)即可，如果有兩臺(tái)相機(jī)，則可以通過傳遞 1 來選擇第二臺(tái)相機(jī)，依次類推。
我們可以使用 isOpened() 方法檢查 VideoCapture 對(duì)象是否正確初始化，如果正確初始化則返回 true。如果返回 true，那么我們可以使用函數(shù) read() 讀取第一幀以及所有后續(xù)幀。
read() 函數(shù)是從設(shè)備捕獲數(shù)據(jù)的最方便的方法，它返回捕獲的視頻幀。如果沒有捕獲到任何幀(攝像機(jī)已斷開連接，或者視頻文件中沒有更多幀)，則該方法返回 false；使用布爾變量 is_capturing 確定是否可以捕獲幀。

(3) 一旦第一幀被正確讀取，我們就可以在 while 循環(huán)中逐幀捕獲，直到視頻的最后一幀。最后，一定要調(diào)用 VideoCapture 對(duì)象上的 release() 函數(shù)來釋放設(shè)備。以下代碼演示了如何捕獲實(shí)時(shí)視頻流的前十幀。需要注意的是，OpenCV 使用 BGR 顏色格式，要顯示具有真實(shí) RGB 顏色的視頻幀，必須使用轉(zhuǎn)換函數(shù) cv2.cvtColor(frame, cv2.color_BGR2RGB)：

frame_index = 1
while is_capturing:
    if frame_index > 10: break

    is_capturing, frame = vc.read()
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    webcam_preview.set_data(image)
    plt.title('Frame {0:d} '.format(frame_index))
    plt.draw()
    frame_index += 1
    try:    # 避免由 plt.pause 引起的 NotImplementedError
        plt.pause(2)
    except Exception:
        Pass
vc.release()

如果連接到計(jì)算機(jī)的相機(jī)設(shè)備可以正常工作，那么運(yùn)行以上代碼時(shí)，就可以看到相機(jī)捕獲到的實(shí)時(shí)圖像。此外， cv2.VideoCapture() 函數(shù)也可用于從磁盤讀取視頻文件，相對(duì)應(yīng)的，可以使用 cv2.VideoWriter() 函數(shù)可將視頻文件保存到本地磁盤文件中。

4. 實(shí)現(xiàn) Gotham 圖像濾鏡

4.1 Gotham 圖像濾鏡

在本節(jié)中，我們將學(xué)習(xí)如何實(shí)現(xiàn) Gotham 圖像濾鏡用于增強(qiáng)圖像效果，用于加強(qiáng)理解如何操作圖像像素并執(zhí)行插值操作。下圖顯示了我們將用于實(shí)現(xiàn)圖像濾鏡的輸入圖像：

首先，導(dǎo)入所需的庫。在本節(jié)中，我們將使用 PIL 庫實(shí)現(xiàn)圖像處理函數(shù)：

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

im = Image.open('images/Img_01_03.jpg')
print(np.max(im))

創(chuàng)建 Gotham 濾波器需要在輸入圖像上應(yīng)用中間色調(diào)(紅色)對(duì)比度增強(qiáng)，這需要通過使用 Numpy 庫的 interp() 函數(shù)完成的，利用該函數(shù)實(shí)現(xiàn)通道插值。接下來，我們首先了解 NumPy 插值在一維情況空間中的工作原理。

4.2 一維線性插值

當(dāng)圖像按縮放因子調(diào)整大小時(shí)，需要執(zhí)行一些像素插值，以便在現(xiàn)有像素之間填充新像素值，我們可以使用 NumPy interp() 函數(shù)執(zhí)行插值操作。從 NumPy 文檔中可以看到，interp() 函數(shù)的一維線性插值基本用法如下：

numpy.interp(x, left=None, right=None, period=None)

interp() 會(huì)返回具有給定離散數(shù)據(jù)點(diǎn)的函數(shù)的一維分段線性插值：

x_p = np.linspace(0, 2*np.pi, 10) # 在間隔[0，2π]中生成10個(gè)均勻間隔的數(shù)字序列
y_p = np.cos(x_p)
x = np.linspace(0, 2*np.pi, 50) # 在間隔[0，2π]中生成50個(gè)均勻間隔的數(shù)字序列
y = np.cos(x)
y_interp = np.interp(x, x_p, y_p)

plt.figure(figsize=(20,10))
plt.plot(x_p, y_p, 'o', label='reference points')
plt.plot(x, y_interp, '-x', label='interpolated')
plt.plot(x, y, '--', label='true')
plt.legend(prop={'size': 16})
plt.show()

假設(shè)我們希望(線性)插值區(qū)間 [ 0 , 2 π ] [0, 2π] [0,2π] 中余弦函數(shù)的值，最初區(qū)間中僅包含十個(gè)參考點(diǎn)。我們可以使用 interp() 函數(shù)計(jì)算剩余點(diǎn)處函數(shù)的值，從給定點(diǎn)處函數(shù)值開始，然后應(yīng)用線性插值。橙色曲線表示由 interp() 函數(shù)估計(jì)的曲線，綠色曲線顯示真實(shí)的余弦曲線：

4.3 圖像插值

可以以類似的方式擴(kuò)展以上代碼，使用 interp() 函數(shù)計(jì)算 R (紅色)通道的通道內(nèi)插值結(jié)果，由于圖像的紅色通道值本質(zhì)上是一個(gè) 2D 陣列(矩陣)，因此在將該函數(shù)應(yīng)用于通道之前，需要執(zhí)行以下操作：

將 2D 陣列展平為 1D 陣列(使用 NumPy 的 ravel() 函數(shù))
使用 interp() 函數(shù)應(yīng)用通道插值
將 1D 陣列重新整形為圖像矩陣(使用 NumPy 的整形函數(shù))

(1) 使用 np.interp() 函數(shù)，用 11 個(gè)參考點(diǎn)拉伸紅色通道直方圖：

r, g, b = im.split() # 分割圖像通道為R、G和B
r_old = np.linspace(0,255,11)   # 參考點(diǎn)
r_new = [0., 12.75, 25.5, 51., 76.5, 127.5, 178.5, 204., 229.5, 242.25, 255.] # 參考點(diǎn)的新值

r1 = Image.fromarray((np.reshape(np.interp(np.array(r).ravel(), r_old, r_new),
                                 (im.height, im.width))).astype(np.uint8), mode='L')

(2) 然后，繪制圖像和紅色通道直方圖：

plt.figure(figsize=(20,15))
plt.subplot(221)
plt.imshow(im)
plt.title('original', size=20)
plt.axis('off')
plt.subplot(222)
im1 = Image.merge('RGB', (r1, g, b))
plt.imshow(im1)
plt.axis('off')
plt.title('with red channel interpolation', size=20)
plt.subplot(223)
plt.hist(np.array(r).ravel())
plt.subplot(224)
plt.hist(np.array(r1).ravel())
plt.show()

下圖顯示了插值變換前后的圖像：

(3) 通過使用以下代碼，令黑色更接近藍(lán)色值，我們將藍(lán)色值增加了 7.65，并且使用函數(shù) np.clip() 來確保新值保持在 0 到 255 區(qū)間內(nèi)：

plt.figure(figsize=(20,10))
plt.subplot(121)
plt.imshow(im1)
plt.title('last image', size=20)
plt.axis('off')
b1 = Image.fromarray(np.clip(np.array(b) + 7.65, 0, 255).astype(np.uint8))
im1 = Image.merge('RGB', (r1, g, b1))
plt.subplot(122)
plt.imshow(im1)
plt.axis('off')
plt.title('with transformation', size=20)
plt.tight_layout()
plt.show()

其中，matplotlib 庫的 subplot 模塊用于顯示子圖，向當(dāng)前圖像中添加子圖：

subplot(nrows, ncols, index, **kwargs)

如上圖所示，使用 plt.subplot(121) 創(chuàng)建具有一行和兩列子圖的圖像，并使用索引參數(shù) index 指定繪圖位置。

(4) 通過使用 PIL 庫的 ImageEnhance 類中的 Enhance() 方法對(duì)圖像執(zhí)行較小的銳化：

class PIL.ImageEnhance.Sharpness(image)

ImageEnhance 類可用于調(diào)整圖像的清晰度，

from PIL.ImageEnhance import Sharpness

plt.figure(figsize=(20,10))
plt.subplot(121)
plt.imshow(im1)
plt.title('last image', size=20)
plt.axis('off')
im2 = Sharpness(im1).enhance(3.0)
plt.subplot(122)
plt.imshow(im2)
plt.axis('off')
plt.title('with transformation', size=20)
plt.tight_layout()
plt.show()

(5) 減少藍(lán)色通道的色調(diào)值，我們同樣使用通道插值完成，但這次需要在 RGB 圖像的藍(lán)色通道上進(jìn)行：

blue_old = np.linspace(0,255,17) # 參考點(diǎn)的像素值
blue_new = [0., 11.985, 30.09, 64.005, 81.09, 99.96, 107.1, 111.945, 121.125, 143.055, 147.9, 159.885, 171.105,
               186.915, 215.985, 235.875, 255.] # 參考點(diǎn)的新像素值

b2 = Image.fromarray((np.reshape(np.interp(np.array(b1).ravel(), blue_old, blue_new),
                                 (im.height, im.width))).astype(np.uint8), mode='L')

繪制圖像以及藍(lán)色通道直方圖如下：

plt.figure(figsize=(20,15))
plt.subplot(221)
plt.imshow(im2)
plt.title('last image', size=20)
plt.axis('off')
plt.subplot(222)
im3 = Image.merge('RGB', (r1, g, b2))
plt.imshow(im3)
plt.axis('off')
plt.title('with blue channel interpolation', size=20)
plt.subplot(223)
plt.hist(np.array(b1).ravel(), normed=True)
plt.subplot(224)
plt.hist(np.array(b2).ravel(), normed=True)
plt.show()

(5) 最后，我們展示應(yīng)用 Gotham 濾鏡生成的最終輸出圖像：

plt.figure(figsize=(20,15))
plt.imshow(im3)
plt.axis('off')
plt.show()

小結(jié)

圖像和視頻是大數(shù)據(jù)時(shí)代重要的交流媒介，圖像和視頻處理是當(dāng)今人工智能技術(shù)重點(diǎn)的研究領(lǐng)域。在本節(jié)中，我們學(xué)習(xí)了如何使用 Python 執(zhí)行基本的圖像/視頻處理，我們首先學(xué)習(xí)在 3D 空間可視化 RGB 圖像的三個(gè)通道；然后，介紹了如何捕獲視頻并提取圖像幀；最后，我們展示了如何實(shí)現(xiàn)一種流形的圖像濾鏡 Gotham，用于理解如何操作圖像像素并執(zhí)行插值操作。

到此這篇關(guān)于Python圖像處理之圖像與視頻處理基礎(chǔ)教程的文章就介紹到這了,更多相關(guān)Python圖像處理內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: