Python+Opencv文本檢測的實(shí)現(xiàn)

更新時(shí)間：2021年11月19日 11:58:01 作者：AI浩

本文主要介紹了如何使用OpenCV和EAST文本檢測器檢測圖像中的文本，以便大家可以在自己的應(yīng)用程序中應(yīng)用文本檢測。感興趣的同學(xué)可以關(guān)注一下

在今天教程的第一部分中，我將討論為什么在自然場景圖像中檢測文本會(huì)如此具有挑戰(zhàn)性。從那里我將簡要討論 EAST 文本檢測器，我們?yōu)槭裁词褂盟?，以及是什么讓算法如此新穎——我還將提供原始論文的鏈接，以便您可以閱讀詳細(xì)信息，如果您愿意的話。

最后，我將提供我的 Python + OpenCV 文本檢測實(shí)現(xiàn)，以便您可以開始在自己的應(yīng)用程序中應(yīng)用文本檢測。

為什么自然場景文本檢測如此具有挑戰(zhàn)性？

在受約束的受控環(huán)境中檢測文本通常可以通過使用基于啟發(fā)式的方法來完成，例如利用梯度信息或文本通常被分組為段落并且字符出現(xiàn)在一條直線上的事實(shí)。

然而，自然場景文本檢測是不同的——而且更具挑戰(zhàn)性。由于廉價(jià)數(shù)碼相機(jī)的普及，更不用說現(xiàn)在幾乎每部智能手機(jī)都配備了相機(jī)這一事實(shí)，我們需要高度關(guān)注拍攝圖像的條件——此外，我們可以做出哪些假設(shè)，哪些不可行。我在 Celine Mancas-Thillou 和 Bernard Gosselin 的 2017 年優(yōu)秀論文《自然場景文本理解》中描述了自然場景文本檢測挑戰(zhàn)的總結(jié)版本，如下所示：

圖像/傳感器噪聲：手持相機(jī)的傳感器噪聲通常高于傳統(tǒng)掃描儀的噪聲。此外，低價(jià)相機(jī)通常會(huì)插入原始傳感器的像素以產(chǎn)生真實(shí)的顏色。
視角：自然場景文本自然會(huì)有與文本不平行的視角，使文本更難識(shí)別。
模糊：不受控制的環(huán)境往往會(huì)模糊，特別是如果最終用戶使用的智能手機(jī)沒有某種形式的穩(wěn)定性。
光照條件：我們無法對自然場景圖像中的光照條件做出任何假設(shè)。可能接近黑暗，相機(jī)上的閃光燈可能打開，或者太陽可能很耀眼，使整個(gè)圖像飽和。
分辨率：并非所有相機(jī)都是一樣的——我們可能會(huì)處理分辨率低于標(biāo)準(zhǔn)的相機(jī)。
非紙質(zhì)物體：大多數(shù)（但不是全部）紙張不具有反射性（至少在您嘗試掃描的紙張環(huán)境中）。自然場景中的文本可能具有反射性，包括徽標(biāo)、標(biāo)志等。
非平面對象：考慮將文本環(huán)繞在瓶子周圍時(shí)會(huì)發(fā)生什么 - 表面上的文本會(huì)扭曲變形。雖然人類可能仍然能夠輕松“檢測”和閱讀文本，但我們的算法將面臨困難。我們需要能夠處理這樣的用例。
未知布局：我們不能使用任何先驗(yàn)信息來為我們的算法提供有關(guān)文本所在位置的“線索”。

EAST 深度學(xué)習(xí)文本檢測器

隨著 OpenCV 3.4.2 和 OpenCV 4 的發(fā)布，我們現(xiàn)在可以使用名為 EAST 的基于深度學(xué)習(xí)的文本檢測器，該檢測器基于 Zhou 等人 2017 年的論文 EAST: An Efficient and Accurate Scene Text Detector。

我們稱該算法為“EAST”，因?yàn)樗且粋€(gè)：高效且準(zhǔn)確的場景文本檢測管道。

這組作者說，EAST 管道能夠預(yù)測 720p 圖像上任意方向的單詞和文本行，而且可以以 13 FPS 的速度運(yùn)行。也許最重要的是，由于深度學(xué)習(xí)模型是端到端的，因此可以避開其他文本檢測器通常應(yīng)用的計(jì)算成本高的子算法，包括候選聚合和單詞分區(qū)。

為了構(gòu)建和訓(xùn)練這樣一個(gè)深度學(xué)習(xí)模型，EAST 方法利用了新穎、精心設(shè)計(jì)的損失函數(shù)。有關(guān) EAST 的更多詳細(xì)信息，包括架構(gòu)設(shè)計(jì)和訓(xùn)練方法，請務(wù)必參閱作者的出版物。

項(xiàng)目結(jié)構(gòu)

$ tree --dirsfirst
.
├── images
│   ├── car_wash.png
│   ├── lebron_james.jpg
│   └── sign.jpg
├── frozen_east_text_detection.pb
├── text_detection.py
└── text_detection_video.py

請注意，我在 images/ 目錄中提供了三張示例圖片。您可能希望添加自己的智能手機(jī)收集的圖像或您在網(wǎng)上找到的圖像。我們今天將審查兩個(gè) .py 文件：

text_detection.py ：檢測靜態(tài)圖像中的文本。
text_detection_video.py ：通過網(wǎng)絡(luò)攝像頭或輸入視頻文件檢測文本。

實(shí)施說明

我今天包含的文本檢測實(shí)現(xiàn)基于 OpenCV 的官方 C++ 示例；但是，我必須承認(rèn)，將其轉(zhuǎn)換為 Python 時(shí)遇到了一些麻煩。

首先，Python 中沒有 Point2f 和 RotatedRect 函數(shù)，因此，我無法 100% 模仿 C++ 實(shí)現(xiàn)。 C++ 實(shí)現(xiàn)可以生成旋轉(zhuǎn)的邊界框，但不幸的是，我今天與您分享的那個(gè)不能。

其次，NMSBoxes 函數(shù)不返回 Python 綁定的任何值（至少對于我的 OpenCV 4 預(yù)發(fā)布安裝），最終導(dǎo)致 OpenCV 拋出錯(cuò)誤。 NMSBoxes 函數(shù)可以在 OpenCV 3.4.2 中工作，但我無法對其進(jìn)行詳盡的測試。

我在 imutils 中使用我自己的非最大值抑制實(shí)現(xiàn)解決了這個(gè)問題，但同樣，我不相信這兩個(gè)是 100% 可互換的，因?yàn)榭雌饋?NMSBoxes 接受額外的參數(shù)。

鑒于所有這些，我已盡最大努力為您提供最好的 OpenCV 文本檢測實(shí)現(xiàn)，使用我擁有的工作功能和資源。如果您對該方法有任何改進(jìn)，請隨時(shí)在下面的評論中分享。

使用 OpenCV 實(shí)現(xiàn)我們的文本檢測器

在開始之前，我想指出您的系統(tǒng)上至少需要安裝 OpenCV 3.4.2（或 OpenCV 4）才能使用 OpenCV 的 EAST 文本檢測器，接下來，確保您的系統(tǒng)上也安裝/升級了 imutils：

 pip install --upgrade imutils

此時(shí)您的系統(tǒng)已經(jīng)配置完畢，因此打開 text_detection.py 并插入以下代碼：

# import the necessary packages
from imutils.object_detection import non_max_suppression
import numpy as np
import argparse
import time
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str,
	help="path to input image")
ap.add_argument("-east", "--east", type=str,
	help="path to input EAST text detector")
ap.add_argument("-c", "--min-confidence", type=float, default=0.5,
	help="minimum probability required to inspect a region")
ap.add_argument("-w", "--width", type=int, default=320,
	help="resized image width (should be multiple of 32)")
ap.add_argument("-e", "--height", type=int, default=320,
	help="resized image height (should be multiple of 32)")
args = vars(ap.parse_args())

首先，導(dǎo)入所需的包和模塊。值得注意的是，我們從 imutils.object_detection 導(dǎo)入了 NumPy、OpenCV 和我對 non_max_suppression 的實(shí)現(xiàn)。然后我們繼續(xù)解析五個(gè)命令行參數(shù)：

–image ：我們輸入圖像的路徑。

–east : EAST 場景文本檢測器模型文件路徑。

–min-confidence ：確定文本的概率閾值。可選， default=0.5 。

–width ：調(diào)整后的圖像寬度 - 必須是 32 的倍數(shù)。默認(rèn)值為 320 時(shí)可選。

–height ：調(diào)整后的圖像高度 - 必須是 32 的倍數(shù)。默認(rèn)值為 320 時(shí)可選。

重要提示：EAST 文本要求您的輸入圖像尺寸是 32 的倍數(shù)，因此如果您選擇調(diào)整 --width 和 --height 值，請確保它們是 32 的倍數(shù)！從那里，讓我們加載我們的圖像并調(diào)整它的大?。?/p>

# load the input image and grab the image dimensions
image = cv2.imread(args["image"])
orig = image.copy()
(H, W) = image.shape[:2]
# set the new width and height and then determine the ratio in change
# for both the width and height
(newW, newH) = (args["width"], args["height"])
rW = W / float(newW)
rH = H / float(newH)
# resize the image and grab the new image dimensions
image = cv2.resize(image, (newW, newH))
(H, W) = image.shape[:2]

我們加載并復(fù)制我們的輸入圖像。確定原始圖像尺寸與新圖像尺寸的比率（基于為 --width 和 --height 提供的命令行參數(shù)）。然后我們調(diào)整圖像大小，忽略縱橫比。為了使用 OpenCV 和 EAST 深度學(xué)習(xí)模型進(jìn)行文本檢測，我們需要提取兩層的輸出特征圖：

# define the two output layer names for the EAST detector model that
# we are interested -- the first is the output probabilities and the
# second can be used to derive the bounding box coordinates of text
layerNames = [
	"feature_fusion/Conv_7/Sigmoid",
	"feature_fusion/concat_3"]

我們構(gòu)建了一個(gè) layerNames 列表：

第一層是我們的輸出 sigmoid 激活，它為我們提供了一個(gè)區(qū)域是否包含文本的概率。

第二層是輸出特征圖，表示圖像的“幾何”——我們將能夠使用這個(gè)幾何來推導(dǎo)出輸入圖像中文本的邊界框坐標(biāo)

讓我們加載 OpenCV 的 EAST 文本檢測器：

# load the pre-trained EAST text detector
print("[INFO] loading EAST text detector...")
net = cv2.dnn.readNet(args["east"])
# construct a blob from the image and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage(image, 1.0, (W, H),
	(123.68, 116.78, 103.94), swapRB=True, crop=False)
start = time.time()
net.setInput(blob)
(scores, geometry) = net.forward(layerNames)
end = time.time()
# show timing information on text prediction
print("[INFO] text detection took {:.6f} seconds".format(end - start))

我們使用 cv2.dnn.readNet 將神經(jīng)網(wǎng)絡(luò)加載到內(nèi)存中，方法是將路徑傳遞給 EAST 檢測器。

然后，我們通過將其轉(zhuǎn)換為 blob 來準(zhǔn)備我們的圖像。要閱讀有關(guān)此步驟的更多信息，請參閱深度學(xué)習(xí)：OpenCV 的 blobFromImage 工作原理。為了預(yù)測文本，我們可以簡單地將 blob 設(shè)置為輸入并調(diào)用 net.forward。這些行被抓取時(shí)間戳包圍，以便我們可以打印經(jīng)過的時(shí)間。通過將 layerNames 作為參數(shù)提供給 net.forward，我們指示 OpenCV 返回我們感興趣的兩個(gè)特征圖：

用于導(dǎo)出輸入圖像中文本的邊界框坐標(biāo)的輸出幾何圖
同樣，分?jǐn)?shù)圖，包含給定區(qū)域包含文本的概率

我們需要一個(gè)一個(gè)地循環(huán)這些值中的每一個(gè)：

# grab the number of rows and columns from the scores volume, then
# initialize our set of bounding box rectangles and corresponding
# confidence scores
(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []
# loop over the number of rows
for y in range(0, numRows):
	# extract the scores (probabilities), followed by the geometrical
	# data used to derive potential bounding box coordinates that
	# surround text
	scoresData = scores[0, 0, y]
	xData0 = geometry[0, 0, y]
	xData1 = geometry[0, 1, y]
	xData2 = geometry[0, 2, y]
	xData3 = geometry[0, 3, y]
	anglesData = geometry[0, 4, y]

我們首先獲取分?jǐn)?shù)卷的維度（，然后初始化兩個(gè)列表：

rects ：存儲(chǔ)文本區(qū)域的邊界框 (x, y) 坐標(biāo)
置信度：將與每個(gè)邊界框關(guān)聯(lián)的概率存儲(chǔ)在 rects 中

我們稍后將對這些區(qū)域應(yīng)用非極大值抑制。循環(huán)遍歷行。提取當(dāng)前行 y 的分?jǐn)?shù)和幾何數(shù)據(jù)。接下來，我們遍歷當(dāng)前選定行的每個(gè)列索引：

    # loop over the number of columns
	for x in range(0, numCols):
		# if our score does not have sufficient probability, ignore it
		if scoresData[x] < args["min_confidence"]:
			continue
		# compute the offset factor as our resulting feature maps will
		# be 4x smaller than the input image
		(offsetX, offsetY) = (x * 4.0, y * 4.0)
		# extract the rotation angle for the prediction and then
		# compute the sin and cosine
		angle = anglesData[x]
		cos = np.cos(angle)
		sin = np.sin(angle)
		# use the geometry volume to derive the width and height of
		# the bounding box
		h = xData0[x] + xData2[x]
		w = xData1[x] + xData3[x]
		# compute both the starting and ending (x, y)-coordinates for
		# the text prediction bounding box
		endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
		endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
		startX = int(endX - w)
		startY = int(endY - h)
		# add the bounding box coordinates and probability score to
		# our respective lists
		rects.append((startX, startY, endX, endY))
		confidences.append(scoresData[x])

對于每一行，我們開始遍歷列。我們需要通過忽略概率不夠高的區(qū)域來過濾掉弱文本檢測。

當(dāng)圖像通過網(wǎng)絡(luò)時(shí)，EAST 文本檢測器自然會(huì)減小體積大小——我們的體積大小實(shí)際上比我們的輸入圖像小 4 倍，因此我們乘以 4 以將坐標(biāo)帶回原始圖像。

提取角度數(shù)據(jù)。然后我們分別更新我們的矩形和置信度列表。我們快完成了！最后一步是對我們的邊界框應(yīng)用非極大值抑制來抑制弱重疊邊界框，然后顯示結(jié)果文本預(yù)測：

# apply non-maxima suppression to suppress weak, overlapping bounding
# boxes
boxes = non_max_suppression(np.array(rects), probs=confidences)
# loop over the bounding boxes
for (startX, startY, endX, endY) in boxes:
	# scale the bounding box coordinates based on the respective
	# ratios
	startX = int(startX * rW)
	startY = int(startY * rH)
	endX = int(endX * rW)
	endY = int(endY * rH)
	# draw the bounding box on the image
	cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 2)
# show the output image
cv2.imshow("Text Detection", orig)
cv2.waitKey(0)

正如我在上一節(jié)中提到的，我無法在我的 OpenCV 4 安裝 (cv2.dnn.NMSBoxes) 中使用非最大值抑制，因?yàn)?Python 綁定沒有返回值，最終導(dǎo)致 OpenCV 出錯(cuò)。我無法完全在 OpenCV 3.4.2 中進(jìn)行測試，因此它可以在 v3.4.2 中運(yùn)行。

相反，我使用了 imutils 包（第 114 行）中提供的非最大值抑制實(shí)現(xiàn)。結(jié)果看起來還是不錯(cuò)的；但是，我無法將我的輸出與 NMSBoxes 函數(shù)進(jìn)行比較以查看它們是否相同。循環(huán)我們的邊界框，將坐標(biāo)縮放回原始圖像尺寸，并將輸出繪制到我們的原始圖像。原始圖像會(huì)一直顯示，直到按下某個(gè)鍵。

作為最后的實(shí)現(xiàn)說明，我想提一下，我們用于循環(huán)分?jǐn)?shù)和幾何體的兩個(gè)嵌套 for 循環(huán)將是一個(gè)很好的例子，說明您可以利用 Cython 顯著加速您的管道。我已經(jīng)使用 OpenCV 和 Python 在快速優(yōu)化的“for”像素循環(huán)中展示了 Cython 的強(qiáng)大功能。

OpenCV 文本檢測結(jié)果

您準(zhǔn)備好將文本檢測應(yīng)用于圖像了嗎？

下載frozen_east_text_detection，地址：

oyyd/frozen_east_text_detection.pb (github.com)

從那里，您可以在終端中執(zhí)行以下命令（注意兩個(gè)命令行參數(shù)）：

$ python text_detection.py --image images/lebron_james.jpg \
	--east frozen_east_text_detection.pb

您的結(jié)果應(yīng)類似于下圖：

在勒布朗·詹姆斯身上標(biāo)識(shí)了三個(gè)文本區(qū)域。現(xiàn)在讓我們嘗試檢測商業(yè)標(biāo)志的文本：

$ python text_detection.py --image images/car_wash.png \
	--east frozen_east_text_detection.pb

使用 OpenCV 檢測視頻中的文本

現(xiàn)在我們已經(jīng)了解了如何檢測圖像中的文本，讓我們繼續(xù)使用 OpenCV 檢測視頻中的文本。這個(gè)解釋將非常簡短；請根據(jù)需要參閱上一節(jié)了解詳細(xì)信息。打開 text_detection_video.py 并插入以下代碼：

# import the necessary packages
from imutils.video import VideoStream
from imutils.video import FPS
from imutils.object_detection import non_max_suppression
import numpy as np
import argparse
import imutils
import time
import cv2

我們首先導(dǎo)入我們的包。我們將使用 VideoStream 訪問網(wǎng)絡(luò)攝像頭和 FPS 來對這個(gè)腳本的每秒幀數(shù)進(jìn)行基準(zhǔn)測試。其他一切都與上一節(jié)相同。

為方便起見，讓我們定義一個(gè)新函數(shù)來解碼我們的預(yù)測函數(shù)——它將在每一幀中重復(fù)使用，并使我們的循環(huán)更清晰：

def decode_predictions(scores, geometry):
	# grab the number of rows and columns from the scores volume, then
	# initialize our set of bounding box rectangles and corresponding
	# confidence scores
	(numRows, numCols) = scores.shape[2:4]
	rects = []
	confidences = []
	# loop over the number of rows
	for y in range(0, numRows):
		# extract the scores (probabilities), followed by the
		# geometrical data used to derive potential bounding box
		# coordinates that surround text
		scoresData = scores[0, 0, y]
		xData0 = geometry[0, 0, y]
		xData1 = geometry[0, 1, y]
		xData2 = geometry[0, 2, y]
		xData3 = geometry[0, 3, y]
		anglesData = geometry[0, 4, y]
		# loop over the number of columns
		for x in range(0, numCols):
			# if our score does not have sufficient probability,
			# ignore it
			if scoresData[x] < args["min_confidence"]:
				continue
			# compute the offset factor as our resulting feature
			# maps will be 4x smaller than the input image
			(offsetX, offsetY) = (x * 4.0, y * 4.0)
			# extract the rotation angle for the prediction and
			# then compute the sin and cosine
			angle = anglesData[x]
			cos = np.cos(angle)
			sin = np.sin(angle)
			# use the geometry volume to derive the width and height
			# of the bounding box
			h = xData0[x] + xData2[x]
			w = xData1[x] + xData3[x]
			# compute both the starting and ending (x, y)-coordinates
			# for the text prediction bounding box
			endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
			endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
			startX = int(endX - w)
			startY = int(endY - h)
			# add the bounding box coordinates and probability score
			# to our respective lists
			rects.append((startX, startY, endX, endY))
			confidences.append(scoresData[x])
	# return a tuple of the bounding boxes and associated confidences
	return (rects, confidences)

定義了 decode_predictions 函數(shù)。

該函數(shù)用于提取：文本區(qū)域的邊界框坐標(biāo) 和一個(gè)文本區(qū)域檢測的概率此專用函數(shù)將使代碼在此腳本中稍后更易于閱讀和管理。讓我們解析我們的命令行參數(shù)：

def decode_predictions(scores, geometry):
	# grab the number of rows and columns from the scores volume, then
	# initialize our set of bounding box rectangles and corresponding
	# confidence scores
	(numRows, numCols) = scores.shape[2:4]
	rects = []
	confidences = []
	# loop over the number of rows
	for y in range(0, numRows):
		# extract the scores (probabilities), followed by the
		# geometrical data used to derive potential bounding box
		# coordinates that surround text
		scoresData = scores[0, 0, y]
		xData0 = geometry[0, 0, y]
		xData1 = geometry[0, 1, y]
		xData2 = geometry[0, 2, y]
		xData3 = geometry[0, 3, y]
		anglesData = geometry[0, 4, y]
		# loop over the number of columns
		for x in range(0, numCols):
			# if our score does not have sufficient probability,
			# ignore it
			if scoresData[x] < args["min_confidence"]:
				continue
			# compute the offset factor as our resulting feature
			# maps will be 4x smaller than the input image
			(offsetX, offsetY) = (x * 4.0, y * 4.0)
			# extract the rotation angle for the prediction and
			# then compute the sin and cosine
			angle = anglesData[x]
			cos = np.cos(angle)
			sin = np.sin(angle)
			# use the geometry volume to derive the width and height
			# of the bounding box
			h = xData0[x] + xData2[x]
			w = xData1[x] + xData3[x]
			# compute both the starting and ending (x, y)-coordinates
			# for the text prediction bounding box
			endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
			endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
			startX = int(endX - w)
			startY = int(endY - h)
			# add the bounding box coordinates and probability score
			# to our respective lists
			rects.append((startX, startY, endX, endY))
			confidences.append(scoresData[x])
	# return a tuple of the bounding boxes and associated confidences
	return (rects, confidences)

命令行參數(shù)解析：

–east : EAST 場景文本檢測器模型文件路徑。

–video ：我們輸入視頻的路徑。可選 — 如果提供了視頻路徑，則不會(huì)使用網(wǎng)絡(luò)攝像頭。

–min-confidence ：確定文本的概率閾值。可選， default=0.5 。

–width ：調(diào)整后的圖像寬度（必須是 32 的倍數(shù)）。可選的 default=320 。

–height ：調(diào)整后的圖像高度（必須是 32 的倍數(shù)）。可選的 default=320 。

與上一節(jié)中的純圖像腳本相比（在命令行參數(shù)方面）的主要變化是我用 --video 替換了 --image 參數(shù)。重要提示：EAST 文本要求您的輸入圖像尺寸是 32 的倍數(shù)，因此如果您選擇調(diào)整 --width 和 --height 值，請確保它們是 32 的倍數(shù)！接下來，我們將執(zhí)行模仿前一個(gè)腳本的重要初始化：

# initialize the original frame dimensions, new frame dimensions,
# and ratio between the dimensions
(W, H) = (None, None)
(newW, newH) = (args["width"], args["height"])
(rW, rH) = (None, None)
# define the two output layer names for the EAST detector model that
# we are interested -- the first is the output probabilities and the
# second can be used to derive the bounding box coordinates of text
layerNames = [
	"feature_fusion/Conv_7/Sigmoid",
	"feature_fusion/concat_3"]
# load the pre-trained EAST text detector
print("[INFO] loading EAST text detector...")
net = cv2.dnn.readNet(args["east"])

高度/寬度和比率初始化將允許我們稍后正確縮放邊界框。我們的輸出層名稱已定義，加載我們預(yù)先訓(xùn)練的 EAST 文本檢測器。以下塊設(shè)置我們的視頻流和每秒幀數(shù)計(jì)數(shù)器：

# if a video path was not supplied, grab the reference to the web cam
if not args.get("video", False):
	print("[INFO] starting video stream...")
	vs = VideoStream(src=0).start()
	time.sleep(1.0)
# otherwise, grab a reference to the video file
else:
	vs = cv2.VideoCapture(args["video"])
# start the FPS throughput estimator
fps = FPS().start()

我們的視頻流設(shè)置為：網(wǎng)絡(luò)攝像頭或視頻文件

初始化每秒幀數(shù)計(jì)數(shù)器并開始循環(huán)傳入幀：

# loop over frames from the video stream
while True:
	# grab the current frame, then handle if we are using a
	# VideoStream or VideoCapture object
	frame = vs.read()
	frame = frame[1] if args.get("video", False) else frame
	# check to see if we have reached the end of the stream
	if frame is None:
		break
	# resize the frame, maintaining the aspect ratio
	frame = imutils.resize(frame, width=1000)
	orig = frame.copy()
	# if our frame dimensions are None, we still need to compute the
	# ratio of old frame dimensions to new frame dimensions
	if W is None or H is None:
		(H, W) = frame.shape[:2]
		rW = W / float(newW)
		rH = H / float(newH)
	# resize the frame, this time ignoring aspect ratio
	frame = cv2.resize(frame, (newW, newH))

遍歷視頻/網(wǎng)絡(luò)攝像頭幀。我們的框架被調(diào)整大小，保持縱橫比。從那里，我們獲取維度并計(jì)算縮放比例。然后我們再次調(diào)整框架的大?。ū仨毷?32 的倍數(shù)），這次忽略縱橫比，因?yàn)槲覀円呀?jīng)存儲(chǔ)了安全保存的比率。推理和繪制文本區(qū)域邊界框發(fā)生在以下幾行：

# construct a blob from the frame and then perform a forward pass
	# of the model to obtain the two output layer sets
	blob = cv2.dnn.blobFromImage(frame, 1.0, (newW, newH),
		(123.68, 116.78, 103.94), swapRB=True, crop=False)
	net.setInput(blob)
	(scores, geometry) = net.forward(layerNames)
	# decode the predictions, then  apply non-maxima suppression to
	# suppress weak, overlapping bounding boxes
	(rects, confidences) = decode_predictions(scores, geometry)
	boxes = non_max_suppression(np.array(rects), probs=confidences)
	# loop over the bounding boxes
	for (startX, startY, endX, endY) in boxes:
		# scale the bounding box coordinates based on the respective
		# ratios
		startX = int(startX * rW)
		startY = int(startY * rH)
		endX = int(endX * rW)
		endY = int(endY * rH)
		# draw the bounding box on the frame
		cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 2)

在這個(gè)區(qū)塊中，我們：

通過創(chuàng)建 blob 并將其傳遞到網(wǎng)絡(luò)，使用 EAST 檢測文本區(qū)域。

解碼預(yù)測并應(yīng)用 NMS。我們使用之前在此腳本中定義的 decode_predictions 函數(shù)和我的 imutils non_max_suppression 便利函數(shù)。

循環(huán)邊界框并將它們繪制在框架上。這涉及按之前收集的比率縮放框。

從那里我們將關(guān)閉幀處理循環(huán)以及腳本本身：

# update the FPS counter
	fps.update()
	# show the output frame
	cv2.imshow("Text Detection", orig)
	key = cv2.waitKey(1) & 0xFF
	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break
# stop the timer and display FPS information
fps.stop()
print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
# if we are using a webcam, release the pointer
if not args.get("video", False):
	vs.stop()
# otherwise, release the file pointer
else:
	vs.release()
# close all windows
cv2.destroyAllWindows()

我們在循環(huán)的每次迭代中更新我們的 fps 計(jì)數(shù)器，以便在我們跳出循環(huán)時(shí)可以計(jì)算和顯示計(jì)時(shí)。我們在第 165 行顯示 EAST 文本檢測的輸出并處理按鍵。如果“q”被按下以“退出”，我們就會(huì)跳出循環(huán)并繼續(xù)清理和釋放指針。

視頻文字檢測結(jié)果

打開一個(gè)終端并執(zhí)行以下命令（這將啟動(dòng)您的網(wǎng)絡(luò)攝像頭，因?yàn)槲覀儧]有通過命令行參數(shù)提供 --video）：

python text_detection_video.py --east frozen_east_text_detection.pb

總結(jié)

在今天的博文中，我們學(xué)習(xí)了如何使用 OpenCV 的新 EAST 文本檢測器來自動(dòng)檢測自然場景圖像中文本的存在。

文本檢測器不僅準(zhǔn)確，而且能夠在 720p 圖像上以大約 13 FPS 的速度近乎實(shí)時(shí)地運(yùn)行。

為了提供 OpenCV 的 EAST 文本檢測器的實(shí)現(xiàn)，我需要轉(zhuǎn)換 OpenCV 的 C++ 示例；但是，我遇到了許多挑戰(zhàn)，例如：

無法使用 OpenCV 的 NMSBoxes 進(jìn)行非最大值抑制，而必須使用 imutils 中的實(shí)現(xiàn)。
由于缺少 RotatedRect 的 Python 綁定，無法計(jì)算真正的旋轉(zhuǎn)邊界框。

以上就是Python+Opencv文本檢測的實(shí)現(xiàn)的詳細(xì)內(nèi)容，更多關(guān)于Python 文本檢測的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫

CMS

常用工具

Python+Opencv文本檢測的實(shí)現(xiàn)

目錄

EAST 深度學(xué)習(xí)文本檢測器

項(xiàng)目結(jié)構(gòu)

實(shí)施說明

使用 OpenCV 實(shí)現(xiàn)我們的文本檢測器

OpenCV 文本檢測結(jié)果

視頻文字檢測結(jié)果

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具