快捷導航

如何利用opencv對拍攝圖片進行文字識別

更新時間：2024年03月30日 15:50:18 作者：碧落&凡塵

在有些工程中有時候我們需要對圖片文字識別,下面這篇文章主要給大家介紹了關(guān)于如何利用opencv對拍攝圖片進行文字識別的相關(guān)資料,文中通過代碼示例介紹的非常詳細,需要的朋友可以參考下

代碼示例：

import cv2 as cv
import numpy as np
import pytesseract
from PIL import Image

img = cv.imread('test.jpg')
rows, cols, _ = img.shape
img = cv.resize(img, (int(cols/2), int(rows/2)))
img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
nrows, ncols = img.shape
print(cols, ncols, rows, nrows)
gray_blurred = cv.GaussianBlur(img, (5, 5), 0)

flag = 200

lines = []
while len(lines) != 4:
    # 使用Canny邊緣檢測
    edges = cv.Canny(gray_blurred, 50, 150, apertureSize=3)
    lines = cv.HoughLines(edges, 1, np.pi / 180, flag)
    if lines is None:
        lines = []
    if flag < 80:
        raise Exception('未找到合適的邊緣處理參數(shù)')
    flag -= 5
print(flag)
nlines = []
# 如果找到了直線，使用它們來計算仿射變換矩陣
if lines is not None:
    for rho, theta in lines[:, 0]:
        a = np.cos(theta)
        b = np.sin(theta)
        x0 = a * rho
        y0 = b * rho
        x1 = int(x0 + 1000 * (-b))
        y1 = int(y0 + 1000 * (a))
        x2 = int(x0 - 1000 * (-b))
        y2 = int(y0 - 1000 * (a))
        cv.line(img, (x1, y1), (x2, y2), (0, 0, 255), 2)
        nlines.append([(x1, y1), (x2, y2)])
points = []
for i in range(len(nlines) - 1):
    for j in range(i + 1, len(nlines)):
        line = nlines[i]
        x1, y1 = line[0]
        x2, y2 = line[1]
        line1 = nlines[j]
        x3, y3 = line1[0]
        x4, y4 = line1[1]
        try:
            u = ((x4-x3)*(y1-y3) - (y4-y3)*(x1-x3)) / ((y4-y3)*(x2-x1) - (x4-x3)*(y2-y1))
        except Exception as e:
            continue
        x = x1 + u * (x2 - x1)
        y = y1 + u * (y2 - y1)
        if x > 0 and y > 0 and x < ncols and y < nrows:
            points.append((x, y))
pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\Tesseract-OCR\tesseract.exe'
center = (int(ncols/2), int(nrows/2))
pstmap = {}
for point in points:
    x, y = point
    cx, cy = center
    if x < cx and y < cy:
        pstmap['lt'] = point
    elif x > cx and y < cy:
        pstmap['rt'] = point
    elif x > cx and y > cy:
        pstmap['rb'] = point
    else:
        pstmap['lb'] = point

pst1 = np.float32([pstmap['lt'], pstmap['rt'], pstmap['rb'], pstmap['lb']])
pst2 = np.float32([[0, 0], [ncols, 0], [ncols, nrows], [0, nrows]])
M = cv.getPerspectiveTransform(pst1, pst2)
dst = cv.warpPerspective(img, M, (ncols, nrows))

x1, y1 = 0, 0
def mouse_callback(event, x, y, flags, param):
    global x1, y1
    if event == cv.EVENT_LBUTTONDOWN:
        x1, y1 = x, y
    elif event == cv.EVENT_LBUTTONUP:
        x2, y2 = x, y
        wimg = dst[y1:y2, x1:x2]
        _, wimg = cv.threshold(wimg, 80, 255, cv.THRESH_BINARY)
        wimg = cv.bitwise_not(wimg)
        cv.imwrite('test_dst.jpg', wimg)
        image = Image.open('test_dst.jpg')
        # 打印選定區(qū)域的坐標
        print(f"({x1}, {y1}) -> ({x2}, {y2})")
        print(pytesseract.image_to_string(image, lang='chi_sim'))
cv.namedWindow('dst')
cv.setMouseCallback("dst", mouse_callback)
cv.imshow('img', img)
cv.imshow('dst', dst)
print(dst[2])
cv.waitKey(0)
cv.destroyAllWindows()

方法：

1. 首先讀取圖片，因為我手機拍攝圖片尺寸太大，所以進行了縮放

2. 對圖片進行高斯模糊，方便進行邊緣處理

3. 從高到低適配不同的閾值檢測圖片內(nèi)容邊緣

4. 通過反向霍夫變換獲取確定邊緣直線的四個點

5. 通過直線兩兩相交確定四個定點

6. 進行透視變換

7. 添加鼠標事件，監(jiān)測鼠標選定區(qū)域

8. 鼠標選定區(qū)域后，裁剪圖片，對圖片進行二值化處理，我這里做了文字黑白反轉(zhuǎn)

9. 利用pytesseract對裁剪后的圖片進行文字識別

注意事項：

1. 選擇的文字區(qū)域會影響識別成功率，如果文字區(qū)域緊貼文字，可能會失敗，盲猜影響了特征提取

2. 圖片尺寸大小會影響邊緣檢測，不縮放圖片時，閾值調(diào)整不當?shù)脑挘?很容易生成N條邊緣直線，閾值怎么選定請了解霍夫變換的原理。

識別效果（加了二值化處理的準確度會很好）：

補充：幾個常用的OpenCV二值化代碼示例

1. 全局閾值二值化：

import cv2
img = cv2.imread('image.jpg', 0)
_, thresh = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
cv2.imshow('image', img)
cv2.imshow('threshold', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()

2. 自適應(yīng)閾值二值化：

import cv2
img = cv2.imread('image.jpg', 0)
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2)
cv2.imshow('image', img)
cv2.imshow('adaptive threshold', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()

3. Otsu二值化：

import cv2
img = cv2.imread('image.jpg', 0)
_, thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
cv2.imshow('image', img)
cv2.imshow('Otsu threshold', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()

這些示例代碼可以根據(jù)需要進行修改和調(diào)整，以適應(yīng)不同的圖像處理任務(wù)。