Python使用Tesseract?OCR實(shí)現(xiàn)識(shí)別圖片中的文字

更新時(shí)間：2025年11月10日 08:30:05 作者：閑人編程

光學(xué)字符識(shí)別OCR是一項(xiàng)將圖像中的文字轉(zhuǎn)換為可編輯文本的技術(shù),本文將詳細(xì)介紹如何使用Python結(jié)合Tesseract?OCR來(lái)實(shí)現(xiàn)圖片中文字的識(shí)別,感興趣的小伙伴可以了解下

1. 引言

光學(xué)字符識(shí)別（Optical Character Recognition，OCR）是一項(xiàng)將圖像中的文字轉(zhuǎn)換為可編輯文本的技術(shù)。隨著數(shù)字化時(shí)代的到來(lái)，OCR技術(shù)在文檔數(shù)字化、車(chē)牌識(shí)別、名片管理、自動(dòng)化數(shù)據(jù)錄入等領(lǐng)域發(fā)揮著越來(lái)越重要的作用。

在眾多OCR工具中，Tesseract OCR因其開(kāi)源、免費(fèi)且識(shí)別準(zhǔn)確率較高而廣受歡迎。最初由惠普實(shí)驗(yàn)室開(kāi)發(fā)，現(xiàn)在由Google維護(hù)，Tesseract支持100多種語(yǔ)言，并且可以通過(guò)訓(xùn)練來(lái)識(shí)別特定字體和字符集。

本文將詳細(xì)介紹如何使用Python結(jié)合Tesseract OCR來(lái)實(shí)現(xiàn)圖片中文字的識(shí)別，包括環(huán)境配置、基礎(chǔ)使用、高級(jí)功能以及實(shí)際應(yīng)用案例。

2. Tesseract OCR簡(jiǎn)介

2.1 Tesseract OCR的發(fā)展歷史

Tesseract OCR最初由惠普實(shí)驗(yàn)室在1985年至1994年間開(kāi)發(fā)。2005年，惠普將其開(kāi)源，并在2006年由Google接手維護(hù)。經(jīng)過(guò)多年的發(fā)展，Tesseract已經(jīng)成為最準(zhǔn)確的開(kāi)源OCR引擎之一。

2.2 Tesseract OCR的特點(diǎn)

多語(yǔ)言支持：支持100多種語(yǔ)言的文字識(shí)別
開(kāi)源免費(fèi)：遵循Apache License 2.0開(kāi)源協(xié)議
跨平臺(tái)：支持Windows、Linux、macOS等操作系統(tǒng)
可訓(xùn)練：支持用戶(hù)自定義訓(xùn)練數(shù)據(jù)以提高特定場(chǎng)景的識(shí)別準(zhǔn)確率
多種輸出格式：支持純文本、hOCR、PDF等多種輸出格式

2.3 Tesseract OCR的工作原理

Tesseract OCR的識(shí)別過(guò)程主要包括以下幾個(gè)步驟：

3. 環(huán)境配置與安裝

3.1 安裝Tesseract OCR引擎

Windows系統(tǒng)

下載Tesseract安裝程序：

訪(fǎng)問(wèn) GitHub releases頁(yè)面
下載適合的Windows安裝包（如：tesseract-ocr-w64-setup-5.3.3.20231005.exe）

運(yùn)行安裝程序，注意勾選"Additional language data"以安裝多語(yǔ)言支持

將Tesseract安裝路徑（如：C:\Program Files\Tesseract-OCR\）添加到系統(tǒng)PATH環(huán)境變量

macOS系統(tǒng)

# 使用Homebrew安裝
brew install tesseract

# 安裝語(yǔ)言包
brew install tesseract-lang

Linux系統(tǒng)（Ubuntu/Debian）

# 更新包列表
sudo apt update

# 安裝Tesseract OCR
sudo apt install tesseract-ocr

# 安裝中文語(yǔ)言包
sudo apt install tesseract-ocr-chi-sim tesseract-ocr-chi-tra

3.2 安裝Python相關(guān)庫(kù)

# 安裝Pillow用于圖像處理
pip install Pillow

# 安裝pytesseract用于調(diào)用Tesseract OCR
pip install pytesseract

# 安裝OpenCV用于高級(jí)圖像處理（可選）
pip install opencv-python

# 安裝numpy（通常OpenCV會(huì)依賴(lài)）
pip install numpy

3.3 驗(yàn)證安裝

完成安裝后，可以通過(guò)以下命令驗(yàn)證Tesseract是否正確安裝：

tesseract --version

4. 基礎(chǔ)使用：簡(jiǎn)單的文字識(shí)別

4.1 基本OCR函數(shù)實(shí)現(xiàn)

讓我們從最簡(jiǎn)單的OCR功能開(kāi)始，創(chuàng)建一個(gè)能夠識(shí)別圖片中文字的基本函數(shù)。

import pytesseract
from PIL import Image
import os

def basic_ocr(image_path, language='eng'):
    """
    基礎(chǔ)OCR函數(shù)：識(shí)別圖片中的文字
    
    參數(shù):
        image_path (str): 圖片文件路徑
        language (str): 識(shí)別語(yǔ)言，默認(rèn)為英語(yǔ)('eng')
    
    返回:
        str: 識(shí)別出的文本內(nèi)容
    """
    try:
        # 檢查圖片文件是否存在
        if not os.path.exists(image_path):
            raise FileNotFoundError(f"圖片文件不存在: {image_path}")
        
        # 使用PIL打開(kāi)圖片
        image = Image.open(image_path)
        
        # 使用Tesseract進(jìn)行OCR識(shí)別
        text = pytesseract.image_to_string(image, lang=language)
        
        return text
    
    except Exception as e:
        print(f"OCR識(shí)別過(guò)程中出現(xiàn)錯(cuò)誤: {str(e)}")
        return ""

# 使用示例
if __name__ == "__main__":
    # 替換為你的圖片路徑
    image_path = "sample_text.png"
    result = basic_ocr(image_path)
    print("識(shí)別結(jié)果:")
    print(result)

4.2 處理不同語(yǔ)言的文字

Tesseract支持多種語(yǔ)言，可以通過(guò)指定語(yǔ)言參數(shù)來(lái)識(shí)別不同語(yǔ)言的文字。

def multi_language_ocr(image_path, languages):
    """
    多語(yǔ)言O(shè)CR識(shí)別
    
    參數(shù):
        image_path (str): 圖片文件路徑
        languages (list): 語(yǔ)言列表，如['eng', 'chi_sim']
    
    返回:
        dict: 各語(yǔ)言的識(shí)別結(jié)果
    """
    results = {}
    
    for lang in languages:
        try:
            image = Image.open(image_path)
            text = pytesseract.image_to_string(image, lang=lang)
            results[lang] = text
        except Exception as e:
            print(f"語(yǔ)言 {lang} 識(shí)別失敗: {str(e)}")
            results[lang] = ""
    
    return results

# 使用示例
languages = ['eng', 'chi_sim', 'fra']  # 英語(yǔ)、簡(jiǎn)體中文、法語(yǔ)
results = multi_language_ocr("multilingual_text.png", languages)

for lang, text in results.items():
    print(f"{lang} 識(shí)別結(jié)果:")
    print(text)
    print("-" * 50)

5. 圖像預(yù)處理技術(shù)

OCR識(shí)別的準(zhǔn)確率很大程度上取決于輸入圖像的質(zhì)量。本節(jié)介紹幾種常用的圖像預(yù)處理技術(shù)。

5.1 圖像預(yù)處理的重要性

未經(jīng)處理的圖像可能包含以下問(wèn)題：

噪聲和偽影
光照不均
文本傾斜
低對(duì)比度
復(fù)雜背景

這些因素都會(huì)降低OCR的識(shí)別準(zhǔn)確率。通過(guò)適當(dāng)?shù)念A(yù)處理，我們可以顯著提高識(shí)別效果。

5.2 常用的預(yù)處理技術(shù)

5.2.1 灰度化與二值化

將彩色圖像轉(zhuǎn)換為灰度圖，然后進(jìn)行二值化處理，可以簡(jiǎn)化后續(xù)處理步驟。

import cv2
import numpy as np
from PIL import Image

def preprocess_image(image_path, output_path=None):
    """
    圖像預(yù)處理：灰度化、二值化、去噪
    
    參數(shù):
        image_path (str): 輸入圖片路徑
        output_path (str): 預(yù)處理后圖片保存路徑（可選）
    
    返回:
        numpy.ndarray: 預(yù)處理后的圖像數(shù)組
    """
    # 讀取圖像
    image = cv2.imread(image_path)
    
    # 轉(zhuǎn)換為灰度圖
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # 使用高斯模糊去噪
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    
    # 使用Otsu's二值化方法
    _, binary = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    
    # 可選：保存預(yù)處理后的圖像
    if output_path:
        cv2.imwrite(output_path, binary)
    
    return binary

# 使用預(yù)處理后的圖像進(jìn)行OCR
def ocr_with_preprocessing(image_path):
    """
    使用預(yù)處理后的圖像進(jìn)行OCR識(shí)別
    
    參數(shù):
        image_path (str): 圖片路徑
    
    返回:
        str: 識(shí)別結(jié)果
    """
    # 圖像預(yù)處理
    processed_image = preprocess_image(image_path)
    
    # 將numpy數(shù)組轉(zhuǎn)換為PIL圖像
    pil_image = Image.fromarray(processed_image)
    
    # OCR識(shí)別
    text = pytesseract.image_to_string(pil_image, lang='eng')
    
    return text

5.2.2 噪聲去除

使用形態(tài)學(xué)操作去除小噪聲點(diǎn)。

def remove_noise(image):
    """
    使用形態(tài)學(xué)操作去除噪聲
    
    參數(shù):
        image (numpy.ndarray): 輸入圖像
    
    返回:
        numpy.ndarray: 去噪后的圖像
    """
    # 定義核（結(jié)構(gòu)元素）
    kernel = np.ones((1, 1), np.uint8)
    
    # 開(kāi)運(yùn)算：先腐蝕后膨脹，去除小噪聲點(diǎn)
    image = cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)
    
    # 閉運(yùn)算：先膨脹后腐蝕，填充小洞
    image = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel)
    
    return image

5.2.3 傾斜校正

檢測(cè)并校正文本的傾斜角度。

def correct_skew(image):
    """
    檢測(cè)并校正圖像傾斜
    
    參數(shù):
        image (numpy.ndarray): 輸入圖像
    
    返回:
        numpy.ndarray: 校正后的圖像
        float: 傾斜角度
    """
    # 邊緣檢測(cè)
    edges = cv2.Canny(image, 50, 150, apertureSize=3)
    
    # 霍夫直線(xiàn)檢測(cè)
    lines = cv2.HoughLines(edges, 1, np.pi/180, threshold=100)
    
    if lines is not None:
        angles = []
        for rho, theta in lines[:, 0]:
            angle = theta * 180 / np.pi - 90
            angles.append(angle)
        
        # 計(jì)算平均角度
        median_angle = np.median(angles)
        
        # 旋轉(zhuǎn)圖像校正傾斜
        (h, w) = image.shape[:2]
        center = (w // 2, h // 2)
        M = cv2.getRotationMatrix2D(center, median_angle, 1.0)
        corrected = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, 
                                  borderMode=cv2.BORDER_REPLICATE)
        
        return corrected, median_angle
    
    return image, 0

5.2.4 對(duì)比度增強(qiáng)

提高圖像對(duì)比度，使文本更加清晰。

def enhance_contrast(image):
    """
    增強(qiáng)圖像對(duì)比度
    
    參數(shù):
        image (numpy.ndarray): 輸入圖像
    
    返回:
        numpy.ndarray: 對(duì)比度增強(qiáng)后的圖像
    """
    # 使用CLAHE（限制對(duì)比度自適應(yīng)直方圖均衡化）
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
    enhanced = clahe.apply(image)
    
    return enhanced

5.3 完整的預(yù)處理流程

def complete_preprocessing(image_path, output_path=None):
    """
    完整的圖像預(yù)處理流程
    
    參數(shù):
        image_path (str): 輸入圖片路徑
        output_path (str): 預(yù)處理后圖片保存路徑（可選）
    
    返回:
        numpy.ndarray: 預(yù)處理后的圖像
    """
    # 讀取圖像
    image = cv2.imread(image_path)
    
    # 轉(zhuǎn)換為灰度圖
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # 去噪
    denoised = cv2.medianBlur(gray, 3)
    
    # 對(duì)比度增強(qiáng)
    enhanced = enhance_contrast(denoised)
    
    # 二值化
    _, binary = cv2.threshold(enhanced, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    
    # 去除噪聲
    cleaned = remove_noise(binary)
    
    # 傾斜校正
    corrected, angle = correct_skew(cleaned)
    
    print(f"檢測(cè)到的傾斜角度: {angle:.2f}度")
    
    # 可選：保存預(yù)處理后的圖像
    if output_path:
        cv2.imwrite(output_path, corrected)
    
    return corrected

6. 高級(jí)功能與配置

6.1 Tesseract配置參數(shù)

Tesseract提供了多種配置選項(xiàng)，可以通過(guò)config參數(shù)進(jìn)行設(shè)置。

def advanced_ocr(image_path, config_options=None):
    """
    使用高級(jí)配置的OCR識(shí)別
    
    參數(shù):
        image_path (str): 圖片路徑
        config_options (str): Tesseract配置參數(shù)
    
    返回:
        dict: 包含文本和詳細(xì)信息的字典
    """
    if config_options is None:
        config_options = '--oem 3 --psm 6'
    
    image = Image.open(image_path)
    
    # 獲取識(shí)別結(jié)果和詳細(xì)信息
    data = pytesseract.image_to_data(image, config=config_options, output_type=pytesseract.Output.DICT)
    
    # 提取識(shí)別文本
    text = pytesseract.image_to_string(image, config=config_options)
    
    return {
        'text': text,
        'data': data
    }

# 常用配置參數(shù)說(shuō)明
"""
--psm N: 頁(yè)面分割模式
    0 = 僅方向和腳本檢測(cè)
    1 = 自動(dòng)頁(yè)面分割與文本檢測(cè)
    3 = 全自動(dòng)頁(yè)面分割，無(wú)文本檢測(cè)（默認(rèn)）
    6 = 統(tǒng)一文本塊
    7 = 單行文本
    8 = 單個(gè)單詞
    13 = 原始行文本

--oem N: OCR引擎模式
    0 = 僅傳統(tǒng)引擎
    1 = 僅神經(jīng)網(wǎng)絡(luò)LSTM引擎
    2 = 傳統(tǒng)+LSTM引擎
    3 = 默認(rèn)，基于可用內(nèi)容選擇
"""

6.2 獲取邊界框信息

獲取每個(gè)識(shí)別字符、單詞或文本行的位置信息。

def get_bounding_boxes(image_path, output_image_path=None):
    """
    獲取文本邊界框并在圖像上繪制
    
    參數(shù):
        image_path (str): 輸入圖片路徑
        output_image_path (str): 帶邊界框的輸出圖片路徑（可選）
    
    返回:
        list: 邊界框信息列表
    """
    # 讀取圖像
    image = cv2.imread(image_path)
    rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    pil_image = Image.fromarray(rgb_image)
    
    # 獲取詳細(xì)的OCR數(shù)據(jù)
    data = pytesseract.image_to_data(pil_image, output_type=pytesseract.Output.DICT)
    
    boxes = []
    
    # 遍歷所有檢測(cè)到的文本元素
    n_boxes = len(data['level'])
    for i in range(n_boxes):
        # 只處理置信度較高的結(jié)果
        if int(data['conf'][i]) > 30:
            (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
            text = data['text'][i]
            
            boxes.append({
                'text': text,
                'position': (x, y, w, h),
                'confidence': int(data['conf'][i])
            })
            
            # 在圖像上繪制邊界框
            if output_image_path:
                cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
                cv2.putText(image, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 
                           0.5, (0, 255, 0), 2)
    
    # 保存帶邊界框的圖像
    if output_image_path:
        cv2.imwrite(output_image_path, image)
    
    return boxes

6.3 批量處理多張圖片

import glob

def batch_ocr(image_folder, output_file="ocr_results.txt"):
    """
    批量處理文件夾中的圖片
    
    參數(shù):
        image_folder (str): 圖片文件夾路徑
        output_file (str): 結(jié)果輸出文件路徑
    """
    # 支持的圖片格式
    image_extensions = ['*.png', '*.jpg', '*.jpeg', '*.bmp', '*.tiff']
    
    image_paths = []
    for extension in image_extensions:
        image_paths.extend(glob.glob(os.path.join(image_folder, extension)))
    
    results = []
    
    for image_path in image_paths:
        print(f"處理圖片: {os.path.basename(image_path)}")
        
        try:
            # 預(yù)處理圖像
            processed_image = complete_preprocessing(image_path)
            pil_image = Image.fromarray(processed_image)
            
            # OCR識(shí)別
            text = pytesseract.image_to_string(pil_image, lang='eng+chi_sim')
            
            results.append({
                'file': os.path.basename(image_path),
                'text': text
            })
            
        except Exception as e:
            print(f"處理圖片 {image_path} 時(shí)出錯(cuò): {str(e)}")
            results.append({
                'file': os.path.basename(image_path),
                'text': f"識(shí)別失敗: {str(e)}"
            })
    
    # 將結(jié)果寫(xiě)入文件
    with open(output_file, 'w', encoding='utf-8') as f:
        for result in results:
            f.write(f"文件: {result['file']}\n")
            f.write(f"識(shí)別結(jié)果:\n{result['text']}\n")
            f.write("=" * 50 + "\n")
    
    print(f"批量處理完成，結(jié)果已保存到: {output_file}")
    return results

7. 性能優(yōu)化與準(zhǔn)確率提升

7.1 選擇合適的頁(yè)面分割模式（PSM）

不同的頁(yè)面布局需要不同的分割模式：

def optimize_psm(image_path):
    """
    嘗試不同的頁(yè)面分割模式，找到最佳結(jié)果
    
    參數(shù):
        image_path (str): 圖片路徑
    
    返回:
        dict: 各PSM模式的識(shí)別結(jié)果
    """
    image = Image.open(image_path)
    
    # 定義不同的PSM模式及其描述
    psm_modes = {
        0: "僅方向和腳本檢測(cè)",
        1: "自動(dòng)頁(yè)面分割與文本檢測(cè)",
        3: "全自動(dòng)頁(yè)面分割，無(wú)文本檢測(cè)（默認(rèn)）",
        6: "統(tǒng)一文本塊",
        7: "單行文本",
        8: "單個(gè)單詞",
        13: "原始行文本"
    }
    
    results = {}
    
    for psm, description in psm_modes.items():
        try:
            config = f'--psm {psm}'
            text = pytesseract.image_to_string(image, config=config)
            results[psm] = {
                'description': description,
                'text': text
            }
        except Exception as e:
            results[psm] = {
                'description': description,
                'text': f"識(shí)別失敗: {str(e)}"
            }
    
    return results

7.2 語(yǔ)言模型優(yōu)化

使用合適的語(yǔ)言模型和詞典可以提高識(shí)別準(zhǔn)確率。

def optimize_language_model(image_path, text_type="general"):
    """
    根據(jù)文本類(lèi)型優(yōu)化語(yǔ)言模型
    
    參數(shù):
        image_path (str): 圖片路徑
        text_type (str): 文本類(lèi)型，如"general", "document", "code"等
    
    返回:
        str: 優(yōu)化后的識(shí)別結(jié)果
    """
    image = Image.open(image_path)
    
    # 根據(jù)文本類(lèi)型選擇配置
    configs = {
        "general": "--oem 3 --psm 6",
        "document": "--oem 3 --psm 1",
        "single_line": "--oem 3 --psm 7",
        "single_word": "--oem 3 --psm 8",
        "code": "--oem 3 --psm 6 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz{}[]();:.,<>/*-+="
    }
    
    config = configs.get(text_type, configs["general"])
    
    text = pytesseract.image_to_string(image, config=config)
    
    return text

7.3 自定義詞典

對(duì)于特定領(lǐng)域的OCR應(yīng)用，可以使用自定義詞典來(lái)提高專(zhuān)業(yè)術(shù)語(yǔ)的識(shí)別準(zhǔn)確率。

def create_custom_dictionary(word_list, dictionary_path="custom_words.txt"):
    """
    創(chuàng)建自定義詞典
    
    參數(shù):
        word_list (list): 自定義單詞列表
        dictionary_path (str): 詞典文件保存路徑
    """
    with open(dictionary_path, 'w', encoding='utf-8') as f:
        for word in word_list:
            f.write(f"{word}\n")
    
    print(f"自定義詞典已創(chuàng)建: {dictionary_path}")

def ocr_with_custom_dictionary(image_path, dictionary_path, language='eng'):
    """
    使用自定義詞典進(jìn)行OCR識(shí)別
    
    參數(shù):
        image_path (str): 圖片路徑
        dictionary_path (str): 自定義詞典路徑
        language (str): 基礎(chǔ)語(yǔ)言
    
    返回:
        str: 識(shí)別結(jié)果
    """
    image = Image.open(image_path)
    
    # 配置參數(shù)，加載自定義詞典
    config = f'--oem 3 --psm 6 --user-words {dictionary_path}'
    
    text = pytesseract.image_to_string(image, lang=language, config=config)
    
    return text

8. 實(shí)際應(yīng)用案例

8.1 文檔數(shù)字化

將掃描的文檔圖片轉(zhuǎn)換為可編輯的文本。

class DocumentOCR:
    """
    文檔OCR處理類(lèi)
    """
    
    def __init__(self, language='eng+chi_sim'):
        self.language = language
    
    def process_document(self, image_path, output_text_path=None, output_pdf_path=None):
        """
        處理文檔圖片
        
        參數(shù):
            image_path (str): 文檔圖片路徑
            output_text_path (str): 文本輸出路徑（可選）
            output_pdf_path (str): PDF輸出路徑（可選）
        
        返回:
            dict: 處理結(jié)果
        """
        try:
            # 圖像預(yù)處理
            processed_image = complete_preprocessing(image_path)
            pil_image = Image.fromarray(processed_image)
            
            # 獲取文本
            text = pytesseract.image_to_string(pil_image, lang=self.language)
            
            # 獲取詳細(xì)信息用于生成搜索PDF
            pdf_data = pytesseract.image_to_pdf_or_hocr(pil_image, extension='pdf', lang=self.language)
            
            result = {
                'success': True,
                'text': text,
                'pdf_data': pdf_data
            }
            
            # 保存文本結(jié)果
            if output_text_path:
                with open(output_text_path, 'w', encoding='utf-8') as f:
                    f.write(text)
                print(f"文本結(jié)果已保存到: {output_text_path}")
            
            # 保存PDF結(jié)果
            if output_pdf_path:
                with open(output_pdf_path, 'wb') as f:
                    f.write(pdf_data)
                print(f"可搜索PDF已保存到: {output_pdf_path}")
            
            return result
            
        except Exception as e:
            print(f"文檔處理失敗: {str(e)}")
            return {
                'success': False,
                'error': str(e)
            }
    
    def batch_process_documents(self, input_folder, output_folder):
        """
        批量處理文檔文件夾
        
        參數(shù):
            input_folder (str): 輸入文件夾路徑
            output_folder (str): 輸出文件夾路徑
        """
        # 創(chuàng)建輸出文件夾
        os.makedirs(output_folder, exist_ok=True)
        
        # 獲取所有圖片文件
        image_extensions = ['*.png', '*.jpg', '*.jpeg', '*.bmp', '*.tiff']
        image_paths = []
        for extension in image_extensions:
            image_paths.extend(glob.glob(os.path.join(input_folder, extension)))
        
        results = []
        
        for image_path in image_paths:
            filename = os.path.splitext(os.path.basename(image_path))[0]
            
            output_text_path = os.path.join(output_folder, f"{filename}.txt")
            output_pdf_path = os.path.join(output_folder, f"{filename}.pdf")
            
            print(f"處理文檔: {os.path.basename(image_path)}")
            
            result = self.process_document(
                image_path, 
                output_text_path, 
                output_pdf_path
            )
            
            results.append({
                'file': os.path.basename(image_path),
                'result': result
            })
        
        return results

# 使用示例
doc_ocr = DocumentOCR(language='eng+chi_sim')
results = doc_ocr.batch_process_documents("input_docs", "output_docs")

8.2 名片信息提取

從名片圖片中提取聯(lián)系信息。

class BusinessCardOCR:
    """
    名片OCR處理類(lèi)
    """
    
    def __init__(self):
        self.contact_patterns = {
            'phone': r'(\+?[0-9]{1,3}[-.\s]?)?(\(?[0-9]{1,4}\)?[-.\s]?)?[0-9]{1,4}[-.\s]?[0-9]{1,9}',
            'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
            'website': r'((https?://)?(www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(/\S*)?)'
        }
    
    def extract_contact_info(self, image_path):
        """
        從名片圖片中提取聯(lián)系信息
        
        參數(shù):
            image_path (str): 名片圖片路徑
        
        返回:
            dict: 提取的聯(lián)系信息
        """
        import re
        
        # OCR識(shí)別
        text = basic_ocr(image_path)
        
        contact_info = {
            'raw_text': text,
            'name': '',
            'company': '',
            'phone': [],
            'email': [],
            'website': []
        }
        
        # 提取電話(huà)號(hào)碼
        phone_matches = re.findall(self.contact_patterns['phone'], text)
        contact_info['phone'] = [match[0] + match[1] for match in phone_matches if any(match)]
        
        # 提取郵箱地址
        contact_info['email'] = re.findall(self.contact_patterns['email'], text)
        
        # 提取網(wǎng)址
        website_matches = re.findall(self.contact_patterns['website'], text)
        contact_info['website'] = [match[0] for match in website_matches if match[0]]
        
        # 簡(jiǎn)單的姓名和公司提?。▽?shí)際應(yīng)用中可能需要更復(fù)雜的NLP處理）
        lines = text.split('\n')
        non_empty_lines = [line.strip() for line in lines if line.strip()]
        
        if len(non_empty_lines) >= 2:
            contact_info['name'] = non_empty_lines[0]
            contact_info['company'] = non_empty_lines[1]
        
        return contact_info

# 使用示例
card_ocr = BusinessCardOCR()
contact_info = card_ocr.extract_contact_info("business_card.jpg")
print("提取的聯(lián)系信息:")
for key, value in contact_info.items():
    print(f"{key}: {value}")

8.3 表格數(shù)據(jù)提取

從圖片中的表格提取結(jié)構(gòu)化數(shù)據(jù)。

def extract_table_data(image_path):
    """
    從圖片中的表格提取數(shù)據(jù)
    
    參數(shù):
        image_path (str): 包含表格的圖片路徑
    
    返回:
        list: 表格數(shù)據(jù)（二維列表）
    """
    # 預(yù)處理圖像，特別強(qiáng)調(diào)垂直線(xiàn)和水平線(xiàn)
    image = cv2.imread(image_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # 二值化
    _, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
    
    # 檢測(cè)水平線(xiàn)
    horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25, 1))
    horizontal_lines = cv2.morphologyEx(binary, cv2.MORPH_OPEN, horizontal_kernel)
    
    # 檢測(cè)垂直線(xiàn)
    vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 25))
    vertical_lines = cv2.morphologyEx(binary, cv2.MORPH_OPEN, vertical_kernel)
    
    # 合并線(xiàn)條
    table_mask = cv2.bitwise_or(horizontal_lines, vertical_lines)
    
    # OCR識(shí)別
    pil_image = Image.fromarray(cv2.bitwise_not(table_mask))
    text = pytesseract.image_to_string(pil_image)
    
    # 簡(jiǎn)單的表格解析（實(shí)際應(yīng)用可能需要更復(fù)雜的邏輯）
    table_data = []
    lines = text.split('\n')
    
    for line in lines:
        if line.strip():
            # 假設(shè)表格列由多個(gè)空格分隔
            row = [cell.strip() for cell in line.split('  ') if cell.strip()]
            if row:
                table_data.append(row)
    
    return table_data

# 使用示例
table_data = extract_table_data("table_image.png")
print("提取的表格數(shù)據(jù):")
for row in table_data:
    print(row)

9. 完整代碼實(shí)現(xiàn)

以下是一個(gè)完整的OCR工具類(lèi)，整合了前面介紹的各種功能：

import pytesseract
import cv2
import numpy as np
from PIL import Image
import os
import glob
import re

class AdvancedOCR:
    """
    高級(jí)OCR工具類(lèi)
    """
    
    def __init__(self, default_language='eng'):
        self.default_language = default_language
        
    def preprocess_image(self, image_path, output_path=None):
        """
        圖像預(yù)處理
        
        參數(shù):
            image_path (str): 輸入圖片路徑
            output_path (str): 預(yù)處理后圖片保存路徑（可選）
        
        返回:
            numpy.ndarray: 預(yù)處理后的圖像
        """
        # 讀取圖像
        image = cv2.imread(image_path)
        if image is None:
            raise ValueError(f"無(wú)法讀取圖像: {image_path}")
        
        # 轉(zhuǎn)換為灰度圖
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        
        # 去噪
        denoised = cv2.medianBlur(gray, 3)
        
        # 對(duì)比度增強(qiáng)
        clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
        enhanced = clahe.apply(denoised)
        
        # 二值化
        _, binary = cv2.threshold(enhanced, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
        
        # 形態(tài)學(xué)操作去噪
        kernel = np.ones((1, 1), np.uint8)
        cleaned = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
        cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_CLOSE, kernel)
        
        # 可選：保存預(yù)處理后的圖像
        if output_path:
            cv2.imwrite(output_path, cleaned)
        
        return cleaned
    
    def correct_skew(self, image):
        """
        校正圖像傾斜
        
        參數(shù):
            image (numpy.ndarray): 輸入圖像
        
        返回:
            numpy.ndarray: 校正后的圖像
            float: 傾斜角度
        """
        # 邊緣檢測(cè)
        edges = cv2.Canny(image, 50, 150, apertureSize=3)
        
        # 霍夫直線(xiàn)檢測(cè)
        lines = cv2.HoughLines(edges, 1, np.pi/180, threshold=100)
        
        if lines is not None:
            angles = []
            for rho, theta in lines[:, 0]:
                angle = theta * 180 / np.pi - 90
                angles.append(angle)
            
            # 計(jì)算中位數(shù)角度
            median_angle = np.median(angles)
            
            # 旋轉(zhuǎn)圖像校正傾斜
            (h, w) = image.shape[:2]
            center = (w // 2, h // 2)
            M = cv2.getRotationMatrix2D(center, median_angle, 1.0)
            corrected = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, 
                                      borderMode=cv2.BORDER_REPLICATE)
            
            return corrected, median_angle
        
        return image, 0
    
    def extract_text(self, image_path, language=None, psm=6, preprocess=True):
        """
        提取圖像中的文本
        
        參數(shù):
            image_path (str): 圖片路徑
            language (str): 識(shí)別語(yǔ)言
            psm (int): 頁(yè)面分割模式
            preprocess (bool): 是否進(jìn)行預(yù)處理
        
        返回:
            str: 識(shí)別出的文本
        """
        if language is None:
            language = self.default_language
        
        try:
            if preprocess:
                # 預(yù)處理圖像
                processed_image = self.preprocess_image(image_path)
                pil_image = Image.fromarray(processed_image)
            else:
                # 直接使用原圖
                pil_image = Image.open(image_path)
            
            # 配置參數(shù)
            config = f'--oem 3 --psm {psm}'
            
            # OCR識(shí)別
            text = pytesseract.image_to_string(pil_image, lang=language, config=config)
            
            return text.strip()
            
        except Exception as e:
            print(f"文本提取失敗: {str(e)}")
            return ""
    
    def extract_text_with_boxes(self, image_path, language=None, output_image_path=None, confidence_threshold=30):
        """
        提取文本及邊界框信息
        
        參數(shù):
            image_path (str): 圖片路徑
            language (str): 識(shí)別語(yǔ)言
            output_image_path (str): 帶邊界框的輸出圖片路徑（可選）
            confidence_threshold (int): 置信度閾值
        
        返回:
            dict: 包含文本和邊界框信息的字典
        """
        if language is None:
            language = self.default_language
        
        # 讀取圖像
        image = cv2.imread(image_path)
        rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        pil_image = Image.fromarray(rgb_image)
        
        # 獲取詳細(xì)的OCR數(shù)據(jù)
        data = pytesseract.image_to_data(pil_image, lang=language, output_type=pytesseract.Output.DICT)
        
        # 提取文本和邊界框
        text_boxes = []
        n_boxes = len(data['level'])
        
        for i in range(n_boxes):
            if int(data['conf'][i]) > confidence_threshold:
                (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
                text = data['text'][i].strip()
                
                if text:  # 只保留非空文本
                    text_boxes.append({
                        'text': text,
                        'position': (x, y, w, h),
                        'confidence': int(data['conf'][i])
                    })
                    
                    # 在圖像上繪制邊界框
                    if output_image_path:
                        cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
                        cv2.putText(image, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 
                                   0.5, (0, 255, 0), 2)
        
        # 保存帶邊界框的圖像
        if output_image_path:
            cv2.imwrite(output_image_path, image)
        
        # 提取完整文本
        full_text = pytesseract.image_to_string(pil_image, lang=language)
        
        return {
            'full_text': full_text.strip(),
            'text_boxes': text_boxes,
            'raw_data': data
        }
    
    def batch_process(self, input_folder, output_folder, language=None):
        """
        批量處理文件夾中的圖片
        
        參數(shù):
            input_folder (str): 輸入文件夾路徑
            output_folder (str): 輸出文件夾路徑
            language (str): 識(shí)別語(yǔ)言
        
        返回:
            list: 處理結(jié)果列表
        """
        if language is None:
            language = self.default_language
        
        # 創(chuàng)建輸出文件夾
        os.makedirs(output_folder, exist_ok=True)
        
        # 獲取所有圖片文件
        image_extensions = ['*.png', '*.jpg', '*.jpeg', '*.bmp', '*.tiff']
        image_paths = []
        for extension in image_extensions:
            image_paths.extend(glob.glob(os.path.join(input_folder, extension)))
        
        results = []
        
        for image_path in image_paths:
            filename = os.path.splitext(os.path.basename(image_path))[0]
            
            print(f"處理圖片: {os.path.basename(image_path)}")
            
            try:
                # 提取文本
                text = self.extract_text(image_path, language=language)
                
                # 保存文本結(jié)果
                output_text_path = os.path.join(output_folder, f"{filename}.txt")
                with open(output_text_path, 'w', encoding='utf-8') as f:
                    f.write(text)
                
                # 保存帶邊界框的圖像
                output_image_path = os.path.join(output_folder, f"{filename}_boxes.png")
                box_data = self.extract_text_with_boxes(
                    image_path, 
                    language=language, 
                    output_image_path=output_image_path
                )
                
                results.append({
                    'file': os.path.basename(image_path),
                    'text': text,
                    'boxes': box_data['text_boxes'],
                    'success': True
                })
                
            except Exception as e:
                print(f"處理圖片 {image_path} 時(shí)出錯(cuò): {str(e)}")
                results.append({
                    'file': os.path.basename(image_path),
                    'error': str(e),
                    'success': False
                })
        
        # 生成處理報(bào)告
        report_path = os.path.join(output_folder, "processing_report.txt")
        with open(report_path, 'w', encoding='utf-8') as f:
            f.write("OCR處理報(bào)告\n")
            f.write("=" * 50 + "\n")
            successful = sum(1 for r in results if r['success'])
            f.write(f"成功處理: {successful}/{len(results)} 個(gè)文件\n\n")
            
            for result in results:
                f.write(f"文件: {result['file']}\n")
                if result['success']:
                    f.write(f"狀態(tài): 成功\n")
                    f.write(f"提取字符數(shù): {len(result['text'])}\n")
                else:
                    f.write(f"狀態(tài): 失敗 - {result['error']}\n")
                f.write("-" * 30 + "\n")
        
        print(f"批量處理完成，報(bào)告已保存到: {report_path}")
        return results

# 使用示例
if __name__ == "__main__":
    # 創(chuàng)建OCR實(shí)例
    ocr = AdvancedOCR(default_language='eng+chi_sim')
    
    # 單張圖片處理
    result = ocr.extract_text("sample.png")
    print("識(shí)別結(jié)果:")
    print(result)
    
    # 批量處理
    # results = ocr.batch_process("input_images", "output_results")

10. 常見(jiàn)問(wèn)題與解決方案

10.1 識(shí)別準(zhǔn)確率低

問(wèn)題原因：

圖像質(zhì)量差
文本字體特殊
背景復(fù)雜
語(yǔ)言模型不匹配

解決方案：

優(yōu)化圖像預(yù)處理流程
嘗試不同的PSM模式
使用合適的語(yǔ)言包
訓(xùn)練自定義語(yǔ)言模型

10.2 處理速度慢

問(wèn)題原因：

圖像分辨率過(guò)高
使用了復(fù)雜的預(yù)處理
同時(shí)處理多語(yǔ)言

解決方案：

適當(dāng)降低圖像分辨率
根據(jù)需求簡(jiǎn)化預(yù)處理步驟
使用GPU加速（如果支持）
只加載需要的語(yǔ)言包

10.3 內(nèi)存占用過(guò)高

問(wèn)題原因：

同時(shí)處理大量高分辨率圖像
內(nèi)存泄漏

解決方案：

分批處理大文件
及時(shí)釋放不再使用的資源
使用流式處理

11. 總結(jié)與展望

本文詳細(xì)介紹了如何使用Python和Tesseract OCR實(shí)現(xiàn)圖片中文字的識(shí)別。我們從基礎(chǔ)的環(huán)境配置開(kāi)始，逐步深入到圖像預(yù)處理、高級(jí)功能配置、性能優(yōu)化以及實(shí)際應(yīng)用案例。

11.1 主要收獲

環(huán)境配置：學(xué)會(huì)了在不同操作系統(tǒng)上安裝和配置Tesseract OCR
基礎(chǔ)使用：掌握了基本的OCR識(shí)別方法和多語(yǔ)言支持
圖像預(yù)處理：了解了各種圖像預(yù)處理技術(shù)對(duì)識(shí)別準(zhǔn)確率的影響
高級(jí)功能：學(xué)會(huì)了使用配置參數(shù)優(yōu)化識(shí)別結(jié)果
實(shí)際應(yīng)用：實(shí)現(xiàn)了文檔數(shù)字化、名片信息提取等實(shí)用功能

11.2 未來(lái)發(fā)展方向

隨著人工智能技術(shù)的發(fā)展，OCR技術(shù)也在不斷進(jìn)步：

深度學(xué)習(xí)應(yīng)用：基于深度學(xué)習(xí)的OCR模型在復(fù)雜場(chǎng)景下表現(xiàn)更好
端到端識(shí)別：直接從圖像到結(jié)構(gòu)化數(shù)據(jù)的端到端識(shí)別系統(tǒng)
多模態(tài)融合：結(jié)合文本、圖像、布局等多種信息進(jìn)行綜合理解
實(shí)時(shí)處理：移動(dòng)設(shè)備和邊緣計(jì)算設(shè)備上的實(shí)時(shí)OCR應(yīng)用

11.3 進(jìn)一步學(xué)習(xí)建議

學(xué)習(xí)Tesseract的訓(xùn)練方法，創(chuàng)建自定義語(yǔ)言模型
探索其他OCR引擎，如Google Cloud Vision API、Amazon Textract等
研究基于深度學(xué)習(xí)的OCR模型，如CRNN、Attention-OCR等
了解自然語(yǔ)言處理技術(shù)，結(jié)合OCR結(jié)果進(jìn)行更深層次的文本理解

通過(guò)不斷學(xué)習(xí)和實(shí)踐，你將能夠構(gòu)建更加智能、高效的OCR應(yīng)用，解決實(shí)際工作中的文字識(shí)別需求。

注意：本文提供的代碼示例需要根據(jù)實(shí)際環(huán)境進(jìn)行調(diào)整。在使用前，請(qǐng)確保已正確安裝所有依賴(lài)庫(kù)，并根據(jù)具體需求修改文件路徑和參數(shù)設(shè)置。

以上就是Python使用Tesseract OCR實(shí)現(xiàn)識(shí)別圖片中的文字的詳細(xì)內(nèi)容，更多關(guān)于Python識(shí)別圖片文字的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

Python使用Tesseract?OCR實(shí)現(xiàn)識(shí)別圖片中的文字

目錄

1. 引言

2. Tesseract OCR簡(jiǎn)介

2.1 Tesseract OCR的發(fā)展歷史

2.2 Tesseract OCR的特點(diǎn)

2.3 Tesseract OCR的工作原理

3. 環(huán)境配置與安裝

3.1 安裝Tesseract OCR引擎

3.2 安裝Python相關(guān)庫(kù)

3.3 驗(yàn)證安裝

4. 基礎(chǔ)使用：簡(jiǎn)單的文字識(shí)別

4.1 基本OCR函數(shù)實(shí)現(xiàn)

4.2 處理不同語(yǔ)言的文字

5. 圖像預(yù)處理技術(shù)

5.1 圖像預(yù)處理的重要性

5.2 常用的預(yù)處理技術(shù)

5.3 完整的預(yù)處理流程

6. 高級(jí)功能與配置

6.1 Tesseract配置參數(shù)

6.2 獲取邊界框信息

6.3 批量處理多張圖片

7. 性能優(yōu)化與準(zhǔn)確率提升

7.1 選擇合適的頁(yè)面分割模式（PSM）

7.2 語(yǔ)言模型優(yōu)化

7.3 自定義詞典

8. 實(shí)際應(yīng)用案例

8.1 文檔數(shù)字化

8.2 名片信息提取

8.3 表格數(shù)據(jù)提取

9. 完整代碼實(shí)現(xiàn)

10. 常見(jiàn)問(wèn)題與解決方案

10.1 識(shí)別準(zhǔn)確率低

10.2 處理速度慢

10.3 內(nèi)存占用過(guò)高

11. 總結(jié)與展望

11.1 主要收獲

11.2 未來(lái)發(fā)展方向

11.3 進(jìn)一步學(xué)習(xí)建議

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線(xiàn)小工具