快捷導(dǎo)航

使用Python開(kāi)發(fā)一個(gè)圖像標(biāo)注與OCR識(shí)別工具

更新時(shí)間：2025年03月06日 15:45:50 作者：winfredzhang

這篇文章主要介紹了一個(gè)使用Python開(kāi)發(fā)的工具,允許用戶(hù)在圖像上進(jìn)行矩形標(biāo)注,使用 OCR 對(duì)標(biāo)注區(qū)域進(jìn)行文本識(shí)別,并將結(jié)果保存為 Excel 文件,感興趣的可以了解下

圖像標(biāo)注和OCR（光學(xué)字符識(shí)別）工具的代碼進(jìn)行詳細(xì)分析。該工具允許用戶(hù)在圖像上進(jìn)行矩形標(biāo)注，使用 OCR 對(duì)標(biāo)注區(qū)域進(jìn)行文本識(shí)別，并將結(jié)果保存為 Excel 文件。同時(shí)，用戶(hù)可以保存和加載標(biāo)注，清除標(biāo)注，以及裁剪圖像等。

項(xiàng)目簡(jiǎn)介

這個(gè)圖像標(biāo)注和OCR工具的功能主要包括：

加載圖像并顯示在界面上。
允許用戶(hù)在圖像上繪制矩形框，以標(biāo)注感興趣的區(qū)域。
在標(biāo)注區(qū)域內(nèi)執(zhí)行OCR識(shí)別，并顯示識(shí)別的文本。
將OCR識(shí)別結(jié)果保存為Excel文件。
保存和加載用戶(hù)的標(biāo)注數(shù)據(jù)（JSON格式）。
提供裁剪、清除標(biāo)注、重置圖像等功能。

1. 圖像加載與顯示

首先，程序會(huì)掃描指定文件夾中的圖像文件（支持的格式包括 .png, .jpg, .jpeg, .bmp, .tiff），并顯示在界面的圖像面板中。

self.image_files = [f for f in os.listdir(folder_path) 
                    if f.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.tiff'))]

通過(guò) wx.ListBox 組件顯示圖像文件的列表，用戶(hù)可以點(diǎn)擊選擇一個(gè)文件，然后加載并顯示它：

file_path = os.path.join(self.folder_path, filename)
self.original_image = cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), cv2.IMREAD_COLOR)
self.original_image = cv2.cvtColor(self.original_image, cv2.COLOR_BGR2RGB)

這段代碼使用 OpenCV 讀取圖像，并轉(zhuǎn)換顏色空間以適應(yīng) wxPython 顯示。

2. 矩形標(biāo)注

用戶(hù)可以在圖像上繪制矩形框，標(biāo)記感興趣的區(qū)域。程序通過(guò)鼠標(biāo)事件（按下、移動(dòng)、松開(kāi)）來(lái)繪制和更新矩形框：

def on_mouse_down(self, event):
    """鼠標(biāo)按下開(kāi)始繪制矩形"""
    if self.displayed_image is not None:
        self.start_point = event.GetPosition()
        self.drawing = True

def on_mouse_move(self, event):
    """鼠標(biāo)移動(dòng)時(shí)更新矩形"""
    if self.drawing and self.displayed_image is not None:
        self.image_panel.Refresh()

???????def on_mouse_up(self, event):
    """鼠標(biāo)松開(kāi)完成矩形繪制"""
    if self.drawing and self.displayed_image is not None:
        end_point = event.GetPosition()
        # 確保矩形方向正確
        x1 = min(self.start_point.x, end_point.x)
        y1 = min(self.start_point.y, end_point.y)
        x2 = max(self.start_point.x, end_point.x)
        y2 = max(self.start_point.y, end_point.y)
        orig_rect = self.convert_to_original_coords((x1, y1, x2, y2))
        self.rectangles.append(orig_rect)
        self.drawing = False
        self.image_panel.Refresh()

在 on_mouse_down 中，用戶(hù)點(diǎn)擊圖像開(kāi)始繪制矩形框，on_mouse_move 用于實(shí)時(shí)更新矩形的形狀，on_mouse_up 在用戶(hù)松開(kāi)鼠標(biāo)時(shí)完成矩形的繪制。

3. OCR識(shí)別

標(biāo)注完成后，用戶(hù)可以點(diǎn)擊“識(shí)別”按鈕，程序會(huì)對(duì)標(biāo)注區(qū)域進(jìn)行OCR識(shí)別。OCR處理通過(guò) pytesseract 庫(kù)實(shí)現(xiàn)：

text = pytesseract.image_to_string(
    pil_image, 
    lang='chi_sim',  # 中文簡(jiǎn)體
    config='--psm 6 --oem 3'  # 更精確的文本塊處理
)

識(shí)別結(jié)果會(huì)顯示在文本框中，并且可以將識(shí)別的結(jié)果保存為Excel文件：

if ocr_results:
    df = pd.DataFrame({'識(shí)別區(qū)域': range(1, len(ocr_results) + 1), '識(shí)別文本': ocr_results})
    output_path = os.path.join(self.folder_path, f'{self.current_filename}_ocr_results.xlsx')
    df.to_excel(output_path, index=False, engine='openpyxl')

使用 pandas 庫(kù)將識(shí)別結(jié)果保存為Excel文件，方便后續(xù)查看和處理。

4. 標(biāo)注的保存與加載

程序還允許用戶(hù)將標(biāo)注區(qū)域保存為 JSON 格式，以便下次加載時(shí)使用。這是通過(guò)以下方式實(shí)現(xiàn)的：

annotations_data = {
    'filename': self.current_filename,
    'rectangles': self.rectangles
}
json_path = os.path.join(self.folder_path, f'{self.current_filename}_annotations.json')
with open(json_path, 'w', encoding='utf-8') as f:
    json.dump(annotations_data, f)

標(biāo)注文件會(huì)根據(jù)圖像的文件名命名，以便與圖像文件對(duì)應(yīng)。加載標(biāo)注時(shí)，程序會(huì)讀取 JSON 文件并恢復(fù)之前的標(biāo)注狀態(tài)：

with open(json_path, 'r', encoding='utf-8') as f:
    annotations_data = json.load(f)
if annotations_data['filename'] == self.current_filename:
    self.rectangles = annotations_data['rectangles']

5. 裁剪與重置圖像

裁剪功能允許用戶(hù)裁剪圖像的選定區(qū)域。用戶(hù)完成矩形繪制后，點(diǎn)擊“裁剪”按鈕，程序會(huì)根據(jù)最后一個(gè)矩形進(jìn)行圖像裁剪：

cropped = self.current_image[y1:y2, x1:x2].copy()

如果需要重置圖像，用戶(hù)可以點(diǎn)擊“重置圖像”按鈕，程序?qū)⒒謴?fù)到原始圖像狀態(tài)。

6. UI組件與布局

wxPython 的布局管理使得界面整潔易用。主界面分為文件列表、圖像顯示區(qū)域和操作按鈕區(qū)域三個(gè)部分。文件列表用于選擇圖像，圖像顯示區(qū)域展示圖像并允許標(biāo)注，按鈕區(qū)域提供裁剪、標(biāo)注、OCR識(shí)別等操作。

file_list_sizer = wx.BoxSizer(wx.VERTICAL)
self.file_listbox = wx.ListBox(panel, choices=self.image_files, style=wx.LB_SINGLE)
file_list_sizer.Add(self.file_listbox, 1, wx.EXPAND | wx.ALL, 10)

通過(guò) BoxSizer 管理不同控件的布局，使得界面更加模塊化和靈活。

運(yùn)行結(jié)果