快捷導(dǎo)航

使用Python和Selenium構(gòu)建一個(gè)自動(dòng)化圖像引擎

更新時(shí)間：2024年12月11日 09:34:10 作者：不愛運(yùn)動(dòng)的跑者

這篇文章主要為大家詳細(xì)介紹了如何使用Python和Selenium庫構(gòu)建一個(gè)自動(dòng)化圖像引擎,能夠根據(jù)指定參數(shù)自動(dòng)截取網(wǎng)頁快照,并將生成的圖片存儲(chǔ)到云端,需要的可以參考下

本篇指南將教你如何使用Python和Selenium庫來構(gòu)建一個(gè)自動(dòng)化圖像引擎，該引擎能夠根據(jù)指定參數(shù)自動(dòng)截取網(wǎng)頁快照，并將生成的圖片存儲(chǔ)到云端。此工具還可以通過消息隊(duì)列接收任務(wù)指令，非常適合需要批量處理網(wǎng)頁截圖的應(yīng)用場景。

1. 準(zhǔn)備環(huán)境

確保你已經(jīng)安裝了Python和必要的庫：

pip install selenium oss2 kafka-python-ng

2. 創(chuàng)建配置文件

創(chuàng)建一個(gè)簡單的config.ini文件來存儲(chǔ)你的OSS和Kafka設(shè)置：

[oss]
access_key_id = YOUR_OSS_ACCESS_KEY_ID
access_key_secret = YOUR_OSS_ACCESS_KEY_SECRET
bucket_name = YOUR_BUCKET_NAME
endpoint = http://oss-cn-hangzhou.aliyuncs.com

[kafka]
bootstrap_servers = localhost:9092
topic = your_topic_name
notify_topic = your_notify_topic
consumer_group = your_consumer_group

[engine]
driver_path = path/to/chromedriver
image_path = path/to/screenshots
param_path = path/to/params
site_base_path = https://example.com

3. 設(shè)置日志記錄

為程序添加基本的日志記錄功能，以便于調(diào)試：

import logging
from logging.handlers import TimedRotatingFileHandler
import os

logger = logging.getLogger('image_engine')
logger.setLevel(logging.DEBUG)

log_file = 'logs/image_engine.log'
os.makedirs('logs', exist_ok=True)
handler = TimedRotatingFileHandler(log_file, when='midnight', backupCount=7, encoding='utf-8')
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

4. 初始化Selenium WebDriver

初始化Chrome WebDriver，并設(shè)置窗口最大化：

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# 讀取配置文件
import configparser
config = configparser.ConfigParser()
config.read('config.ini')

service = Service(config.get('engine', 'driver_path'))
driver = webdriver.Chrome(service=service)
driver.maximize_window()

5. 圖像處理邏輯

編寫一個(gè)函數(shù)來處理每個(gè)Kafka消息，打開指定網(wǎng)頁，等待頁面加載完成，然后保存截圖：

from kafka import KafkaConsumer, KafkaProducer
import json
import time
from datetime import datetime
import oss2

def process_task(msg):
    task_params = json.loads(msg.value)
    item_id = task_params['itemId']
    param_value = task_params['paramValue']
    
    logger.info(f"開始處理項(xiàng)【{item_id}】對應(yīng)參數(shù)【{param_value}】")
    
    # 構(gòu)建請求鏈接
    url = f"{config.get('engine', 'site_base_path')}/view?param={param_value}&id={item_id}"
    driver.get(url)
    
    try:
        # 簡單等待頁面加載
        time.sleep(3)  # 根據(jù)需要調(diào)整或替換為WebDriverWait
        
        # 生成截圖文件名
        today = datetime.now().strftime('%Y-%m-%d')
        screenshot_dir = os.path.join(config.get('engine', 'image_path'), 'images', today)
        os.makedirs(screenshot_dir, exist_ok=True)
        fname = os.path.join(screenshot_dir, f"{item_id}_{param_value}.png")
        
        driver.save_screenshot(fname)
        logger.info(f"保存截圖到 {fname}")
        
        # 上傳至OSS（省略具體實(shí)現(xiàn)，根據(jù)實(shí)際情況添加）
        upload_to_oss(fname)
        
        # 發(fā)送完成通知
        notify_completion(item_id, param_value, fname)
        
        logger.info(f"完成處理項(xiàng)【{item_id}】對應(yīng)參數(shù)【{param_value}】")
    except Exception as e:
        logger.error(f"處理項(xiàng)【{item_id}】對應(yīng)參數(shù)【{param_value}】時(shí)發(fā)生異常: {e}")

def upload_to_oss(file_path):
    """上傳文件到阿里云OSS"""
    auth = oss2.Auth(config.get('oss', 'access_key_id'), config.get('oss', 'access_key_secret'))
    bucket = oss2.Bucket(auth, config.get('oss', 'endpoint'), config.get('oss', 'bucket_name'))
    remote_path = os.path.relpath(file_path, config.get('engine', 'image_path'))
    bucket.put_object_from_file(remote_path, file_path)

def notify_completion(item_id, param_value, image_path):
    """發(fā)送完成通知"""
    producer.send(config.get('kafka', 'notify_topic'), {
        'itemId': item_id,
        'paramValue': param_value,
        'imagePath': image_path
    })

6. 啟動(dòng)Kafka消費(fèi)者

啟動(dòng)Kafka消費(fèi)者，監(jiān)聽消息并調(diào)用處理函數(shù)：

if __name__ == "__main__":
    consumer = KafkaConsumer(
        config.get('kafka', 'topic'),
        bootstrap_servers=config.get('kafka', 'bootstrap_servers').split(','),
        group_id=config.get('kafka', 'consumer_group'),
        auto_offset_reset='latest',
        enable_auto_commit=True,
        value_deserializer=lambda m: m.decode('utf-8')
    )

    for msg in consumer:
        try:
            process_task(msg)
        except Exception as ex:
            logger.error(f"消費(fèi)消息發(fā)生異常: {ex}")

總結(jié)

通過上述簡化步驟，你可以快速搭建一個(gè)基于Python和Selenium的圖像引擎。該引擎能夠從Kafka接收任務(wù)指令，訪問指定網(wǎng)站，截取頁面快照，并將截圖上傳到阿里云OSS。此版本去除了不必要的復(fù)雜性，專注于核心功能的實(shí)現(xiàn)。

到此這篇關(guān)于使用Python和Selenium構(gòu)建一個(gè)自動(dòng)化圖像引擎的文章就介紹到這了,更多相關(guān)Python Selenium構(gòu)建圖像引擎內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: