如何使用OpenCV實(shí)現(xiàn)手勢音量控制

更新時(shí)間：2023年11月06日 09:22:47 作者：是Dream呀

今天來學(xué)習(xí)一下如何使用OpenCV實(shí)現(xiàn)手勢音量控制,本次實(shí)驗(yàn)需要使用OpenCV和mediapipe庫進(jìn)行手勢識別,并利用手勢距離控制電腦音量,感興趣的朋友跟隨小編一起看看吧

一、需要的庫及功能介紹

本次實(shí)驗(yàn)需要使用OpenCV和mediapipe庫進(jìn)行手勢識別，并利用手勢距離控制電腦音量。

導(dǎo)入庫：

cv2：OpenCV庫，用于讀取攝像頭視頻流和圖像處理。
mediapipe：mediapipe庫，用于手部關(guān)鍵點(diǎn)檢測和手勢識別。
ctypes和comtypes：用于與操作系統(tǒng)的音頻接口進(jìn)行交互。
pycaw：pycaw庫，用于控制電腦音量。

功能：

初始化mediapipe和音量控制模塊，獲取音量范圍。
打開攝像頭，讀取視頻流。
對每一幀圖像進(jìn)行處理：
- 轉(zhuǎn)換圖像為RGB格式。
- 使用mediapipe檢測手部關(guān)鍵點(diǎn)。
- 如果檢測到手部關(guān)鍵點(diǎn)：
  - 在圖像中標(biāo)注手指關(guān)鍵點(diǎn)和手勢連線。
  - 解析手指關(guān)鍵點(diǎn)坐標(biāo)。
  - 根據(jù)拇指和食指指尖的坐標(biāo)，計(jì)算手勢距離。
  - 將手勢距離轉(zhuǎn)換為音量大小，并控制電腦音量。
- 顯示處理后的圖像。
循環(huán)執(zhí)行前述步驟，直到手動停止程序或關(guān)閉攝像頭。

注意事項(xiàng):

在運(yùn)行代碼之前，需要安裝相關(guān)庫（opencv、mediapipe、pycaw）。
需要連接音頻設(shè)備并使其可訪問。
檢測到多個(gè)手部時(shí)，只處理第一個(gè)檢測到的手部。
檢測到手指關(guān)鍵點(diǎn)時(shí)，將索引指為0的關(guān)鍵點(diǎn)作為拇指的指尖，索引指為1的關(guān)鍵點(diǎn)作為食指的指尖。

cv2.VideoCapture()函數(shù)參數(shù)問題

這并沒有錯。但在樹莓派上調(diào)用時(shí)需要更改參數(shù)，改為：

cap = cv2.VideoCapture(1)

調(diào)用電腦攝像頭時(shí)：
電腦在用cv2.VideoCapture(0)時(shí)，程序結(jié)束后會有報(bào)錯：

[ WARN:0] SourceReaderCB::~SourceReaderCB terminating async callback

需要改為：

cv2.VideoCapture(0,cv2.CAP_DSHOW)

二、導(dǎo)入所需要的模塊

# 導(dǎo)入OpenCV
import cv2
# 導(dǎo)入mediapipe
import mediapipe as mp
# 導(dǎo)入電腦音量控制模塊
from ctypes import cast, POINTER
from comtypes import CLSCTX_ALL
from pycaw.pycaw import AudioUtilities, IAudioEndpointVolume
# 導(dǎo)入其他依賴包
import time
import math
import numpy as np

三、初始化 HandControlVolume 類

class HandControlVolume:
    def __init__(self):
        """
        初始化 HandControlVolume 類的實(shí)例
        初始化 mediapipe 對象，用于手部關(guān)鍵點(diǎn)檢測和手勢識別。
        獲取電腦音量接口，并獲取音量范圍。
        """
        # 初始化 medialpipe
        self.mp_drawing = mp.solutions.drawing_utils
        self.mp_drawing_styles = mp.solutions.drawing_styles
        self.mp_hands = mp.solutions.hands
        # 獲取電腦音量范圍
        devices = AudioUtilities.GetSpeakers()
        interface = devices.Activate(
            IAudioEndpointVolume._iid_, CLSCTX_ALL, None)
        self.volume = cast(interface, POINTER(IAudioEndpointVolume))
        self.volume.SetMute(0, None)
        self.volume_range = self.volume.GetVolumeRange()

初始化 mediapipe 對象，用于手部關(guān)鍵點(diǎn)檢測和手勢識別。
獲取電腦音量接口，并獲取音量范圍。

四、主函數(shù)

1.計(jì)算刷新率

初始化刷新率的計(jì)算，記錄當(dāng)前時(shí)間作為初始時(shí)間。
使用OpenCV打開視頻流，此處讀取攝像頭設(shè)備，默認(rèn)使用設(shè)備ID為0。
設(shè)置視頻流的分辨率為指定的resize_w和resize_h大小，并將圖像resize為該尺寸。
在使用hands對象之前，使用with語句創(chuàng)建一個(gè)上下文環(huán)境，設(shè)置手部檢測和追蹤的相關(guān)參數(shù)，包括最小檢測置信度、最小追蹤置信度和最大手的數(shù)量。
進(jìn)入循環(huán)，判斷視頻流是否打開。使用cap.read()函數(shù)從視頻流中讀取一幀圖像，返回的success表示是否讀取成功，image則是讀取到的圖像。
對讀取到的圖像進(jìn)行resize，將其調(diào)整為指定的大小。如果讀取失敗，則打印提示信息并繼續(xù)下一次循環(huán)。

# 主函數(shù)
    def recognize(self):
        # 計(jì)算刷新率
        fpsTime = time.time()
        # OpenCV讀取視頻流
        cap = cv2.VideoCapture(0)
        # 視頻分辨率
        resize_w = 640
        resize_h = 480
        # 畫面顯示初始化參數(shù)
        rect_height = 0
        rect_percent_text = 0
        with self.mp_hands.Hands(min_detection_confidence=0.7,
                                 min_tracking_confidence=0.5,
                                 max_num_hands=2) as hands:
            while cap.isOpened():
                success, image = cap.read()
                image = cv2.resize(image, (resize_w, resize_h))
                if not success:
                    print("空幀.")
                    continue

2.提高性能

將圖像的可寫標(biāo)志image.flags.writeable設(shè)置為False，以便進(jìn)行內(nèi)存優(yōu)化。
將圖像從BGR格式轉(zhuǎn)換為RGB格式，這是因?yàn)镸ediaPipe模型處理的輸入要求為RGB格式。
對圖像進(jìn)行水平翻轉(zhuǎn)，即鏡像操作，以使圖像更符合常見的鏡像顯示。
使用MediaPipe模型對圖像進(jìn)行處理，得到結(jié)果。
將圖像的可寫標(biāo)志image.flags.writeable設(shè)置為True，以重新啟用對圖像的寫入操作。
將圖像從RGB格式轉(zhuǎn)換回BGR格式，以便后續(xù)的顯示和處理。

這些優(yōu)化操作旨在提高程序的性能和效率。其中，將圖像的可寫標(biāo)志設(shè)置為False可以減少不必要的內(nèi)存拷貝，轉(zhuǎn)換圖像的格式和鏡像操作則是為了符合MediaPipe模型的輸入要求和更好地進(jìn)行手勢識別。最后，將圖像轉(zhuǎn)換回BGR格式是為了與OpenCV的顯示函數(shù)兼容。

                # 提高性能
                image.flags.writeable = False
                # 轉(zhuǎn)為RGB
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                # 鏡像
                image = cv2.flip(image, 1)
                # mediapipe模型處理
                results = hands.process(image)
                image.flags.writeable = True
                image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

3.判斷是否有手掌

判斷results.multi_hand_landmarks是否存在，即是否檢測到手掌。如果存在，則繼續(xù)執(zhí)行下面的代碼。
遍歷results.multi_hand_landmarks中的每個(gè)hand_landmarks，即遍歷每個(gè)檢測到的手掌。
使用self.mp_drawing.draw_landmarks函數(shù)將檢測到的手掌標(biāo)注在圖像上，包括手指的關(guān)鍵點(diǎn)和手指之間的連接線。

# 判斷是否有手掌
                if results.multi_hand_landmarks:
                    # 遍歷每個(gè)手掌
                    for hand_landmarks in results.multi_hand_landmarks:
                        # 在畫面標(biāo)注手指
                        self.mp_drawing.draw_landmarks(
                            image,
                            hand_landmarks,
                            self.mp_hands.HAND_CONNECTIONS,
                            self.mp_drawing_styles.get_default_hand_landmarks_style(),
                            self.mp_drawing_styles.get_default_hand_connections_style())

4.解析手指，存入各個(gè)手指坐標(biāo)

首先解析手指的坐標(biāo)，并存入landmark_list列表中。然后，根據(jù)手指的坐標(biāo)計(jì)算出大拇指和食指的指尖坐標(biāo)，以及兩者的中間點(diǎn)坐標(biāo)。接下來，繪制了大拇指、食指和兩者之間的連線，并使用勾股定理計(jì)算了兩個(gè)指尖之間的長度。

創(chuàng)建一個(gè)空的landmark_list列表用于存儲手指坐標(biāo)。
遍歷手部關(guān)鍵點(diǎn)的每個(gè)元素，將每個(gè)關(guān)鍵點(diǎn)的id、x、y和z坐標(biāo)存儲在一個(gè)列表中，然后將該列表添加到landmark_list中。
判斷l(xiāng)andmark_list是否不為空，如果不為空，繼續(xù)執(zhí)行下面的代碼。
從landmark_list中獲取大拇指指尖坐標(biāo)的列表項(xiàng)，然后計(jì)算出在圖像上的像素坐標(biāo)。
從landmark_list中獲取食指指尖坐標(biāo)的列表項(xiàng)，然后計(jì)算出在圖像上的像素坐標(biāo)。
計(jì)算大拇指指尖和食指指尖的中間點(diǎn)坐標(biāo)。
繪制大拇指和食指的指尖點(diǎn)，以及中間點(diǎn)。
繪制大拇指和食指之間的連線。
使用勾股定理計(jì)算大拇指指尖和食指指尖之間的長度，保存在line_len中。

 # 解析手指，存入各個(gè)手指坐標(biāo)
                        landmark_list = []
                        for landmark_id, finger_axis in enumerate(
                                hand_landmarks.landmark):
                            landmark_list.append([
                                landmark_id, finger_axis.x, finger_axis.y,
                                finger_axis.z
                            ])
                        if landmark_list:
                            # 獲取大拇指指尖坐標(biāo)
                            thumb_finger_tip = landmark_list[4]
                            thumb_finger_tip_x = math.ceil(thumb_finger_tip[1] * resize_w)
                            thumb_finger_tip_y = math.ceil(thumb_finger_tip[2] * resize_h)
                            # 獲取食指指尖坐標(biāo)
                            index_finger_tip = landmark_list[8]
                            index_finger_tip_x = math.ceil(index_finger_tip[1] * resize_w)
                            index_finger_tip_y = math.ceil(index_finger_tip[2] * resize_h)
                            # 中間點(diǎn)
                            finger_middle_point = (thumb_finger_tip_x + index_finger_tip_x) // 2, (
                                    thumb_finger_tip_y + index_finger_tip_y) // 2
                            # print(thumb_finger_tip_x)
                            thumb_finger_point = (thumb_finger_tip_x, thumb_finger_tip_y)
                            index_finger_point = (index_finger_tip_x, index_finger_tip_y)
                            # 畫指尖2點(diǎn)
                            image = cv2.circle(image, thumb_finger_point, 10, (255, 0, 255), -1)
                            image = cv2.circle(image, index_finger_point, 10, (255, 0, 255), -1)
                            image = cv2.circle(image, finger_middle_point, 10, (255, 0, 255), -1)
                            # 畫2點(diǎn)連線
                            image = cv2.line(image, thumb_finger_point, index_finger_point, (255, 0, 255), 5)
                            # 勾股定理計(jì)算長度
                            line_len = math.hypot((index_finger_tip_x - thumb_finger_tip_x),
                                                  (index_finger_tip_y - thumb_finger_tip_y))

5.獲取電腦最大最小音量

實(shí)現(xiàn)獲取電腦的最大和最小音量，并將指尖的長度映射到音量范圍和矩形顯示上，然后將映射后的音量值設(shè)置為電腦的音量。具體過程如下：

self.volume_range[0]和self.volume_range[1]分別獲取電腦的最小音量和最大音量。
np.interp函數(shù)將指尖的長度line_len映射到從50到300的范圍，再映射到最小音量和最大音量的范圍，得到音量值vol。
np.interp函數(shù)將指尖的長度line_len映射到從50到300的范圍，再映射到從0到200的范圍，得到矩形的高度rect_height。
np.interp函數(shù)將指尖的長度line_len映射到從50到300的范圍，再映射到從0到100的范圍，得到矩形百分比顯示的數(shù)值rect_percent_text。
self.volume.SetMasterVolumeLevel方法將音量值vol設(shè)置為電腦的音量。

# 獲取電腦最大最小音量
                            min_volume = self.volume_range[0]
                            max_volume = self.volume_range[1]
                            # 將指尖長度映射到音量上
                            vol = np.interp(line_len, [50, 300], [min_volume, max_volume])
                            # 將指尖長度映射到矩形顯示上
                            rect_height = np.interp(line_len, [50, 300], [0, 200])
                            rect_percent_text = np.interp(line_len, [50, 300], [0, 100])
                            # 設(shè)置電腦音量
                            self.volume.SetMasterVolumeLevel(vol, None)

6.顯示矩形

cv2.putText函數(shù)來在圖像上顯示矩形框的百分比值；
cv2.rectangle函數(shù)來繪制矩形框并填充顏色；
cv2.putText函數(shù)來在圖像上顯示當(dāng)前幀的刷新率FPS；
cv2.imshow函數(shù)來顯示處理后的圖像；
cv2.waitKey函數(shù)等待按鍵輸入，當(dāng)按下ESC鍵或關(guān)閉窗口時(shí)退出程序；
HandControlVolume類的recognize方法調(diào)用了手勢識別的功能。

# 顯示矩形
                cv2.putText(image, str(math.ceil(rect_percent_text)) + "%", (10, 350),
                            cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
                image = cv2.rectangle(image, (30, 100), (70, 300), (255, 0, 0), 3)
                image = cv2.rectangle(image, (30, math.ceil(300 - rect_height)), (70, 300), (255, 0, 0), -1)
                # 顯示刷新率FPS
                cTime = time.time()
                fps_text = 1 / (cTime - fpsTime)
                fpsTime = cTime
                cv2.putText(image, "FPS: " + str(int(fps_text)), (10, 70),
                            cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
                # 顯示畫面
                cv2.imshow('MediaPipe Hands', image)
                if cv2.waitKey(5) & 0xFF == 27 or cv2.getWindowProperty('MediaPipe Hands', cv2.WND_PROP_VISIBLE) < 1:
                    break
            cap.release()
# 開始程序
control = HandControlVolume()
control.recognize()

五、實(shí)戰(zhàn)演示

通過演示我們可以發(fā)現(xiàn)，食指與大拇指之間在屏幕中的的距離越遠(yuǎn)，那么我們的音量會越大，反之越小，實(shí)現(xiàn)了通過手勢對音量的控制。

六、源碼分享

import cv2
import mediapipe as mp
from ctypes import cast, POINTER
from comtypes import CLSCTX_ALL
from pycaw.pycaw import AudioUtilities, IAudioEndpointVolume
import time
import math
import numpy as np
class HandControlVolume:
    def __init__(self):
        # 初始化medialpipe
        self.mp_drawing = mp.solutions.drawing_utils
        self.mp_drawing_styles = mp.solutions.drawing_styles
        self.mp_hands = mp.solutions.hands
        # 獲取電腦音量范圍
        devices = AudioUtilities.GetSpeakers()
        interface = devices.Activate(
            IAudioEndpointVolume._iid_, CLSCTX_ALL, None)
        self.volume = cast(interface, POINTER(IAudioEndpointVolume))
        self.volume.SetMute(0, None)
        self.volume_range = self.volume.GetVolumeRange()
    # 主函數(shù)
    def recognize(self):
        # 計(jì)算刷新率
        fpsTime = time.time()
        # OpenCV讀取視頻流
        cap = cv2.VideoCapture(0)
        # 視頻分辨率
        resize_w = 640
        resize_h = 480
        # 畫面顯示初始化參數(shù)
        rect_height = 0
        rect_percent_text = 0
        with self.mp_hands.Hands(min_detection_confidence=0.7,
                                 min_tracking_confidence=0.5,
                                 max_num_hands=2) as hands:
            while cap.isOpened():
                success, image = cap.read()
                image = cv2.resize(image, (resize_w, resize_h))
                if not success:
                    print("空幀.")
                    continue
                # 提高性能
                image.flags.writeable = False
                # 轉(zhuǎn)為RGB
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                # 鏡像
                image = cv2.flip(image, 1)
                # mediapipe模型處理
                results = hands.process(image)
                image.flags.writeable = True
                image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
                # 判斷是否有手掌
                if results.multi_hand_landmarks:
                    # 遍歷每個(gè)手掌
                    for hand_landmarks in results.multi_hand_landmarks:
                        # 在畫面標(biāo)注手指
                        self.mp_drawing.draw_landmarks(
                            image,
                            hand_landmarks,
                            self.mp_hands.HAND_CONNECTIONS,
                            self.mp_drawing_styles.get_default_hand_landmarks_style(),
                            self.mp_drawing_styles.get_default_hand_connections_style())
                        # 解析手指，存入各個(gè)手指坐標(biāo)
                        landmark_list = []
                        for landmark_id, finger_axis in enumerate(
                                hand_landmarks.landmark):
                            landmark_list.append([
                                landmark_id, finger_axis.x, finger_axis.y,
                                finger_axis.z
                            ])
                        if landmark_list:
                            # 獲取大拇指指尖坐標(biāo)
                            thumb_finger_tip = landmark_list[4]
                            thumb_finger_tip_x = math.ceil(thumb_finger_tip[1] * resize_w)
                            thumb_finger_tip_y = math.ceil(thumb_finger_tip[2] * resize_h)
                            # 獲取食指指尖坐標(biāo)
                            index_finger_tip = landmark_list[8]
                            index_finger_tip_x = math.ceil(index_finger_tip[1] * resize_w)
                            index_finger_tip_y = math.ceil(index_finger_tip[2] * resize_h)
                            # 中間點(diǎn)
                            finger_middle_point = (thumb_finger_tip_x + index_finger_tip_x) // 2, (
                                    thumb_finger_tip_y + index_finger_tip_y) // 2
                            # print(thumb_finger_tip_x)
                            thumb_finger_point = (thumb_finger_tip_x, thumb_finger_tip_y)
                            index_finger_point = (index_finger_tip_x, index_finger_tip_y)
                            # 畫指尖2點(diǎn)
                            image = cv2.circle(image, thumb_finger_point, 10, (255, 0, 255), -1)
                            image = cv2.circle(image, index_finger_point, 10, (255, 0, 255), -1)
                            image = cv2.circle(image, finger_middle_point, 10, (255, 0, 255), -1)
                            # 畫2點(diǎn)連線
                            image = cv2.line(image, thumb_finger_point, index_finger_point, (255, 0, 255), 5)
                            # 勾股定理計(jì)算長度
                            line_len = math.hypot((index_finger_tip_x - thumb_finger_tip_x),
                                                  (index_finger_tip_y - thumb_finger_tip_y))
                            # 獲取電腦最大最小音量
                            min_volume = self.volume_range[0]
                            max_volume = self.volume_range[1]
                            # 將指尖長度映射到音量上
                            vol = np.interp(line_len, [50, 300], [min_volume, max_volume])
                            # 將指尖長度映射到矩形顯示上
                            rect_height = np.interp(line_len, [50, 300], [0, 200])
                            rect_percent_text = np.interp(line_len, [50, 300], [0, 100])
                            # 設(shè)置電腦音量
                            self.volume.SetMasterVolumeLevel(vol, None)
                # 顯示矩形
                cv2.putText(image, str(math.ceil(rect_percent_text)) + "%", (10, 350),
                            cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
                image = cv2.rectangle(image, (30, 100), (70, 300), (255, 0, 0), 3)
                image = cv2.rectangle(image, (30, math.ceil(300 - rect_height)), (70, 300), (255, 0, 0), -1)
                # 顯示刷新率FPS
                cTime = time.time()
                fps_text = 1 / (cTime - fpsTime)
                fpsTime = cTime
                cv2.putText(image, "FPS: " + str(int(fps_text)), (10, 70),
                            cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
                # 顯示畫面
                cv2.imshow('xyp', image)
                if cv2.waitKey(5) & 0xFF == 27 or cv2.getWindowProperty('MediaPipe Hands', cv2.WND_PROP_VISIBLE) < 1:
                    break
            cap.release()
control = HandControlVolume()
control.recognize()

到此這篇關(guān)于OpenCV實(shí)現(xiàn)手勢音量控制的文章就介紹到這了,更多相關(guān)OpenCV手勢音量控制內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

如何使用OpenCV實(shí)現(xiàn)手勢音量控制

目錄

一、需要的庫及功能介紹

二、導(dǎo)入所需要的模塊

三、初始化 HandControlVolume 類

四、主函數(shù)

1.計(jì)算刷新率

2.提高性能

3.判斷是否有手掌

4.解析手指，存入各個(gè)手指坐標(biāo)

5.獲取電腦最大最小音量

6.顯示矩形

五、實(shí)戰(zhàn)演示

六、源碼分享

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

如何使用OpenCV實(shí)現(xiàn)手勢音量控制

目錄

一、需要的庫及功能介紹

二、導(dǎo)入所需要的模塊

三、初始化 HandControlVolume 類

四、主函數(shù)

1.計(jì)算刷新率

2.提高性能

3.判斷是否有手掌

4.解析手指，存入各個(gè)手指坐標(biāo)

5.獲取電腦最大最小音量

6.顯示矩形

五、實(shí)戰(zhàn)演示

六、源碼分享

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

一、需要的庫及功能介紹

三、初始化 HandControlVolume 類

4.解析手指，存入各個(gè)手指坐標(biāo)

五、實(shí)戰(zhàn)演示

六、源碼分享