Python中將語(yǔ)音轉(zhuǎn)換為文本的實(shí)現(xiàn)方法

更新時(shí)間：2024年01月26日 08:56:58 作者：無(wú)水先生

語(yǔ)音識(shí)別是計(jì)算機(jī)軟件識(shí)別口語(yǔ)中的單詞和短語(yǔ)并將其轉(zhuǎn)換為人類(lèi)可讀文本的能力，在本教程中，您將學(xué)習(xí)如何使用SpeechRecognition 庫(kù)在 Python 中將語(yǔ)音轉(zhuǎn)換為文本,文中有相關(guān)的代碼供大家參考，需要的朋友可以參考下

一、說(shuō)明

學(xué)習(xí)如何使用語(yǔ)音識(shí)別 Python 庫(kù)執(zhí)行語(yǔ)音識(shí)別，以在 Python 中將音頻語(yǔ)音轉(zhuǎn)換為文本。想要更快地編碼嗎？我們的Python 代碼生成器讓您只需點(diǎn)擊幾下即可創(chuàng)建 Python 腳本?，F(xiàn)在就現(xiàn)在試試！

二、語(yǔ)言AI庫(kù)

2.1 相當(dāng)給力的轉(zhuǎn)文字庫(kù)

語(yǔ)音識(shí)別是計(jì)算機(jī)軟件識(shí)別口語(yǔ)中的單詞和短語(yǔ)并將其轉(zhuǎn)換為人類(lèi)可讀文本的能力。在本教程中，您將學(xué)習(xí)如何使用SpeechRecognition 庫(kù)在 Python 中將語(yǔ)音轉(zhuǎn)換為文本。

因此，我們不需要從頭開(kāi)始構(gòu)建任何機(jī)器學(xué)習(xí)模型，這個(gè)庫(kù)為我們提供了各種知名公共語(yǔ)音識(shí)別 API（例如 Google Cloud Speech API、IBM Speech To Text 等）的便捷包裝。

請(qǐng)注意，如果您不想使用 API，而是直接對(duì)機(jī)器學(xué)習(xí)模型進(jìn)行推理，那么一定要查看本教程，其中我將向您展示如何使用當(dāng)前最先進(jìn)的機(jī)器學(xué)習(xí)模型在Python中執(zhí)行語(yǔ)音識(shí)別。

另外，如果您想要其他方法來(lái)執(zhí)行 ASR，請(qǐng)查看此語(yǔ)音識(shí)別綜合教程。

另請(qǐng)學(xué)習(xí)：如何在 Python 中翻譯文本。

2.2 安裝過(guò)程

好吧，讓我們開(kāi)始使用以下命令安裝庫(kù)pip：

pip3 install SpeechRecognition pydub

好的，打開(kāi)一個(gè)新的 Python 文件并導(dǎo)入它：

import speech_recognition as sr

這個(gè)庫(kù)的好處是它支持多種識(shí)別引擎：

CMU Sphinx（離線(xiàn)）
谷歌語(yǔ)音識(shí)別
谷歌云語(yǔ)音API
維特人工智能
微軟必應(yīng)語(yǔ)音識(shí)別
Houndify API
IBM 語(yǔ)音轉(zhuǎn)文本
Snowboy 熱詞檢測(cè)（離線(xiàn)）

我們將在這里使用 Google 語(yǔ)音識(shí)別，因?yàn)樗芎?jiǎn)單并且不需要任何 API 密鑰。

2.3 轉(zhuǎn)錄音頻文件

確保當(dāng)前目錄中有一個(gè)包含英語(yǔ)演講的音頻文件（如果您想跟我一起學(xué)習(xí)，請(qǐng)?jiān)诖颂帿@取音頻文件）：

filename = "16-122828-0002.wav"

該文件是從LibriSpeech數(shù)據(jù)集中獲取的，但您可以使用任何您想要的音頻 WAV 文件，只需更改文件名，讓我們初始化我們的語(yǔ)音識(shí)別器：

# initialize the recognizer
r = sr.Recognizer()

下面的代碼負(fù)責(zé)加載音頻文件，并使用 Google 語(yǔ)音識(shí)別將語(yǔ)音轉(zhuǎn)換為文本：

# open the file
with sr.AudioFile(filename) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)
    print(text)

這將需要幾秒鐘才能完成，因?yàn)樗鼘⑽募蟼鞯?Google 并獲取輸出，這是我的結(jié)果：

I believe you're just talking nonsense

上面的代碼適用于小型或中型音頻文件。在下一節(jié)中，我們將為大文件編寫(xiě)代碼。

2.4 轉(zhuǎn)錄大型音頻文件

如果您想對(duì)長(zhǎng)音頻文件執(zhí)行語(yǔ)音識(shí)別，那么下面的函數(shù)可以很好地處理這個(gè)問(wèn)題：

# importing libraries 
import speech_recognition as sr 
import os 
from pydub import AudioSegment
from pydub.silence import split_on_silence

# create a speech recognition object
r = sr.Recognizer()

# a function to recognize speech in the audio file
# so that we don't repeat ourselves in in other functions
def transcribe_audio(path):
    # use the audio file as the audio source
    with sr.AudioFile(path) as source:
        audio_listened = r.record(source)
        # try converting it to text
        text = r.recognize_google(audio_listened)
    return text

# a function that splits the audio file into chunks on silence
# and applies speech recognition
def get_large_audio_transcription_on_silence(path):
    """Splitting the large audio file into chunks
    and apply speech recognition on each of these chunks"""
    # open the audio file using pydub
    sound = AudioSegment.from_file(path)  
    # split audio sound where silence is 500 miliseconds or more and get chunks
    chunks = split_on_silence(sound,
        # experiment with this value for your target audio file
        min_silence_len = 500,
        # adjust this per requirement
        silence_thresh = sound.dBFS-14,
        # keep the silence for 1 second, adjustable as well
        keep_silence=500,
    )
    folder_name = "audio-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        try:
            text = transcribe_audio(chunk_filename)
        except sr.UnknownValueError as e:
            print("Error:", str(e))
        else:
            text = f"{text.capitalize()}. "
            print(chunk_filename, ":", text)
            whole_text += text
    # return the text for all chunks detected
    return whole_text
        ```




 &emsp;&emsp;  <font face="楷體"   size=4>


注意：您需要安裝Pydub才能pip使上述代碼正常工作。

上述函數(shù)使用模塊split_on_silence()中的函數(shù)pydub.silence在靜音時(shí)將音頻數(shù)據(jù)分割成塊。該min_silence_len參數(shù)是用于分割的最小靜音長(zhǎng)度（以毫秒為單位）。

silence_thresh是閾值，任何比這更安靜的東西都將被視為靜音，我將其設(shè)置為平均dBFS - 14，keep_silence參數(shù)是在檢測(cè)到的每個(gè)塊的開(kāi)頭和結(jié)尾處留下的靜音量（以毫秒為單位）。

這些參數(shù)并不適合所有聲音文件，請(qǐng)嘗試根據(jù)您的大量音頻需求嘗試這些參數(shù)。

之后，我們迭代所有塊并將每個(gè)語(yǔ)音音頻轉(zhuǎn)換為文本，然后將它們加在一起，這是一個(gè)運(yùn)行示例：

path = "7601-291468-0006.wav"
print("\nFull text:", get_large_audio_transcription_on_silence(path))
注意：您可以在此處7601-291468-0006.wav獲取文件。

輸出：

```python
audio-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat. 
audio-chunks\chunk2.wav : At a short distance from the city. 
audio-chunks\chunk3.wav : Just at what is now called dutch street. 
audio-chunks\chunk4.wav : Sooner bounded with proofs of his ingenuity. 
audio-chunks\chunk5.wav : Patent smokejacks. 
audio-chunks\chunk6.wav : It required a horse to work some. 
audio-chunks\chunk7.wav : Dutch oven roasted meat without fire. 
audio-chunks\chunk8.wav : Carts that went before the horses. 
audio-chunks\chunk9.wav : Weather cox that turned against the wind and other wrongheaded contrivances. 
audio-chunks\chunk10.wav : So just understand can found it all beholders. 

Full text: His abode which you had fixed in a bowery or country seat. At a short distance from the city. Just at what is now called dutch street. Sooner bounded with proofs of his ingenuity. Patent smokejacks. It required a horse to work some. Dutch oven roasted meat without fire. Carts that went before the horses. Weather cox that turned against the wind and other wrongheaded contrivances. So just understand can found it all beholders.

因此，該函數(shù)會(huì)自動(dòng)為我們創(chuàng)建一個(gè)文件夾，并放置我們指定的原始音頻文件塊，然后對(duì)所有這些文件運(yùn)行語(yǔ)音識(shí)別。

如果您想將音頻文件分割成固定的間隔，我們可以使用以下函數(shù)：

# a function that splits the audio file into fixed interval chunks
# and applies speech recognition
def get_large_audio_transcription_fixed_interval(path, minutes=5):
    """Splitting the large audio file into fixed interval chunks
    and apply speech recognition on each of these chunks"""
    # open the audio file using pydub
    sound = AudioSegment.from_file(path)  
    # split the audio file into chunks
    chunk_length_ms = int(1000 * 60 * minutes) # convert to milliseconds
    chunks = [sound[i:i + chunk_length_ms] for i in range(0, len(sound), chunk_length_ms)]
    folder_name = "audio-fixed-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        try:
            text = transcribe_audio(chunk_filename)
        except sr.UnknownValueError as e:
            print("Error:", str(e))
        else:
            text = f"{text.capitalize()}. "
            print(chunk_filename, ":", text)
            whole_text += text
    # return the text for all chunks detected
    return whole_text

上述函數(shù)將大音頻文件分割成 5 分鐘的塊。您可以更改minutes參數(shù)以滿(mǎn)足您的需要。由于我的音頻文件不是那么大，我嘗試將其分成 10 秒的塊：

print("\nFull text:", get_large_audio_transcription_fixed_interval(path, minutes=1/6))

輸出：

audio-fixed-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat at a short distance from the city just that one is now called.
audio-fixed-chunks\chunk2.wav : Dutch street soon abounded with proofs of his ingenuity patent smokejacks that required a horse to work some.
audio-fixed-chunks\chunk3.wav : Oven roasted meat without fire carts that went before the horses weather cox that turned against the wind and other wrong
head.
audio-fixed-chunks\chunk4.wav : Contrivances that astonished and confound it all beholders.

Full text: His abode which you had fixed in a bowery or country seat at a short distance from the city just that one is now called. Dutch street soon abounded with proofs of his ingenuity patent smokejacks that required a horse to work some. Oven roasted meat without fire carts that went before the horses weather cox that turned against the wind and other wrong head. Contrivances that astonished and confound it all beholders.

2.5 從麥克風(fēng)讀取

這需要在您的計(jì)算機(jī)上安裝PyAudio ，以下是根據(jù)您的操作系統(tǒng)安裝的過(guò)程：

windows

你可以直接pip 安裝它：

$ pip3 install pyaudio

Linux

您需要先安裝依賴(lài)項(xiàng)：

$ sudo apt-get install python-pyaudio python3-pyaudio
$ pip3 install pyaudio

蘋(píng)果系統(tǒng)

你需要先安裝portaudio，然后你可以直接 pip 安裝它：

$ brew install portaudio
$ pip3 install pyaudio

現(xiàn)在讓我們使用麥克風(fēng)來(lái)轉(zhuǎn)換我們的語(yǔ)音：

import speech_recognition as sr

with sr.Microphone() as source:
    # read the audio data from the default microphone
    audio_data = r.record(source, duration=5)
    print("Recognizing...")
    # convert speech to text
    text = r.recognize_google(audio_data)
    print(text)

這將從您的麥克風(fēng)中聽(tīng)到 5 秒鐘，然后嘗試將語(yǔ)音轉(zhuǎn)換為文本！

它與前面的代碼非常相似，但是我們?cè)谶@里使用該Microphone()對(duì)象從默認(rèn)麥克風(fēng)讀取音頻，然后我們使用函數(shù)duration中的參數(shù)record()在5秒后停止讀取，然后將音頻數(shù)據(jù)上傳到Google以獲取輸出文本。

您還可以使用函數(shù)offset中的參數(shù)在幾秒record()后開(kāi)始錄制offset。

此外，您可以通過(guò)將language參數(shù)傳遞給recognize_google()函數(shù)來(lái)識(shí)別不同的語(yǔ)言。例如，如果您想識(shí)別西班牙語(yǔ)語(yǔ)音，您可以使用：

text = r.recognize_google(audio_data, language="es-ES")

在此 StackOverflow 答案中查看支持的語(yǔ)言。

三、結(jié)論

正如您所看到的，使用這個(gè)庫(kù)將語(yǔ)音轉(zhuǎn)換為文本非常容易和簡(jiǎn)單。這個(gè)庫(kù)在野外被廣泛使用。查看官方文檔。

如果您也想在 Python 中將文本轉(zhuǎn)換為語(yǔ)音，請(qǐng)查看本教程。

以上就是Python中將語(yǔ)音轉(zhuǎn)換為文本的實(shí)現(xiàn)方法的詳細(xì)內(nèi)容，更多關(guān)于Python語(yǔ)音轉(zhuǎn)文本的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

Python中將語(yǔ)音轉(zhuǎn)換為文本的實(shí)現(xiàn)方法

目錄

一、說(shuō)明

二、語(yǔ)言AI庫(kù)

2.1 相當(dāng)給力的轉(zhuǎn)文字庫(kù)

2.2 安裝過(guò)程

2.3 轉(zhuǎn)錄音頻文件

2.4 轉(zhuǎn)錄大型音頻文件

2.5 從麥克風(fēng)讀取

三、結(jié)論

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線(xiàn)小工具

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

Python中將語(yǔ)音轉(zhuǎn)換為文本的實(shí)現(xiàn)方法

目錄

一、說(shuō)明

二、語(yǔ)言AI庫(kù)

2.1 相當(dāng)給力的轉(zhuǎn)文字庫(kù)

2.2 安裝過(guò)程

2.3 轉(zhuǎn)錄音頻文件

2.4 轉(zhuǎn)錄大型音頻文件

2.5 從麥克風(fēng)讀取

三、結(jié)論

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線(xiàn)小工具

二、語(yǔ)言AI庫(kù)

三、結(jié)論