Python中將語音轉(zhuǎn)換為文本的實(shí)現(xiàn)方法
一、說明
學(xué)習(xí)如何使用語音識別 Python 庫執(zhí)行語音識別,以在 Python 中將音頻語音轉(zhuǎn)換為文本。想要更快地編碼嗎?我們的Python 代碼生成器讓您只需點(diǎn)擊幾下即可創(chuàng)建 Python 腳本?,F(xiàn)在就現(xiàn)在試試!
二、語言AI庫
2.1 相當(dāng)給力的轉(zhuǎn)文字庫
語音識別是計算機(jī)軟件識別口語中的單詞和短語并將其轉(zhuǎn)換為人類可讀文本的能力。在本教程中,您將學(xué)習(xí)如何使用SpeechRecognition 庫在 Python 中將語音轉(zhuǎn)換為文本。
因此,我們不需要從頭開始構(gòu)建任何機(jī)器學(xué)習(xí)模型,這個庫為我們提供了各種知名公共語音識別 API(例如 Google Cloud Speech API、IBM Speech To Text 等)的便捷包裝。
請注意,如果您不想使用 API,而是直接對機(jī)器學(xué)習(xí)模型進(jìn)行推理,那么一定要查看本教程,其中我將向您展示如何使用當(dāng)前最先進(jìn)的機(jī)器學(xué)習(xí)模型在Python中執(zhí)行語音識別。
另外,如果您想要其他方法來執(zhí)行 ASR,請查看此語音識別綜合教程。
另請學(xué)習(xí):如何在 Python 中翻譯文本。
2.2 安裝過程
好吧,讓我們開始使用以下命令安裝庫pip:
pip3 install SpeechRecognition pydub
好的,打開一個新的 Python 文件并導(dǎo)入它:
import speech_recognition as sr
這個庫的好處是它支持多種識別引擎:
- CMU Sphinx(離線)
- 谷歌語音識別
- 谷歌云語音API
- 維特人工智能
- 微軟必應(yīng)語音識別
- Houndify API
- IBM 語音轉(zhuǎn)文本
- Snowboy 熱詞檢測(離線)
我們將在這里使用 Google 語音識別,因?yàn)樗芎唵尾⑶也恍枰魏?API 密鑰。
2.3 轉(zhuǎn)錄音頻文件
確保當(dāng)前目錄中有一個包含英語演講的音頻文件(如果您想跟我一起學(xué)習(xí),請?jiān)诖颂帿@取音頻文件):
filename = "16-122828-0002.wav"
該文件是從LibriSpeech數(shù)據(jù)集中獲取的,但您可以使用任何您想要的音頻 WAV 文件,只需更改文件名,讓我們初始化我們的語音識別器:
# initialize the recognizer r = sr.Recognizer()
下面的代碼負(fù)責(zé)加載音頻文件,并使用 Google 語音識別將語音轉(zhuǎn)換為文本:
# open the file with sr.AudioFile(filename) as source: # listen for the data (load audio to memory) audio_data = r.record(source) # recognize (convert from speech to text) text = r.recognize_google(audio_data) print(text)
這將需要幾秒鐘才能完成,因?yàn)樗鼘⑽募蟼鞯?Google 并獲取輸出,這是我的結(jié)果:
I believe you're just talking nonsense
上面的代碼適用于小型或中型音頻文件。在下一節(jié)中,我們將為大文件編寫代碼。
2.4 轉(zhuǎn)錄大型音頻文件
如果您想對長音頻文件執(zhí)行語音識別,那么下面的函數(shù)可以很好地處理這個問題:
# importing libraries import speech_recognition as sr import os from pydub import AudioSegment from pydub.silence import split_on_silence # create a speech recognition object r = sr.Recognizer() # a function to recognize speech in the audio file # so that we don't repeat ourselves in in other functions def transcribe_audio(path): # use the audio file as the audio source with sr.AudioFile(path) as source: audio_listened = r.record(source) # try converting it to text text = r.recognize_google(audio_listened) return text # a function that splits the audio file into chunks on silence # and applies speech recognition def get_large_audio_transcription_on_silence(path): """Splitting the large audio file into chunks and apply speech recognition on each of these chunks""" # open the audio file using pydub sound = AudioSegment.from_file(path) # split audio sound where silence is 500 miliseconds or more and get chunks chunks = split_on_silence(sound, # experiment with this value for your target audio file min_silence_len = 500, # adjust this per requirement silence_thresh = sound.dBFS-14, # keep the silence for 1 second, adjustable as well keep_silence=500, ) folder_name = "audio-chunks" # create a directory to store the audio chunks if not os.path.isdir(folder_name): os.mkdir(folder_name) whole_text = "" # process each chunk for i, audio_chunk in enumerate(chunks, start=1): # export audio chunk and save it in # the `folder_name` directory. chunk_filename = os.path.join(folder_name, f"chunk{i}.wav") audio_chunk.export(chunk_filename, format="wav") # recognize the chunk try: text = transcribe_audio(chunk_filename) except sr.UnknownValueError as e: print("Error:", str(e)) else: text = f"{text.capitalize()}. " print(chunk_filename, ":", text) whole_text += text # return the text for all chunks detected return whole_text ```    <font face="楷體" size=4> 注意:您需要安裝Pydub才能pip使上述代碼正常工作。 上述函數(shù)使用模塊split_on_silence()中的函數(shù)pydub.silence在靜音時將音頻數(shù)據(jù)分割成塊。該min_silence_len參數(shù)是用于分割的最小靜音長度(以毫秒為單位)。 silence_thresh是閾值,任何比這更安靜的東西都將被視為靜音,我將其設(shè)置為平均dBFS - 14,keep_silence參數(shù)是在檢測到的每個塊的開頭和結(jié)尾處留下的靜音量(以毫秒為單位)。 這些參數(shù)并不適合所有聲音文件,請嘗試根據(jù)您的大量音頻需求嘗試這些參數(shù)。 之后,我們迭代所有塊并將每個語音音頻轉(zhuǎn)換為文本,然后將它們加在一起,這是一個運(yùn)行示例: path = "7601-291468-0006.wav" print("\nFull text:", get_large_audio_transcription_on_silence(path)) 注意:您可以在此處7601-291468-0006.wav獲取文件。 輸出: ```python audio-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat. audio-chunks\chunk2.wav : At a short distance from the city. audio-chunks\chunk3.wav : Just at what is now called dutch street. audio-chunks\chunk4.wav : Sooner bounded with proofs of his ingenuity. audio-chunks\chunk5.wav : Patent smokejacks. audio-chunks\chunk6.wav : It required a horse to work some. audio-chunks\chunk7.wav : Dutch oven roasted meat without fire. audio-chunks\chunk8.wav : Carts that went before the horses. audio-chunks\chunk9.wav : Weather cox that turned against the wind and other wrongheaded contrivances. audio-chunks\chunk10.wav : So just understand can found it all beholders. Full text: His abode which you had fixed in a bowery or country seat. At a short distance from the city. Just at what is now called dutch street. Sooner bounded with proofs of his ingenuity. Patent smokejacks. It required a horse to work some. Dutch oven roasted meat without fire. Carts that went before the horses. Weather cox that turned against the wind and other wrongheaded contrivances. So just understand can found it all beholders.
因此,該函數(shù)會自動為我們創(chuàng)建一個文件夾,并放置我們指定的原始音頻文件塊,然后對所有這些文件運(yùn)行語音識別。
如果您想將音頻文件分割成固定的間隔,我們可以使用以下函數(shù):
# a function that splits the audio file into fixed interval chunks # and applies speech recognition def get_large_audio_transcription_fixed_interval(path, minutes=5): """Splitting the large audio file into fixed interval chunks and apply speech recognition on each of these chunks""" # open the audio file using pydub sound = AudioSegment.from_file(path) # split the audio file into chunks chunk_length_ms = int(1000 * 60 * minutes) # convert to milliseconds chunks = [sound[i:i + chunk_length_ms] for i in range(0, len(sound), chunk_length_ms)] folder_name = "audio-fixed-chunks" # create a directory to store the audio chunks if not os.path.isdir(folder_name): os.mkdir(folder_name) whole_text = "" # process each chunk for i, audio_chunk in enumerate(chunks, start=1): # export audio chunk and save it in # the `folder_name` directory. chunk_filename = os.path.join(folder_name, f"chunk{i}.wav") audio_chunk.export(chunk_filename, format="wav") # recognize the chunk try: text = transcribe_audio(chunk_filename) except sr.UnknownValueError as e: print("Error:", str(e)) else: text = f"{text.capitalize()}. " print(chunk_filename, ":", text) whole_text += text # return the text for all chunks detected return whole_text
上述函數(shù)將大音頻文件分割成 5 分鐘的塊。您可以更改minutes參數(shù)以滿足您的需要。由于我的音頻文件不是那么大,我嘗試將其分成 10 秒的塊:
print("\nFull text:", get_large_audio_transcription_fixed_interval(path, minutes=1/6))
輸出:
audio-fixed-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat at a short distance from the city just that one is now called.
audio-fixed-chunks\chunk2.wav : Dutch street soon abounded with proofs of his ingenuity patent smokejacks that required a horse to work some.
audio-fixed-chunks\chunk3.wav : Oven roasted meat without fire carts that went before the horses weather cox that turned against the wind and other wrong
head.
audio-fixed-chunks\chunk4.wav : Contrivances that astonished and confound it all beholders.Full text: His abode which you had fixed in a bowery or country seat at a short distance from the city just that one is now called. Dutch street soon abounded with proofs of his ingenuity patent smokejacks that required a horse to work some. Oven roasted meat without fire carts that went before the horses weather cox that turned against the wind and other wrong head. Contrivances that astonished and confound it all beholders.
2.5 從麥克風(fēng)讀取
這需要在您的計算機(jī)上安裝PyAudio ,以下是根據(jù)您的操作系統(tǒng)安裝的過程:
- windows
你可以直接pip 安裝它:
$ pip3 install pyaudio
- Linux
您需要先安裝依賴項(xiàng):
$ sudo apt-get install python-pyaudio python3-pyaudio $ pip3 install pyaudio
- 蘋果系統(tǒng)
你需要先安裝portaudio,然后你可以直接 pip 安裝它:
$ brew install portaudio $ pip3 install pyaudio
現(xiàn)在讓我們使用麥克風(fēng)來轉(zhuǎn)換我們的語音:
import speech_recognition as sr with sr.Microphone() as source: # read the audio data from the default microphone audio_data = r.record(source, duration=5) print("Recognizing...") # convert speech to text text = r.recognize_google(audio_data) print(text)
這將從您的麥克風(fēng)中聽到 5 秒鐘,然后嘗試將語音轉(zhuǎn)換為文本!
它與前面的代碼非常相似,但是我們在這里使用該Microphone()對象從默認(rèn)麥克風(fēng)讀取音頻,然后我們使用函數(shù)duration中的參數(shù)record()在5秒后停止讀取,然后將音頻數(shù)據(jù)上傳到Google以獲取輸出文本。
您還可以使用函數(shù)offset中的參數(shù)在幾秒record()后開始錄制offset。
此外,您可以通過將language參數(shù)傳遞給recognize_google()函數(shù)來識別不同的語言。例如,如果您想識別西班牙語語音,您可以使用:
text = r.recognize_google(audio_data, language="es-ES")
在此 StackOverflow 答案中查看支持的語言。
三、結(jié)論
正如您所看到的,使用這個庫將語音轉(zhuǎn)換為文本非常容易和簡單。這個庫在野外被廣泛使用。查看官方文檔。
如果您也想在 Python 中將文本轉(zhuǎn)換為語音,請查看本教程。
以上就是Python中將語音轉(zhuǎn)換為文本的實(shí)現(xiàn)方法的詳細(xì)內(nèi)容,更多關(guān)于Python語音轉(zhuǎn)文本的資料請關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
Python scipy的二維圖像卷積運(yùn)算與圖像模糊處理操作示例
這篇文章主要介紹了Python scipy的二維圖像卷積運(yùn)算與圖像模糊處理操作,涉及Python數(shù)學(xué)運(yùn)算與圖形繪制相關(guān)操作技巧,需要的朋友可以參考下2019-09-09Python中連接不同數(shù)據(jù)庫的方法總結(jié)
在數(shù)據(jù)驅(qū)動的現(xiàn)代應(yīng)用開發(fā)中,Python憑借其豐富的庫和強(qiáng)大的生態(tài)系統(tǒng),成為連接各種數(shù)據(jù)庫的理想編程語言,下面我們就來看看如何使用Python實(shí)現(xiàn)連接常用的幾個數(shù)據(jù)庫吧2025-02-02Flask框架學(xué)習(xí)筆記之路由和反向路由詳解【圖文與實(shí)例】
這篇文章主要介紹了Flask框架學(xué)習(xí)筆記之路由和反向路由,結(jié)合圖文與實(shí)例形式詳細(xì)分析了flask框架中路由與反向路由相關(guān)概念、原理、用法與相關(guān)操作注意事項(xiàng),需要的朋友可以參考下2019-08-08初步介紹Python中的pydoc模塊和distutils模塊
這篇文章主要介紹了Python中的pydoc模塊和distutils模塊,本文來自于IBM官方開發(fā)者技術(shù)文檔,需要的朋友可以參考下2015-04-04tensorflow沒有output結(jié)點(diǎn),存儲成pb文件的例子
今天小編就為大家分享一篇tensorflow沒有output結(jié)點(diǎn),存儲成pb文件的例子,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧2020-01-01Python用類實(shí)現(xiàn)撲克牌發(fā)牌的示例代碼
這篇文章主要介紹了Python用類實(shí)現(xiàn)撲克牌發(fā)牌的示例代碼,文中通過示例代碼介紹的非常詳細(xì),對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2020-06-06