python 音頻處理重采樣、音高提取的操作方法

更新時間：2024年08月02日 09:54:06 作者：io_T_T

這篇文章主要介紹了python 音頻處理重采樣、音高提取,本文給大家介紹的非常詳細,感興趣的朋友跟隨小編一起看看吧

采集數(shù)據(jù)->采樣率調(diào)整

使用torchaudio進行重采樣（cpu版）
- 首先導入相關(guān)包，既然使用torch作為我們的選項，安裝torch環(huán)境我就不必多說了，如果你不想用torch可以使用后文提到的另一個庫

import torch
 import torchaudio
 from torchaudio.transforms import Resample
 from time import time#僅計算時間，不影響主體

使用torchaudio.load導入音頻文件
設(shè)定目標采樣率并構(gòu)造resample函數(shù)
調(diào)用構(gòu)造好的resample函數(shù)
調(diào)用torchaudio的保存函數(shù)

封裝一下，總函數(shù)【記得先導入】：

def resample_by_cpu():
    file_path = input("please input your file path: ")
    start_time = time()#不影響，可去掉
    y, sr = torchaudio.load(file_path)  #使用torchaudio.load導入音頻文件
?
    target_sample = 32000   #設(shè)定目標采樣率
    resampler = Resample(orig_freq=sr, new_freq=target_sample)#構(gòu)造resample函數(shù)，輸入原始采樣率和目標采樣率
    resample_misic = resampler(y)                             #調(diào)用resample函數(shù)
?
    torchaudio.save("test.mp3", resample_misic, target_sample)#調(diào)用torchaudio的保存即可
    print(f"cost :{time() - start_time}s")#不影響，可去掉

最后結(jié)果大概是幾秒鐘這樣子

2.使用使用torchaudio進行重采樣（gpu版）：

有了上面cpu的基礎(chǔ)，其實調(diào)用gpu也就更換一下設(shè)備，和放入gpu的操作就好了，因此不過多贅述

def resample_use_cuda():
?
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    start_time = time()
    file_path = input("please input your file path:")
    y, sr = torchaudio.load(file_path)
?
    y = y.to(device)
    target_sample = 32000
    resampler = Resample(orig_freq=sr, new_freq=target_sample).to(device)
    resample_misic = resampler(y)
    torchaudio.save("test.mp3", resample_misic.to('cpu'), target_sample)    #這里注意要把結(jié)果從gpu中拿出來到cpu，不然會報錯。
    print(f"cost :{time() - start_time}s")

時間方面嘛，單個音頻多了放入gpu取出gpu的步驟肯定會稍慢的，但是跑過cuda都知道它的強大，更多是用于后續(xù)的操作說是。

3.使用librosa庫進行重采樣

具體步驟：

導入兩個庫文件，librosa和音頻文件讀寫庫soundfile

import librosa
import soundfile as sf
from time import time#僅計算時間，不影響主體

導入音頻文件
設(shè)定目標采樣率
重采樣
輸出

綜合封裝成函數(shù)：

def resample_by_lisa():
    file_path = input("please input your file path:")
    start_time = time()
    y, sr = librosa.load(file_path)     #使用librosa導入音頻文件
    target_sample_rate = 32000
    y_32k = librosa.resample(y=y, orig_sr=sr, target_sr=target_sample_rate)         #使用librosa進行重采樣至目標采樣率
    sf.write("test_lisa.mp3", data=y_32k, samplerate=target_sample_rate)        #使用soundfile進行文件寫入
    print(f"cost :{time() - start_time}s")

總結(jié)：

優(yōu)點，簡單小巧，ibrosa有很多能處理音頻的功能
缺點：無法調(diào)用cuda，保存的時候需要依賴soundfile庫。
時間：也是幾秒左右，和torchaudiocpu版差不多
小聲bb：提取32k的效果好像沒有torchaudio好【嘛，畢竟librosa歷史有點久了，沒有專注深度學習的torch好很正常啦】，你們也可以自己測一下

all code：

import torch
import torchaudio
from torchaudio.transforms import Resample
import librosa
import soundfile as sf
from time import time
?
def resample_by_cpu():
    file_path = input("please input your file path: ")
    start_time = time()
    y, sr = torchaudio.load(file_path)  #使用torchaudio.load導入音頻文件
?
    target_sample = 32000   #設(shè)定目標采樣率
    resampler = Resample(orig_freq=sr, new_freq=target_sample)#構(gòu)造resample函數(shù)，輸入原始采樣率和目標采樣率
    resample_misic = resampler(y)                             #調(diào)用resample函數(shù)
?
    torchaudio.save("test.mp3", resample_misic, target_sample)#調(diào)用torchaudio的保存即可
    print(f"cost :{time() - start_time}s")
def resample_use_cuda():
?
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    start_time = time()
    file_path = input("please input your file path:")
    y, sr = torchaudio.load(file_path)
?
    y = y.to(device)
    target_sample = 32000
    resampler = Resample(orig_freq=sr, new_freq=target_sample).to(device)
    resample_misic = resampler(y)
    torchaudio.save("test.mp3", resample_misic.to('cpu'), target_sample)
    print(f"cost :{time() - start_time}s")
?
def resample_by_lisa():
    file_path = input("please input your file path:")
    start_time = time()
    y, sr = librosa.load(file_path)#使用librosa導入音頻文件
    target_sample_rate = 32000
    y_32k = librosa.resample(y=y, orig_sr=sr, target_sr=target_sample_rate)#使用librosa進行重采樣至目標采樣率
    sf.write("test_lisa.mp3", data=y_32k, samplerate=target_sample_rate)#使用soundfile進行文件寫入
    print(f"cost :{time() - start_time}s")
?
if __name__ == '__main__':
    resample_use_cuda()
    resample_by_cpu()
    resample_by_lisa()

2.2 提取pitch基頻特征【音高提取】

使用torchaudio進行基頻特征提取

其實主要使用的這個函數(shù)：torchaudio.transforms._transforms.PitchShift

讓我們來看看它官方的example，仿照著來寫就好啦

>>> waveform, sample_rate = torchaudio.load("test.wav", normalize=True)
>>> transform = transforms.PitchShift(sample_rate, 4)
>>> waveform_shift = transform(waveform)  # (channel, time)

步驟：

導入依賴

import torchaudio
import torchaudio.transforms as Tf
import matplotlib.pyplot as plt     #畫圖依賴

導入音頻
構(gòu)造PitchShift
使用這個函數(shù)對歌曲進行基頻提取

code：

def get_pitch_by_torch():
    file_path = input("file path:")
    y, sr = torchaudio.load(file_path)
    """specimen:
    >>> waveform, sample_rate = torchaudio.load("test.wav", normalize=True)
    >>> transform = transforms.PitchShift(sample_rate, 4)
    >>> waveform_shift = transform(waveform)  # (channel, time)
    """
    pitch_tf = Tf.PitchShift(sample_rate=sr, n_steps=0)
    feature = pitch_tf(y)
    # 繪制基頻特征 這部分可以忽略，只是畫圖而已，可以直接復制不用理解
    plt.figure(figsize=(16, 5))
    plt.plot(feature[0].numpy(), label='Pitch')
    plt.xlabel('Frame')
    plt.ylabel('Frequency (Hz)')
    plt.title('Pitch Estimation')
    plt.legend()
    plt.show()

輸出圖片【總歌曲】效果：

將輸出的范圍稍微改一下，切分特征的一部分，就是歌曲部分的音高特征啦，效果就很明顯了

改為：plt.plot(feature[0][5000:10000].numpy(), label='Pitch')

使用librosa提取基頻特征

步驟：
- 導入包
- 提取基頻特征
- （可選）繪制基頻特征

主要函數(shù)：librosa.pyin，請見官方example

#Computing a fundamental frequency (F0) curve from an audio input
>>> y, sr = librosa.load(librosa.ex('trumpet'))
>>> f0, voiced_flag, voiced_probs = librosa.pyin(y,
... sr=sr,
... fmin=librosa.note_to_hz('C2'),
... fmax=librosa.note_to_hz('C7'))
>>> times = librosa.times_like(f0, sr=sr)

code：

def get_pitch_by_librosa():
?
    file_path = input("請輸入音頻文件路徑：")
    y, sr = librosa.load(file_path)
    """librosa.pyin(y,sr=sr,fmin=librosa.note_to_hz('C2'),fmax=librosa.note_to_hz('C7'))"""
    # 使用pyin提取基頻特征
    f0, voiced_flag, voiced_probs = librosa.pyin(y, sr=sr, fmin=librosa.note_to_hz('C2'), fmax=librosa.note_to_hz('C7'))
?
    # 繪制基頻特征,可忽略
    plt.figure(figsize=(14, 5))
    librosa.display.waveshow(y, sr=sr, alpha=0.5)
    plt.plot(librosa.times_like(f0), f0, label='f0 (fundamental frequency)', color='r')
    plt.xlabel('Time (s)')
    plt.ylabel('Frequency (Hz)')
    plt.title('Pitch (fundamental frequency) Estimation')
    plt.legend()
    plt.show()

總結(jié)：

比torchaudio略微麻煩一點，不過多了兩個參數(shù) voiced_flag, voiced_probs，看起來的視覺圖好像也有些不一樣，不過都是按照官方的這個來了，這也不對的話我也不會了

輸出：

all code：

import torchaudio
import torchaudio.transforms as Tf
import matplotlib.pyplot as plt
import librosa
def get_pitch_by_torch():
    file_path = input("file path:")
    y, sr = torchaudio.load(file_path)
    """specimen:
    >>> waveform, sample_rate = torchaudio.load("test.wav", normalize=True)
    >>> transform = transforms.PitchShift(sample_rate, 4)
    >>> waveform_shift = transform(waveform)  # (channel, time)
    """
    pitch_tf = Tf.PitchShift(sample_rate=sr, n_steps=0)
    feature = pitch_tf(y)
    # 繪制基頻特征
    plt.figure(figsize=(16, 5))
    plt.plot(feature[0][5000:10000].numpy(), label='Pitch')
    plt.xlabel('Frame')
    plt.ylabel('Frequency (Hz)')
    plt.title('Pitch Estimation')
    plt.legend()
    plt.show()
def get_pitch_by_librosa():
?
    file_path = input("請輸入音頻文件路徑：")
    y, sr = librosa.load(file_path)
    """librosa.pyin(y,sr=sr,fmin=librosa.note_to_hz('C2'),fmax=librosa.note_to_hz('C7'))"""
    # 使用pyin提取基頻特征
    f0, voiced_flag, voiced_probs = librosa.pyin(y, sr=sr, fmin=librosa.note_to_hz('C2'), fmax=librosa.note_to_hz('C7'))
?
    # 繪制基頻特征,可忽略
    plt.figure(figsize=(14, 5))
    librosa.display.waveshow(y, sr=sr, alpha=0.5)
    plt.plot(librosa.times_like(f0), f0, label='f0 (fundamental frequency)', color='r')
    plt.xlabel('Time (s)')
    plt.ylabel('Frequency (Hz)')
    plt.title('Pitch (fundamental frequency) Estimation')
    plt.legend()
    plt.show()
if __name__ == '__main__':
    # get_pitch_by_torch()
    # get_pitch_by_librosa()

后續(xù)PPG特征、vec特征見下一章

到此這篇關(guān)于python 音頻處理重采樣、音高提取的文章就介紹到這了,更多相關(guān)python 音頻重采樣內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: