快捷導(dǎo)航

Java實現(xiàn)Shazam聲音識別算法的實例代碼

更新時間：2018年09月10日 10:32:13 作者：llhhzz1989

Shazam算法采用傅里葉變換將時域信號轉(zhuǎn)換為頻域信號，并獲得音頻指紋，最后匹配指紋契合度來識別音頻。這篇文章給大家介紹Java實現(xiàn)Shazam聲音識別算法的實例代碼，需要的朋友參考下吧

Shazam算法采用傅里葉變換將時域信號轉(zhuǎn)換為頻域信號，并獲得音頻指紋，最后匹配指紋契合度來識別音頻。

1、AudioSystem獲取音頻

奈奎斯特-香農(nóng)采樣定理告訴我們，為了能捕獲人類能聽到的聲音頻率，我們的采樣速率必須是人類聽覺范圍的兩倍。人類能聽到的聲音頻率范圍大約在20Hz到20000Hz之間，所以在錄制音頻的時候采樣率大多是44100Hz。這是大多數(shù)標(biāo)準(zhǔn)MPEG-1 的采樣率。44100這個值最初來源于索尼，因為它可以允許音頻在修改過的視頻設(shè)備上以25幀（PAL）或者30幀（ NTSC）每秒進(jìn)行錄制，而且也覆蓋了專業(yè)錄音設(shè)備的20000Hz帶寬。所以當(dāng)你在選擇錄音的頻率時，選擇44100Hz就好了。

定義音頻格式：

  public static float sampleRate = 44100;
  public static int sampleSizeInBits = 16;
  public static int channels = 2; // double
  public static boolean signed = true; // Indicates whether the data is signed or unsigned
  public static boolean bigEndian = true; // Indicates whether the audio data is stored in big-endian or little-endian order
  public AudioFormat getFormat() {
    return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed,
        bigEndian);
  }

調(diào)用麥克風(fēng)獲取音頻，保存到out中

 public static ByteArrayOutputStream out = new ByteArrayOutputStream();1
    try {
      AudioFormat format = smartAuto.getFormat(); // Fill AudioFormat with the settings
      DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
      startTime = new Date().getTime();
      System.out.println(startTime);
      SmartAuto.line = (TargetDataLine) AudioSystem.getLine(info);
      SmartAuto.line.open(format);
      SmartAuto.line.start();
      new FileAnalysis().getDataToOut("");
      while (smartAuto.running) {
        checkTime(startTime);
      }
      SmartAuto.line.stop();
      SmartAuto.line.close();
    } catch (Throwable e) {
      e.printStackTrace();
    }

獲取到的out數(shù)據(jù)需要通過傅里葉變換，從時域信號轉(zhuǎn)換為頻域信號。

傅里葉變換

public Complex[] fft(Complex[] x) {
    int n = x.length;
    // 因為exp(-2i*n*PI)=1，n=1時遞歸原點
    if (n == 1){
      return x;
    }
    // 如果信號數(shù)為奇數(shù)，使用dft計算
    if (n % 2 != 0) {
      return dft(x);
    }
    // 提取下標(biāo)為偶數(shù)的原始信號值進(jìn)行遞歸fft計算
    Complex[] even = new Complex[n / 2];
    for (int k = 0; k < n / 2; k++) {
      even[k] = x[2 * k];
    }
    Complex[] evenValue = fft(even);
    // 提取下標(biāo)為奇數(shù)的原始信號值進(jìn)行fft計算
    // 節(jié)約內(nèi)存
    Complex[] odd = even;
    for (int k = 0; k < n / 2; k++) {
      odd[k] = x[2 * k + 1];
    }
    Complex[] oddValue = fft(odd);
    // 偶數(shù)+奇數(shù)
    Complex[] result = new Complex[n];
    for (int k = 0; k < n / 2; k++) {
      // 使用歐拉公式e^(-i*2pi*k/N) = cos(-2pi*k/N) + i*sin(-2pi*k/N)
      double p = -2 * k * Math.PI / n;
      Complex m = new Complex(Math.cos(p), Math.sin(p));
      result[k] = evenValue[k].add(m.multiply(oddValue[k]));
      // exp(-2*(k+n/2)*PI/n) 相當(dāng)于 -exp(-2*k*PI/n)，其中exp(-n*PI)=-1(歐拉公式);
      result[k + n / 2] = evenValue[k].subtract(m.multiply(oddValue[k]));
    }
    return result;
  }

計算out的頻域值

 private void setFFTResult(){
    byte audio[] = SmartAuto.out.toByteArray();
    final int totalSize = audio.length;
    System.out.println("totalSize = " + totalSize);
    int chenkSize = 4;
    int amountPossible = totalSize/chenkSize;
    //When turning into frequency domain we'll need complex numbers: 
    SmartAuto.results = new Complex[amountPossible][];
    DftOperate dfaOperate = new DftOperate();
    //For all the chunks: 
    for(int times = 0;times < amountPossible; times++) {
      Complex[] complex = new Complex[chenkSize];
      for(int i = 0;i < chenkSize;i++) {
        //Put the time domain data into a complex number with imaginary part as 0: 
        complex[i] = new Complex(audio[(times*chenkSize)+i], 0);
      }
      //Perform FFT analysis on the chunk: 
      SmartAuto.results[times] = dfaOperate.fft(complex);
    }
    System.out.println("results = " + SmartAuto.results.toString());
  }

總結(jié)

以上所述是小編給大家介紹的Java實現(xiàn)Shazam聲音識別算法的實例代碼，希望對大家有所幫助，如果大家有任何疑問請給我留言，小編會及時回復(fù)大家的。在此也非常感謝大家對腳本之家網(wǎng)站的支持！

您可能感興趣的文章: