Java?離線中文語音文字識別功能的實(shí)現(xiàn)代碼
項(xiàng)目需要,要實(shí)現(xiàn)類似小愛同學(xué)的語音控制功能,并且要離線,不能花公司一分錢。第一步就是需要把音頻文字化。經(jīng)過各種資料搜集后,選擇了vosk。這是vosk的官方介紹:
Vosk is a speech recognition toolkit. The best things in Vosk are:
- Supports 19+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh. More to come.
- Works offline, even on lightweight devices - Raspberry Pi, Android, iOS
- Installs with simple pip3 install vosk
- Portable per-language models are only 50Mb each, but there are much bigger server models available.
- Provides streaming API for the best user experience (unlike popular speech-recognition python packages)
- There are bindings for different programming languages, too - java/csharp/javascript etc.
- Allows quick reconfiguration of vocabulary for best accuracy.
- Supports speaker identification beside simple speech recognition.
選擇它的理由,開源、可離線、可使用第三方的訓(xùn)練模型,本次使用的官方提供的中文訓(xùn)練模型,如果有需要可自行訓(xùn)練,不過成本太大。具體見官網(wǎng):https://alphacephei.com/vosk/,官方demo:https://github.com/alphacep/vosk-api。
本次使用springboot +maven實(shí)現(xiàn),官方demo為springboot+gradle。
1、pom文件如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.5.4</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>voice</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>voice-ai</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<repositories>
<repository>
<id>com.alphacephei</id>
<name>vosk</name>
<url>https://alphacephei.com/maven/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>5.7.0</version>
</dependency>
<dependency>
<groupId>com.alphacephei</groupId>
<artifactId>vosk</artifactId>
<version>0.3.30</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.8</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>特別說明一下,vosk的包在常見的maven倉庫里面是沒有的,所以需要指定下載地址。
2、工程結(jié)構(gòu):

3、語音識別工具類
public class VoiceUtil {
@Value("${leenleda.vosk.model}")
private String VOSKMODELPATH;
public String getWord(String filePath) throws IOException, UnsupportedAudioFileException {
Assert.isTrue(StringUtils.hasLength(VOSKMODELPATH), "無效的VOS模塊!");
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
// 轉(zhuǎn)換為16KHZ
reSamplingAndSave(bytes, filePath);
File f = new File(filePath);
RandomAccessFile rdf = null;
rdf = new RandomAccessFile(f, "r");
log.info("聲音尺寸:{}", toInt(read(rdf, 4, 4)));
log.info("音頻格式:{}", toShort(read(rdf, 20, 2)));
short track=toShort(read(rdf, 22, 2));
log.info("1 單聲道 2 雙聲道: {}", track);
log.info("采樣率、音頻采樣級別 16000 = 16KHz: {}", toInt(read(rdf, 24, 4)));
log.info("每秒波形的數(shù)據(jù)量:{}", toShort(read(rdf, 22, 2)));
log.info("采樣幀的大?。簕}", toShort(read(rdf, 32, 2)));
log.info("采樣位數(shù):{}", toShort(read(rdf, 34, 2)));
rdf.close();
LibVosk.setLogLevel(LogLevel.WARNINGS);
try (Model model = new Model(VOSKMODELPATH);
InputStream ais = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream(filePath)));
// 采樣率為音頻采樣率的聲道倍數(shù)
Recognizer recognizer = new Recognizer(model, 16000*track)) {
int nbytes;
byte[] b = new byte[4096];
int i = 0;
while ((nbytes = ais.read(b)) >= 0) {
i += 1;
if (recognizer.acceptWaveForm(b, nbytes)) {
// System.out.println(recognizer.getResult());
} else {
// System.out.println(recognizer.getPartialResult());
}
}
String result = recognizer.getFinalResult();
log.info("識別結(jié)果:{}", result);
if (StringUtils.hasLength(result)) {
JSONObject jsonObject = JSON.parseObject(result);
return jsonObject.getString("text").replace(" ", "");
}
return "";
}
}
public static int toInt(byte[] b) {
return (((b[3] & 0xff) << 24) + ((b[2] & 0xff) << 16) + ((b[1] & 0xff) << 8) + ((b[0] & 0xff) << 0));
}
public static short toShort(byte[] b) {
return (short) ((b[1] << 8) + (b[0] << 0));
}
public static byte[] read(RandomAccessFile rdf, int pos, int length) throws IOException {
rdf.seek(pos);
byte result[] = new byte[length];
for (int i = 0; i < length; i++) {
result[i] = rdf.readByte();
}
return result;
}
public static void reSamplingAndSave(byte[] data, String path) throws IOException, UnsupportedAudioFileException {
WaveFileReader reader = new WaveFileReader();
AudioInputStream audioIn = reader.getAudioInputStream(new ByteArrayInputStream(data));
AudioFormat srcFormat = audioIn.getFormat();
int targetSampleRate = 16000;
AudioFormat dstFormat = new AudioFormat(srcFormat.getEncoding(),
targetSampleRate,
srcFormat.getSampleSizeInBits(),
srcFormat.getChannels(),
srcFormat.getFrameSize(),
srcFormat.getFrameRate(),
srcFormat.isBigEndian());
AudioInputStream convertedIn = AudioSystem.getAudioInputStream(dstFormat, audioIn);
File file = new File(path);
WaveFileWriter writer = new WaveFileWriter();
writer.write(convertedIn, AudioFileFormat.Type.WAVE, file);
}
}有幾點(diǎn)需要說明一下,官方demo里面對采集率是寫死了的,為16000。這是以16KHz來算的,所以我把所有拿到的音頻都轉(zhuǎn)成了16KHz。還有采集率的設(shè)置,需要設(shè)置為聲道數(shù)的倍數(shù)。
4、前端交互
@RestController
public class VoiceAiController {
@Autowired
VoiceUtil voiceUtil;
@PostMapping("/getWord")
public String getWord(MultipartFile file) {
String path = "G:\\leenleda\\application\\voice-ai\\" + new Date().getTime() + ".wav";
File localFile = new File(path);
try {
file.transferTo(localFile); //把上傳的文件保存至本地
System.out.println(file.getOriginalFilename() + " 上傳成功");
// 上傳成功,開始解析
String text = voiceUtil.getWord(path);
localFile.delete();
return text;
} catch (IOException | UnsupportedAudioFileException e) {
e.printStackTrace();
localFile.delete();
return "上傳失敗";
}
}
}5、前端頁面
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>聲音轉(zhuǎn)換</title>
</head>
<body>
<div>
<audio controls autoplay></audio>
<input id="start" type="button" value="錄音" />
<input id="stop" type="button" value="停止" />
<input id="play" type="button" value="播放" />
<input id="upload" type="button" value="提交" />
<div id="text">
</div>
</div>
<script src="http://libs.baidu.com/jquery/2.1.4/jquery.min.js"></script>
<script type="text/javascript" src="HZRecorder.js"></script>
<script>
var recorder;
var audio = document.querySelector('audio');
$("#start").click(function () {
HZRecorder.get(function (rec) {
recorder = rec;
recorder.start();
});
})
$("#stop").click(function () {
recorder.stop();
})
$("#play").click(function () {
recorder.play(audio);
})
$("#upload").click(function () {
recorder.upload("/admin/getWord", function (state, e) {
switch (state) {
case 'uploading':
//var percentComplete = Math.round(e.loaded * 100 / e.total) + '%';
break;
case 'ok':
//alert(e.target.responseText);
// alert("上傳成功");
break;
case 'error':
alert("上傳失敗");
break;
case 'cancel':
alert("上傳被取消");
break;
}
});
})
</script>
</body>
</html>(function (window) {
//兼容
window.URL = window.URL || window.webkitURL;
navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia;
var HZRecorder = function (stream, config) {
config = config || {};
config.sampleBits = 16; //采樣數(shù)位 8, 16
config.sampleRate = 16000; //采樣率(1/6 44100)
var context = new AudioContext();
var audioInput = context.createMediaStreamSource(stream);
var recorder = context.createScriptProcessor(4096, 1, 1);
var audioData = {
size: 0 //錄音文件長度
, buffer: [] //錄音緩存
, inputSampleRate: context.sampleRate //輸入采樣率
, inputSampleBits: 16 //輸入采樣數(shù)位 8, 16
, outputSampleRate: config.sampleRate //輸出采樣率
, oututSampleBits: config.sampleBits //輸出采樣數(shù)位 8, 16
, input: function (data) {
this.buffer.push(new Float32Array(data));
this.size += data.length;
}
, compress: function () { //合并壓縮
//合并
var data = new Float32Array(this.size);
var offset = 0;
for (var i = 0; i < this.buffer.length; i++) {
data.set(this.buffer[i], offset);
offset += this.buffer[i].length;
}
//壓縮
var compression = parseInt(this.inputSampleRate / this.outputSampleRate);
var length = data.length / compression;
var result = new Float32Array(length);
var index = 0, j = 0;
while (index < length) {
result[index] = data[j];
j += compression;
index++;
}
return result;
}
, encodeWAV: function () {
var sampleRate = Math.min(this.inputSampleRate, this.outputSampleRate);
var sampleBits = Math.min(this.inputSampleBits, this.oututSampleBits);
var bytes = this.compress();
var dataLength = bytes.length * (sampleBits / 8);
var buffer = new ArrayBuffer(44 + dataLength);
var data = new DataView(buffer);
var channelCount = 1;//單聲道
var offset = 0;
var writeString = function (str) {
for (var i = 0; i < str.length; i++) {
data.setUint8(offset + i, str.charCodeAt(i));
}
}
// 資源交換文件標(biāo)識符
writeString('RIFF'); offset += 4;
// 下個地址開始到文件尾總字節(jié)數(shù),即文件大小-8
data.setUint32(offset, 36 + dataLength, true); offset += 4;
// WAV文件標(biāo)志
writeString('WAVE'); offset += 4;
// 波形格式標(biāo)志
writeString('fmt '); offset += 4;
// 過濾字節(jié),一般為 0x10 = 16
data.setUint32(offset, 16, true); offset += 4;
// 格式類別 (PCM形式采樣數(shù)據(jù))
data.setUint16(offset, 1, true); offset += 2;
// 通道數(shù)
data.setUint16(offset, channelCount, true); offset += 2;
// 采樣率,每秒樣本數(shù),表示每個通道的播放速度
data.setUint32(offset, sampleRate, true); offset += 4;
// 波形數(shù)據(jù)傳輸率 (每秒平均字節(jié)數(shù)) 單聲道×每秒數(shù)據(jù)位數(shù)×每樣本數(shù)據(jù)位/8
data.setUint32(offset, channelCount * sampleRate * (sampleBits / 8), true); offset += 4;
// 快數(shù)據(jù)調(diào)整數(shù) 采樣一次占用字節(jié)數(shù) 單聲道×每樣本的數(shù)據(jù)位數(shù)/8
data.setUint16(offset, channelCount * (sampleBits / 8), true); offset += 2;
// 每樣本數(shù)據(jù)位數(shù)
data.setUint16(offset, sampleBits, true); offset += 2;
// 數(shù)據(jù)標(biāo)識符
writeString('data'); offset += 4;
// 采樣數(shù)據(jù)總數(shù),即數(shù)據(jù)總大小-44
data.setUint32(offset, dataLength, true); offset += 4;
// 寫入采樣數(shù)據(jù)
if (sampleBits === 8) {
for (var i = 0; i < bytes.length; i++, offset++) {
var s = Math.max(-1, Math.min(1, bytes[i]));
var val = s < 0 ? s * 0x8000 : s * 0x7FFF;
val = parseInt(255 / (65535 / (val + 32768)));
data.setInt8(offset, val, true);
}
} else {
for (var i = 0; i < bytes.length; i++, offset += 2) {
var s = Math.max(-1, Math.min(1, bytes[i]));
data.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7FFF, true);
}
}
return new Blob([data], { type: 'audio/wav' });
}
};
//開始錄音
this.start = function () {
audioInput.connect(recorder);
recorder.connect(context.destination);
}
//停止
this.stop = function () {
recorder.disconnect();
}
//獲取音頻文件
this.getBlob = function () {
this.stop();
return audioData.encodeWAV();
}
//回放
this.play = function (audio) {
audio.src = window.URL.createObjectURL(this.getBlob());
}
//上傳
this.upload = function (url, callback) {
var fd = new FormData();
fd.append("file", this.getBlob());
var xhr = new XMLHttpRequest();
if (callback) {
xhr.upload.addEventListener("progress", function (e) {
callback('uploading', e);
}, false);
xhr.addEventListener("load", function (e) {
callback('ok', e);
}, false);
xhr.addEventListener("error", function (e) {
callback('error', e);
}, false);
xhr.addEventListener("abort", function (e) {
callback('cancel', e);
}, false);
}
xhr.open("POST", url);
xhr.send(fd);
xhr.onreadystatechange = function () {
console.log("語音識別結(jié)果:"+xhr.responseText)
$("#text").append('<h2>'+xhr.responseText+'</h2>');
}
}
//音頻采集
recorder.onaudioprocess = function (e) {
audioData.input(e.inputBuffer.getChannelData(0));
//record(e.inputBuffer.getChannelData(0));
}
};
//拋出異常
HZRecorder.throwError = function (message) {
alert(message);
throw new function () { this.toString = function () { return message; } }
}
//是否支持錄音
HZRecorder.canRecording = (navigator.getUserMedia != null);
//獲取錄音機(jī)
HZRecorder.get = function (callback, config) {
if (callback) {
if (navigator.getUserMedia) {
navigator.getUserMedia(
{ audio: true } //只啟用音頻
, function (stream) {
var rec = new HZRecorder(stream, config);
callback(rec);
}
, function (error) {
switch (error.code || error.name) {
case 'PERMISSION_DENIED':
case 'PermissionDeniedError':
HZRecorder.throwError('用戶拒絕提供信息。');
break;
case 'NOT_SUPPORTED_ERROR':
case 'NotSupportedError':
HZRecorder.throwError('瀏覽器不支持硬件設(shè)備。');
break;
case 'MANDATORY_UNSATISFIED_ERROR':
case 'MandatoryUnsatisfiedError':
HZRecorder.throwError('無法發(fā)現(xiàn)指定的硬件設(shè)備。');
break;
default:
HZRecorder.throwError('無法打開麥克風(fēng)。異常信息:' + (error.code || error.name));
break;
}
});
} else {
HZRecorder.throwErr('當(dāng)前瀏覽器不支持錄音功能。'); return;
}
}
}
window.HZRecorder = HZRecorder;
})(window);6、運(yùn)行效果

到此這篇關(guān)于Java 離線中文語音文字識別 的文章就介紹到這了,更多相關(guān)java 離線語音文字識別 內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Spring?Security權(quán)限控制的實(shí)現(xiàn)接口
這篇文章主要介紹了Spring?Security的很多功能,在這些眾多功能中,我們知道其核心功能其實(shí)就是認(rèn)證+授權(quán)。Spring教程之Spring?Security的四種權(quán)限控制方式2023-03-03
java實(shí)現(xiàn)把一個List集合拆分成多個的操作
這篇文章主要介紹了java實(shí)現(xiàn)把一個List集合拆分成多個的操作,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧2020-08-08
Java多線程程序中synchronized修飾方法的使用實(shí)例
synchronized關(guān)鍵字主要北用來進(jìn)行線程同步,這里我們主要來演示Java多線程程序中synchronized修飾方法的使用實(shí)例,需要的朋友可以參考下:2016-06-06
java編程實(shí)現(xiàn)并查集的路徑壓縮代碼詳解
這篇文章主要介紹了java編程實(shí)現(xiàn)并查集的路徑壓縮代碼詳解,具有一定借鑒價值,需要的朋友可以參考。2017-11-11
java使用淘寶API讀寫json實(shí)現(xiàn)手機(jī)歸屬地查詢功能代碼
本文介紹java使用淘寶API讀寫json實(shí)現(xiàn)手機(jī)歸屬地查詢功能,代碼簡單,大家可以參考使用2013-11-11
SpringBoot監(jiān)控模塊Actuator的用法詳解
Spring?Boot?Actuator?是?Spring?Boot?自帶的一個功能模塊,提供了一組已經(jīng)開箱即用的生產(chǎn)環(huán)境下常用的特性和服務(wù),比如應(yīng)用程序的健康檢查、信息暴露、度量收集、日志記錄等,本文將給大家詳細(xì)SpringBoot監(jiān)控模塊Actuator的用法2023-06-06

