[英]Converting Raw PCM Data to RIFF WAV
I'm attempting to convert raw audio data from one format to another for the purposes of voice recognition. 我正在尝试将原始音频数据从一种格式转换为另一种格式,以进行语音识别。
20ms
chunks in the format: 48Khz, 16-bit stereo signed BigEndian PCM
. 20ms
块格式接收音频,格式为: 48Khz, 16-bit stereo signed BigEndian PCM
。 InputStream
in RIFF (little-endian) WAVE audio, 16-bit, mono 16,000Hz
RIFF (little-endian) WAVE audio, 16-bit, mono 16,000Hz
的InputStream
RIFF (little-endian) WAVE audio, 16-bit, mono 16,000Hz
Audio data is received in a byte[]
with length 3840
. 在长度为
3840
的byte[]
接收音频数据。 This byte[]
array contains 20ms
of audio in format 1 described above. 这个
byte[]
数组包含20ms
的上述格式1的音频。 That means that 1 second of this audio is 3840 * 50
, which is 192,000
. 这意味着该音频的1秒为
3840 * 50
,即192,000
。 So that's 192,000
samples per second. 因此,每秒
192,000
样本。 This makes sense, 48KHz
sample rate, times 2 (96K samples) because a byte is 8 bits, and our audio is 16 bit, and times an additional two for stereo. 这是
48KHz
,因为48KHz
采样率乘以2(96K采样),因为一个字节是8位,而我们的音频是16位,另外是立体声的2倍。 So 48,000 * 2 * 2 = 192,000
. 因此
48,000 * 2 * 2 = 192,000
。
So I first call this method every time an audio packet is received: 因此,每当收到音频数据包时,我首先调用此方法:
private void addToPacket(byte[] toAdd) {
if(packet.length >= 576000 && !done) {
System.out.println("Processing needs to occur...");
getResult(convertAudio());
packet = null; // reset the packet
return;
}
byte[] newPacket = new byte[packet.length + 3840];
// copy old packet onto new temp array
System.arraycopy(packet, 0, newPacket, 0, packet.length);
// copy toAdd packet onto new temp array
System.arraycopy(toAdd, 0, newPacket, 3840, toAdd.length);
// overwrite the old packet with the newly resized packet
packet = newPacket;
}
This will just add new packets onto one big byte[] until the byte[] contains 3 seconds of audio data (576,000 samples, or 192000 * 3). 这只会将新的数据包添加到一个大byte []上,直到byte []包含3秒的音频数据(576,000个样本或192000 * 3)。 3 seconds of audio data is enough time (just a guess) to detect if the user said the bot's activation hot word like "hey computer.".
3秒钟的音频数据足够时间(只是一个猜测),可以检测用户是否说出了机器人的激活热词,例如“嘿计算机”。 Here's how I convert the sound data:
这是我转换声音数据的方法:
private byte[] convertAudio() {
// STEP 1 - DROP EVERY OTHER PACKET TO REMOVE STEREO FROM THE AUDIO
byte[] mono = new byte[96000];
for(int i = 0, j = 0; i % 2 == 0 && i < packet.length; i++, j++) {
mono[j] = packet[i];
}
// STEP 2 - DROP EVERY 3RD PACKET TO CONVERT TO 16K HZ Audio
byte[] resampled = new byte[32000];
for(int i = 0, j = 0; i % 3 == 0 && i < mono.length; i++, j++) {
resampled[j] = mono[i];
}
// STEP 3 - CONVERT TO LITTLE ENDIAN
ByteBuffer buffer = ByteBuffer.allocate(resampled.length);
buffer.order(ByteOrder.BIG_ENDIAN);
for(byte b : resampled) {
buffer.put(b);
}
buffer.order(ByteOrder.LITTLE_ENDIAN);
buffer.rewind();
for(int i = 0; i < resampled.length; i++) {
resampled[i] = buffer.get(i);
}
return resampled;
}
And finally, attempt to recognize the speech: 最后,尝试识别语音:
private void getResult(byte[] toProcess) {
InputStream stream = new ByteArrayInputStream(toProcess);
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
}
The problem I'm having is that CMUSphinx
doesn't crash or provide any error messages, it just comes up with an empty hypothesis every 3 seconds. 我遇到的问题是
CMUSphinx
不会崩溃或不提供任何错误消息,它每3秒就会提出一个空的假设。 I'm not exactly sure why, but my guess is that I didn't convert the sound correctly. 我不确定为什么,但是我的猜测是我没有正确转换声音。 Any ideas?
有任何想法吗? Any help would be greatly appreciated.
任何帮助将不胜感激。
So, there's actual a much better, in-house solution for converting audio from a byte[]
. 因此,实际上存在一个更好的内部解决方案,用于从
byte[]
转换音频。
Here's what I found works pretty well: 这是我发现的效果很好:
// Specify the output format you want
AudioFormat target = new AudioFormat(16000f, 16, 1, true, false);
// Get the audio stream ready, and pass in the raw byte[]
AudioInputStream is = AudioSystem.getAudioInputStream(target, new AudioInputStream(new ByteArrayInputStream(raw), AudioReceiveHandler.OUTPUT_FORMAT, raw.length));
// Write a temporary file to the computer somewhere, this method will return a InputStream that can be used for recognition
try {
AudioSystem.write(is, AudioFileFormat.Type.WAVE, new File("C:\\filename.wav"));
} catch(Exception e) {}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.