简体   繁体   English

NodeJS:将立体声 PCM 波 stream 捕获到 mono AudioBuffer

[英]NodeJS: Capturing a stereo PCM wave stream into mono AudioBuffer

I'm recording audio from nodejs using node-microphone (which is just a javascript interface for arecord), and want to store the stream chunks in an AudioBuffer using web-audio-api (which is a nodejs implementation of the Web Audio API).我正在使用node-microphone录制来自 nodejs 的音频(这只是用于记录的 javascript 接口),并希望使用web-audio-api将 stream 块存储在AudioBuffer中(这是 ZC6E1939Audio-api 的 nodejs 实现) .

My audio source has two channels while my AudioBuffer has only one (in purpose).我的音频源有两个通道,而我的AudioBuffer只有一个(故意)。

This is my working configuration for recording audio with arecord through my USB sound card (I'm using a Raspberry pi 3 running on Raspbian buster):这是我通过 USB 声卡使用 arecord 录制音频的工作配置(我使用的是在 Raspbian buster 上运行的 Raspberry pi 3):

arecord -D hw:1,0 -c 2 -f S16_LE -r 44100

Running this command with an output path and playing the resulting wav file with aplay works just fine.使用 output 路径运行此命令并使用 aplay 播放生成的 wav 文件就可以了。 So node-microphone is able to record audio with these parameters, and at the end I get a nodejs readable stream flowing wave data.所以 node-microphone 能够使用这些参数录制音频,最后我得到一个 nodejs 可读的 stream 流动波数据。

But

I'm struggling doing the bridge from the stream chunks ( Buffer instances) to the AudioBuffer .我正在努力做从 stream 块( Buffer实例)到AudioBuffer的桥梁。 More precisely;更确切地说; I'm not sure of the format of the incoming data, not sure of the destination format, and not sure of how I would do the conversion whatever:我不确定传入数据的格式,不确定目标格式,也不确定如何进行转换:

The stream chunks are Buffer s so they also are Uint8Array s. stream 块是Buffer ,所以它们也是Uint8Array Regarding my configuration, I guess they are binary representations of 16 bits signed integers (little endian, I don't know what it means).关于我的配置,我猜它们是 16 位有符号整数的二进制表示(小端,我不知道它是什么意思)。

The AudioBuffer holds multiple buffers (one per channel, so only one in my case) that I can access as Float32Array s by calling AudioBuffer.prototype.getChannelData() . AudioBuffer包含多个缓冲区(每个通道一个,因此在我的情况下只有一个),我可以通过调用AudioBuffer.prototype.getChannelData()作为Float32Array访问这些缓冲区。MDN also says:MDN还说:

The buffer contains data in the following format: non-interleaved IEEE754 32-bit linear PCM with a nominal range between -1 and +1, that is, 32bits floating point buffer, with each samples between -1.0 and 1.0.缓冲区包含以下格式的数据:非交错 IEEE754 32 位线性 PCM,标称范围在 -1 和 +1 之间,即 32 位浮点缓冲区,每个样本在 -1.0 和 1.0 之间。

The point is to find what I have to extract from the incoming Buffer s and how I should transform it so it's suitable for the Float32Array destination (and remains valid wave data), knowing that the audio source is stereo and the AudioBuffer isn't.关键是要找到我必须从传入的Buffer中提取的内容以及我应该如何对其进行转换以使其适合Float32Array目标(并且仍然是有效的波形数据),知道音频源是立体声而AudioBuffer不是。

My best contender so far was the Buffer.prototype.readFloatLE() method whose name looks like it would solve my problem, but this wasn't a success (just noise).到目前为止,我最好的竞争者是Buffer.prototype.readFloatLE()方法,它的名字看起来可以解决我的问题,但这并不成功(只是噪音)。

My first try (before doing research) was just to naively copy buffer data to Float32Array and interleaving indexes to handle stereo/mono conversion.我的第一次尝试(在进行研究之前)只是天真地将缓冲区数据复制到Float32Array并交错索引以处理立体声/单声道转换。 Obviously it mostly produced noise but I could hear some of the sound I recorded (incredibly distorted but surely present) so I guess I should mention that.显然它主要产生噪音,但我可以听到我录制的一些声音(难以置信的失真但肯定存在)所以我想我应该提到这一点。

This is a simplified version of my naive try (I'm aware this is not meant to work well, I just include it in my question as a base of discussion):这是我天真的尝试的简化版本(我知道这并不意味着效果很好,我只是将它作为讨论的基础包含在我的问题中):

import { AudioBuffer } from 'web-audio-api'
import Microphone from 'node-microphone'

const rate = 44100
const channels = 2 // Number of source channels

const microphone = new Microphone({ // These parameters result to the arecord command above
  channels,
  rate,
  device: 'hw:1,0',
  bitwidth: 16,
  endian: 'little',
  encoding: 'signed-integer'
})

const audioBuffer = new AudioBuffer(
  1, // 1 channel
  30 * rate, // 30 seconds buffer
  rate
})

const chunks = []
const data = audioBuffer.getChannelData(0) // This is the Float32Array
const stream = microphone.startRecording()

setTimeout(() => microphone.stopRecording(), 5000) // Recording for 5 seconds

stream.on('data', chunk => chunks.push(chunk))

stream.on('close', () => {
  chunks.reduce((offset, chunk) => {
    for (var index = 0; index < chunk.length; index += channels) {
      let value = 0

      for (var channel = 0; channel < channels; channel++) {
        value += chunk[index + channel]
      }

      data[(offset + index) / channels] = value / channels // Average value from the two channels
    }

    return offset + chunk.length // Since data comes as chunks, this offsets AudioBuffer's index
  }, 0)
})

I would be really grateful if you could help:)如果您能提供帮助,我将不胜感激:)

So the input stereo signal is coming as 16 bits signed integers, interleaving left and right channels, meaning that the corresponding buffers (8 bits unsigned integers) have this format for a single stereo sample:因此输入立体声信号以 16 位有符号整数的形式出现,左右声道交错,这意味着相应的缓冲区(8 位无符号整数)对于单个立体声样本具有以下格式:

[LEFT ] 8 bits (LSB)
[LEFT ] 8 bits (MSB)
[RIGHT] 8 bits (LSB)
[RIGHT] 8 bits (MSB)

Since arecord is configured with little endian format, the Least Significant Byte (LSB) comes first, and the Most Significant Byte (MSB) comes next.由于 arecord 配置为little endian格式,因此最低有效字节(LSB) 位于第一位,最高有效字节(MSB) 紧随其后。

The AudioBuffer single channel buffer, represented by a Float32Array , expects values between -1 and 1 (one value per sample).Float32Array表示的AudioBuffer单通道缓冲区需要介于-11之间的值(每个样本一个值)。

So to map values from the input Buffer to the destination Float32Array , I had to use the Buffer.prototype.readInt16LE(offset) method incrementing the bytes offset parameter by 4 each sample (2 left bytes + 2 right bytes = 4 bytes), and interpolating input values from range [-32768;+32768] (16 bits signed integer range) to range [-1;+1] :因此,对于从输入Buffer到目标Float32Array的 map 值,我必须使用Buffer.prototype.readInt16LE(offset)方法将字节offset参数增加 4 每个样本(2 个左字节 + 2 个右字节 = 4 个字节),并且将输入值从范围[-32768;+32768] (16 位有符号 integer 范围)插值到范围[-1;+1]

import { AudioBuffer } from 'web-audio-api'
import Microphone from 'node-microphone'

const rate = 44100
const channels = 2 // 2 input channels

const microphone = new Microphone({
  channels,
  rate,
  device: 'hw:1,0',
  bitwidth: 16,
  endian: 'little',
  encoding: 'signed-integer'
})

const audioBuffer = new AudioBuffer(
  1, // 1 channel
  30 * rate, // 30 seconds buffer
  rate
})

const chunks = []
const data = audioBuffer.getChannelData(0)
const stream = microphone.startRecording()

setTimeout(() => microphone.stopRecording(), 5000) // Recording for 5 seconds

stream.on('data', chunk => chunks.push(chunk))

stream.on('close', () => {
  chunks.reduce((offset, chunk) => {
    for (var index = 0; index < chunk.length; index += channels + 2) {
      let value = 0

      for (var channel = 0; channel < channels; channel++) {
        // Iterates through input channels and adds the values
        // of all the channel so we can compute the
        // average value later to reduce them into a mono signal

        // Multiplies the channel index by 2 because
        // there are 2 bytes per channel sample

        value += chunk.readInt16LE(index + channel * 2)
      }

      // Interpolates index according to the number of input channels
      // (also divides it by 2 because there are 2 bytes per channel sample)
      // and computes average value as well as the interpolation
      // from range [-32768;+32768] to range [-1;+1]
      data[(offset + index) / channels / 2] = value / channels / 32768
    }

    return offset + chunk.length
  }, 0)
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM