简体   繁体   English

将音频文件转换为线性 PCM 16 位

[英]convert audio file to Linear PCM 16-bit

I am trying to send an audio file through a websocket, and I realised that in order to do so i need to convert the mp3 file to a Linear PCM 16-bit code, but i cant find a way to do so.我正在尝试通过 websocket 发送音频文件,我意识到为此我需要将 mp3 文件转换为线性 PCM 16 位代码,但我找不到这样做的方法。

here is what i want to do:这是我想做的:

 let mp3File = // the 16-bit pcm file 

    ws.on('message', async(msg) => {
        if (typeof msg === "string") {

        } else if (recognizeStream) {
            recognizeStream.write(msg);
        }
        ws.send(mp3File) <== stream back the audio file
    });
});

some background, the stream is a phone call (via vonage api) so ny ws connected to phone call and hear the user input, and then after some logic on my server i want to play to the user a mp3 file that is a local file in my server, via ws.send().一些背景,stream 是一个电话(通过 vonage api)所以 ny ws 连接到电话并听到用户输入,然后在我的服务器上进行一些逻辑之后,我想向用户播放一个本地文件的 mp3 文件在我的服务器中,通过 ws.send()。

-----------update-------- - - - - - -更新 - - - -

now, if i send the pcm data from the stream (the raw audio from phone call) its works (the server echoing the phone call ) so i want to convert the mp3 file to the same format so i could send it to via ws.send().现在,如果我从 stream 发送 pcm 数据(来自电话的原始音频)它的工作原理(服务器回显电话)所以我想将 mp3 文件转换为相同的格式,以便我可以通过 ws 发送它。发送()。

-----------update 2-------- ------------更新2--------

after making my audio file at the right format which is: " Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate, and a 20ms frame size "在以正确的格式制作我的音频文件后:“Linear PCM 16-bit, with a 8kHz or a 16kHz sample rate, and a 20ms frame size”

i am trying to send the file trough the web socket but i dont know how to do so, i have the file in the project folder but i dont know how to send it via websocket, i looked for how to do so but i dident find anything.我正在尝试通过 web 套接字发送文件但我不知道该怎么做任何事物。

i am trying to do what specified here:我正在尝试执行此处指定的操作: 在此处输入图像描述

First let's understand what this means:首先让我们了解这意味着什么:

Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate, and a 20ms frame size线性 PCM 16 位,采样率为 8kHz 或 16kHz,帧大小为 20ms

They are talking about 2 things here:他们在这里谈论两件事:

  1. The format of audio data, which is "Linear PCM 16-bit, with either a 8kHz or a 16kHz sample rate"音频数据的格式,即“线性 PCM 16 位,采样率为 8kHz 或 16kHz”
  2. How you send this audio data to them and how they send it to you: in chunks of audio data worth 20ms frames您如何将此音频数据发送给他们以及他们如何将其发送给您:以价值 20 毫秒帧的音频数据块

Based on the audio format, if you choose "16bit Linear PCM with sample rate of 16K" implies:根据音频格式,如果选择“16bit Linear PCM with sample rate of 16K”意味着:

  • samplerate = 16000采样率 = 16000
  • samplewidth = 16 bits = 2 byte样本宽度 = 16 位 = 2 字节

So an audio chunk of 1 second will contain bytes = (16000 * 2) = 32000 bytes this means a 20ms/0.02s frame of audio will be equivalent to (32000*0.2) = 640 bytes因此 1 秒的音频块将包含字节 = (16000 * 2) = 32000 字节,这意味着 20 毫秒/0.02 秒的音频帧将相当于 (32000*0.2) = 640 字节

There are 2 things needed:需要做两件事:

  1. convert mp3 to wav.将mp3转换为wav。 Install ffmpeg on your system and run this command在您的系统上安装 ffmpeg 并运行此命令
    ffmpeg -i filename.mp3 -ar 16000 -sample_fmt s16 output.wav
    This converts your filename.mp3 to output.wav which will be Linear PCM 16-bit in 16K samplerate这会将您的filename.mp3转换为output.wav ,这将是 16K 采样率的线性 PCM 16 位

  2. In your code, when you send audio back, you need to stream it as chunks of 640 bytes, not the entire file data in one shot .在您的代码中,当您发回音频时,您需要将stream 作为 640 字节的块,而不是一次性发送整个文件数据 There are 3 options:有 3 个选项:

    • run a loop to write write all the audio to the websocket but in chunks of 640 bytes.运行一个循环,将所有音频写入 websocket,但以 640 字节为单位。 but this has an issue, Nexmo will buffer only first 20s of audio.但这有一个问题,Nexmo 只会缓冲前 20 秒的音频。 Anything more than that will be discarded超出此范围的任何内容都将被丢弃
    • start an async task that runs every 20ms and writes 640 bytes of data to websocket.启动一个每 20ms 运行一次的异步任务,并将 640 字节的数据写入 websocket。
    • write when you get audio from nexmo (this is the one I will show) Since nexmo will send you 640 bytes every 20ms, you can just send back 640 bytes at same time.当您从nexmo 获取音频时写入(这是我将展示的) 因为nexmo 将每20 毫秒向您发送640 字节,您可以同时发送回640 字节。

I'm writing this example using npm websocket package.我正在使用 npm websocket package 编写此示例。

var fs = require('fs');
var binaryData = fs.readFileSync('output.wav');
var start = 44 // discard the wav header
var chunkSize = 640

...

// ws is a websocket connection object
connection.on('message', function(message) {
  if (message.type === 'utf8') {
    // handle a text message here
  }
  else if (message.type === 'binary') {
    // print length  of audio sent by nexmo. will be 640 for 16K and 320 for 8K 
    console.log('Received Binary Message of ' + message.binaryData.length + ' bytes');

    if (start >= binaryData.length) {
      // slice a chunk and send
      toSend = binaryData.slice(start, start + chunkSize)
      start = start + chunkSize
      connection.sendBytes(toSend); 
      console.log('Sent Binary Message of ' + toSend.length + ' bytes');
    } 
  } ...
  
});

Remember, there will be some delay from the point you send the audio from your server to nexmo, and you hearing on other side.请记住,从您将音频从服务器发送到 nexmo 到您在另一端收听的时间点会有一些延迟。 It can vary from half a second to even more depending on the location of Nexmo's datacentre, of the server where you run your code, network speed etc. I have observed it to be close to 0.5 sec.根据 Nexmo 数据中心的位置、运行代码的服务器的位置、网络速度等,它可以从半秒到更多。我观察到它接近 0.5 秒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM