简体   繁体   English

将表示为 numpy 数组的音频数据从 python 发送到 Javascript

[英]Send Audio data represent as numpy array from python to Javascript

I have a TTS (text-to-speech) system that produces audio in numpy-array form whose data type is np.float32 .我有一个 TTS(文本到语音)系统,它以 numpy-array 形式生成音频,其数据类型为np.float32 This system is running in the backend and I want to transfer the data from the backend to the frontend to be played when a certain event happens.该系统在后端运行,我想将数据从后端传输到前端以在某个事件发生时播放。

The obvious solution for this problem is to write the audio data on disk as a wav file and then pass the path to the frontend to be played.这个问题的显而易见的解决方案是将音频数据作为wav文件写入磁盘,然后将路径传递给前端进行播放。 This worked fine, but I don't want to do that for administrative reasons.这工作正常,但出于管理原因我不想这样做。 I just want to transfer only the audio data (numpy array) to the frontend.我只想将音频数据(numpy 数组)传输到前端。

What I have done till now is the following:到目前为止,我所做的如下:

backend后端

text = "Hello"
wav, sr = tts_model.synthesize(text)
data = {"snd", wav.tolist()}
flask_response = app.response_class(response=flask.json.dumps(data),
                                    status=200,
                                    mimetype='application/json' )
# then return flask_response

frontend前端

// gets wav from backend
let arrayData = new Float32Array(wav);
let blob = new Blob([ arrayData ]);
let url = URL.createObjectURL(blob);
let snd = new Audio(url);
snd.play()

That what I have done till now, but the JavaScript throws the following error:这就是我到目前为止所做的,但 JavaScript 抛出以下错误:

Uncaught (in promise) DOMException: Failed to load because no supported source was found.

This is the gist of what I'm trying to do.这是我正在尝试做的事情的要点。 I'm so sorry, you can't repreduce the error as you don't have the TTS system, so this is an audio file generated by it which you can use to see what I'm doing wrong.很抱歉,由于您没有 TTS 系统,因此无法消除错误,因此这是由它生成的音频文件,您可以使用它来查看我做错了什么。

Other things I tried:我尝试过的其他事情:

  • Change the audio datatype to np.int8 , np.int16 to be casted in the JavaScript by Int8Array() and int16Array() respectively.将音频数据类型更改为np.int8np.int16以分别通过Int8Array()int16Array()在 JavaScript 中进行Int8Array()
  • tried different types when creating the blob such as {"type": "application/text;charset=utf-8;"} and {"type": "audio/ogg; codecs=opus;"} .在创建blob时尝试了不同的类型,例如{"type": "application/text;charset=utf-8;"}{"type": "audio/ogg; codecs=opus;"}

I have been struggling in this issue for so long, so any help is appriciated !!我一直在这个问题上挣扎了很长时间,所以任何帮助都值得感谢!!

Your sample as is does not work out of the box.您的样品不能开箱即用。 (Does not play) (不玩)

However with:但是:

  • StarWars3.wav: OK. StarWars3.wav:好的。 retrieved from cs.uic.educs.uic.edu检索
  • your sample encoded in PCM16 instead of PCM32: OK (check the wav metadata)您的样本以 PCM16 而不是 PCM32 编码:好的(检查 wav 元数据)

Flask烧瓶

from flask import Flask, render_template, json
import base64

app = Flask(__name__)

with open("sample_16.wav", "rb") as binary_file:
    # Read the whole file at once
    data = binary_file.read()
    wav_file = base64.b64encode(data).decode('UTF-8')

@app.route('/wav')
def hello_world():
    data = {"snd": wav_file}
    res = app.response_class(response=json.dumps(data),
        status=200,
        mimetype='application/json')
    return res

@app.route('/')
def stat():
    return render_template('index.html')

if __name__ == '__main__':
    app.run(debug = True)

js js


  <audio controls></audio>
  <script>
    ;(async _ => {
      const res = await fetch('/wav')
      let {snd: b64buf} = await res.json()
      document.querySelector('audio').src="data:audio/wav;base64, "+b64buf;
    })()
  </script>

Original Poster Edit原始海报编辑

So, what I ended up doing before (using this solution) that solved my problem is to:因此,我之前(使用此解决方案)最终解决了我的问题的是:

  • First, change the datatype from np.float32 to np.int16 :首先,将数据类型从np.float32更改为np.int16
wav = (wav * np.iinfo(np.int16).max).astype(np.int16)
  • Write the numpy array into a temporary wav file using scipy.io.wavfile :使用scipy.io.wavfile将 numpy 数组写入临时 wav 文件:
from scipy.io import wavfile
wavfile.write(".tmp.wav", sr, wav)
  • Read the bytes from the tmp file:从 tmp 文件中读取字节:
# read the bytes
with open(".tmp.wav", "rb") as fin:
    wav = fin.read()
  • Delete the temporary file删除临时文件
import os
os.remove(".tmp.wav")

Convert wav array of values to bytes将 wav 值数组转换为字节

Right after synthesis you can convert numpy array of wav to byte object then encode via base64.合成后,您可以将 wav 的 numpy 数组转换为字节对象,然后通过 base64 进行编码。

import io
from scipy.io.wavfile import write

bytes_wav = bytes()
byte_io = io.BytesIO(bytes_wav)
write(byte_io, sr, wav)
wav_bytes = byte_io.read()

audio_data = base64.b64encode(wav_bytes).decode('UTF-8')

This can be used directly to create html audio tag as source (with flask):这可以直接用于创建 html 音频标签作为源(使用烧瓶):

<audio controls src="data:audio/wav;base64, {{ audio_data }}"></audio>

So, all you need is to convert wav , sr to audio_data representing raw .wav file.因此,您只需要将wav , sr转换为表示原始.wav文件的audio_data And use as parameter of render_template for your flask app.并用作您的烧瓶应用程序的render_template参数。 (Solution without sending) (不发送的解决方案)

Or if you send audio_data , in .js file where you accept response, use audio_data to construct url (would be placed as src attribute like in html):或者,如果您在接受响应的.js文件中发送audio_data ,请使用audio_data构造 url (将像在 html 中一样作为src属性放置):

// get audio_data from response

let snd = new Audio("data:audio/wav;base64, " + audio_data);
snd.play()

because:因为:

Audio(url) Return value: A new HTMLAudioElement object, configured to be used for playing back the audio from the file specified by url.The new object's preload property is set to auto and its src property is set to the specified URL or null if no URL is given. Audio(url)返回值:一个新的 HTMLAudioElement 对象,配置用于播放Audio(url)指定的文件中的音频。新对象的 preload 属性设置为 auto,其 src 属性设置为指定的 URL或 null 如果没有给出网址。 If a URL is specified, the browser begins to asynchronously load the media resource before returning the new object.如果指定了 URL,浏览器会在返回新对象之前开始异步加载媒体资源。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM