简体   繁体   中英

How to convert the float32Array format of native html5 recorded audio to proper bytes for Google Speech-to-Text service?

If you follow this tutorial: https://medium.com/ideas-at-igenius/delivering-a-smooth-cross-browser-speech-to-text-experience-b1e1f1f194a2 you will manage to create a script processor to which you add a listener

scriptProcessor = inputPoint.context.createScriptProcessor(bufferSize, in_channels, out_channels)
//...
scriptProcessor.addEventListener('audioprocess', streamAudioData)

Inside the callback by calling this line: callback_param.inputBuffer.getChannelData(0) one receives a javascript Float32Array which by looking at the data seems to contain float numbers from -1.0 to +1.0

Therefore streaming this to the backend which in turn streams it to Google Speech-To-Text service you are getting nothing (as expected)

Google Speech-To-Text service, at least in Python, for streaming input expects a byte-string in a wav format which contains the sound in the rate that it was specified (ie 16000Hz). Note that if in the backend you stream it a file this is working ok.

This conversion has failed: Float32Array -> Int16Array -> byte-string

Has anyone find what are the appropriate conversions for the above to work ?

Alternatively are you aware of a simpler more robust path for: Microphone in browser -> stream data via websocket to backend server -> stream data to Google Speech-To-Input service -> get responses as expected ?


Edit: Adding python code for Recognition Config of Google speech api

config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code=self.language_code)

Ok, did some digging, found the actual documentation which has the proper information.

LINEAR16 - Uncompressed 16-bit signed little-endian samples (Linear PCM).

The key parts being:

  • 16-bits per sample
  • Signed
  • Little-endian

So, what you need to do is scale your floating point values ( -1.0 ... 1.0 ) to integers between -32786 and 32767 .

There isn't any built-in JavaScript method to do this for you. Your conversions between Float32Array and Int16Array don't work because you'll just end up with values approximating -1 , 0 , and 1 . The other reason you can't use Int16Array is because it's endianness is platform dependent !

What you need to do is get cozy with ArrayBuffers and manipulate them with a DataView . Take each sample, do some math, write the bytes, move to the next sample. When you're done, both XHR and the Fetch API support sending an ArrayBuffer as the HTTP request body. Or, you can instantiate a new Blob with that ArrayBuffer and do other things with it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM