简体   繁体   中英

AWS Transcribe Streaming BadRequestException: "Could not decode the audio stream..."

I'm building a Transcribe Streaming app in Dart/Flutter with websockets. When I stream the test audio (pulled from a mono, 16kHz, 16bit signed little endian WAV file), I get...

BadRequestException: Could not decode the audio stream that you provided. Check that the audio stream is valid and try your request again.

As a test I'm using a file to stream the audio. I'm sending 32k data bytes every second (roughly simulating a realtime microphone stream). I even get the error if I stream all 0x00 or all 0xFF or random bytes. If I divide the chunk size to 16k and the interval time to 0.5s then it goes one more frame before erroring out...

As far as the data, I'm simply packing the bytes in the data portion of the EventStream frame literally as they are in the file. Clearly the Event Stream packaging is correct (the byte layout, the CRCs) or else I'd get an error indicating that, no?

What would indicate to AWSTrans that it is not decodable? Any other ideas on how to proceed with this?

thanks for any help...

Here's the code that does the packing. Full version is here (if you dare...It's a bit of a mess at the moment) https://pastebin.com/PKTj5xM2

Uint8List createEventStreamFrame(Uint8List audioChunk) {
  final headers = [
    EventStreamHeader(":content-type", 7, "application/octet-stream"),
    EventStreamHeader(":event-type", 7, "AudioEvent"),
    EventStreamHeader(":message-type", 7, "event")
  ];
  final headersData = encodeEventStreamHeaders(headers);
 
  final int totalLength = 16 + audioChunk.lengthInBytes + headersData.lengthInBytes;
  // final prelude = [headersData.length, totalLength];
  // print("Prelude: " + prelude.toString());
 
  // Convert a 32b int to 4 bytes
  List<int> int32ToBytes(int i) { return [(0xFF000000 & i) >> 24, (0x00FF0000 & i) >> 16, (0x0000FF00 & i) >> 8, (0x000000FF & i)]; }
 
  final audioBytes = ByteData.sublistView(audioChunk);
  var offset = 0;
  var audioDataList = <int>[];
  while (offset < audioBytes.lengthInBytes) {
    audioDataList.add(audioBytes.getInt16(offset, Endian.little));
    offset += 2;
  }
 
  final crc = CRC.crc32();
  final messageBldr = BytesBuilder();
  messageBldr.add(int32ToBytes(totalLength));
  messageBldr.add(int32ToBytes(headersData.length));
 
  // Now we can calc the CRC. We need to do it on the bytes, not the Ints
  final preludeCrc = crc.calculate(messageBldr.toBytes());
 
  // Continue adding data
  messageBldr.add(int32ToBytes(preludeCrc));
  messageBldr.add(headersData.toList());
  // messageBldr.add(audioChunk.toList());
  messageBldr.add(audioDataList);
  final messageCrc = crc.calculate(messageBldr.toBytes().toList());
  messageBldr.add(int32ToBytes(messageCrc));
  final frame = messageBldr.toBytes();
  //print("${frame.length} == $totalLength");
  return frame;
}

BadRequestException, at least in my case, refered to having the frame encoded incorrectly rather than the audio data being wrong.

AWS Event Stream Encoding details are here .

I had some issues with endianness and bytesize. You need to be very bit-saavy with the message encoding and the audio buffer. The audio needs to be 16bit/signed (int)/little-endian ( See here ). And those length params in the message wrapper are 32bit (4 bytes) BIG endian. ByteData is your friend here in Dart. Here's a snippet from my updated code:

final messageBytes = ByteData(totalLength);

...

for (var i=0; i<audioChunk.length; i++) {
  messageBytes.setInt16(offset, audioChunk[i], Endian.little);
  offset += 2;
}

Notice that the 16bit int is actually taking up 2 bytes positions. If you don't specify the Endian style then it will default to your systems which will get it wrong either for the header int encoding or the audio data...lose lose!

The best way to go about ensuring it is all correct is to write your decode functions which you'll need for the AWS response anyway and then decode your encoded frame and see if it comes out the same. Use test data for the audo like [-32000, -100, 0, 200 31000] or something like that so you can test the endianness, etc. is all correct.

here is my suggestions (too long to be put into comments). Feel free to tell me updated information so that I can further think about it.

could you please use Wireshark to look at the data that is transmitted? (not necessary, see next paragraph for alternative) Please examine them, and see whether the data on the wire (ie data that is being transmitted) is valid. For example, manually record those data bytes and open it with some audio player.

or, instead of using wireshark, please write the data (that you originally transfer through websocket) onto a local file. open that local file, and see whether that is a valid audio. (ps notice that some audio players can tolerate malformed formats)

secondly, could you please try that, if you put all bytes of that originally good wav file in one packet of websocket, can it be played, or error still happen?

thirdly, this may not be a best practice... you know, wav is uncompressed and is quite huge. you may want something like AAC file format. Or, more advanced, the OPUS format. They both work well for streaming, for example, AAC has a sub-format called ADTS which packs into packets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM