Why does storing a stream of base64 data not work?

Question

Using a react native audio record library I'm trying to record a 3 second .wav audio file. During the recording, base64 data can be received with this 'connnection'/'function', which is activated every time it receives a chunk of data from the recording (not sure what you would call it):

AudioRecord.on('data', data => {
  // base64-encoded audio data chunks
});

I'm doing this within a function that is activated when a button is pressed. The problem appears when I'm trying to store all the data I receive in a variable like this:

var tempString = '';
AudioRecord.on('data', data => {
  tempString += data;
});

For some reason when I console.log tempString after the recording is finished (using settimeout), it seems to have only stored the data of the first time it received any data. Also, when I create a variable count that counts up every time data is received, it just counts up normally.

When I console.log the data, it does print out all the data. I've tried pushing to an array and listening to when a variable is changing but everything I try results in it just storing the first piece of data I receive. How do I store all the data I receive in a variable? Is this even possible?

Answer 1

Background: Base64 Padding

In Base64 each output character represents 6 bits of the input (2 ⁶ = 64). When you're encoding data, the first step is to split the bits of the input into 6-bit chunks. Let's use as an example the input string " hello " (encoded to binary as ASCII or UTF-8). If we try to split its bits into 6-bit chunks, we'll realize that it doesn't divide evenly: the last chunk only has 4 bits.

h         e         l         l         o
01101000  01100101  01101100  01101100  01101111 
011010 000110 010101 101100 011011 000110 1111?? 
a      G      V      s      b      G      ?

We can pad out the input stream with 0 s fill the missing bits.

011010 000110 010101 101100 011011 000110 111100
a      G      V      s      b      G      8

This gives us "aGVsbG8" , and a quick sanity-test in JavaScript confirms that atob("aGVsbG8") === "hello" . No problem yet.

This works if we're decoding this chunk on its own, because we know that once we reach the end of the chunk, the remaining two bits that we haven't decoded must be padding, and can be ignored. However, if this is just part of a stream, immediately followed by more base64 data, we can't tell that we're at the end of a chunk!

For example, let's try concatenating aGVsbG8 with itself, and decoding aGVsbG8aGVsbG8 as a single value.

a      G      V      s      b      G      8      a      G      V      s      b      G      8     
011010 000110 010101 101100 011011 000110 111100 011010 000110 010101 101100 011011 000110 111100
                                              ||- padding that should be ignored
01101000 01100101 01101100 01101100 01101111  00011010 00011001 01011011 00011011 00011011 1100????
h        e        l        l        o         \x1A     \x19     [        \x1B     \x1B     ?

The two padding bits cause the decoding stream to become misaligned, and the remaining data is mangled.

In these cases, the standard solution is to add zero to two = padding characters after the encoded data. Each = represents six bits of padding data. These mark the end of an encoded value, but they also allow the alignment to be maintained between the input data and output data: with appropriate padding in the stream, every four characters chunk of encoded data can be unambiguously decoded into one to three bytes of decoded data, without separate knowledge of the data alignment. Our example requires six bits of padding to maintain alignment, giving us aGVsbG8= . If we concatenate that with itself, we can see that decoding is now successful:

a      G      V      s      b      G      8      =      a      G      V      s      b      G      8       =    
011010 000110 010101 101100 011011 000110 111100 PPPPPP 011010 000110 010101 101100 011011 000110 111100 PPPPPP
01101000 01100101 01101100 01101100 01101111  00PPPPPP 01101000 01100101 01101100 01101100 01101111  00PPPPPP
h        e        l        l        o         padding  h        e        l        l        o         padding

Problem: Incapable Decoders

With fully-capable encoders and decoders, your approach should work fine. Each chunk should include the appropriate padding, and the decoder should be able to skip over it and assemble the correct result.

Unfortunately, a lot of the most common base64 decoding libraries do not support this.

Node's Buffer just assumes that it's getting a single encoded value, so when its see padding (possibly at the end of the first chunk) its assumes it's the end of the value, and stops decoding, discarding the rest of your data.

> Buffer.from('aGVsbG8=', 'base64')
<Buffer 68 65 6c 6c 6f>
> Buffer.from('aGVsbG8=aGVsbG8=', 'base64')
<Buffer 68 65 6c 6c 6f>

The browser's atob throws an error instead of silently ignoring data:

> atob("aGVsbG8=")
"hello"
> atob("aGVsbG8=aGVsbG8=")
InvalidCharacterError: String contains an invalid character

Solutions

Manually Split on Padding

If we keep your approach of storing all of the data in a single string, we need to take responsibility for splitting on padding ourselves. ( NB: Generally, repeatedly appending to strings can be problematic because it can get very slow if the JavaScript engine fails to optimize it. It may not be a problem in practice here, but it's often avoided.)

We can do this using a regular expression that matches a sequence of one or more = padding characters,

const input = "aGVsbG8=aGVsbG8=aGVsbG8=aGVsbG8=";
const delimiter = /=+/g;

splitting the string on that,

const pieces = input.split(delimiter);

decoding the pieces individually,

const decodedPieces = pieces.map(piece => Buffer.from(piece, 'base64'));

and then combining their outputs in a single step (more efficient than doing it incrementally).

const decoded = Buffer.concat(decodedPieces);
console.log(decoded.toString('ascii'));

'hellohellohellohello'

Store Chunks Separately

However, in your case it might be simpler to just store the chunks in an array from the beginning, and skip the concatenation and splitting altogether.

const decodedPieces = [];
AudioRecord.on('data', data => {
  decodedPieces.push(Buffer.from(data, 'base64'));
});

// later, when you need to collect the data...
const decoded = Buffer.concat(decodedPieces);

Why does storing a stream of base64 data not work?

Question

1 answers

solution1
1 ACCPTED 2019-01-13 23:01:39

Background: Base64 Padding

Problem: Incapable Decoders

Solutions

Manually Split on Padding

Store Chunks Separately

Why does storing a stream of base64 data not work?

Question

1 answers

solution1 1 ACCPTED 2019-01-13 23:01:39

Background: Base64 Padding

Problem: Incapable Decoders

Solutions

Manually Split on Padding

Store Chunks Separately

solution1
1 ACCPTED 2019-01-13 23:01:39