简体   繁体   English

为什么存储 base64 数据流不起作用?

[英]Why does storing a stream of base64 data not work?

Using a react native audio record library I'm trying to record a 3 second .wav audio file.使用本机音频记录库,我正在尝试录制 3 秒 .wav 音频文件。 During the recording, base64 data can be received with this 'connnection'/'function', which is activated every time it receives a chunk of data from the recording (not sure what you would call it):在录制过程中,可以使用此“连接”/“功能”接收 base64 数据,每次从录制中接收到一大块数据时都会激活它(不知道你会怎么称呼它):

AudioRecord.on('data', data => {
  // base64-encoded audio data chunks
});

I'm doing this within a function that is activated when a button is pressed.我在按下按钮时激活的功能中执行此操作。 The problem appears when I'm trying to store all the data I receive in a variable like this:当我尝试将收到的所有数据存储在这样的变量中时,就会出现问题:

var tempString = '';
AudioRecord.on('data', data => {
  tempString += data;
});

For some reason when I console.log tempString after the recording is finished (using settimeout), it seems to have only stored the data of the first time it received any data.出于某种原因,当我在录制完成后(使用 settimeout)console.log tempString 时,它似乎只存储了第一次收到任何数据的数据。 Also, when I create a variable count that counts up every time data is received, it just counts up normally.此外,当我创建一个每次收到数据时都会计数的变量计数时,它只会正常计数。

When I console.log the data, it does print out all the data.当我 console.log 数据时,它打印出所有数据。 I've tried pushing to an array and listening to when a variable is changing but everything I try results in it just storing the first piece of data I receive.我尝试推送到数组并在变量发生变化时进行监听,但是我尝试的所有操作都导致它只存储我收到的第一条数据。 How do I store all the data I receive in a variable?如何将接收到的所有数据存储在变量中? Is this even possible?这甚至可能吗?

Background: Base64 Padding背景:Base64 填充

In Base64 each output character represents 6 bits of the input (2 6 = 64).Base64 中,每个输出字符代表输入的 6 位 (2 6 = 64)。 When you're encoding data, the first step is to split the bits of the input into 6-bit chunks.编码数据时,第一步是将输入的位拆分为 6 位块。 Let's use as an example the input string " hello " (encoded to binary as ASCII or UTF-8).让我们以输入字符串“ hello ”(编码为 ASCII 或 UTF-8 二进制)为例。 If we try to split its bits into 6-bit chunks, we'll realize that it doesn't divide evenly: the last chunk only has 4 bits.如果我们尝试将其位拆分为 6 位块,我们会发现它并没有均匀划分:最后一个块只有 4 位。

h         e         l         l         o
01101000  01100101  01101100  01101100  01101111 
011010 000110 010101 101100 011011 000110 1111?? 
a      G      V      s      b      G      ?     

We can pad out the input stream with 0 s fill the missing bits.我们可以用0填充输入流来填充缺失的位。

011010 000110 010101 101100 011011 000110 111100
a      G      V      s      b      G      8

This gives us "aGVsbG8" , and a quick sanity-test in JavaScript confirms that atob("aGVsbG8") === "hello" .这给了我们"aGVsbG8" ,并且 JavaScript 中的快速健全性测试确认atob("aGVsbG8") === "hello" No problem yet.还没有问题。

This works if we're decoding this chunk on its own, because we know that once we reach the end of the chunk, the remaining two bits that we haven't decoded must be padding, and can be ignored.如果我们自己解码这个块,这是可行的,因为我们知道一旦到达块的末尾,我们尚未解码的剩余两位必须填充,可以忽略。 However, if this is just part of a stream, immediately followed by more base64 data, we can't tell that we're at the end of a chunk!但是,如果这只是流的一部分,紧随其后的是更多 base64 数据,我们就无法判断是在块的末尾!

For example, let's try concatenating aGVsbG8 with itself, and decoding aGVsbG8aGVsbG8 as a single value.例如,让我们尝试将aGVsbG8与其自身连接,并将aGVsbG8aGVsbG8解码为单个值。

a      G      V      s      b      G      8      a      G      V      s      b      G      8     
011010 000110 010101 101100 011011 000110 111100 011010 000110 010101 101100 011011 000110 111100
                                              ||- padding that should be ignored
01101000 01100101 01101100 01101100 01101111  00011010 00011001 01011011 00011011 00011011 1100????
h        e        l        l        o         \x1A     \x19     [        \x1B     \x1B     ?

The two padding bits cause the decoding stream to become misaligned, and the remaining data is mangled.这两个填充位会导致解码流未对齐,并且剩余的数据被破坏。

In these cases, the standard solution is to add zero to two = padding characters after the encoded data.在这些情况下,标准解决方案是在编码数据后添加零到两个=填充字符。 Each = represents six bits of padding data.每个=代表六位填充数据。 These mark the end of an encoded value, but they also allow the alignment to be maintained between the input data and output data: with appropriate padding in the stream, every four characters chunk of encoded data can be unambiguously decoded into one to three bytes of decoded data, without separate knowledge of the data alignment.这些标记了编码值的结束,但它们也允许在输入数据和输出数据之间保持对齐:通过在流中适当的填充,每四个字符的编码数据块可以明确地解码为一到三个字节的解码数据,无需单独了解数据对齐。 Our example requires six bits of padding to maintain alignment, giving us aGVsbG8= .我们的例子需要六位填充来保持对齐,给我们aGVsbG8= If we concatenate that with itself, we can see that decoding is now successful:如果我们将其与自身连接起来,我们可以看到解码现在成功了:

a      G      V      s      b      G      8      =      a      G      V      s      b      G      8       =    
011010 000110 010101 101100 011011 000110 111100 PPPPPP 011010 000110 010101 101100 011011 000110 111100 PPPPPP
01101000 01100101 01101100 01101100 01101111  00PPPPPP 01101000 01100101 01101100 01101100 01101111  00PPPPPP
h        e        l        l        o         padding  h        e        l        l        o         padding  

Problem: Incapable Decoders问题:无能的解码器

With fully-capable encoders and decoders, your approach should work fine.使用功能齐全的编码器和解码器,您的方法应该可以正常工作。 Each chunk should include the appropriate padding, and the decoder should be able to skip over it and assemble the correct result.每个块都应该包含适当的填充,解码器应该能够跳过它并组合正确的结果。

Unfortunately, a lot of the most common base64 decoding libraries do not support this.不幸的是,很多最常见的 base64 解码库都不支持这一点。

Node's Buffer just assumes that it's getting a single encoded value, so when its see padding (possibly at the end of the first chunk) its assumes it's the end of the value, and stops decoding, discarding the rest of your data. Node 的Buffer只是假设它正在获取单个编码值,因此当它看到填充(可能在第一个块的末尾)时,它假定它是值的末尾,并停止解码,丢弃其余的数据。

> Buffer.from('aGVsbG8=', 'base64')
<Buffer 68 65 6c 6c 6f>
> Buffer.from('aGVsbG8=aGVsbG8=', 'base64')
<Buffer 68 65 6c 6c 6f>

The browser's atob throws an error instead of silently ignoring data:浏览器的atob会抛出错误而不是默默地忽略数据:

> atob("aGVsbG8=")
"hello"
> atob("aGVsbG8=aGVsbG8=")
InvalidCharacterError: String contains an invalid character

Solutions解决方案

Manually Split on Padding在填充上手动拆分

If we keep your approach of storing all of the data in a single string, we need to take responsibility for splitting on padding ourselves.如果我们保留将所有数据存储在单个字符串中的方法,我们需要自己负责拆分填充。 ( NB: Generally, repeatedly appending to strings can be problematic because it can get very slow if the JavaScript engine fails to optimize it. It may not be a problem in practice here, but it's often avoided.) 注意:通常,重复附加到字符串可能会出现问题,因为如果 JavaScript 引擎无法对其进行优化,它会变得非常慢。在实践中这可能不是问题,但通常可以避免。)

We can do this using a regular expression that matches a sequence of one or more = padding characters,我们可以使用匹配一个或多个=填充字符序列的正则表达式来做到这一点,

const input = "aGVsbG8=aGVsbG8=aGVsbG8=aGVsbG8=";
const delimiter = /=+/g;

splitting the string on that,在那分割字符串,

const pieces = input.split(delimiter);

decoding the pieces individually,单独解码这些片段,

const decodedPieces = pieces.map(piece => Buffer.from(piece, 'base64'));

and then combining their outputs in a single step (more efficient than doing it incrementally).然后在一个步骤中组合它们的输出(比逐步执行更有效)。

const decoded = Buffer.concat(decodedPieces);
console.log(decoded.toString('ascii'));
'hellohellohellohello'

Store Chunks Separately单独存储块

However, in your case it might be simpler to just store the chunks in an array from the beginning, and skip the concatenation and splitting altogether.但是,在您的情况下,从一开始就将块存储在数组中并完全跳过连接和拆分可能会更简单。

const decodedPieces = [];
AudioRecord.on('data', data => {
  decodedPieces.push(Buffer.from(data, 'base64'));
});

// later, when you need to collect the data...
const decoded = Buffer.concat(decodedPieces);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM