简体   繁体   English

流文件增量编码/解码

[英]Streaming File Delta Encoding/Decoding

Here's the problem - I want to generate the delta of a binary file (> 1 MB in size) on a server and send the delta to a memory-constrained (low on RAM and no dynamic memory) embedded device over HTTP. 这是问题 - 我想在服务器上生成二进制文件的增量(大小> 1 MB),并通过HTTP将增量发送到内存受限(低RAM和无动态内存)嵌入式设备。 Deltas are preferred (as opposed to sending the full binary file from the server) because of the high cost involved in transmitting data over the wire. Deltas是首选(与从服务器发送完整的二进制文件相反),因为通过线路传输数据涉及高成本。

Trouble is, the embedded device cannot decode deltas and create the contents of the new file in memory. 麻烦的是,嵌入式设备无法解码增量并在内存中创建新文件的内容。 I have looked into various binary delta encoding/decoding algorithms like bsdiff, VCDiff etc. but was unable to find libraries that supported streaming. 我已经研究了各种二进制增量编码/解码算法,如bsdiff,VCDiff等,但无法找到支持流媒体的库。

Perhaps, rather than asking if there are suitable libraries out there, are there alternate approaches I can take that will still solve the original problem (send minimal data over the wire)? 也许,而不是询问是否有合适的库,我可以采取的其他方法仍然可以解决原始问题(通过线路发送最少的数据)? Although it would certainly help if there are suitable delta libraries out there that support streaming decode (written in C or C++ without using dynamic memory). 虽然如果有合适的delta库支持流解码(用C或C ++编写而不使用动态内存),它肯定会有所帮助。

Maintain a copy on the server of the current file as held by the embedded device. 在嵌入式设备保留的当前文件的服务器上维护一份副本。 When you want to send an update, XOR the new version of the file with the old version and compress the resultant stream with any sensible compressor. 如果要发送更新,请使用旧版本对文件的新版本进行XOR,并使用任何合理的压缩器压缩生成的流。 (Algorithms which allow high-cost encoding to allow low-cost decoding would be particularly helpful here.) Send the compressed stream to the embedded device, which reads the stream, decompresses it on the fly and XORs directly (a copy of) the target file. (允许高成本编码以允许低成本解码的算法在这里特别有用。)将压缩流发送到嵌入式设备,嵌入式设备读取流,对其进行即时解压缩,并直接对目标进行XOR(副本)文件。

If your updates are such that the file content changes little over time and retains a fixed structure, the XOR stream will be predominantly zeroes, and will compress extremely well: number of bytes transmitted will be small, effort to decompress will be low, memory requirements on the embedded device will be minimal. 如果您的更新使得文件内容随时间变化很小并且保留固定结构,则XOR流将主要为零,并且将非常好地压缩:传输的字节数将很小,解压缩的工作量将会很低,内存要求在嵌入式设备上将是最小的。 The further your model is from these assumptions, the less this approach will gain you. 你的模型离这些假设越远,这种方法就越少。

Since you said the delta could be arbitrarily random (from zero delta to a completely different file), compression of the delta may be a lost cause. 既然你说delta可以是任意随机的(从零delta到完全不同的文件),delta的压缩可能是一个失败的原因。 Lossless compression of random binary data is theoretically impossible. 无理压缩随机二进制数据在理论上是不可能的。 Also, since the embedded device has limited memory anyway, using a sophisticated -and therefore computationally expensive- library for compression/decompression of the occasional "simple" delta will probably be infeasible. 此外,由于嵌入式设备无论如何都具有有限的存储器,因此使用复杂的 - 因此计算上昂贵的 - 用于偶尔“简单”delta的压缩/解压缩的库可能是不可行的。

I would recommend simply sending the new file to the device in raw byte format, and overwriting the existing old file. 我建议简单地以原始字节格式将新文件发送到设备,并覆盖现有的旧文件。

As Kevin mentioned, compressing random data should not be your goal. 正如Kevin所说,压缩随机数据不应该是你的目标。 A few more comments about the type of data your working with would be helpful. 关于您使用的数据类型的一些评论会有所帮助。 Context is key in compression. 上下文是压缩的关键。

You used the term image which makes it sound like the classic video codec challenge. 你使用了术语image,这听起来像是经典的视频编解码器挑战。 If you've ever seen weird video aliasing effects that impact the portion of the frame that has changed, and then suddenly everything clears up. 如果您曾经看到过奇怪的视频锯齿效果会影响已更改的帧部分,然后突然一切都会清除。 You've likely witnessed the notion of a key frame along with a series of delta frames. 您可能已经目睹了关键帧的概念以及一系列增量帧。 Where the delta frames were not properly applied. 三角形框架没有正确应用的地方。

In this model, the server decides what's cheaper: 在这个模型中,服务器决定什么更便宜:

  • complete key frame 完整的关键帧
  • delta commands delta命令

The delta commands are communicated as a series of write instructions that can overlay the clients existing buffer. 增量命令作为一系列写指令进行通信,这些指令可以覆盖客户端现有的缓冲区。

Example Format: 示例格式:

  • [Address][Length][Repeat][Delta Payload] [地址] [长度] [重复] [Delta Payload]
  • [Address][Length][Repeat][Delta Payload] [地址] [长度] [重复] [Delta Payload]
  • [Address][Length][Repeat][Delta Payload] [地址] [长度] [重复] [Delta Payload]

There are likely a variety of methods for computing these delta commands. 可能有多种方法用于计算这些delta命令。 A brute force method would be: 蛮力方法是:

  • Perform Smith Waterman between two images. 在两张图像之间执行Smith Waterman。
  • Compress the resulting transform into delta commands. 将生成的转换压缩为delta命令。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM