简体   繁体   English

从流媒体存档中即时解压缩和提取文件

[英]Decompression and extraction of files from streaming archive on the fly

I'm writing a browser plugin, similiar to Flash and Java in that it starts downloading a file (.jar or .swf) as soon as it gets displayed. 我正在编写一个类似于Flash和Java的浏览器插件,因为它一出现就开始下载文件(.jar或.swf)。 Java waits (I believe) until the entire jar files is loaded, but Flash does not. Java等待(我相信)直到加载整个jar文件,但Flash不会。 I want the same ability, but with a compressed archive file. 我想要相同的能力,但使用压缩的存档文件。 I would like to access files in the archive as soon as the bytes necessary for their decompression are downloaded. 我想在下载解压缩所需的字节后立即访问存档中的文件。

For example I'm downloading the archive into a memory buffer, and as soon as the first file is possible to decompress, I want to be able to decompress it (also to a memory buffer). 例如,我将存档下载到内存缓冲区,只要第一个文件可以解压缩,我希望能够将其解压缩(也可以解压缩到内存缓冲区)。

Are there any formats/libraries that support this? 有没有支持这种格式/库?

EDIT: If possible, I'd prefer a single file format instead of separate ones for compression and archiving, like gz/bzip2 and tar. 编辑:如果可能的话,我更喜欢单个文件格式而不是单独的文件格式用于压缩和存档,如gz / bzip2和tar。

There are 2 issues here 这里有2个问题

  1. How to write the code. 如何编写代码。

  2. What format to use. 使用什么格式。

On the file format, You can't use the .ZIP format because .ZIP puts the table of contents at the end of the file. 在文件格式上,您不能使用.ZIP格式,因为.ZIP将目录放在文件的末尾。 That means you'd have to download the entire file before you can know what's in it. 这意味着您必须先下载整个文件,然后才能知道其中的内容。 Zip has headers you can scan for but those headers are not the official list of what's in the file. Zip具有您可以扫描的标题,但这些标题不是文件中正文的正式列表。

Zip explicitly puts the table of contents at the end because it allows fast adding a files. Zip显式地将目录放在最后,因为它允许快速添加文件。

Assume you have a zip file with contains files 'a', 'b', and 'c'. 假设您有一个包含文件'a','b'和'c'的zip文件。 You want to update 'c'. 你想要更新'c'。 It's perfectly valid in zip to read the table of contents, append the new c, write a new table of contents pointing to the new 'c' but the old 'c' is still in the file. 在zip中读取目录是完全有效的,附加新的c,写一个新的目录指向新的'c',但旧的'c'仍然在文件中。 If you scan for headers you'll end up seeing the old 'c' since it's still in the file. 如果你扫描标题,你最终会看到旧的'c',因为它仍然在文件中。

This feature of appending was an explicit design goal of zip. 这个附加功能是zip的明确设计目标。 It comes from the 1980s when a zip could span multiple floppy discs. 它来自20世纪80年代,当时拉链可以跨越多张软盘。 If you needed to add a file it would suck to have to read all N discs just to re-write the entire zip file. 如果你需要添加一个文件,那么只需重新编写整个zip文件就必须阅读所有N个光盘。 So instead the format just lets you append updated files to the end which means it only needs the last disc. 因此,格式只允许您将更新的文件附加到末尾,这意味着它只需要最后一张光盘。 It just reads the old TOC, appends the new files, writes a new TOC. 它只是读取旧的TOC,附加新文件,写入新的TOC。

Gzipped tar files don't have this problem. Gzipped tar文件没有这个问题。 Tar files are stored header, file, header file, and the compression is on top of that so it's possible to decompress as the file it's downloaded and use the files as they become available. Tar文件存储在头文件,文件,头文件中,并且压缩位于其上面,因此可以将其解压缩为下载的文件,并在文件可用时使用它们。 You can create gzipped tar files easily in windows using winrar (commercial) or 7-zip (free) and on linux, osx and cygwin use the tar command. 您可以使用winrar(商业)或7-zip(免费)在Windows中轻松创建gzipped tar文件,在linux,osx和cygwin上使用tar命令。

On the code to write, 在写代码上,

O3D does this and is open source so you can look at the code http://o3d.googlecode.com O3D执行此操作并且是开源的,因此您可以查看代码http://o3d.googlecode.com

The decompression code is in o3d/import/cross/... 解压缩代码在o3d / import / cross / ...

It targets the NPAPI using some glue which can be found in o3d/plugin/cross 它使用一些胶水来定位NPAPI,这些胶水可以在o3d / plugin / cross中找到

Check out the boost::zlib filters . 查看boost :: zlib过滤器 They make using zlib a snap. 他们使用zlib快速。

Here's the sample from the boost docs that will decompress a file and write it to the console: 以下是来自boost文档的示例,它将解压缩文件并将其写入控制台:

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/zlib.hpp>

int main() 
{
    using namespace std;

    ifstream file("hello.z", ios_base::in | ios_base::binary);
    filtering_streambuf<input> in;
    in.push(zlib_decompressor());
    in.push(file);
    boost::iostreams::copy(in, cout);
}

Sure, zlib for example uses z_stream for incremental compression and decompression via functions inflateInit, inflate, deflateInit, deflate. 当然, zlib例如使用z_stream进行增量压缩和解压缩,通过函数inflateInit,inflate,deflateInit,deflate。 libzip2 has similar abilities. libzip2具有相似的能力。

For incremental extraction from the archive (as it gets deflated), look eg to the good old tar format. 对于从存档中进行增量提取(因为它被放气),请查看例如好的旧tar格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM