简体繁体 English

解析zip文件的字节？

[英]Parse bytes of a zip file?

原文 2015-12-08 22:54:18 1 1 node.js/ parsing/ zip/ compression/ adm-zip

I am requesting a zip file from an API and I'm trying to retrieve it by bytes range (setting a Range header) and then parsing each of the parts individually. 我要从API请求一个zip文件，并且尝试按字节范围（设置Range标头）检索它，然后分别解析每个部分。 After reading some about gzip and zip compression, I'm having a hard time figuring out: 在阅读了有关gzip和zip压缩的一些知识之后，我很难弄清楚：

Can I parse a portion out of a zip file? 我可以从zip文件中解析出一部分吗？

I know that gzip files usually compresses a single file so you can decompress and parse it in parts, but what about zip files? 我知道gzip文件通常会压缩单个文件，因此您可以将其解压缩并解析为多个部分，但是zip文件呢？

I am using node-js and tried several libraries like adm-zip or zlib but it doesn't look like they allow this kind of possibility. 我正在使用node-js，并尝试了一些库，例如adm-zip或zlib，但看起来它们不允许这种可能性。

1 个解决方案

Zip files have a catalog at the end of the file (in addition to the same basic information before each item), which lists the file names and the location in the zip file of each item. 压缩文件在文件末尾具有一个目录（除了每个项目之前的基本信息之外），该目录列出了每个项目的zip文件中的文件名和位置。 Generally each item is compressed using deflate, which is the same algorithm that gzip uses (but gzip has a custom header before the deflate stream). 通常，每个项目都是使用deflate压缩的，它与gzip使用的算法相同（但是gzip在deflate流之前有一个自定义标头）。

So yes, it's entirely feasible to extract the compressed byte stream for one item in a zip file, and prepend a fabricated gzip header (IIRC 14 bytes is the minimum size of this header) to allow you to decompress just that file by passing it to gunzip. 因此，是的，提取zip文件中一项的压缩字节流，并添加一个预制的gzip标头（IIRC 14个字节是此标头的最小大小）是完全可行的，以允许您通过将该文件传递给gunzip。

If you want to write code to inflate the deflated stream yourself, I recommend you make a different plan. 如果您想编写代码自行充实放气的流，建议您制定一个不同的计划。 I've done it, and it's really not fun. 我已经做到了，而且真的很不好玩。 Use zlib if you must do it, don't try to reimplement the decompression. 如果必须使用zlib，请不要尝试重新实现解压缩。