简体繁体 English

Git Packfile条目偏移量计算

[英]Git Packfile Entry Offset Calculations

原文 2016-02-07 19:25:10 2 1 git/ go/ zlib

I'm trying to interpret a packfile received from git-upload-pack. 我正在尝试解释从git-upload-pack接收到的packfile。 git-upload-pack doesn't send the accompanying index, because supposedly you can derive it from the original packfile, but I can't figure out how that's possible with the packfile's format. git-upload-pack不会发送伴随的索引，因为据说您可以从原始的packfile导出它，但是我无法弄清楚packfile的格式是如何实现的。

The git technical documentation says it has a variable number of bytes indicating the entry size, but this is the uncompressed size of the entry, and the entry data itself is compressed in the pack file with zlib. git技术文档说它有可变数量的字节来指示条目大小，但这是条目的未压缩大小，并且条目数据本身使用zlib在压缩文件中压缩。 Go's zlib implementation is greedy and seeks past the end of the data with the io.Reader I give it, meaning I can't trust it to leave the io.Reader pointer at the right place after decompressing the block. Go的zlib实现很贪婪，我使用io.Reader来查找数据的末尾，这意味着我不相信它在解压缩块后将io.Reader指针留在正确的位置。

My first thought was to take a bookmark before reading the compressed block from the packfile with compress/zlib, reset to the bookmark after reading, recompress the uncompressed data with the same algorithm/compression level so that I know the length of the compressed data, and then seek forward that far to get to the right offset for the next block. 我的第一个想法是在使用compress / zlib从packfile中读取压缩块之前先做一个书签，在读取后重置为书签，以相同的算法/压缩级别重新压缩未压缩的数据，以便知道压缩数据的长度，然后向前搜索，直到到达下一个块的正确偏移量。

However, the recompressed data doesn't seem to be identical to the original compressed data. 但是，重新压缩的数据似乎与原始压缩数据并不相同。 Why would the same data compressed with the same algorithm produce different results? 为什么用相同算法压缩的相同数据会产生不同的结果？ And is there a better way to calculate the indexes of entries into a git packfile? 有没有更好的方法来计算git packfile中条目的索引？

1 个解决方案

I've solved my problem in a different way: I modified compress/zlib to expose the digest from the zlib reader. 我以另一种方式解决了我的问题：我修改了compress / zlib以公开来自zlib阅读器的摘要。 After decompressing, I seek backwards in the original io.ReadSeeker to find the 4 byte digest that was used as a checksum for the compressed data so that I know where the end of the compressed data stream was. 解压缩后，我在原始io.ReadSeeker中向后搜索，以找到4字节摘要，该摘要用作压缩数据的校验和，以便我知道压缩数据流的末尾在哪里。

I still don't have an answer for why git and Go's zlib algorithm would produce different results with the same compression level, though. 不过，对于为什么git和Go的zlib算法在相同的压缩级别下会产生不同的结果，我仍然没有答案。