简体   繁体   English

bup(基于git的图像备份)如何计算存储对象的哈希

[英]How does bup (git-based image backup) computes hashes of stored objects

There is bup backup program ( https://github.com/bup/bup ) based on some ideas and some functions from git version control system for compact storage of virtual machine images. 有一个bup备份程序( https://github.com/bup/bup ),该程序基于git版本控制系统中的一些思想和一些功能,用于紧凑地存储虚拟机映像。

In bup there is bup ls subcommand, which can show some sha1-like hashes (same length of hex) of objects stored inside the backup when -s option is passed (in man bup-ls there is just " -s, --hash : show hash for each file/directory. "). bupbup ls子命令,当传递-s选项时,该子命令可以显示存储在备份中的对象的一些类似于sha1的哈希(十六进制长度相同)(在man bup-ls中只有“ -s,--hash :显示每个文件/目录的哈希。 ”)。 But the sha1-like hash is not equal to sha1sum output of original file. 但是类似sha1的哈希值不等于原始文件的sha1sum输出。

Original git computes sha1 hash of data by prefixing data with `blob NNN\\0' string, where NNN is size of object in bytes, written as decimal, according to How does git compute file hashes? 原始git通过使用blob NNN \\ 0字符串为数据加上前缀来计算数据的sha1哈希,其中NN是对象的大小(以字节为单位),用十进制表示,具体取决于git如何计算文件哈希? and https://stackoverflow.com/a/28881708/ https://stackoverflow.com/a/28881708/

I tested prefix `blob NNN\\0' and still not same sha1 sum. 我测试了前缀“ blob NNN \\ 0”,但仍然不是相同的sha1 sum。

What is the method of computing hash sum for files is used in bup? bup中使用的计算文件哈希总和的方法是什么? Is it linear sha1 or some tree-like variant like Merkle trees? 是线性sha1还是像Merkle树之类的树状变体? What is the hash of directory? 目录的哈希是什么?

The source of ls command of bup is https://github.com/bup/bup/blob/master/lib/bup/ls.py , and hash just printed in hex, but where the hash was generated? bup的ls命令的来源是https://github.com/bup/bup/blob/master/lib/bup/ls.py ,并且哈希仅以十六进制打印,但是哈希在哪里生成?

def node_info(n, name, 
    ''' ....
    if show_hash:
        result += "%s " % n.hash.encode('hex')

Is that hash generated on creating bup backup (when file is placed inside to the backup by bup index + bup save commands) and just printed out on bup ls ; 是在创建bup备份时生成的哈希值(通过bup index + bup save命令将文件放入备份中时),并在bup ls上打印出来的bup ls or is it recomputed on every bup ls and can be used as integrity test of bup backup? 还是在每个bup ls上重新计算它并可用作bup备份的完整性测试?

bup stores all data in a bare git repository (which by default is located at ~/.bup ). bup将所有数据存储在裸git仓库中(默认情况下位于~/.bup )。 Therefore bup 's hash computation method exactly replicates the one used by git . 因此, bup的哈希计算方法精确地复制了git使用的方法。

However, an important difference from git is that bup may split files into chunks. 但是,与git的一个重要区别是bup可能会将文件拆分为多个块。 If bup decides to split a file into chunks, then the file is represented in the repository as a tree rather than as a blob. 如果bup决定将文件拆分为多个块,则该文件在存储库中以树而不是Blob表示。 In that case bup 's hash of the file coincides with git 's hash of the corresponding tree. 在这种情况下,文件的bup哈希与相应树的git哈希重合。

The following script demonstrates that: 以下脚本说明了这一点:

bup_hash_test bup_hash_test

#!/bin/bash

bup init
BUPTEST=/tmp/bup_test
function test_bup_hash()
{
    bup index $BUPTEST &> /dev/null
    bup save -n buptest $BUPTEST &> /dev/null
    local buphash=$(bup ls -s buptest/latest$BUPTEST|cut -d' ' -f 1)
    echo "bup's hash: $buphash"
    echo "git's hash: $(git hash-object $BUPTEST)"
    echo git --git-dir \~/.bup cat-file -p $buphash
    git --git-dir ~/.bup cat-file -p $buphash
}

cat > $BUPTEST <<'END'
    http://pkgsrc.se/sysutils/bup
    http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/bup/
END

test_bup_hash

echo
echo

echo " -1" >> $BUPTEST

echo "After appending ' -1' line:"
test_bup_hash

echo
echo

echo "After replacing '-' with '#':"
sed -i 's/-/#/' $BUPTEST
test_bup_hash

Output: 输出:

$ ./bup_hash_test
Initialized empty Git repository in ~/.bup/
bup's hash: b52baef90c17a508115ce05680bbb91d1d7bfd8d
git's hash: b52baef90c17a508115ce05680bbb91d1d7bfd8d
git --git-dir ~/.bup cat-file -p b52baef90c17a508115ce05680bbb91d1d7bfd8d
    http://pkgsrc.se/sysutils/bup
    http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/bup/


After appending ' -1' line:
bup's hash: c95b4a1fe1956418cb0e58e0a2c519622d8ce767
git's hash: b5bc4094328634ce6e2f4c41458514bab5f5cd7e
git --git-dir ~/.bup cat-file -p c95b4a1fe1956418cb0e58e0a2c519622d8ce767
100644 blob aa7770f6a52237f29a5d10b350fe877bf4626bd6    00
100644 blob d00491fd7e5bb6fa28c517a0bb32b8b506539d4d    61


After replacing '-' with '#':
bup's hash: cda9a69f1cbe66ff44ea6530330e51528563e32a
git's hash: cda9a69f1cbe66ff44ea6530330e51528563e32a
git --git-dir ~/.bup cat-file -p cda9a69f1cbe66ff44ea6530330e51528563e32a
    http://pkgsrc.se/sysutils/bup
    http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/bup/
 #1

As we can see, when bup 's and git 's hashes match, the corresponding object in the bup repository is a blob with the expected contents. 如我们所见,当bup的哈希和git的哈希匹配时, bup存储库中的对应对象是具有所需内容的blob。 When bup 's and git 's hashes do NOT match, the object with bup 's hash is a tree. bupgit的哈希值不匹配时,带有bup哈希值的对象就是一棵树。 The contents of the blobs in that tree correspond to fragments of the full file: 该树中的Blob的内容对应于整个文件的片段:

$ git --git-dir ~/.bup cat-file -p aa7770f6a52237f29a5d10b350fe877bf4626bd6
    http://pkgsrc.se/sysutils/bup
    http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/bup/
 -$ git --git-dir ~/.bup cat-file -p d00491fd7e5bb6fa28c517a0bb32b8b506539d4d
1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM