简体   繁体   中英

Multithreaded binary diff tool?

There are a lot of binary diff tools out there:

and so on. They are great, but one-threaded. Is it possible to split large files on chunks, find diff between chunks simultaneously and then merge into the final delta? Any other tools, libraries to find delta between very large files (hundreds Gb) in a reasonable amount of time and RAM? May be I could implement algorithm myself, but can not find any papers about it.

ECMerge是多线程的,能够比较大文件。

libraries to find delta between very large files (hundreds Gb) in a reasonable amount of time and RAM?

try HDiffPatch,it used in 50GB game(not test 100GB) : https://github.com/sisong/HDiffPatch
it can run fast for large file, but is not muti-thread differ;
Creating a patch: hdiffz -s-1k -c-zlib old_path new_path out_delta_file
Applying a patch: hpatchz old_path delta_file out_new_path
diff with -s-1k & input 100GB files, requires ~ 100GB*16/1k < 2GB bytes of memory; if diff with -s-128k then less time & less memory;

bsdiff can changed to muti-thread differ:

  • suffix array sort algorithm can replace by msufsort ,it's a muti-thread suffix array construction algorithm;
  • match func changed to a muti-thread version, clip new file by thread number;
  • bzip2 compresser changed to a muti-thread version,such as pbzip2 or lzma2 ...

but this way need very large of memory! (not suitable for large files)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM