简体   繁体   中英

How to calculate delta on file i.e. changed file portion

I want to calculate a delta on file ie I want to get only changed bits of a file the way applications like DropBox or Google Drive does.

Once the file in watched folder changes I want to know the offset of the affected bytes and the changed bytes to be sent to the file server.

I want to implement this solution on Windows platform so I am fine with C, C++ or C#.Net solution.

Update: Example: Let's assume I have a file X having size 10 MB (Binary or Text) in my local watched folder. Let's assume that I modified 1 MB. Now I want to fetch only modified bytes (1 MB) and a range in which I can apply 1 MB on a file server. Which is also called as Delta Sync feature.

There's a command in Linux/Unix called rsync which basically does what you want, and the idea of this program is that, it picks up the first chunk(of size, say, 512 bytes) of the changed file and calculates the checksum of this chunk using a weak checksum algorithm and compare it to that of the original file. If the checksums are different, then we find a chunk that has changed. And if the weak checksums are the same, it then calculates another checksum of this chunk using a strong checksum algorithm and then, again, compare it to that of the original file. If the checksums are the same, we can rest assured that this chunk has not changed. And then the program moves a byte(not a chunk, a BYTE) forward and pick up another chunk and repeat this procedure. The most important point about this algorithm rests on the weak checksum algorithm, which is called rolling checksum . This checksum algorithm allows you to calculate the checksum of (k + 1, k + 513) by that of (k, k + 512) in O(1) time. You can check out this for the details of this algorithm.

That's not what Drive or Dropbox does, when they flag a file as changed they reload the entire document. The truth is, when you save something who's to say important information is scattered at various locations in the binary file that is the document.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM