简体   繁体   English

如何计算文件是否已被修改?

[英]How to work out if a file has been modified?

I'm writing a back up solution (of sorts). 我正在编写一种备份解决方案。 Simply it copies a file from location C:\\ and pastes it to location Z:\\ 只需从位置C:\\复制文件,然后将其粘贴到位置Z:\\

To ensure the speed is fast, before copying and pasting it checks to see if the original file exists. 为确保速度很快,在复制和粘贴之前,请检查原始文件是否存在。 If it does, it performs a few 'calculations' to work out if the copy should continue or if the backup file is up to date. 如果是这样,它将执行一些“计算”以计算出复制是否应该继续或备份文件是否最新。 It is these calculations I'm finding difficult. 这些计算让我感到困难。

Originally, I compared the file size but this is not good enough because it would be very possible to change a file and it to be the same size (for example saving the character C in notepad is the same size as if I saved the Character T). 最初,我比较了文件大小,但这还不够好,因为很有可能更改文件并将其设置为相同大小(例如,将字符C保存在记事本中的大小与保存字符T的大小相同) )。

So, I need to find out if the modified date differs. 因此,我需要确定修改日期是否不同。 At the moment, I get the file info using the FileInfo class but after reviewing all the fields there is nothing which appears to be suitable. 目前,我使用FileInfo类获取文件信息,但是在查看所有字段之后,似乎没有合适的文件。

How can I check to ensure that I'm copying files which have been modified? 如何检查以确保我正在复制已修改的文件?

EDIT I have seen suggestions on SO to use MD5 checksums, but I'm concerned this may be a problem as some of the files I'm comparing will be up to 10GB 编辑我已经看到关于使用MD5校验和的建议,但是我担心这可能是一个问题,因为我正在比较的某些文件将高达10GB

Going by modified date will be unreliable - the computer clock can go backwards when it synchronizes, or when manually adjusted. 修改日期的日期将不可靠-同步或手动调整时,计算机时钟可能会倒退。 Some programs might not behave well when modifying or copying files in terms of managing the modified date. 就管理修改日期而言,某些程序在修改或复制文件时可能无法正常工作。

Going by the archive bit might work in a controlled environment but what happens if another piece of software is running that uses the archive bit as well? 在控制的环境中运行存档位可能会起作用,但是如果正在运行另一个也使用存档位的软件,会发生什么情况呢?

The Windows archive bit is evil and must be stopped Windows存档位是恶意的,必须停止

If you want (almost) complete reliability then what you should do is store a hash value of the last backed up version using a good hashing function like SHA1, and if the hash value changes then you upload the new copy. 如果您想要(几乎)完全的可靠性,那么您应该使用良好的哈希函数(如SHA1)存储最后备份版本的哈希值,并且如果哈希值发生更改,则您将上载新副本。

Here is the SHA1 class along with a code sample on the bottom: 这是SHA1类以及底部的代码示例:

http://msdn.microsoft.com/en-us/library/system.security.cryptography.sha1.aspx http://msdn.microsoft.com/zh-CN/library/system.security.cryptography.sha1.aspx

Just run the file bytes through it and store the hash value. 只需运行文件字节并存储哈希值即可。 Pass a FileStream to it instead of loading your file into memory with a byte array to reduce memory usage, especially for large files. FileStream传递给它,而不是使用字节数组将文件加载到内存中以减少内存使用,特别是对于大文件。

You can combine this with modified date in various ways to tweak your program as needed for speed and reliability. 您可以通过各种方式将此日期与修改日期结合起来,以根据需要调整程序,以提高速度和可靠性。 For example, you can check modified dates for most backups and periodically run a hash checker that runs while the system is idle to make sure nothing got missed. 例如,您可以检查大多数备份的修改日期,并定期运行在系统空闲时运行的哈希检查器,以确保没有遗漏任何内容。 Sometimes the modified date will change but the file contents are still the same (ie got overwritten with the same data), in which case you can avoid resending the whole file after you recompute the hash and realize it is still the same. 有时修改日期会更改,但是文件内容仍然相同(即被相同的数据覆盖),在这种情况下,您可以避免在重新计算哈希并意识到它仍然相同之后重新发送整个文件。

Most version control systems use some kind of combined approach with hashes and modified dates. 大多数版本控制系统使用某种结合了哈希和修改日期的方法。

Your approach will generally involve some kind of risk management with a compromise between performance and reliability if you don't want to do a full backup and send all the data over each time. 如果您不想执行完整备份并每次发送所有数据,则您的方法通常会涉及某种风险管理,在性能和可靠性之间进行折衷。 It's important to do "full backups" once in a while for this reason. 因此,偶尔进行一次“完整备份”很重要。

You can compare files by their hashes: 您可以按文件的哈希值比较文件:

private byte[] GetFileHash(string fileName)
{
    HashAlgorithm sha1 = HashAlgorithm.Create();
    using(FileStream stream = new FileStream(fileName,FileMode.Open,FileAccess.Read))
      return sha1.ComputeHash(stream);
}

If content was changed, hashes will be different. 如果更改了内容,则哈希将有所不同。

You may like to check out the FileSystemWatcher class. 您可能想签出FileSystemWatcher类。

"This class lets you monitor a directory for changes and will fire an event when something is modified." “此类可让您监视目录中的更改,并在修改某些内容时触发事件。”

Your code can then handle the event and process the file. 然后,您的代码可以处理事件并处理文件。

Code source - MSDN: 代码源-MSDN:

// Create a new FileSystemWatcher and set its properties.
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = args[1];

/* Watch for changes in LastAccess and LastWrite times, and
   the renaming of files or directories. */
watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
   | NotifyFilters.FileName | NotifyFilters.DirectoryName;

// Only watch text files.
watcher.Filter = "*.txt";

// Add event handlers.
watcher.Changed += new FileSystemEventHandler(OnChanged);
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);

Generally speaking, you'd let the OS take care of tracking whether a file has changed or not. 一般来说,您应该让操作系统负责跟踪文件是否已更改。

If you use: 如果您使用:

File.GetAttributes

And check for the archive flag, this will tell you if the file has changed since it was last archived. 并检查存档标志,这将告诉您自上次存档以来文件是否已更改。 I believe XCOPY and similar reset this flag once it has done the copy, but you may need to take care of this yourself. 我相信XCOPY和类似工具在完成​​复制后会重置此标志,但是您可能需要自己进行处理。

You can easily test the flag in DOS using: 您可以使用以下方法在DOS中轻松测试该标志:

dir /aa yourfilename

Or just add the attributes column in windows explorer. 或者只是在Windows资源管理器中添加属性列。

The file archive flag is normally used by backup programs to check whether a file needs backing up. 备份程序通常使用文件存档标志来检查文件是否需要备份。 When Windows modifies or creates a file, it sets the archive flag (see here ). Windows修改或创建文件时,它将设置存档标志(请参阅此处 )。 Check whether the archive flag is set to decide whether the file needs backing up: 检查是否设置了存档标志,以决定是否需要备份文件:

if ((File.GetAttributes(fileName) & FileAttributes.Archive) == FileAttributes.Archive)
{
    // Archive file.
}

After backing up the file, clear the archive flag: 备份文件后,清除存档标志:

File.SetAttributes(fileName, File.GetAttributes(fileName) & ~FileAttributes.Archive);

This assumes no other programs (eg, system backup software) are clearing the archive flag. 假设没有其他程序(例如系统备份软件)正在清除存档标志。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM