简体   繁体   English

如何确保文件已成功写入?

[英]How to be sure a file has been successfully written?

I'm adding autosave functionality to a graphics application in Java. 我正在为Java中的图形应用程序添加自动保存功能。 The application periodically autosaves the current document and also autosaves on exit. 应用程序定期自动保存当前文档,并在退出时自动保存。 When the user starts the application, the autosave file is reloaded. 当用户启动应用程序时,将重新加载自动保存文件。

If the autosave file is corrupted in any way (I assume a power cut when the file is in the middle of being saved would do this?), the user will lose their work. 如果自动保存文件以任何方式损坏(我假设当文件处于保存状态时断电会这样做吗?),用户将失去工作。 How can I prevent such situations and do all I can to guarantee that the autosave document is in a consistent state? 如何防止这种情况并尽我所能保证自动保存文档处于一致状态?

To further complicate matters, to autosave the document I need to save one .xml file and several .png files. 更复杂的是,要自动保存文档,我需要保存一个.xml文件和几个.png文件。 Also, the .png saving occurs in C code over JNI. 此外,.png保存发生在JNI的C代码中。

My current strategy is to write each .png with the extension .png.tmp, write the .xml file with the extension .xml.tmp, and then rename each file to remove the .tmp part leaving the .xml until last. 我当前的策略是使用扩展名.png.tmp编写每个.png,编写扩展名为.xml.tmp的.xml文件,然后重命名每个文件以删除.tmp部分,直到最后一个.xml。 On startup, I only load the autosave document if I can find a .xml file and ignore .xml.tmp files. 在启动时,我只加载自动保存文件,如果我能找到.xml文件并忽略.xml.tmp文件。 I also don't delete the previous autosave document until the .xml.tmp file for the new document is renamed. 在重命名新文档的.xml.tmp文件之前,我也不会删除以前的自动保存文档。

I guess my knowledge of what happens when you write to disk is poor. 我想我对写入磁盘时会发生什么的了解很少。 I know you can have software read/write buffers when using files, as well as OS and hardware buffers and that all of these need to be flushed. 我知道在使用文件时,您可以拥有软件读/写缓冲区,以及操作系统和硬件缓冲区,并且所有这些都需要刷新。 I'm confused how I can know for sure when something really has been written to disk and what I can do to protect myself. 我很困惑如何确切地知道什么东西真的被写入磁盘以及我可以做些什么来保护自己。 Does the renaming operation do anything to make sure buffers are flushed? 重命名操作是否会执行任何操作以确保刷新缓冲区?

If the autosave file is corrupted in any way (I assume a power cut when the file is in the middle of being saved would do this?), the user will lose their work. 如果自动保存文件以任何方式损坏(我假设当文件处于保存状态时断电会这样做吗?),用户将失去工作。 How can I prevent such situations and do all I can to guarantee that the autosave document is in a consistent state? 如何防止这种情况并尽我所能保证自动保存文档处于一致状态?

To prevent loss of data due to partially written autosave file, don't overwrite the autosave file. 为防止因部分写入的自动保存文件而导致数据丢失,请勿覆盖自动保存文件。 Instead, write to a new file each time, and then rename it once the file has been safely written. 而是每次都写入一个新文件,然后在文件安全写入后重命名。

To guard against not noticing that an autosave file has not been correctly written: 为了防止没有注意到自动保存文件未正确写入:

  1. Pay attention to the exceptions thrown as the autosave file is written and closed in case a disc error, file system full, etc. 注意在写入和关闭自动保存文件时抛出的异常,以防光盘错误,文件系统已满等。
  2. Keep a running checksum of the file as it is written and write it at the end of the file. 在写入文件时保持文件的运行校验和,并将其写在文件末尾。 Then when you load the autosave file, check that the checksum is there and is correct. 然后,当您加载自动保存文件时,检查校验和是否存在且是否正确。

If the checkpointed state involves multiple files, make sure that you write the files in a well known order (without overwriting!), and write the checksum on the autosave file after all of the other files have been safely closed. 如果检查点状态涉及多个文件,请确保以众所周知的顺序编写文件(不覆盖!),并在安全关闭所有其他文件后在自动保存文件上写入校验和。 You might want to create a directory for each checkpoint. 您可能希望为每个检查点创建一个目录。

FOLLOW UP 跟进

No. I'm not saying that rename always succeeds. 不,我不是说重命名总是成功的。 However, it is atomic - it either succeeds (and completes) or the file system is not changed. 但是,它原子的 - 它要么成功(并完成),要么文件系统不会更改。 So, if you do this: 所以,如果你这样做:

  1. write "file.new" and close, 写“file.new”并关闭,
  2. delete "file", 删除文件”,
  3. rename "file.new" to "file" 将“file.new”重命名为“file”

then provided the first step succeeds you are guaranteed to have the latest "file" safely on disc. 然后提供第一步成功,你保证在光盘上安全地拥有最新的“文件”。 And it is simple to add a couple of steps so that you have a backup of "file" at all times. 添加几个步骤很简单,这样您就可以随时备份“文件”。 (If the 3rd step fails, you are left with "file.new" and no "file". This can be recovered manually, or automatically by the application next time you run it.) (如果第3步失败,则会留下“file.new”而没有“文件”。这可以手动恢复,也可以在下次运行时由应用程序自动恢复。)

Also, I'm not saying that writes always succeed, or that applications don't crash, or that the power never goes off. 此外,我并不是说写入总是成功,或者应用程序不会崩溃,或者说电源永远不会消失。 And the point of the checksum is to allow you to detect the cases where these things have happened and the autosave file is incomplete. 校验和的要点是允许您检测发生这些事情并且自动保存文件不完整的情况。

Finally, it is a good idea to have two autosaves in case your application gets itself into a state where its data structures are messed up and the last autosave is nonsensical as a result. 最后,如果您的应用程序进入一个状态,其数据结构混乱并且最后一次自动保存结果是无意义的,那么最好有两个自动保存。 (The checksum won't protect against this.) Be cautious about autosaving when the application crashes for the same reason. (校验和不会防止这种情况。)当应用程序因同样的原因崩溃时要小心自动保存。

As an aside, since you have several different files as part of this one document, consider using either a project directory to hold them all together, or using some encapsulation format (like .zip) to put them all inside one file. 顺便说一句,由于您有几个不同的文件作为这一个文档的一部分,考虑使用项目目录将它们全部保存在一起,或者使用一些封装格式(如.zip)将它们全部放在一个文件中。

What you want to do is atomically replace the old backup files with new ones. 你想要做的是用新的备份文件原子地替换旧的备份文件。 Unfortunately, I don't believe that Java gives you enough control do this directly. 不幸的是,我不相信Java会给你足够的控制权。 You also need to reason about what operations are atomic in the underlying operating system. 您还需要推断底层操作系统中哪些操作是原子操作。 I know Linux file systems, so my answer will be biased towards a Java program running on that system. 我知道Linux文件系统,所以我的答案将偏向于在该系统上运行的Java程序。 I would be shocked if Windows didn't do the same thing, but I can't say for certain. 如果Windows没有做同样的事情我会感到震惊,但我不能肯定地说。

Most Linux file systems (eg the meta-data journaled ones) let you rename files atomically. 大多数Linux文件系统(例如元数据日志系统)允许您以原子方式重命名文件。 If the system crashes half-way through a rename, when you restart, it will be as if you never renamed a file in the first place. 如果系统在重命名中途崩溃,那么当您重新启动时,就好像您从未重新命名过一个文件。 For this reason, a common way to atomically update an existing file F is to write your new data to a temporary file T and then rename T to F. Any system or application crash up to that rename will not affect F, so it will always be consistent. 因此,原子更新现有文件F的常用方法是将新数据写入临时文件T,然后将T重命名为F.任何系统或应用程序崩溃到该重命名都不会影响F,因此它将始终始终如一。

Of course, before you rename, you need to make sure that your temporary file is consistent. 当然,在重命名之前,您需要确保临时文件是一致的。 Make sure that all streaming buffers for the file are flushed to the OS ( Channel.force() or OutputStream.flush() ) and the OS buffers are flushed to the disk ( FileOutputStream.getFD.sync() ). 确保将文件的所有流缓冲区刷新到OS( Channel.force()OutputStream.flush() ),并将OS缓冲区刷新到磁盘( FileOutputStream.getFD.sync() )。 Of course, unless your OS disables the write cache on the hard disk itself (it probably hasn't), there's still a chance that your data can be corrupted. 当然,除非您的操作系统禁用硬盘本身的写缓存(可能没有),否则您的数据仍有可能被破坏。 Add a checksum to the XML if you really want to be really sure. 如果您真的想要确定,请在XML中添加校验和。 If you're truly paranoid, you should flush the OS and hard disk buffer caches and re-read the file to verify that it is consistent. 如果你真的是偏执狂,你应该刷新操作系统和硬盘缓冲区缓存并重新读取文件以验证它是否一致。 This is beyond any reasonable expectation for normal consumer applications. 这对于正常的消费者应用来说超出了任何合理的期望。

But that's just to atomically write write a single file. 但这只是以原子方式写入单个文件。 Your propblem is more complex: you have many files to update atomically. 您的问题更复杂:您有许多文件可以自动更新。 For example, I'll say that you have two files, img.png and main.xml . 例如,我会说你有两个文件, img.pngmain.xml I'd do one of these: 我会做其中一个:

  1. The easy solution is to make a per-savefile directory. 简单的解决方案是创建一个per-savefile目录。 You wouldn't need to worry about renaming each individual file, and you could still atomically rename the new backup dir over the old backup dir you're replacing. 您不必担心重命名每个单独的文件,并且您仍然可以原子地将新备份目录重命名为您要替换的旧备份目录。 That is, if your old backup is bak/img.png and bak/main.xml , write bak.tmp/img.png and bak.tmp/main.xml and rename bak.tmp to bak . 也就是说,如果您的旧备份是bak / img.pngbak / main.xml ,请编写bak.tmp / img.pngbak.tmp / main.xml并将bak.tmp重命名为bak
  2. Name the new auxiliary files something else and let them coexist with the old ones for a little while. 将新的辅助文件命名为其他内容,并让它们与旧的辅助文件共存一段时间。 That is, write img.2.png and main.xml.tmp (which should refer to img.2.png , not img.png ) and only rename main.xml.tmp to main.xml . 也就是说,编写img.2.pngmain.xml.tmp (应该引用img.2.png ,而不是img.png )并且只将main.xml.tmp重命名为main.xml Then delete img.png . 然后删除img.png
  3. addition: If you don't have atomic renames, the next best thing extends on #2. 另外:如果你没有原子重命名,那么下一个最好的东西就是#2。 Whenever you save the project, give it a new name (eg ver342.xml ). 每当您保存项目时,请为其指定一个新名称(例如ver342.xml )。 When you load, just find the most recent XML that is consistent (ie its checksum verifies). 加载时,只需查找最新的一致 XML(即其校验和验证)。 Keep around 2 or 3 to be safe. 保持2或3左右是安全的。 Only delete an auto-save if you have successfully restored from a more-recent copy. 如果已从较新的副本成功还原,则仅删除自动保存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM