简体   繁体   English

进程间文件交换:效率和竞争条件

[英]Inter-process file exchange: efficiency and race conditions

The story: 故事:
A few days ago I was thinking about inter-process communication based on file exchange. 几天前,我在考虑基于文件交换的进程间通信。 Say process A creates several files during its work and process B reads these files afterwards. 假设进程A在其工作期间创建了几个文件,然后进程B读取这些文件。 To ensure that all files were correctly written, it would be convenient to create a special file, which existence will signal that all operations were done. 为确保正确写入所有文件,创建一个特殊文件会很方便,该文件的存在将表明所有操作都已完成。

Simple workflow: 工作简单:
process A creates file "file1.txt" 进程A创建文件“file1.txt”
process A creates file "file2.txt" 进程A创建文件“file2.txt”
process A creates file "processA.ready" 进程A创建文件“processA.ready”

Process B is waiting until file "processA.ready" appears and then reads file1 and file2. 进程B正在等待文件“processA.ready”出现,然后读取file1和file2。

Doubts: 释疑:
File operations are performed by the operating system, specifically by the file subsystem. 文件操作由操作系统执行,特别是由文件子系统执行。 Since implementations can differ in Unix, Windows or MacOS, I'm uncertain about the reliability of file exchange inter-process communication. 由于Unix,Windows或MacOS的实现可能不同,我不确定文件交换进程间通信的可靠性。 Even if OS will guarantee this consistency, there are things like JIT compiler in Java, which can reorder program instructions. 即使操作系统能够保证这种一致性,Java中的JIT编译器也可以重新排序程序指令。

Questions: 问题:
1. Are there any real specifications on file operations in operating systems? 1.操作系统中的文件操作是否有任何实际规范?
2. Is JIT really allowed to reorder file operation program instructions for a single program thread? 2. JIT是否真的允许为单个程序线程重新排序文件操作程序指令?
3. Is file exchange still a relevant option for inter-process communication nowadays or it is unconditionally better to choose TCP/HTTP/etc? 3.文件交换现在仍然是进程间通信的相关选项,还是无条件地更好地选择TCP / HTTP /等?

  1. You don't need to know OS details in this case. 在这种情况下,您无需了解操作系统详细信息。 Java IO API is documented to guess whether file was saved or not. 记录Java IO API以猜测文件是否已保存。
  2. JVM can't reorder native calls. JVM无法重新排序本机调用。 It is not written in JMM explicitly but it is implied that it can't do it. 它不是明确地用JMM编写的,但暗示它不能这样做。 JVM can't guess what is impact of native call and reordering of those call can be quite generous. JVM无法猜测本机调用的影响是什么,并且那些调用的重新排序可能相当慷慨。
  3. There are some disadvantages of using files as a way of communication: 使用文件作为沟通方式有一些缺点:
    1. It uses IO which is slow 它使用缓慢的IO
    2. It is difficult to separate processes between different machines in case you would need it (there are ways using samba for example but is quite platform-dependant) 如果您需要它,很难在不同机器之间分离进程(例如,有使用samba的方法,但是非常依赖于平台)
  1. You could use File watcher (WatchService) in Java to receive a signal when your .ready file appears. 您可以在Java中使用File watcher(WatchService)在.ready文件出现时接收信号。

  2. Reordering could apply but it shouldn't hurt your application logic in this case - refer the following link: https://assylias.wordpress.com/2013/02/01/java-memory-model-and-reordering/ 重新排序可能适用但在这种情况下不应该损害您的应用程序逻辑 - 请参阅以下链接: https//assylias.wordpress.com/2013/02/01/java-memory-model-and-reordering/

  3. I don't know the size of your data but I feel it would still be better to use an Message Queue (MQ) solution in this case. 我不知道您的数据大小,但我觉得在这种情况下使用Message Queue(MQ)解决方案仍然会更好。 Using a File IO is a relatively slow operation which could slow down the system. 使用文件IO是一个相对较慢的操作,可能会降低系统速度。

Used file exchange based approach on one of my projects. 在我的一个项目上使用基于文件交换的方法。 It's based on renaming file extensions when a process is done so other process can retrieve it by file name expression checking. 它基于在进程完成时重命名文件扩展名,以便其他进程可以通过文件名表达式检查来检索它。

  1. FTP process downloads a file and put its name '.downloaded' FTP进程下载文件并将其名称命名为“.downloaded”
  2. Main task processor searched directory for the files '*.downloaded'. 主任务处理器在目录中搜索文件'* .downloaded'。
    Before starting, job updates file name as '.processing'. 在开始之前,作业更新文件名为“.processing”。
    When finished then updates to '.done'. 完成后,更新为“.done”。
    In case of error, it creates a new supplemantary file with '.error' extension and put last processed line and exception trace there. 如果出现错误,它会创建一个带有“.error”扩展名的新补充文件,并将最后处理的行和异常跟踪放在那里。 On retries, if this file exists then read it and resume from correct position. 在重试时,如果此文件存在,则读取它并从正确的位置继续。
  3. Locator process searches for '.done' and according to its config move to backup folder or delete 定位器进程搜索“.done”并根据其配置移动到备份文件夹或删除

This approach is working fine with a huge load in a mobile operator network. 这种方法在移动运营商网络中的巨大负载下运行良好。

Consideration point is to using unique names for files is important. 考虑点是对文件使用唯一名称很重要。 Because moving file's behaviour changes according to operating system. 因为移动文件的行为会根据操作系统而改变。
eg Windows gives error when there is same file at destination, however unix ovrwrites it. 例如,当目的地中存在相同文件时,Windows会出错,但是unix会将其删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM