简体繁体 English

POSIX / UNIX：如何可靠地关闭文件描述符

[英]POSIX/UNIX: How to reliably close a file descriptor

原文 2016-10-26 08:41:54 4 3 c++/ multithreading/ unix/ freebsd

Problem: 问题：

After a close() syscall that fails with EINTR or EIO it is unspecified whether the file has been closed. 在使用EINTR或EIO失败的close（）系统调用之后，未指定文件是否已关闭。 ( http://pubs.opengroup.org/onlinepubs/9699919799/ ) In multi-threaded applications, retrying the close may close unrelated files opened by other threads. （ http://pubs.opengroup.org/onlinepubs/9699919799/ ）在多线程应用程序中，重试关闭可能会关闭其他线程打开的不相关文件。 Not retrying the close may result in unusable open file descriptors piling up. 不重试关闭可能导致无法打开的文件描述符堆积。 A clean solution might involve invoking fstat() on the freshly closed file descriptor and a quite complex locking mechanism. 一个干净的解决方案可能涉及在新近关闭的文件描述符上调用fstat（）和一个非常复杂的锁定机制。 Also, serializing all open/close/accept/... invocations with a single mutex may be an option. 此外，使用单个互斥锁序列化所有打开/关闭/接受/ ...调用可能是一种选择。

These solutions do not take into account that library functions may open and close files on their own in an uncontrollable way, eg, some implementations of std::thread::hardware_concurrency() open files in the /proc filesystem. 这些解决方案没有考虑库函数可能以不可控制的方式打开和关闭文件，例如，/ proc文件系统中的std :: thread :: hardware_concurrency（）的一些实现打开文件。

File Streams as in the [file.streams] C++ standard section are not an option. 文件流在[file.streams] C ++标准部分中不是一个选项。

Is there a simple and reliable mechanism to close files in the presence of multiple threads? 是否有一个简单而可靠的机制来在存在多个线程的情况下关闭文件？

edits: 编辑：

Regular Files: While most of the time there will be no unusable open file descriptors accumulating, two conditions might trigger the problem: 1. Signals emitted at high frequency by some malware 2. Network file systems that lose connection before caches are flushed. 常规文件：虽然大多数情况下不会有不可用的打开文件描述符累积，但有两个条件可能会触发问题：1。某些恶意软件以高频率发出的信号2.刷新缓存之前失去连接的网络文件系统。

Sockets: According to Stevens/Fenner/Rudoff, if the socket option SO_LINGER is set on a file descriptor referring to a connected socket, and during a close(), the timer elapses before the FIN-ACK shutdown sequence completes, close() fails as part of the common procedure. 套接字：根据Stevens / Fenner / Rudoff的说法，如果套接字选项SO_LINGER设置在引用连接套接字的文件描述符上，并且在close（）期间，定时器在FIN-ACK关闭序列完成之前经过，则close（）失败作为共同程序的一部分。 Linux does not show this behavior, however, FreeBSD does, and also sets errno to EAGAIN. Linux没有显示这种行为，但FreeBSD会这样做，并且还将errno设置为EAGAIN。 As I understand it, in this case, it is unspecified whether the file descriptor is invalidated. 据我所知，在这种情况下，未指定文件描述符是否无效。 C++ code to test the behavior: http://www.longhaulmail.de/misc/close.txt The test code output there looks like a race condition in FreeBSD to me, if it's not, why not? 用于测试行为的C ++代码： http ： //www.longhaulmail.de/misc/close.txt那里的测试代码输出看起来像FreeBSD中的竞争条件，如果不是，为什么不呢？

One might consider bocking signals during calls to close(). 人们可能会在调用close（）期间考虑信号。

3 个解决方案

This issue has been fixed in POSIX for the next issue; 此问题已在POSIX中修复以用于下一期; unfortunately it's too big a change to have made it into the recent TC2. 不幸的是，进入最近的TC2变化太大了。 See the final accepted text for Austin Group Issue #529 . 请参阅Austin Group Issue＃529 的最终接受文本。

There's no practical solution for this problem as POSIX doesn't address this at all. 对于这个问题没有实际的解决方案，因为POSIX根本没有解决这个问题。

Not retrying the close may result in unusable open file descriptors piling up. 不重试关闭可能导致无法打开的文件描述符堆积。

As much as it sounds like legitimate concern, I have never seen this happen due to failed close() calls. 尽管听起来像是合法的关注，但我从未见过这种情况，因为close()调用失败。

A clean solution might involve invoking fstat() on the freshly closed file descriptor and a quite complex locking mechanism. 一个干净的解决方案可能涉及在新近关闭的文件描述符上调用fstat()和一个非常复杂的锁定机制。

Not really. 并不是的。 When close() failed, the state of the file descriptor is unspecified . 当close()失败时， 未指定文件描述符的状态。 So, you can't reliably use it a fstat() call. 因此，您无法可靠地将其用于fstat()调用。 Because the file descriptor might have been closed already. 因为文件描述符可能已经关闭了。 In that case, you are passing an invalid file descriptor to fstat() . 在这种情况下，您将无效的文件描述符传递给fstat() 。 Or another thread might have reused it. 或者另一个线程可能已经重用它。 In that case, you are passing the wrong file descriptor to fstat() . 在这种情况下，您将错误的文件描述符传递给fstat() 。 Or the file descriptor might have been corrupted by the failed close() call. 或者文件描述符可能已被失败的close()调用破坏。

When process exits, all the open descriptors will be flushed and closed anyway. 当进程退出时，无论如何都将刷新并关闭所有打开的描述符。 So, this isn't much of a practical concern. 所以，这不是一个实际问题。 One could argue that this would be a problem in a long running process in which close() fails too often. 有人可能会说，在长时间运行的过程中，这将是一个问题，其中close()经常失败。 But I have seen this happen in my experience and POSIX doesn't provide any alternative either. 但我已经看到这种情况发生在我的经验中，POSIX也没有提供任何替代方案。

Basically, you can't do much about this except report that the problem occurred. 基本上，除了报告问题发生之外，您无法做很多事情。

To mitigate any issues, explicitly sync the file: 要缓解任何问题，请明确同步文件：

(If you are operating on FILE* , first call fflush() on it to make sure user space buffers are emptied to kernel.) （如果您正在使用FILE* ，首先在其上调用fflush()以确保用户空间缓冲区已清空到内核。）
Call fsync() on the file descriptor, to flush any kernel data and metadata about the file to disk. 在文件描述符上调用fsync() ，将有关该文件的任何内核数据和元数据刷新到磁盘。

These you can retry on error without extra worries. 这些你可以在没有额外担心的情况下重试错误。 After that, possibly leaking file descriptors or handles on interrupted close on some OSes is probably a minor issue, especially if you check the behavior for OSes which are important to you (I suspect there's no problem in most relevant OSes). 之后，在某些操作系统上可能泄漏文件描述符或处理中断关闭的句柄可能是一个小问题，特别是如果你检查对你很重要的操作系统的行为（我怀疑大多数相关操作系统没有问题）。

Also, once the file and data are flushed, the chances of getting interrupted during close is much smaller, as then close should not actually touch disk. 此外，一旦文件和数据被刷新，在关闭期间被中断的可能性要小得多，因为关闭不应该实际接触磁盘。 If you do get EIO or EINTR anyway, just (optionally) log it and ignore it, because doing anything else probably does more harm than good. 如果你确实得到EIO或EINTR，只需（可选）记录并忽略它，因为做其他事情可能弊大于利。 It's not a perfect world. 这不是一个完美的世界。