简体繁体 English

h5py：在关闭（）文件之前是否需要flush（）？

[英]h5py: Do I need to flush() before I close() a file?

原文 2018-02-09 18:41:10 6 1 python/ h5py

The title contains the question: In the Python HDF5 library h5py , do I need to flush() a file before I close() it? 标题包含以下问题：在Python HDF5库h5py ，我需要在close()之前flush()文件吗？

Or does closing the file already make sure that any data that might still be in the buffers will be written to disk? 或者关闭文件是否已确保可能仍在缓冲区中的任何数据都将写入磁盘？

What exactly is the point of flushing? 什么是冲洗点？ When would flushing be necessary? 什么时候需要冲洗？

1 个解决方案

No, you do not need to flush the file before closing. 不，您不需要在关闭之前刷新文件。 Flushing is done automatically by the underlying HDF5 C library when you close the file. 关闭文件时，底层HDF5 C库会自动刷新。

As to the point of flushing. 至于冲洗点。 File I/O is slow compared to things like memory or cache access. 与内存或缓存访问相比，文件I / O很慢。 If programs had to wait before data was actually on the disk each time a write was performed, that would slow things down a lot. 如果程序必须等到每次执行写入时数据实际上都在磁盘上 ，那么这将减慢很多事情。 So the actual writing to disk is buffered by at least the OS, but in many cases by the I/O library being used (eg, the C standard I/O library). 因此，实际写入磁盘至少由OS缓冲，但在许多情况下由所使用的I / O库（例如，C标准I / O库）缓冲。 When you ask to write data to a file, it usually just means that the OS has copied your data to its own internal buffer, and will actually put it on the disk when it's convenient to do so. 当您要求将数据写入文件时，通常只是意味着操作系统已将数据复制到其自己的内部缓冲区，并且在方便时将其实际放在磁盘上。

Flushing overrides this buffering, at whatever level the call is made. 无论呼叫是什么级别，刷新都会覆盖此缓冲。 So calling h5py.File.flush() will flush the HDF5 library buffers, but not necessarily the OS buffers. 所以调用h5py.File.flush()将刷新HDF5库缓冲区，但不一定是OS缓冲区。 The point of this is to give the program some control over when data actually leaves a buffer. 这样做的目的是让程序控制数据何时实际离开缓冲区。

For example, writing to the standard output is usually line-buffered. 例如，写入标准输出通常是行缓冲的。 But if you really want to see the output before a newline, you can call fflush(stdout) . 但是如果你真的想在换行前看到输出，你可以调用fflush(stdout) 。 This might make sense if you are piping the standard output of one process into another: that downstream process can start consuming the input right away, without waiting for the OS to decide it's a good time. 如果您将一个进程的标准输出传递给另一个进程，这可能是有意义的：下游进程可以立即开始消耗输入，而无需等待操作系统确定它是一个好时机。

Another good example is making a call to fork(2) . 另一个很好的例子是调用fork(2) 。 This usually copies the entire address space of a process, which means the I/O buffers as well. 这通常会复制进程的整个地址空间，这也意味着I / O缓冲区。 That may result in duplicated output, unnecessary copying, etc. Flushing a stream guarantees that the buffer is empty before forking. 这可能导致重复输出，不必要的复制等。刷新流可确保在分叉之前缓冲区为空。