简体   繁体   English

Java FileOutputStream连续关闭需要很长时间

[英]Java FileOutputStream consecutive close takes a long time

I'm facing a little weird situation. 我面临一点奇怪的情况。

I'm copying from FileInputStream to FileOutputStream a file that is sized around 500MB. 我正在从FileInputStream复制到FileOutputStream一个大小约为500MB的文件。 It goes pretty well (takes around 500ms). 它很顺利(需要大约500毫秒)。 When I close this FileOutputStream the FIRST time, it takes about 1ms. 当我第一次关闭这个FileOutputStream时,大约需要1ms。

But here comes the catch, when I run this again, every consecutive close takes around 1500-2000ms! 但是接下来,当我再次运行时,每次连续关闭大约需要1500-2000毫秒! The duration is dropped back to 1ms when I delete this file. 删除此文件时,持续时间将减少到1毫秒。

Is there some essential java.io knowledge I'm missing? 我缺少一些必要的java.io知识吗?

It seems to be related to OS. 它似乎与操作系统有关。 I'm running on ArchLinux (the same code run on Windows 7 have all the times under 20ms). 我在ArchLinux上运行(在Windows 7上运行的相同代码一直在20ms以下)。 Note that it doesn't matter if it runs in OpenJDK or Oracle's JDK. 请注意,它是否在OpenJDK或Oracle的JDK中运行并不重要。 Hard drive is a solid state drive with ext4 file-system. 硬盘驱动器是带有ext4文件系统的固态驱动器。

Here is my testing code: 这是我的测试代码:

public void copyMultipleTimes() throws IOException {
    copy();
    copy();
    copy();
    new File("/home/d1x/temp/500mb.out").delete();
    copy();
    copy();
    // Runtime.getRuntime().exec("sync") => same results
    // Thread.sleep(30000) => same results
    // combination of sync & sleep => same results
    copy();
}

private void copy() throws IOException {
    FileInputStream fis = new FileInputStream("/home/d1x/temp/500mb.in");
    FileOutputStream fos = new FileOutputStream("/home/d1x/temp/500mb.out");
    IOUtils.copy(fis, fos); // copyLarge => same results
    // copying takes always the same amount of time, only close "enlarges"

    fis.close(); // input stream close this is always fast
    // fos.flush(); // has no effect 
    // fos.getFD().sync(); // Solves the problem but takes ~2.5s

    long start = System.currentTimeMillis();
    fos.close();
    System.out.println("OutputStream close took " + (System.currentTimeMillis() - start) + "ms");
}

The output is then: 输出是:

OutputStream close took 0ms
OutputStream close took 1951ms
OutputStream close took 1934ms
OutputStream close took 1ms
OutputStream close took 1592ms
OutputStream close took 1727ms

@Duncan proposed the following explanation: @Duncan提出以下解释:

The first call to close() returns quickly, yet the OS is still flushing data to disk. 第一次调用close()会很快返回,但操作系统仍在将数据刷新到磁盘。 The subsequent calls to close() can't complete until the previous flushing is complete. 在上一次刷新完成之前,后续的close()调用无法完成。

I think this is close to the mark, but not exactly correct. 我认为这是接近标志,但不完全正确。

I think that what is actually going on here is that the first copy is filling up the operating system's file buffer cache with large numbers of dirty pages. 我认为这里实际发生的是第一个副本正在填满操作系统的文件缓冲区缓存,其中包含大量脏页。 The internal daemon that flushes the dirty pages to discs may start working on them, but it is still going when you start the second copy. 将脏页面刷新到光盘的内部守护程序可能会开始处理它们,但是当您启动第二个副本时它仍然会运行。

When you do the second copy, the OS tries to acquire buffer cache pages for reading and writing. 执行第二次复制时,操作系统会尝试获取用于读取和写入的缓冲区高速缓存页面。 But since the buffer cache is full of dirty pages the read and write calls are repeatedly blocked, waiting for free pages to become available. 但由于缓冲区缓存中充满了脏页,因此会反复阻止读写调用,等待空闲页面可用。 But before a dirty page can be recycled, the data in the page needs to be written to disc. 但是在可以回收脏页之前,需要将页面中的数据写入光盘。 The net result is that the copy slows down to the effective data write rate. 最终结果是复制速度降低到有效数据写入速率。


A 30 second pause may not be sufficient to complete flushing the dirty pages to disc. 暂停30秒可能不足以完成将脏页刷新到光盘。

One thing you could try is to do an fsync(fd) or fdatasync(fd) between the copies. 您可以尝试的一件事是在副本之间执行fsync(fd)fdatasync(fd) In Java, the way to do that is to call FileDescriptor.sync() . 在Java中,这样做的方法是调用FileDescriptor.sync()

Now, I can't say if this is going to improve total copy throughput, but I'd expect a sync operation to be better at writing out (just) one file than relying on the page eviction algorithm to do it. 现在,我不能说这是否会提高总复制吞吐量,但我希望sync操作能够更好地写出(仅)一个文件而不是依赖于页面逐出算法来完成它。

You seem on to something interesting. 你似乎有点兴趣。 Under Linux someone is allowed to be holding a file handle to the original file, when you open it, actually deleting the directory entry and starting afresh. 在Linux下,允许某人持有原始文件的文件句柄,当你打开它时,实际上删除目录条目并重新开始。 This does not bother the original file (handle). 这不会打扰原始文件(句柄)。 On closing than, maybe some disk directory work happens. 在结束时,可能会发生一些磁盘目录工作。

Test it with IOUtils.copyLarge and Files.copy: 使用IOUtils.copyLarge和Files.copy测试它:

Path target = Paths.get("/home/d1x/temp/500mb.out");
Files.copy(fis, target, StandardCopyOption.REPLACE_EXISTING);

(I once saw a IOUtils.copy that just called copyLarge, but Files.copy should act nice.) (我曾经看过一个名为copyLarge的IOUtils.copy,但Files.copy应该表现得很好。)

Note that this question was asked because I was curious why this is happening, it was not meant to be measurement of copy throughput. 请注意,这个问题被问到,因为我很好奇为什么会发生这种情况,它并不意味着要测量复制吞吐量。

To summarize: 总结一下:

As EJP noted, the whole thing is not connected to Java . 正如EJP所指出的那样,整个事情与Java无关 The result is the same if multiple consecutive cp commands are run in bash script. 如果在bash脚本中运行多个连续的cp命令,结果是相同的。

The best answer why is this happening is Stephen 's one - fsync between copy calls removes the issue (but fsync itself takes ~2.5s). 最好的答案为什么会发生这种情况是斯蒂芬复制调用之间的一个fsync消除了问题 (但fsync本身需要大约2.5秒)。

The best way to solve this is to do it as Files.copy(I, o, REPLACE_EXISTING) (as in Joop 's answer) => First check if target file exists and if so delete it (instead of "overwriting"). 解决这个问题的最好方法是将其作为Files.copy(I, o, REPLACE_EXISTING) (如在Joop的回答中)=>首先检查目标文件是否存在,如果存在则删除它 (而不是“覆盖”)。 Then you can write and close stream fast. 然后你可以快速写入和关闭流。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM