简体   繁体   English

Java多次打开和关闭要写入的文件

[英]Java Multiple Opening and Closing of Files for Writing

Below is a class I have which writes a ConcurrentMap<String, List<String>> to a file. 下面是我有一个将ConcurrentMap<String, List<String>>写入文件的类。 The key in the map is the path, and the value in the map are to be written sequentially to the file. 映射中的键是路径,并且映射中的值应顺序写入文件。 This Task<Void> gets called every time there are 1,000 values in the map: 每当地图中有1,000个值时,都会调用此Task<Void>

public class MapWriter extends Task<Void> {

private final ParsingProducerConsumerContext context;

public MapWriter(ParsingProducerConsumerContext context) {
    this.context = context;
}

@Override
protected Void call() throws Exception {
    if (!isCancelled() || !context.isEmpty()) {
        ConcurrentMap<String, List<String>> jsonObjectMap = context.fetchAndReset();

        jsonObjectMap.entrySet().forEach((t) -> {                
            try {
                FileUtils.writeLines(new File(context.getPath() + t.getKey() + "\\sorted.json"), t.getValue(), true);
            } catch (IOException ex) {
                context.getLogger().log("Error writing to disk:");
                context.getLogger().log(ex.toString());
                context.stopEverything();
            }
        });

        context.getLogger().log(jsonObjectMap.values().stream().mapToInt(List::size).sum() + " schedules written to disk ");
    } else {
        context.getLogger().log("Nothing to write");
    }

    return null;
}
}

All the while this task is running, there is a producer Task reading a ~2GByte file line by line, which gets processed by a consumer and placed into ConcurrentMap<String, List<String>> . 在此任务运行期间,有一个生产者Task逐行读取〜2GByte文件,该文件由使用者处理并放入ConcurrentMap<String, List<String>>

Whilst this does work, it is very slow! 虽然这确实有效,但速度非常慢!

My research suggests that there is a significant enough overhead in opening and closing files repeatedly to impair performance, so was wondering if the following approach might be better? 我的研究表明,反复打开和关闭文件会产生相当大的开销,从而影响性能,是否想知道以下方法是否会更好?

Maintain a Map<String, File> of File objects which are open. 维护打开的File对象的Map<String, File> If the key in ConcurrentMap<String, List<String>> corresponds to an open file, use that File reference for writing When all processing has finished, loop over Map<String, File> values and close each file. 如果ConcurrentMap<String, List<String>>对应于一个打开的文件,请使用该File引用进行写入。完成所有处理后,请遍历Map<String, File>值并关闭每个文件。

Does this sound a sensible way to go? 这听起来是明智的选择吗? There would be approx 100 files open though. 虽然大约有100个文件打开。

EDIT :: I did a simple benchmark using System.nanoTime() . 编辑::我使用System.nanoTime()做了一个简单的基准测试。 The file being imported line by line by the producer is approx 2GB, and each line is between 6kb and 10kb (in the List<String> ). 生产者逐行导入的文件约为2GB,每行介于6kb和10kb之间(在List<String> )。

Also, an OutOfMemory error is encountered! 此外,还会遇到OutOfMemory错误! I guess because the 2GByte is effectively loaded into memory, and not being written out quickly enough? 我猜是因为2GByte已有效地加载到内存中,而没有足够快地被写出?

514 jsonObjects written to disk in 2258007ms 538 jsonObjects written to disk in 2525166ms 1372 jsonObjects written to disk in 169959ms 1690 jsonObjects written to disk in 720824ms 9079 jsonObjects written to disk in 5221168ms 22552 jsonObjects written to disk in 6943207ms 13392 jsonObjects written to disk in 6475639ms 0 jsonObjects written to disk in 6ms 0 jsonObjects written to disk in 5ms 0 jsonObjects written to disk in 5ms 40 jsonObjects written to disk in 23108ms 631 jsonObjects written to disk in 200269ms 3883 jsonObjects written to disk in 2054177ms Producer failed with java.lang.OutOfMemoryError: GC overhead limit exceeded

For completeness, here is the Producer class: 为了完整起见,这是Producer类:

public class NRODJsonProducer extends Task<Void> {

private final ParsingProducerConsumerContext context;

public NRODJsonProducer(ParsingProducerConsumerContext context) {
    this.context = context;
}

@Override
protected Void call() throws Exception {
    context.getLogger().log("Producer created");

    LineIterator li = FileUtils.lineIterator(new File(context.getPath() + context.getFilterFile()));

    while (li.hasNext()) {
        try {
            context.getQueue().put(li.next());
        } catch (InterruptedException ex) {
            Logger.getLogger(NRODJsonProducer.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

    LineIterator.closeQuietly(li);

    context.getLogger().log("Producer finished...");

    return null;
}

} }

I don't see why. 我不明白为什么。 This code writes out everything for a key to a file with the same name, then moves on to the next key. 此代码将密钥的所有内容写到同名文件中,然后继续进行下一个密钥。 If the producer produces another entry for that key, it overwrites the previous entry, and this code will write the file again. 如果生产者为该密钥产生另一个条目,它将覆盖先前的条目,并且此代码将再次写入文件。 Keeping files open won't help that. 保持文件打开无济于事。

The real problem seems to be that you keep writing the same data to the file, because you never remove a processed key from the map. 真正的问题似乎是您一直将相同的数据写入文件,因为您从未从映射中删除已处理的密钥。

NB Your util condition is wrong. 注意:您的使用情况是错误的。 It should be 它应该是

if (!isCancelled() && !context.isEmpty())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM