简体   繁体   English

Java中文件处理的可重试模式

[英]Retryable pattern for file processing in java

I need to process a large file (with columns and same format lines). 我需要处理一个大文件(具有列和相同格式的行)。 Since I need to consider the cases that the program crashes during the processing, I need this processing program to be retryable, which means after it crashes and I start the program again, it can continue to process the file starting with the line it failed. 由于我需要考虑程序在处理过程中崩溃的情况,因此我需要将此处理程序重试,这意味着在它崩溃并再次启动程序后,它可以继续处理文件,从失败的行开始。

Is there any pattern I can follow or library I can use? 有什么我可以遵循的模式或可以使用的图书馆吗? Thank you! 谢谢!


Update: 更新:

About the crashing cases, it is not just about OOM or some internal issues. 关于崩溃的情况,不仅涉及OOM或一些内部问题。 It also could be caused by the timeout with other parts or machine crashing. 也可能是由于其他零件超时或机器崩溃导致的。 So try/catch can't handle this. 因此,try / catch无法处理此问题。


Another update: 另一个更新:

About the chunking the file, it is feasible in my case but not that as simple as it sounds. 关于对文件进行分块,在我看来,这是可行的,但听起来并不那么简单。 As I said, the file is formatted with several columns and I can split it up into hundreds of files based on one of the column and then process the files one by one. 就像我说的那样,文件是用几列格式化的,我可以根据其中一列将其拆分成数百个文件,然后逐个处理文件。 But instead of doing this, I would like to learn more about the common solution about processing big file/data supporting retrying. 但是,除了这样做,我想了解有关处理大文件/数据支持重试的通用解决方案的更多信息。

How I would do it (though am not a pro) 我会怎么做(虽然不是专业人士)

  1. Create a LineProcessor called on every line in file 在文件的每一行上创建一个LineProcessor调用

    class Processor implements LineProcessor> { 类Processor实现LineProcessor> {

      private List<String> lines = Lists.newLinkedList(); private int startFrom = 0; private int lineNumber = 0; public Processor(int startFrom) { this.startFrom = startFrom; } @Override public List<String> getResult() { return lines; } @Override public boolean processLine(String arg0) throws IOException { lineNumber++; if (lineNumber < startFrom) { // do nothing } else { if (new Random().nextInt() % 50000 == 0) { throw new IOException("Randomly thrown Exception " + lineNumber); } //Do the hardwork here lines.add(arg0); startFrom++; } return true; } } 
  2. Create a Callable for Reading Files that makes use of my LineProcessor 创建一个使用我的LineProcessor可调用文件来读取

     class Reader implements Callable<List<String>> { private int startFrom; public Reader(int startFrom) { this.startFrom = startFrom; } @Override public List<String> call() throws Exception { return Files.readLines(new File("/etc/dictionaries-common/words"), Charsets.UTF_8, new Processor(startFrom)); } } 
  3. Wrap the Callable in a Retryer and call it using an Executor Callable包装在Retryer中,然后使用Executor调用它

     public static void main(String[] args) throws InterruptedException, ExecutionException { BasicConfigurator.configure(); ExecutorService executor = Executors.newSingleThreadExecutor(); Future<List<String>> lines = executor.submit(RetryerBuilder .<List<String>> newBuilder() .retryIfExceptionOfType(IOException.class) .withStopStrategy(StopStrategies.stopAfterAttempt(100)).build() .wrap(new Reader(100))); logger.debug(lines.get().size()); executor.shutdown(); logger.debug("Happily Ever After"); 

    } }

You could maintain a checkpoint/commit style logic in your code. 您可以在代码中维护检查点/提交样式逻辑。 So when the program runs again it starts from the same checkpoint. 因此,当程序再次运行时,它将从同一检查点开始。

You can use RandomAccessFile to read the file and use the getFilePointer() as your checkpoint which you preserver. 您可以使用RandomAccessFile读取文件,并使用getFilePointer()作为保存者的检查点。 When you execute the program again you start with this checkpoint by calling seek(offset). 当您再次执行程序时,可以通过调用seek(offset)从此检查点开始。

Try catch won's save from OOM error. 尝试从OOM错误中捕获获胜者。 You should process the file in chunks and store the location after every successfull chunck into filesystem/database/what ever place where it remains persistent even if your program crashes. 您应该对文件进行分块处理,并在每次成功处理后将位置存储到文件系统/数据库/即使程序崩溃也可以保持持久性的任何位置。 Then you can read the previous point from the place you stored it when you restart your software. 然后,您可以在重新启动软件时从存储它的位置读取上一点。 You must also cleanup this information when the whole file is processed. 在处理整个文件时,您还必须清除此信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM