简体   繁体   中英

Retryable pattern for file processing in java

I need to process a large file (with columns and same format lines). Since I need to consider the cases that the program crashes during the processing, I need this processing program to be retryable, which means after it crashes and I start the program again, it can continue to process the file starting with the line it failed.

Is there any pattern I can follow or library I can use? Thank you!


Update:

About the crashing cases, it is not just about OOM or some internal issues. It also could be caused by the timeout with other parts or machine crashing. So try/catch can't handle this.


Another update:

About the chunking the file, it is feasible in my case but not that as simple as it sounds. As I said, the file is formatted with several columns and I can split it up into hundreds of files based on one of the column and then process the files one by one. But instead of doing this, I would like to learn more about the common solution about processing big file/data supporting retrying.

How I would do it (though am not a pro)

  1. Create a LineProcessor called on every line in file

    class Processor implements LineProcessor> {

      private List<String> lines = Lists.newLinkedList(); private int startFrom = 0; private int lineNumber = 0; public Processor(int startFrom) { this.startFrom = startFrom; } @Override public List<String> getResult() { return lines; } @Override public boolean processLine(String arg0) throws IOException { lineNumber++; if (lineNumber < startFrom) { // do nothing } else { if (new Random().nextInt() % 50000 == 0) { throw new IOException("Randomly thrown Exception " + lineNumber); } //Do the hardwork here lines.add(arg0); startFrom++; } return true; } } 
  2. Create a Callable for Reading Files that makes use of my LineProcessor

     class Reader implements Callable<List<String>> { private int startFrom; public Reader(int startFrom) { this.startFrom = startFrom; } @Override public List<String> call() throws Exception { return Files.readLines(new File("/etc/dictionaries-common/words"), Charsets.UTF_8, new Processor(startFrom)); } } 
  3. Wrap the Callable in a Retryer and call it using an Executor

     public static void main(String[] args) throws InterruptedException, ExecutionException { BasicConfigurator.configure(); ExecutorService executor = Executors.newSingleThreadExecutor(); Future<List<String>> lines = executor.submit(RetryerBuilder .<List<String>> newBuilder() .retryIfExceptionOfType(IOException.class) .withStopStrategy(StopStrategies.stopAfterAttempt(100)).build() .wrap(new Reader(100))); logger.debug(lines.get().size()); executor.shutdown(); logger.debug("Happily Ever After"); 

    }

You could maintain a checkpoint/commit style logic in your code. So when the program runs again it starts from the same checkpoint.

You can use RandomAccessFile to read the file and use the getFilePointer() as your checkpoint which you preserver. When you execute the program again you start with this checkpoint by calling seek(offset).

Try catch won's save from OOM error. You should process the file in chunks and store the location after every successfull chunck into filesystem/database/what ever place where it remains persistent even if your program crashes. Then you can read the previous point from the place you stored it when you restart your software. You must also cleanup this information when the whole file is processed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM