具有大量任務的ExecutorService

Question

我有一個文件列表和一個分析這些文件的分析器列表。 文件數量可以大（200,000），分析儀數量可以（1000）。 因此，操作總數可能真的很大（200,000,000）。 現在，我需要應用多線程來加快速度。 我遵循這種方法：

ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
for (File file : listOfFiles) {
  for (Analyzer analyzer : listOfAnalyzers){
    executor.execute(() -> {
      boolean exists = file.exists();
      if(exists){
        analyzer.analyze(file);
      }
    });
  }
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);

但是這種方法的問題是它占用了太多內存，我想還有更好的方法。 我仍然是Java和多線程的初學者。

Answer 1

2億個任務將駐留在哪里？ 我希望，除非您打算以分布式方式實現解決方案，否則請不要將其存儲在內存中。 同時，您需要實例化不會累積大量隊列的ExecutorService 。 創建服務時，請與“主叫方運行策略”一起使用（請參閱此處）。 如果嘗試在另一個任務已滿時將其放入隊列中，您將最終自己執行它，這可能就是您想要的。

OTOH，現在我更加認真地研究您的問題，為什么不同時分析單個文件？ 然后，隊列永遠不會大於分析器的數量。 坦率地說，這就是我要做的，因為我想要一個可讀的日志，該日志在加載文件時會以正確的順序顯示每個文件的消息。

很抱歉沒有提供更多幫助：

analysts.stream().map(analyst -> executor.submit(() -> analyst.analyze(file))).map(Future::get);

基本上，為一個文件創建一堆期貨，然后等待所有這些期貨再繼續。

Answer 2

一種想法是采用fork / join算法並將項目（文件）分組為批處理，以便分別處理它們。

我的建議如下：

首先，過濾掉所有不存在的文件-它們不必要地占用了資源。

以下偽代碼演示了可以幫助您的算法：

 public static class CustomRecursiveTask extends RecursiveTask<Integer { private final Analyzer[] analyzers; private final int threshold; private final File[] files; private final int start; private final int end; public CustomRecursiveTask(Analyzer[] analyzers, final int threshold, File[] files, int start, int end) { this.analyzers = analyzers; this.threshold = threshold; this.files = files; this.start = start; this.end = end; } @Override protected Integer compute() { final int filesProcessed = end - start; if (filesProcessed < threshold) { return processSequentially(); } else { final int middle = (start + end) / 2; final int analyzersCount = analyzers.length; final ForkJoinTask<Integer> left = new CustomRecursiveTask(analyzers, threshold, files, start, middle); final ForkJoinTask<Integer> right = new CustomRecursiveTask(analyzers, threshold, files, middle + 1, end); left.fork(); right.fork(); return left.join() + right.join(); } } private Integer processSequentially() { for (int i = start; i < end; i++) { File file = files[i]; for(Analyzer analyzer : analyzers) { analyzer.analyze(file) }; } return 1; } }

用法如下所示：

 public static void main(String[] args) {
    final Analyzer[] analyzers = new Analyzer[]{};
    final File[] files = new File[] {};

    final int threshold = files.length / 5;

    ForkJoinPool.commonPool().execute(
            new CustomRecursiveTask(
                    analyzers,
                    threshold,
                    files,
                    0,
                    files.length
            )
    );
}

請注意，根據約束條件，您可以操縱任務的構造函數參數，以便算法將調整為文件量。

您可以根據文件的數量指定不同的threshold 。

final int threshold;
if(files.length > 100_000) {
   threshold = files.length / 4;
} else {
   threshold = files.length / 8;
}

您還可以根據輸入量在ForkJoinPool指定輔助線程的數量。

測量，調整，修改，您最終將解決問題。

希望能有所幫助。

更新：

如果結果分析RecursiveTask ，則可以將RecursiveTask替換為RecursiveAction 。 偽代碼在這之間增加了自動裝箱的開銷。

具有大量任務的ExecutorService

問題描述

2 個解決方案

解決方案1
4 已采納 2018-06-28 13:45:50

解決方案2
2 2018-06-28 14:37:13

具有大量任務的ExecutorService

問題描述

2 個解決方案

解決方案1 4 已采納 2018-06-28 13:45:50

解決方案2 2 2018-06-28 14:37:13

解決方案1
4 已采納 2018-06-28 13:45:50

解決方案2
2 2018-06-28 14:37:13