简体   繁体   English

使用ExecutorService和要执行的任务树

[英]Using ExecutorService with a tree of tasks to perform

We had a bit of a problem. 我们遇到了一些问题。 :) :)

We want to ensure that only N threads are doing background tasks at any time. 我们希望确保只有N个线程可以随时执行后台任务。 To do this, we used a fixed thread pool executor. 为此,我们使用了一个固定的线程池执行器。 It seemed to be working fine. 它似乎工作正常。

Then we found an issue. 然后我们发现了一个问题。 Suppose you have a class which uses the executor to do some parallel work and then it calls some other class while in the executor thread which also does some parallel work, intending to wait on it. 假设你有一个类,它使用执行程序做一些并行工作,然后在执行程序线程中调用其他类,这也执行一些并行工作,打算等待它。 Here's what happens: 这是发生的事情:

  • Main thread calls the first level method. 主线程调用第一级方法。
  • This method thinks it can parallelise into 16 tasks and splits up its work. 这种方法认为它可以并行化为16个任务并将其分工。
  • 16 tasks are submitted to the executor. 16个任务提交给执行者。
  • Main thread starts waiting for its tasks to complete. 主线程开始等待其任务完成。
  • Supposing there are four threads available, the first four tasks each get picked up and run. 假设有四个线程可用,前四个任务分别被选中并运行。 So there are 12 tasks left on the queue. 所以队列中还剩下12个任务。
  • Now, one of these tasks calls some other method. 现在,其中一个任务调用其他方法。
  • This new method thinks it can parallelise into 2 tasks. 这种新方法认为它可以并行化为2个任务。 Let's say it's the first step in a parallel merge sort or something along those lines. 让我们说这是并行合并排序的第一步,或者沿着这些排序。
  • 2 tasks are submitted to the executor. 2个任务提交给执行者。
  • This thread now starts waiting for its tasks to complete. 此线程现在开始等待其任务完成。

Uh-oh. 嗯,哦。 So at this point, all four threads will now be waiting for tasks to complete but they are collaboratively blocking the executor actually running those tasks. 所以在这一点上,所有四个线程现在都在等待任务完成,但它们正在协作阻止执行者实际运行这些任务。

Solution 1 to this problem was as follows: on submitting a new task to the executor, if we are already running all our threads, and we are already running on one of the executor threads, run the task inline. 此问题的解决方案1如下:在向执行程序提交新任务时,如果我们已经在运行所有线程,并且我们已经在其中一个执行程序线程上运行,则运行内联任务。 This worked fine for 10 months, but now we have hit a problem with it. 这个工作正常10个月,但现在我们遇到了问题。 If the new tasks it is submitting are still relatively large, then you can get into a situation where the new task blocks the method from adding the other tasks to the queue, which would otherwise be able to be picked up by the other worker threads. 如果它提交的新任务仍然相对较大,那么您可能会遇到新任务阻止该方法将其他任务添加到队列的情况,否则其他工作线程可以接收该任务。 So you get periods of huge delays while a thread is processing the work inline. 因此,当线程正在处理内联工作时,会出现大量延迟。

Is there a better solution to the core problem of executing a potentially unbounded tree of background tasks? 是否有更好的解决方案来执行可能无限制的后台任务树的核心问题? I understand that .NET's equivalent to the executor service has some kind of in-built ability to steal from the queue which prevents the original deadlock issue from occurring, which as far as I can tell is an ideal solution. 我知道.NET等同于执行程序服务具有从队列中窃取的某种内置能力,这可以防止发生原始死锁问题,据我所知,这是一种理想的解决方案。 But what about over in Java land? 但是在Java土地上呢?

Java 7 has the concept of a ForkJoinPool that allows a task to "fork" off another task by submitting it tot he same Executor. Java 7具有ForkJoinPool的概念,允许任务通过将其提交给相同的Executor来“分离”另一个任务。 Then gives it the option of later attempting to "help join" that task by attempting to run it if it has not been run. 然后给它选择稍后尝试“帮助加入”该任务,如果它尚未运行则尝试运行它。

I believe the same thing can be done in Java 6 by simple combining an Executor with FutureTask . 我相信通过简单地将ExecutorFutureTask相结合,可以在Java 6中完成同样的事情。 Like so: 像这样:

public class Fib implements Callable<Integer> {
    int n;
    Executor exec;

    Fib(final int n, final Executor exec) {
        this.n = n;
        this.exec = exec;
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public Integer call() throws Exception {
        if (n == 0 || n == 1) {
            return n;
        }

        //Divide the problem
        final Fib n1 = new Fib(n - 1, exec);
        final Fib n2 = new Fib(n - 2, exec);

        //FutureTask only allows run to complete once
        final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
        //Ask the Executor for help
        exec.execute(n2Task);

        //Do half the work ourselves
        final int partialResult = n1.call();

        //Do the other half of the work if the Executor hasn't
        n2Task.run();

        //Return the combined result
        return partialResult + n2Task.get();
    }

}        

You could make use of callbacks instead of having your thread wait for the tasks to complete. 您可以使用回调而不是让您的线程等待任务完成。 Your tasks themselves will need to be callbacks since they submit more tasks. 您的任务本身需要回调,因为他们提交了更多任务。

Eg: 例如:

public class ParallelTask implements Runnable, Callback {
  private final Callback mCB;
  private final int mNumChildTasks;
  private int mTimesCalledBack = 0;
  private final Object mLock = new Object();
  private boolean mCompleted = false;
  public ParallelTask(Callback cb) {
    mCB = cb;
    mNumChildTasks = N; // the number of direct child tasks you know this task will spawn
    // only going down 1 generation
    // of course you could figure this number out in the run method (will need to be volatile if so)
   // just as long as it is set before submitting any child tasks for execution
  }

  @Override
  public void run() {
    // do your stuff
    // and submit your child tasks, but don't wait on them to complete
    synchronized(mLock) {
      mCompleted = true;
      if (mNumChildTasks == mTimesCalledBack) {
        mCB.taskCompleted();
      }
    }
  }

  // Callback interface
  // taskCompleted is being called from the threads that this task's children are running in
  @Override
  public void taskCompleted() {
    synchronized(mLock) {
      mTimesCalledBack++;
      // only call our parent back if our direct children have all called us back
      // and our own task is done
      if (mCompleted && mTimesCalledBack == mNumChildTasks) {
        mCB.taskCompleted();
      }
    }
  }
}

In your main thread you submit your root task and register some callback to be executed. 在主线程中,您提交根任务并注册一些要执行的回调。

Since all child tasks don't report completion until their children have reported completion, your root callback shouldn't be called until everything is done. 由于所有子任务在子项报告完成之前都不会报告完成,因此在完成所有操作之前不应调用根回调。

I wrote this on the fly and haven't tested or compiled it, so there may be some errors. 我在运行中写了这个并没有测试或编译它,所以可能会有一些错误。

It seems like the issue is that the tasks also try to parallelize themselves which makes it difficult to avoid resource constraints. 似乎问题是任务也试图并行化自己,这使得难以避免资源限制。 Why do you need to do this? 你为什么需要这样做? Why not always run the subtasks inline? 为什么不总是内联运行子任务?

If you're fully utilizing the cpu already by parallelization then you're not going to buy much in terms of overall work accomplished by dividing the work up again into smaller tasks. 如果您已经通过并行化充分利用了cpu,那么通过将工作再次划分为更小的任务,您将不会在完成整体工作方面购买太多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM