简体   繁体   English

如何确定fork-join任务的正确工时除法阈值

[英]How to determine the proper work division threshold of a fork-join task

After looking the Fork/Join Tutorial , I created a class for computing large factorials: 看完Fork / Join教程之后 ,我创建了一个用于计算大因子的类:

public class ForkFactorial extends RecursiveTask<BigInteger> {

    final int end;
    final int start;
    private static final int THRESHOLD = 10;

    public ForkFactorial(int n) {
        this(1, n + 1);
    }

    private ForkFactorial(int start, int end) {
        this.start = start;
        this.end = end;
    }

    @Override
    protected BigInteger compute() {
        if (end - start < THRESHOLD) {
            return computeDirectly();
        } else {
            int mid = (start + end) / 2;
            ForkFactorial lower = new ForkFactorial(start, mid);
            lower.fork();
            ForkFactorial upper = new ForkFactorial(mid, end);
            BigInteger upperVal = upper.compute();
            return lower.join().multiply(upperVal);
        }
    }

    private BigInteger computeDirectly() {
        BigInteger val = BigInteger.ONE;
        BigInteger mult = BigInteger.valueOf(start);
        for (int iter = start; iter < end; iter++, mult = mult.add(BigInteger.ONE)) {
            val = val.multiply(mult);
        }
        return val;
    }
}

The question I have is how to determine the threshold for which I subdivide the task? 我的问题是如何确定我细分任务的门槛? I found a page on fork/join parallelism which states: 在fork / join parallelism上找到了一个页面,其中指出:

One of the main things to consider when implementing an algorithm using fork/join parallelism is chosing the threshold which determines whether a task will execute a sequential computation rather than forking parallel sub-tasks. 在使用fork / join并行性实现算法时要考虑的主要事项之一是选择阈值,该阈值确定任务是否将执行顺序计算而不是分支并行子任务。

If the threshold is too large, then the program might not create enough tasks to fully take advantage of the available processors/cores. 如果阈值太大,则程序可能无法创建足够的任务来充分利用可用的处理器/核心。

If the threshold is too small, then the overhead of task creation and management could become significant. 如果阈值太小,则任务创建和管理的开销可能变得很大。

In general, some experimentation will be necessary to find an appropriate threshold value. 通常,需要一些实验来找到合适的阈值。

So what experimentation would I need to do in order to determine the threshold? 那么我需要做些什么样的实验来确定阈值呢?

PigeonHole estimation: Set an arbitrary Threshold, calculate the computation time. PigeonHole估计:设置任意阈值,计算计算时间。 and based on it increase and decrease the threshold to see if your computation time improves, till the time you see no improvement by lowering the threshold. 并在此基础上增加和减少阈值,以查看您的计算时间是否有所改善,直到您通过降低阈值看不到任何改进为止。

Choosing a threshold depends on many factors: 选择阈值取决于许多因素:

The actual computation should take a reasonable amount of time. 实际计算应该花费合理的时间。 If you're summing an array and the array is small then it is probably better to do it sequentially. 如果你要求一个数组并且数组很小,那么按顺序执行它可能会更好。 If the array length is 16M, then splitting it into smaller pieces and parallel processing should be worthwhile. 如果数组长度为16M,则将其拆分为较小的部分并进行并行处理应该是值得的。 Try it and see. 试试看吧。

The number of processors should be sufficient. 处理器的数量应该足够了。 Doug Lea once documented his framework with the number 16+ processors to make it worthwhile. Doug Lea曾用16+处理器记录他的框架,以使其值得。 Even splitting an array in half and running on two threads will produce about a 1.3% gain in throughput. 即使将数组分成两半并在两个线程上运行,吞吐量也会增加1.3%。 Now you have to consider the split/join overhead. 现在您必须考虑拆分/加入开销。 Try running on many configurations to see what you get. 尝试在许多配置上运行,看看你得到了什么。

The number of concurrent requests should be small. 并发请求的数量应该很少。 If you have N processors and 8(N) concurrent requests, then using one thread per request is often more efficient for throughput. 如果您有N个处理器和8(N)个并发请求,那么每个请求使用一个线程通常对吞吐量更有效。 The logic here is simple. 这里的逻辑很简单。 If you have N processors available and you split your work accordingly but there are hundreds of other tasks ahead of you, then what's the point of splitting? 如果您有N个处理器可用,并且相应地拆分了您的工作,但是您前面还有数百个其他任务,那么分裂的重点是什么?

This is what experimenting means. 这就是实验手段。

Unfortunately, this framework doesn't come with the means for accountability. 不幸的是,这个框架没有问责制。 There is no way to see the load on each thread. 无法在每个线程上看到负载。 The high water mark in deques. deques的高水位标记。 Total requests processed. 已处理的请求总数 Errors encountered, etc. 遇到错误等

Good luck. 祝好运。

Note that arithmetic is not constant time with BigInteger, it is proportional to the length of the inputs. 请注意,算术与BigInteger不是恒定时间,它与输入的长度成比例。 The actual complexity of each operation is not readily at hand , though the futureboy implementation referenced in that Q/A section does document what it (expects) to achieve under different circumstances. 每个操作的实际复杂性是不容易在 ,虽然是Q引用的futureboy执行/ A部分没有记录什么(预计)根据不同的情况来实现。

Getting the work estimating function correct is important both when it comes to deciding how to partition the problem into smaller chunks and for determining whether or not a particular chunk is worth dividing again. 在决定如何将问题划分为更小的块以及确定特定块是否值得再次划分时,使工作估计功能正确非常重要。

When using experimentation to determine your threshold, you need to take care that you do not just benchmark one corner of the problem space. 在使用实验来确定阈值时,您需要注意不要只是对问题空间的一个角进行基准测试。

As I understand, this experiment is an optimization, so it should be applied only when there is a need. 据我了解,这个实验是一个优化,因此只有在需要时才应用。

You could experiment on different split strategies - ie one can split by two equal parts or by estimated multiplication cost which depends on the integer decimal length. 您可以尝试不同的拆分策略 - 即可以通过两个相等的部分或估计的乘法成本(取决于整数小数长度)进行分割。

For each of the strategies you could test as many threshold values as possible for get the full picture of your strategies. 对于每种策略,您可以测试尽可能多的阈值,以全面了解您的策略。 If you are limited in CPU resource, than you could test ie each 5th or 10th. 如果您的CPU资源有限,那么您可以测试每个5或10。 So, from my experience the first important thing here is to get the full picture of how your algorithm performs. 因此,根据我的经验,这里的第一个重要事项是全面了解算法的执行情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM