简体   繁体   中英

No worker threads when fork-join has work to do?

I've got a bug that's come up twice in production now where one of my fork/join pools stops working, even though it has work to do and more work is being added.

This is the conclusion I've come to so far to explain why queues of tasks to do are filling up and the flow of task results are stopping. I have thread dumps where my task producer threads are waiting for a fork/join submission to finish, but there is no ForkJoinPool worker thread doing anything about it.

"calc-scheduling-pool-4-thread-2" #65 prio=5 os_prio=0  tid=0x00000000102e39f0 nid=0x794a in Object.wait() [0x00002ad900a06000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334)
    - locked <0x000000061ad08708> (a com.....Engine$Calculation)
    at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:391)
    at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:719)
    at java.util.concurrent.ForkJoinPool.invoke(ForkJoinPool.java:2613)
    at com...Engine.calculateSinceLastBatch(Engine.java:141)

Regardless of what I'm doing, this shouldn't happen right? The thread dump is from many hours after the initial condition is detected. I have two other ForkJoinPools in the runtime are both running normally with many worker threads present.

The parallelism of this pool is 1 (I know that's stupid but shouldn't break the correctness of the fork/join pool). There are no errors or exceptions detected other until my task queue fills up and a thread dump reveals no worker.

Has anyone else seen this? Either I'm missing something or there's a bug in fork/join that never (re)started a worker thread for me.

The runtime is java 8

update with code

This is a reasonable simplification of how we're using fork/join in production. We have three engines, only one of which is configured with parallelism of 1.

import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.*;

public class Engine {

    BlockingQueue<Calculation> externalQueue = new LinkedBlockingQueue<>(100000);
    ScheduledExecutorService scheduling = Executors.newScheduledThreadPool(3);
    static ForkJoinPool forkJoin = new ForkJoinPool(1);

    public static void main(String[] args) {
        new Engine().start();
    }

    void start() {
        final AtomicInteger batch = new AtomicInteger(0);
        // data comes in from external systems
        scheduling.scheduleWithFixedDelay(
                () -> produceData(batch.getAndIncrement()),
                500,
                500,
                TimeUnit.MILLISECONDS);
        // internal scheduling processes data with a fixed delay
        scheduling.scheduleWithFixedDelay(
                this::calculate,
                1000,
                1000,
                TimeUnit.MILLISECONDS);
    }

    void produceData(final int batch) {
        System.out.println(Thread.currentThread().getName() + " => submitting data for batch " + batch);
        Stream<Integer> data = IntStream.range(0, 10).boxed();
        data.map((i) -> new Calculation(batch, i)).forEach(externalQueue::offer);
    }

    void calculate() {
        int available = externalQueue.size();
        List<Calculation> tasks = new ArrayList<>(available);
        externalQueue.drainTo(tasks);
        // invoke will block for the results to be calculated before continuing
        forkJoin.invoke(new CalculationTask(tasks, 0, tasks.size()));
        System.out.println("done with calculations at " + new Date());
    }

    static class CalculationTask extends RecursiveAction {

        static int MIN_CALCULATION_THRESHOLD = 3;

        List<Calculation> tasks;
        int start;
        int end;

        CalculationTask(List<Calculation> tasks, int start, int end) {
            this.tasks = tasks;
            this.start = start;
            this.end = end;
        }

        // if below a threshold, calculate here, else fork to new CalculationTasks
        @Override
        protected void compute() {
            int work = end - start;
            if (work <= threshold()) {
                for (int i = start; i < end; i++) {
                    Calculation calc = tasks.get(i);
                    calc.calculate();
                }
                return;
            }

            invokeNewActions();
        }

        int threshold() {
            return Math.max(tasks.size() / forkJoin.getParallelism() / 2, MIN_CALCULATION_THRESHOLD);
        }

        void invokeNewActions() {
            invokeAll(
                    new CalculationTask(tasks, start, middle()),
                    new CalculationTask(tasks, middle(), end));
        }

        int middle() {
            return (start + end) / 2;
        }
    }

    static class Calculation {

        int batch;
        int data;

        Calculation(int batch, int data) {
            this.batch = batch;
            this.data = data;
        }

        void calculate() {
            // does some work and pushes results to a listener
            System.out.println(Thread.currentThread().getName() + " => calculation complete on batch " + batch
                            + " for " + data);
        }
    }

}

The wait is at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334)

This tells me that F/J may be using your submitting thread as a worker. Follow the code from invokeAll. After the task submits for execution the code needs the Future and it ends up with ((ForkJoinTask)futures.get(i)).quietlyJoin(); quietlyJoin goes to doJoin.

There, if (Thread.currentThread()) instanceof ForkJoinWorkerThread) would not be true if the pool is using your submitting thread as a worker, it ends up in externalAwaitDone().

The problem may be that your submitting thread will never wake up since it is not a real worker. There are many problems with using a submitting thread as a worker and this may be another one.

As @John-Vint said, without a test this answer is just a guess. Why not set the parallelism >1 and be done with it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM