Thread-safe FIFO queue with unique items and thread pool

Question

I have to manage scheduled file replications in a system. The file replications are scheduled by users and I need to restrict the amount of system resources used during replication. The amount of time that each replication may take is not defined (ie a replication may be scheduled to run every 15 minutes and the previous run may still be running when the next run is due) and a replication should not be queued if it's already queued or running.

I have a scheduler that periodically checks for due file replications and, for each one, (1) add it to a blocking queue if it is not queued nor running or (2) drop it otherwise.

private final Object scheduledReplicationsLock = new Object();
private final BlockingQueue<Replication> replicationQueue = new LinkedBlockingQueue<>();
private final Set<Long> queuedReplicationIds = new HashSet<>();
private final Set<Long> runningReplicationIds = new HashSet<>();

public boolean add(Replication replication) {

    synchronized (scheduledReplicationsLock) {
        // If the replication job is either still executing or is already queued, do not add it.
        if (queuedReplicationIds.contains(replication.id) || runningReplicationIds.contains(replication.id)) {
            return false;
        }
        replicationQueue.add(replication)
        queuedReplicationIds.add(replication.id);
    }

I also have a pool of threads that waits until there is a replication in the queue and executes it. Below is the main method of each thread in the thread pool:

public void run() {
    while (True) {
        Replication replication = null;
        synchronized (scheduledReplicationsLock) {
            // This will block until a replication job is ready to be run or the current thread is interrupted.
            replication = replicationQueue.take();

            // Move the ID value out of the queued set and into the active set
            Long replicationId = replication.getId();
            queuedReplicationIds.remove(replicationId);
            runningReplicationIds.add(replicationId);
        }
        executeReplication(replication)
    }
}

This code gets into a deadlock because the first thread in the thread poll will get scheduledLock and prevent the scheduler to add replications to the queue. Moving replicationQueue.take() out of the synchronized block will eliminate the deadlock, but then it's possible that a element is removed from the queue and the hash sets are not atomically updated with it, which could cause a replication to be incorrectly dropped.

Should I use BlockingQueue.poll() and release the lock + sleep if the queue is empty instead of using BlockingQueue.take() ?

Fixes to the current solution or other solutions that meet the requirements are welcome.

Answer 1

wait / notify

Keeping your same control flow, instead of blocking on the BlockingQueue instance while holding the mutex lock, you can wait on notifications for the scheduledReplicationsLock forcing the worker thread to release the lock and return to the waiting pool.

Here down a reduced sample of your producer:

private final List<Replication> replicationQueue = new LinkedList<>();
private final Set<Long> runningReplicationIds = new HashSet<>();

public boolean add(Replication replication) {
    synchronized (replicationQueue) {
        // If the replication job is either still executing or is already queued, do not add it.
        if (replicationQueue.contains(replication) || runningReplicationIds.contains(replication.id)) {
            return false;
        } else {
            replicationQueue.add(replication);
            replicationQueue.notifyAll();
        }
    }
}

The worker Runnable would then be updated as follows:

public void run() {
    synchronized (replicationQueue) {
        while (true) {
            if (replicationQueue.isEmpty()) {
                scheduledReplicationsLock.wait();
            }
            if (!replicationQueue.isEmpty()) {
                Replication replication = replicationQueue.poll();
                runningReplicationIds.add(replication.getId())
                executeReplication(replication);
            }
        }
    }
}

BlockingQueue

Generally you are better off using the BlockingQueue to coordinate your producer and replicating worker pool.

The BlockingQueue is, as the name implies, blocking by nature and will cause the calling thread to block only if items cannot be pulled / pushed from / to the queue.

Meanwhile, note that you will have to update your running / enqueued state management as you will only synchronizing on the BlockingQueue items dropping any constraints. This then will depend on the context, whether this would be acceptable or not.

This way, you would drop all other used mutex(es) and use on the BlockingQueue as your synchronization state:

private final BlockingQueue<Replication> replicationQueue = new LinkedBlockingQueue<>();

public boolean add(Replication replication) {
    // not sure if this is the proper invariant to check as at some point the replication would be neither queued nor running while still have been processed
    if (replicationQueue.contains(replication)) {
        return false;
    }
    // use `put` instead of `add` as this will block waiting for free space
    replicationQueue.put(replication);
    return true;
}

The workers will then take indefinitely from the BlockingQueue :

public void run() {
    while (true) {
        Replication replication = replicationQueue.take();
        executeReplication(replication);
    }
}

Answer 2

You no need to use any additional synchronization block if you using BlockingQueue

Quote from docs ( https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html )

BlockingQueue implementations are thread-safe . All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control.

just use something like this

public void run() {
    try {
        while (replicationQueue.take()) { //Thread will be wait for the next element in the queue
          Long replicationId = replication.getId();
          queuedReplicationIds.remove(replicationId);
          runningReplicationIds.add(replicationId);
          executeReplication(replication);
        }
    } catch (InterruptedException ex) {
      //if interrupted while waiting next element
    }
}

}

look in javadoc https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/LinkedBlockingQueue.html#take()

Or you can use BlockinQueue.pool() with timeout settings

UPD: After discussion, I extend LinkedBlockingQueue with two ConcurrentHashSets and add method afterTake() to remove processed Replicas. You do not need an additional synchronizations outside the queue. Just put replica in the first thread and take it in another, and call afterTake() when replication finished. You need to override other method if you want to use it.

package ru.everytag;

import io.vertx.core.impl.ConcurrentHashSet;

import java.util.concurrent.LinkedBlockingQueue;

public class TwoPhaseBlockingQueue<E> extends LinkedBlockingQueue<E> {
  private ConcurrentHashSet<E> items = new ConcurrentHashSet<>();
  private ConcurrentHashSet<E> taken = new ConcurrentHashSet<>();

@Override
public void put(E e) throws InterruptedException {
    if (!items.contains(e)) {
        items.add(e);
        super.put(e);
    }
}

public E take() {
    E item = take();

    taken.add(item);
    items.remove(item);

    return item;
}

public void afterTake(E e) {
    if (taken.contains(e)) {
        taken.remove(e);
    } else if (items.contains(e)) {
        throw new IllegalArgumentException("Element still in the queue");
    }
}
}

Thread-safe FIFO queue with unique items and thread pool

Question

2 answers

solution1
2 2021-11-10 09:36:56

wait / notify

BlockingQueue

solution2
0 2021-11-10 09:37:01

Thread-safe FIFO queue with unique items and thread pool

Question

2 answers

solution1 2 2021-11-10 09:36:56

wait / notify

BlockingQueue

solution2 0 2021-11-10 09:37:01

solution1
2 2021-11-10 09:36:56

solution2
0 2021-11-10 09:37:01