I am looking for something seemingly simple, a collection with a non-blocking version of "add" and "drain". Something like this:
List itemsToProcess = queue.addOrDrainAndAdd( item );
if ( itemsToProcess != null )
process( items );
It seems to me that if I do these as separate "offer" and "drainTo" calls that I could have offer called twice before I get to the first call to drain to. I would also need a loop on something like "while ( !queue.offer( item ) )" so that after it is drained the offer would work, which I think would require me to also check if drain returned an empty collection (because two might call drain). My naive implementation was like this but it doesn't seem optimal:
void addBatchItem( T item ) {
while ( !batch.offer( item ) ) {
List<T> batched = new ArrayList<>( batchSize );
batch.drainTo( batched );
process( batched );
}
}
Then I thought maybe there is a better way and I just don't know it. Thanks!
EDIT:
Okay, so here's a solution (that is blocking based on ArrayBlockingQueue):
public void add( T batchItem ) {
while ( !batch.offer( batchItem ) ) {
flush();
}
}
public void flush() {
List<T> batched = new ArrayList<>( batchSize );
batch.drainTo( batched, batchSize );
if ( !batched.isEmpty() )
executor.execute( new PhasedRunnable( batched ) );
}
I guess my question is, would the above be more optimal for this purpose than a solution based on ConcurrentLinkedQueue since the latter requires an object allocation for each node?
EXAMPLE CLASS WITH USAGE:
public abstract class Batcher<T> {
private final int batchSize;
private ArrayBlockingQueue<T> batch;
private ExecutorService executor;
private final Phaser phaser = new Phaser( 1 );
public Batcher( int batchSize, ExecutorService executor ) {
this.batchSize = batchSize;
this.executor = executor;
this.batch = new ArrayBlockingQueue<>( batchSize );
}
public void add( T batchItem ) {
while ( !batch.offer( batchItem ) ) {
flush();
}
}
public void flush() {
List<T> batched = new ArrayList<>( batchSize );
batch.drainTo( batched, batchSize );
if ( !batched.isEmpty() )
executor.execute( new PhasedRunnable( batched ) );
}
public abstract void onFlush( List<T> batch );
public void awaitDone() {
phaser.arriveAndAwaitAdvance();
}
public void awaitDone( long duration, TimeUnit unit ) throws TimeoutException {
try {
phaser.awaitAdvanceInterruptibly( phaser.arrive(), duration, unit );
}
catch ( InterruptedException e ) {
Thread.currentThread().interrupt();
}
}
private class PhasedRunnable implements Runnable {
private final List<T> batch;
private PhasedRunnable( List<T> batch ) {
this.batch = batch;
phaser.register();
}
@Override
public void run() {
try {
onFlush( batch );
}
finally {
phaser.arrive();
}
}
}
}
This is a simple example, a more complete example may be JPA entity updates or inserts. Also, I would like it to be possible for #add to be called concurrently.
@Test
public void testOddNumber() {
Batcher<Integer> batcher = new Batcher<Integer>( 10, executor ) {
@Override
public void onFlush( List<Integer> batch ) {
count.addAndGet( batch.size() );
}
};
for ( int at = 0; at != 21; ++at ) {
batcher.add( at );
}
batcher.flush();
batcher.awaitDone();
assertEquals( count.get(), 21 );
}
seemingly simple, a collection with a non-blocking but atomic version of "add" and "drain"
That is actually impossible. Non blocking algorithms (on 1-CAS archs) work on a single memory address for atomicity. So draining an entire queue without blocking and atomically is impossible.
Based on your edit, I would think that is probably the most efficient way to achieve what you are looking for.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.