简体   繁体   中英

Java and queues: saturation issues with multithreaded I/O

This question relates to the latest version of Java.

30 producer threads push strings to an abstract queue. One writer thread pops from the same queue and writes the string to a file that resides on a 5400 rpm HDD RAID array. The data is pushed at a rate of roughly 111 MBps, and popped/written at a rate of roughly 80MBps. The program lives for 5600 seconds, enough for about 176 GB of data to accumulate in the queue. On the other hand, I'm restricted to a total of 64GB of main memory.

My question is: What type of queue should I use?

Here's what I've tried so far.

1) ArrayBlockingQueue . The problem with this bounded queue is that, regardless of the initial size of the array, I always end up with liveness issues as soon as it fills up. In fact, a few seconds after the program starts, top reports only a single active thread. Profiling reveals that, on average, the producer threads spend most of their time waiting for the queue to free up. This is regardless of whether or not I use the fair-access policy (with the second argument in the constructor set to true).

2) ConcurrentLinkedQueue . As far as liveness goes, this unbounded queue performs better. Until I run out of memory, about seven hundred seconds in, all thirty producer threads are active. After I cross the 64GB limit, however, things become incredibly slow. I conjecture that this is because of paging issues, though I haven't performed any experiments to prove this.

I foresee two ways out of my situation.

1) Buy an SSD. Hopefully the I/O rate increases will help.

2) Compress the output stream before writing to file.

Is there an alternative? Am I missing something in the way either of the above queues are constructed/used? Is there a cleverer way to use them? The Java Concurrency in Practice book proposes a number of saturation policies (Section 8.3.3) in the case that bounded queues fill up faster than they can be exhausted, but unfortunately none of them---abort, caller runs, and the two discard policies---apply in my scenario.

Look for the bottleneck. You produce more then you consume, a bounded queue makes absolutely sense, since you don't want to run out of memory.

Try to make your consumer faster. Profile and look where the most time is spent. Since you write to a disk here some thoughts:

  • Could you use NIO for your problem? (maybe FileChannel#transferTo() )
  • Flush only when needed.
  • If you have enough CPU reserves, compress the stream? (as you already mentioned)
  • optimize your disks for speed (raid cache, etc.)
  • faster disks

As @Flavio already said, for the producer-consumer pattern, i see no problem there and it should be the way it is now. In the end the slowest party controls the speed.

I can't see the problem here. In a producer-consumer situation, the system will always go with the speed of the slower party. If the producer is faster than the consumer, it will be slowed down to the consumer speed when the queue fills up.

If your constraint is that you can not slow down the producer, you will have to find a way to speed up the consumer. Profile the consumer (don't start too fancy, a few System.nanoTime() calls often give enough information), check where it spends most of its time, and start optimizing from there. If you have a CPU bottleneck you can improve your algorithm, add more threads, etc. If you have a disk bottleneck try writing less (compression is a good idea), get a faster disk, write on two disks instead of one...

According to java " Queue implementation " there are other classes that should be right for you:

  • LinkedBlockingQueue
  • PriorityBlockingQueue
  • DelayQueue
  • SynchronousQueue
  • LinkedTransferQueue
  • TransferQueue

I don't know the performance of these classes or the memory usage but you can try by your self.

I hope that this helps you.

Why do you have 30 producers. Is that number fixed by the problem domain, or is it just a number you picked? If the latter, you should reduce the number of producers until they produce at total rate that is larger than the consumption by only a small amount, and use a blocking queue (as others have suggested). Then you will keep your consumer busy, which is the performance limiting part, while minimizing use of other resources (memory, threads).

you have only 2 ways out: make suppliers slower or consumer faster. Slower producers can be done in many ways, particullary, using bounded queues. To make consumer faster, try https://www.google.ru/search?q=java+memory-mapped+file . Look at https://github.com/peter-lawrey/Java-Chronicle .

Another way is to free writing thread from work of preparing write buffers from strings. Let the producer threads emit ready buffers, not strings. Use limited number of buffers, say, 2*threadnumber=60. Allocate all buffers at the start and then reuse them. Use a queue for empty buffers. Producing thread takes a buffer from that queue, fills it and puts into writing queue. Writing thread takes buffers from writing thread, writes to disk and puts into the empty buffers queue.

Yet another approach is to use asynchronous I/O . Producers initiate writing operation themselves, without special writing thread. Completion handler returns used buffer into tthe empty buffers queue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM