简体   繁体   English

java中的非阻塞缓冲区

[英]Non blocking buffer in java

In a high volume multi-threaded java project I need to implement a non-blocking buffer. 在高容量多线程java项目中,我需要实现一个非阻塞缓冲区。

In my scenario I have a web layer that receives ~20,000 requests per second. 在我的场景中,我有一个每秒接收约20,000个请求的Web层。 I need to accumulate some of those requests in some data structure (aka the desired buffer) and when it is full (let's assume it is full when it contains 1000 objects) those objects should be serialized to a file that will be sent to another server for further processing. 我需要在某些数据结构(也就是所需的缓冲区)中累积其中一些请求,当它满了时(假设它包含1000个对象时它已满),这些对象应序列化为将发送到另一个服务器的文件进一步处理。

The implementation shoud be a non-blocking one. 实施应该是一个非阻塞的实施。 I examined ConcurrentLinkedQueue but I'm not sure it can fit the job. 我检查了ConcurrentLinkedQueue,但我不确定它是否适合这项工作。

I think I need to use 2 queues in a way that once the first gets filled it is replaced by a new one, and the full queue ("the first") gets delivered for further processing. 我认为我需要使用两个队列,一旦第一个被填充,它将被一个新队列替换,并且完整队列(“第一个”)被交付以进行进一步处理。 This is the basic idea I'm thinking of at the moment, and still I don't know if it is feasible since I'm not sure I can switch pointers in java (in order to switch the full queue). 这是我现在想到的基本想法,但我仍然不知道它是否可行,因为我不确定我是否可以在java中切换指针(为了切换完整队列)。

Any advice? 有什么建议?

Thanks 谢谢

What I usualy do with requirements like this is create a pool of buffers at app startup and store the references in a BlockingQueue. 我通常对这样的要求做的是在app启动时创建一个缓冲池并将引用存储在BlockingQueue中。 The producer thread pops buffers, fills them and then pushes the refs to another queue upon which the consumers are waiting. 生产者线程弹出缓冲区,填充它们然后将refs推送到消费者正在等待的另一个队列。 When consumer/s are done, (data written to fine, in your case), the refs get pushed back onto the pool queue for re-use. 当消费者完成时(在你的情况下写入数据很好),refs会被推回到池队列中以便重新使用。 This provides lots of buffer storage, no need for expensive bulk copying inside locks, eliminates GC actions, provides flow-control, (if the pool empties, the producer is forced to wait until some buffers are returned), and prevents memory-runaway, all in one design. 这提供了大量的缓冲存储,不需要在锁内部进行昂贵的批量复制,消除GC操作,提供流控制(如果池清空,生产者被迫等待直到返回一些缓冲区),并防止内存失控,一体化设计。

More: I've used such designs for many years in various other languages too, (C++, Delphi), and it works well. 更多:我已经在其他各种语言中使用过这些设计多年(C ++,Delphi),而且效果很好。 I have an 'ObjectPool' class that contains the BlockingQueue and a 'PooledObject' class to derive the buffers from. 我有一个'ObjectPool'类,它包含BlockingQueue和'PooledObject'类,用于派生缓冲区。 PooledObject has an internal private reference to its pool, (it gets initialized on pool creation), so allowing a parameterless release() method. PooledObject有一个对其池的内部私有引用(它在池创建时初始化),因此允许无参数的release()方法。 This means that, in complex designs with more than one pool, a buffer always gets released to the correct pool, reducing cockup-potential. 这意味着,在具有多个池的复杂设计中,缓冲区始终会释放到正确的池中,从而降低了混淆潜力。

Most of my apps have a GUI, so I usually dump the pool level to a status bar on a timer, every second, say. 我的大多数应用程序都有一个GUI,所以我通常会将池级别转储到计时器上的状态栏,比如说。 I can then see roughly how much loading there is, if any buffers are leaking, (number consistently goes down and then app eventually deadlocks on empty pool), or I am double-releasing, (number consistently goes up and app eventually crashes). 然后我可以看到大概有多少加载,如果有任何缓冲区泄漏,(数字一直下降,然后app最终在空池上死锁),或者我是双重释放,(数字一直上升,应用程序最终崩溃)。

It's also fairly easy to change the number of buffers at runtime, by either creating more and pushing them into the pool, or by waiting on the pool, removing buffers and letting GC destroy them. 通过创建更多缓冲区并将它们推入池中,或者通过等待池,移除缓冲区并让GC销毁它们,在运行时更改缓冲区的数量也相当容易。

I think you have a very good point with your solution. 我认为你的解决方案非常好。 You would need two queues, the processingQueue would be the buffer size you want (in your example that would be 1000) while the waitingQueue would be a lot bigger. 你需要两个队列, processingQueue将是你想要的缓冲区大小(在你的例子中是1000),而waitingQueue则要大得多。 Every time the processingQueue is full it will put its contents in the specified file and then grab the first 1000 from the waitingQueue (or less if the waiting queue has fewer than 1000). 每当processingQueue已满时,它将把它的内容放在指定的文件中,然后从waitingQueue获取前1000个(如果等待队列少于1000,则为更少)。

My only concern about this is that you mention 20000 per second and a buffer of 1000. I know the 1000 was an example, but if you don't make it bigger it might just be that you are moving the problem to the waitingQueue rather than solving it, as your waitingQueue will receive 1000 new ones faster than the processingQueue can process them, giving you a buffer overflow in the waitingQueue . 我唯一担心的是你提到每秒20000和1000的缓冲区。我知道1000是一个例子,但是如果你不把它做得更大,那可能就是你把问题转移到waitingQueue而不是解决它,因为waitingQueue将比processingQueue处理它们更快地接收1000个新的,在waitingQueue给你一个缓冲区溢出。

I might be getting something wrong, but you may use an ArrayList for this as you don't need to poll per element from your queue. 我可能会出错,但您可以使用ArrayList ,因为您不需要轮询队列中的每个元素。 You just flush (create a copy and clear) your array in a synchronized section when it's size reaches the limit and you need to send it. 您只需在同步部分中刷新(创建副本并清除)数组,当它的大小达到限制并且您需要发送它时。 Adding to this list should also be synced to this flush operation. 添加到此列表也应同步到此刷新操作。

Swapping your arrays might not be safe - if your sending is slower than your generation, buffers may soon start overwriting each other. 交换数组可能不安全 - 如果您的发送速度比您的发送速度慢,缓冲区可能很快就会开始覆盖彼此。 And 20000-elements array allocation per second is almost nothing for GC. 每秒20000个元素的数组分配对于GC几乎没有任何意义。

Object lock  = new Object();

List list = ...;

synchronized(lock){
    list.add();
}

...

// this check outside is a quick dirty check for performance, 
// it's not valid out of the sync block
// this first check is less than nano-second and will filter out 99.9%
// `synchronized(lock)` sections
if(list.size() > 1000){
  synchronized(lock){  // this should be less than a microsecond
     if(list.size() > 1000){  // this one is valid
       // make sure this is async (i.e. saved in a separate thread) or <1ms
       // new array allocation must be the slowest part here
       sendAsyncInASeparateThread(new ArrayList(list)); 
       list.clear();
     }
  }
}

UPDATE UPDATE

Considering that sending is async, the slowest part here is new ArrayList(list) which should be around 1 microseconds for 1000 elements and 20 microseconds per second. 考虑到发送是异步的,这里最慢的部分是new ArrayList(list) ,对于1000个元素和每秒20微秒应该是大约1微秒。 I didn't measure that, I resolved this from proportion in which 1 million elements are allocated in ~1 ms. 我没有测量,我解决了这个问题,其中在1毫秒内分配了100万个元素。

If you still require a super-fast synchronized queue, you might want to have a look at the MentaQueue 如果您仍需要超快速同步队列,则可能需要查看MentaQueue

Instead of putting each request object in a queue, allocate an array of size 1000, and when it is filled, put that array in the queue to the sender thread which serializes and sends the whole array. 不是将每个请求对象放在一个队列中,而是分配一个大小为1000的数组,当它被填充时,将该数组放入队列中,发送者线程序列化并发送整个数组。 Then allocate another array. 然后分配另一个数组。

How are you going to handle the situation when the sender cannot work fast enough and its queue is overflown? 当发送方无法快速工作且其队列溢出时,您将如何处理这种情况? To avoid out of memory error, use queue of a limited size. 为避免内存不足错误,请使用有限大小的队列。

What do you mean by "switch pointers"? “切换指针”是什么意思? There are no pointers in Java (unless you're talking about references). Java中没有指针(除非你在谈论引用)。

Anyways, as you probably saw from the Javadoc, ConcurrentLinkedQueue has a "problem" with the size() method. 无论如何,正如您可能从Javadoc中看到的那样, ConcurrentLinkedQueue与size()方法存在“问题”。 Still, you could use your original idea of 2 (or more) buffers that would get switched. 不过,你可以使用你想要切换的2个(或更多)缓冲区的原始想法。 There's probably going to be some bottlenecks with the disk I/O. 磁盘I / O可能会遇到一些瓶颈。 Maybe the non-constant time of size() won't be a problem here either. 也许size()的非常量时间也不会成为问题。

Of course if you want it to be non-blocking, you better have a lot of memory and a fast disk (and large / bigger buffers). 当然,如果你想要它是非阻塞的,你最好拥有大量内存和快速磁盘(以及大/大缓冲区)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM