简体   繁体   English

通用列表并发访问 - 在存储数据时清除列表的一部分

[英]Generic list concurrent access - clear part of list while data is getting stored

I have a generic List<T> in which live streaming data coming from web socket is getting stored.我有一个通用的List<T> ,其中存储了来自网络套接字的实时流数据。 I want to store the data from the generic list to the database and clear the list so that fresh streaming data can be stored without filling my machine's memory.我想将通用列表中的数据存储到数据库并清除列表,以便可以存储新的流数据而不会填满我的机器内存。

If I enumerate over the list to send data to database I am getting exception, since data is getting added to the list while I try to enumerate or clear the list.如果我枚举列表以将数据发送到数据库,我会遇到异常,因为在我尝试枚举或清除列表时数据正在添加到列表中。 If I apply the lock on the list, streaming data will pause and that's not allowed.如果我在列表上应用锁定,流数据将暂停,这是不允许的。

Please suggest how can I solve this problem.请建议我如何解决这个问题。

Seems like a job for BatchBlock似乎是BatchBlock的工作

It is totally thread safe and is perfectly suited for data flows.它是完全线程安全的,非常适合数据流。 There are a lot of classes in the DataFlow.Net library, but the one that suits your situation is BatchBlock . DataFlow.Net 库中有很多类,但适合您情况的是BatchBlock

BatchBlock collects data until the size threshold is met. BatchBlock收集数据,直到达到大小阈值。 When it is met, the whole batch will be the result.遇之,则整批为结果。 You get the result in different ways like .Receive or ReceiveAll or their async counterparts.您可以通过不同的方式获得结果,例如.ReceiveReceiveAll或它们的异步方式。 Another way is to link the batch result to another block like ActionBlock which will asynchronously call the supplied Action every time the input is supplied to it from the source block(BatchBlock in this case), so basically every time the batch gets full it is sent to the ActionBlock.另一种方法是将批处理结果链接到另一个块,如ActionBlock ,每次从源块(在本例中为 BatchBlock)提供输入时,它将异步调用提供的Action ,因此基本上每次批处理满时都会发送到动作块。 ActionBlock can receive a parameter like MaxDegreeOfParallelism to avoid database lock or smth if you need that, but it will not block the BatchBlock in any way so no waiting on the client side, the batches will be simply placed in a queue(thread safe) for ActionBlock to execute. ActionBlock可以接收像MaxDegreeOfParallelism这样的参数来避免数据库锁定或 smth 如果你需要的话,但它不会以任何方式阻塞BatchBlock所以不需要在客户端等待,批次将简单地放在一个队列中(线程安全) ActionBlock执行的动作块。

And do not worry, when the batch gets full, it also doesn't stop to receive new items, so no blocking again.不用担心,当批次变满时,它也不会停止接收新项目,因此不会再次阻塞。 A beautiful solution.一个漂亮的解决方案。

One thing to worry about is that if the batch didn't reach full size, but you stop the application, the results will get lost, so you can TriggerBatch manually to sent as much items to ActionBlock as there is in the batch.需要担心的一件事是,如果批处理没有达到完整大小,但您停止了应用程序,结果将会丢失,因此您可以手动TriggerBatch将与批处理中一样多的项目发送到ActionBlock So you can call TriggerBatch in Dispose or smth, up to you.因此,您可以在Dispose或 smth 中调用TriggerBatch ,由您决定。

Also there are two ways of inputting items in the BatchBlock : Post and SendAsync .BatchBlock中也有两种输入项目的方法: PostSendAsync Post is blocking I believe (although I am not sure), but SendAsync postpones the message if the BatchBlock is busy.我相信Post正在阻塞(尽管我不确定),但是如果BatchBlock繁忙, SendAsync会推迟消息。

class ConcurrentCache<T> : IAsyncDisposable {
    private readonly BatchBlock<T>    _batchBlock;
    private readonly ActionBlock<T[]> _actionBlock;
    private readonly IDisposable      _linkedBlock;

    public ConcurrentCache(int cacheSize) {
        _batchBlock = new BatchBlock<T>(cacheSize);
        // action to do when the batch max capacity is met
        // the action can be an async task
        _actionBlock = new ActionBlock<T[]>(ReadBatchBlock);
        _linkedBlock = _batchBlock.LinkTo(_actionBlock);
    }

    public async Task SendAsync(T item) {
        await _batchBlock.SendAsync(item);
    }

    private void ReadBatchBlock(T[] items) {
        foreach (var item in items) {
            Console.WriteLine(item);
        }
    }

    public async ValueTask DisposeAsync() {
        _batchBlock.Complete();
        await _batchBlock.Completion;
        _batchBlock.TriggerBatch();
        _actionBlock.Complete();
        await _actionBlock.Completion;
        _linkedBlock.Dispose();
    }
}

Usage example:使用示例:

await using var cache = new ConcurrentCache<int>(5);

for (int i = 0; i < 12; i++) {
    await cache.SendAsync(i);
    await Task.Delay(200);
}

When the object will be disposed, the remaining batch will be triggered and printed.当对象将被处置时,将触发并打印剩余的批次。


UPDATE更新

As @TheodorZoulias pointed out, if the batch is not filled up and there are no messages for a long time, the messages would be stuck in the BatchBlock.正如@TheodorZoulias 指出的那样,如果批次未填满且长时间没有消息,则消息将卡在 BatchBlock 中。 The solution would be to create a timer to call .TriggerBatch() .解决方案是创建一个计时器来调用.TriggerBatch()

If I apply the lock on the list, streaming data will pause and thats not allowed如果我在列表上应用锁定,流数据将暂停,这是不允许的

You should only hold locks for as short time as possible.你应该只持有锁尽可能短的时间。 In this case that should be to add or remove an item from the list.在这种情况下,应该是从列表中添加或删除一个项目。 You should not hold the lock while adding the data to the database, or any other slow operation.在将数据添加到数据库或任何其他缓慢的操作时,您不应该持有锁。 Taking a uncontested lock is on the order of 25ns , this should only be a problem in very tight loops.获得一个无竞争的锁大约需要 25ns ,这应该只在非常紧凑的循环中才会出现问题。

But an better option would be to use the built in thread safe collections, like BlockingCollection .但更好的选择是使用内置的线程安全集合,如BlockingCollection The later is very convenient since it has methods like GetConsumingEnumerable and CompleteAdding .后者非常方便,因为它具有GetConsumingEnumerableCompleteAdding等方法。 This lets your consumer just use a regular foreach loop to consume items, and the producer can just call CompleteAdding to let the loop exit after all items have been processed.这让您的消费者只需使用常规的 foreach 循环来消费项目,而生产者只需调用 CompleteAdding 让循环在处理完所有项目后退出。

You might also want to take a look at DataFlow .您可能还想看看DataFlow I have not used it myself, but it seem suitable for setting up concurrent processing pipelines.我自己没有使用过它,但它似乎适合设置并发处理管道。

However, before trying to do any kind of concurrent processing you need to be fairly familiar with thread safety and the dangers involved.然而,在尝试进行任何类型的并发处理之前,您需要相当熟悉线程安全和相关的危险。 Thread safety is difficult, and you need to know what is safe and unsafe to do.线程安全很难,你需要知道什么是安全的,什么是不安全的。 You will not always be so lucky to get an exception when you mess up, you might just get missing or incorrect data.当你搞砸时,你不会总是幸运地得到异常,你可能只是丢失或不正确的数据。

I think you should try Parallel.ForEach along with ConcurrentDictionary我认为你应该尝试Parallel.ForEachConcurrentDictionary

var streamingDataList = new ConcurrentDictionary<int, StreamingDataModel>();
Parallel.ForEach(streamingDataBatch, streamingData =>
{                            
  streamingDataList.TryAdd(streamingData.Id,streamingData.Data));
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM