简体   繁体   English

C#多线程,在不断添加新任务的情况下,等待所有任务完成

[英]C# Multi-threading, wait for all task to complete in a situation when new tasks are being constantly added

I have a situation where new tasks are being constantly generated and added to a ConcurrentBag<Tasks> . 我遇到的情况是不断生成新任务并将其添加到ConcurrentBag<Tasks>

I need to wait all tasks to complete. 我需要等待所有任务完成。

Waiting for all the tasks in the ConcurrentBag via WaitAll is not enough as the number of tasks would have grown while the previous wait is completed. 仅通过WaitAll等待ConcurrentBag中的所有任务是不够的,因为在前一次等待完成时,任务数量会增加。

At the moment I am waiting it in the following way: 目前,我正在按以下方式等待:

private void WaitAllTasks()
{
    while (true)
    {
        int countAtStart = _tasks.Count();
        Task.WaitAll(_tasks.ToArray());

        int countAtEnd = _tasks.Count();
        if (countAtStart == countAtEnd)
        {
            break;
        }

        #if DEBUG
        if (_tasks.Count() > 100)
        {
            tokenSource.Cancel();
            break;
        }
        #endif
    }
}

I am not very happy with the while(true) solution. 我对while(true)解决方案不是很满意。

Can anyone suggest a better more efficient way to do this (without having to pool the processor constantly with a while(true) ) 任何人都可以提出一种更好的,更有效的方法来执行此操作(而不必使用while(true)不断地合并处理器)


Additional context information as requested in the comments. 注释中要求的其他上下文信息。 I don't think though this is relevant to the question. 我认为这与问题无关。

This piece of code is used in a web crawler. 这段代码在Web搜寻器中使用。 The crawler scans page content and looks for two type of information. 搜寻器扫描页面内容并查找两种类型的信息。 Data Pages and Link Pages. 数据页和链接页。 Data pages will be scanned and data will be collected, Link Pages will be scanned and more links will be collected from them. 将扫描数据页面并收集数据,将扫描链接页面并从中收集更多链接。

As each of the tasks carry-on the activities and find more links, they add the links to an EventList . 当每个任务进行活动并查找更多链接时,它们会将链接添加到EventList There is an event OnAdd on the list (code below) that is used to trigger other task to scan the newly added URLs. 列表上有一个事件OnAdd (下面的代码),该事件用于触发其他任务来扫描新添加的URL。 And so forth. 依此类推。

The job is complete when there are no more running tasks (so no more links will be added) and all items have been processed. 当没有更多正在运行的任务(因此不再添加任何链接)并且所有项目均已处理时,作业即完成。

public IEventList<ISearchStatus> CurrentLinks { get; private set; }
public IEventList<IDataStatus> CurrentData { get; private set; }
public IEventList<System.Dynamic.ExpandoObject> ResultData { get; set; }
private readonly ConcurrentBag<Task> _tasks = new ConcurrentBag<Task>();

private readonly CancellationTokenSource tokenSource = new CancellationTokenSource();
private readonly CancellationToken token;

public void Search(ISearchDefinition search)
{
    CurrentLinks.OnAdd += UrlAdded;
    CurrentData.OnAdd += DataUrlAdded;

    var status = new SearchStatus(search);

    CurrentLinks.Add(status);

    WaitAllTasks();

    _exporter.Export(ResultData as IList<System.Dynamic.ExpandoObject>);
}

private void DataUrlAdded(object o, EventArgs e)
{
    var item = o as IDataStatus;
    if (item == null)
    {
        return;
    }

    _tasks.Add(Task.Factory.StartNew(() => ProcessObjectSearch(item), token));
}

private void UrlAdded(object o, EventArgs e)
{
    var item = o as ISearchStatus;
    if (item==null)
    {
        return;
    }

    _tasks.Add(Task.Factory.StartNew(() => ProcessFollow(item), token));
    _tasks.Add(Task.Factory.StartNew(() => ProcessData(item), token));
}

 public class EventList<T> : List<T>, IEventList<T>
{
    public EventHandler OnAdd { get; set; }
    private readonly object locker = new object();
    public new void Add(T item)
    {
        //lock (locker)
        {
            base.Add(item);
        }
        OnAdd?.Invoke(item, null);
    }

    public new bool Contains(T item)
    {
        //lock (locker) 
        {
            return base.Contains(item);
        }
    }
}

Why not write one function that yields your tasks as necessary, when they are created? 创建任务时,为什么不编写一个可以根据需要生成任务的函数呢? This way you can just use Task.WhenAll to wait for them to complete or, have I missed the point? 这样,您可以只使用Task.WhenAll等待它们完成,或者我错过了重点吗? See this working here . 看到这里工作

using System;
using System.Threading.Tasks;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        try
        {
            Task.WhenAll(GetLazilyGeneratedSequenceOfTasks()).Wait();   
            Console.WriteLine("Fisnished.");
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);  
        }   
    }

    public static IEnumerable<Task> GetLazilyGeneratedSequenceOfTasks()
    {
        var random =  new Random();
        var finished = false;
        while (!finished)
        {
            var n = random.Next(1, 2001);
            if (n < 50)
            {
                finished = true;
            }

            if (n > 499)
            {
                yield return Task.Delay(n);
            }

            Task.Delay(20).Wait();              
        }

        yield break;
    }
}

Alternatively, if your question is not as trivial as my answer may suggest, I'd consider a mesh with TPL Dataflow . 或者,如果您的问题不像我的答案所建议的那么琐碎,则可以考虑使用TPL Dataflow进行网格划分 The combination of a BufferBlock and an ActionBlock would get you very close to what you need. BufferBlockActionBlock的组合将使您非常接近所需的内容。 You could start here . 你可以从这里开始


Either way, I'd suggest you want to include a provision for accepting a CancellationToken or two. 无论哪种方式,我建议您都包含一个接受CancellationToken或两个的规定。

I think that this task can be done with TPL Dataflow library with very basic setup. 我认为可以使用非常基本的设置使用TPL Dataflow库完成此任务。 You'll need a TransformManyBlock<Task, IEnumerable<DataTask>> and an ActionBlock (may be more of them) for actual data processing, like this: 您需要一个TransformManyBlock<Task, IEnumerable<DataTask>>和一个ActionBlock (可能更多)来进行实际的数据处理,如下所示:

// queue for a new urls to parse
var buffer = new BufferBlock<ParseTask>();

// parser itself, returns many data tasks from one url
// similar to LINQ.SelectMany method
var transform = new TransformManyBlock<ParseTask, DataTask>(task =>
{
    // get all the additional urls to parse
    var parsedLinks = GetLinkTasks(task);
    // get all the data to parse
    var parsedData = GetDataTasks(task);

    // setup additional links to be parsed
    foreach (var parsedLink in parsedLinks)
    {
        buffer.Post(parsedLink);
    }

    // return all the data to be processed
    return parsedData;
});

// actual data processing
var consumer = new ActionBlock<DataTask>(s => ProcessData(s));

After that you need to link the blocks between each over: 之后,您需要在每个链接之间链接这些块:

buffer.LinkTo(transform, new DataflowLinkOptions { PropagateCompletion = true });
transform.LinkTo(consumer, new DataflowLinkOptions { PropagateCompletion = true });

Now you have a nice pipeline which will execute in background. 现在您有了一个不错的管道,它将在后台执行。 At the moment you realize that everything you need is parsed, you simply call the Complete method for a block so it stops accepting news messages. 当您意识到您需要的所有内容都已被解析时,您只需为一个块调用Complete方法,使其停止接受新闻消息即可。 After the buffer became empty, it will propagate the completion down the pipeline to transform block, which will propagate it down to consumer(s), and you need to wait for Completion task: buffer变空后,它将完成信息沿管道传播到transform块,然后将其传播给使用者,您需要等待Completion任务:

// no additional links would be accepted
buffer.Complete();
// after all the tasks are done, this will get fired
await consumer.Completion;

You can check the moment for a completion, for example, if both buffer ' Count property and transform ' InputCount and transform ' CurrentDegreeOfParallelism (this is internal property for the TransformManyBlock ) are equal to 0 . 您可以检查完成的时刻,例如,如果bufferCount属性 transformInputCount transformCurrentDegreeOfParallelism (这是TransformManyBlock内部属性)都等于0

However, I suggested you to implement some additional logic here to determine current transformers number, as using the internal logic isn't a great solution. 但是,我建议您在此处实施一些其他逻辑来确定电流互感器的数量,因为使用内部逻辑并不是一个很好的解决方案。 As for cancelling the pipeline, you can create a TPL block with a CancellationToken , either the one for all, or a dedicated for each block, getting the cancellation out of box. 至于取消管​​道,您可以创建一个带有CancellationTokenTPL块,该TPL块可以全部CancellationToken ,也可以每个块专用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM