简体   繁体   中英

C# Multi-threading, wait for all task to complete in a situation when new tasks are being constantly added

I have a situation where new tasks are being constantly generated and added to a ConcurrentBag<Tasks> .

I need to wait all tasks to complete.

Waiting for all the tasks in the ConcurrentBag via WaitAll is not enough as the number of tasks would have grown while the previous wait is completed.

At the moment I am waiting it in the following way:

private void WaitAllTasks()
{
    while (true)
    {
        int countAtStart = _tasks.Count();
        Task.WaitAll(_tasks.ToArray());

        int countAtEnd = _tasks.Count();
        if (countAtStart == countAtEnd)
        {
            break;
        }

        #if DEBUG
        if (_tasks.Count() > 100)
        {
            tokenSource.Cancel();
            break;
        }
        #endif
    }
}

I am not very happy with the while(true) solution.

Can anyone suggest a better more efficient way to do this (without having to pool the processor constantly with a while(true) )


Additional context information as requested in the comments. I don't think though this is relevant to the question.

This piece of code is used in a web crawler. The crawler scans page content and looks for two type of information. Data Pages and Link Pages. Data pages will be scanned and data will be collected, Link Pages will be scanned and more links will be collected from them.

As each of the tasks carry-on the activities and find more links, they add the links to an EventList . There is an event OnAdd on the list (code below) that is used to trigger other task to scan the newly added URLs. And so forth.

The job is complete when there are no more running tasks (so no more links will be added) and all items have been processed.

public IEventList<ISearchStatus> CurrentLinks { get; private set; }
public IEventList<IDataStatus> CurrentData { get; private set; }
public IEventList<System.Dynamic.ExpandoObject> ResultData { get; set; }
private readonly ConcurrentBag<Task> _tasks = new ConcurrentBag<Task>();

private readonly CancellationTokenSource tokenSource = new CancellationTokenSource();
private readonly CancellationToken token;

public void Search(ISearchDefinition search)
{
    CurrentLinks.OnAdd += UrlAdded;
    CurrentData.OnAdd += DataUrlAdded;

    var status = new SearchStatus(search);

    CurrentLinks.Add(status);

    WaitAllTasks();

    _exporter.Export(ResultData as IList<System.Dynamic.ExpandoObject>);
}

private void DataUrlAdded(object o, EventArgs e)
{
    var item = o as IDataStatus;
    if (item == null)
    {
        return;
    }

    _tasks.Add(Task.Factory.StartNew(() => ProcessObjectSearch(item), token));
}

private void UrlAdded(object o, EventArgs e)
{
    var item = o as ISearchStatus;
    if (item==null)
    {
        return;
    }

    _tasks.Add(Task.Factory.StartNew(() => ProcessFollow(item), token));
    _tasks.Add(Task.Factory.StartNew(() => ProcessData(item), token));
}

 public class EventList<T> : List<T>, IEventList<T>
{
    public EventHandler OnAdd { get; set; }
    private readonly object locker = new object();
    public new void Add(T item)
    {
        //lock (locker)
        {
            base.Add(item);
        }
        OnAdd?.Invoke(item, null);
    }

    public new bool Contains(T item)
    {
        //lock (locker) 
        {
            return base.Contains(item);
        }
    }
}

Why not write one function that yields your tasks as necessary, when they are created? This way you can just use Task.WhenAll to wait for them to complete or, have I missed the point? See this working here .

using System;
using System.Threading.Tasks;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        try
        {
            Task.WhenAll(GetLazilyGeneratedSequenceOfTasks()).Wait();   
            Console.WriteLine("Fisnished.");
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);  
        }   
    }

    public static IEnumerable<Task> GetLazilyGeneratedSequenceOfTasks()
    {
        var random =  new Random();
        var finished = false;
        while (!finished)
        {
            var n = random.Next(1, 2001);
            if (n < 50)
            {
                finished = true;
            }

            if (n > 499)
            {
                yield return Task.Delay(n);
            }

            Task.Delay(20).Wait();              
        }

        yield break;
    }
}

Alternatively, if your question is not as trivial as my answer may suggest, I'd consider a mesh with TPL Dataflow . The combination of a BufferBlock and an ActionBlock would get you very close to what you need. You could start here .


Either way, I'd suggest you want to include a provision for accepting a CancellationToken or two.

I think that this task can be done with TPL Dataflow library with very basic setup. You'll need a TransformManyBlock<Task, IEnumerable<DataTask>> and an ActionBlock (may be more of them) for actual data processing, like this:

// queue for a new urls to parse
var buffer = new BufferBlock<ParseTask>();

// parser itself, returns many data tasks from one url
// similar to LINQ.SelectMany method
var transform = new TransformManyBlock<ParseTask, DataTask>(task =>
{
    // get all the additional urls to parse
    var parsedLinks = GetLinkTasks(task);
    // get all the data to parse
    var parsedData = GetDataTasks(task);

    // setup additional links to be parsed
    foreach (var parsedLink in parsedLinks)
    {
        buffer.Post(parsedLink);
    }

    // return all the data to be processed
    return parsedData;
});

// actual data processing
var consumer = new ActionBlock<DataTask>(s => ProcessData(s));

After that you need to link the blocks between each over:

buffer.LinkTo(transform, new DataflowLinkOptions { PropagateCompletion = true });
transform.LinkTo(consumer, new DataflowLinkOptions { PropagateCompletion = true });

Now you have a nice pipeline which will execute in background. At the moment you realize that everything you need is parsed, you simply call the Complete method for a block so it stops accepting news messages. After the buffer became empty, it will propagate the completion down the pipeline to transform block, which will propagate it down to consumer(s), and you need to wait for Completion task:

// no additional links would be accepted
buffer.Complete();
// after all the tasks are done, this will get fired
await consumer.Completion;

You can check the moment for a completion, for example, if both buffer ' Count property and transform ' InputCount and transform ' CurrentDegreeOfParallelism (this is internal property for the TransformManyBlock ) are equal to 0 .

However, I suggested you to implement some additional logic here to determine current transformers number, as using the internal logic isn't a great solution. As for cancelling the pipeline, you can create a TPL block with a CancellationToken , either the one for all, or a dedicated for each block, getting the cancellation out of box.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM