简体   繁体   中英

Thread safe with Linq and Tasks on a Collection

Given some code like so

public class CustomCollectionClass : Collection<CustomData> {}
public class CustomData
{
    string name;
    bool finished;
    string result;
}

public async Task DoWorkInParallel(CustomCollectionClass collection)
{
    // collection can be retrieved from a DB, may not exist.
    if (collection == null)
    {
        collection = new CustomCollectionClass();
        foreach (var data in myData) 
        { 
            collection.Add(new CustomData()
            {
                name = data.Name;
            });
        }
    }

    // This part doesn't feel safe. Not sure what to do here.
    var processTasks = myData.Select(o => 
        this.DoWorkOnItemInCollection(collection.Single(d => d.name = o.Name))).ToArray();

    await Task.WhenAll(processTasks);

    await SaveModifedCollection(collection);
}

public async Task DoWorkOnItemInCollection(CustomData data)
{
    await DoABunchOfWorkElsewhere();
    // This doesn't feel safe either. Lock here?
    data.finished = true;
    data.result = "Parallel";
}

As I noted in a couple comments inline, it doesn't feel safe for me to do the above, but I'm not sure. I do have a collection of elements that I'd like to assign a unique element to each parallel task and have those tasks be able to modify that single element of the collection based on what work is done. End result being, I wanted to save the collection after individual, different elements have been modified in parallel. If this isn't a safe way to do it, how best would I go about this?

Your code is the right way to do this, assuming starting DoABunchOfWorkElsewhere() multiple times is itself safe.

You don't need to worry about your LINQ query, because it doesn't actually run in parallel. All it does is to invoke DoWorkOnItemInCollection() multiple times. Those invocations may work in parallel (or not, depending on your synchronization context and the implementation of DoABunchOfWorkElsewhere() ), but the code you showed is safe.

Your above code should work without issue. You are passing off one item to each worker thread. I'm not so sure about the async attribute. You might just return a Task, and then in your method do:

public Task DoWorkOnItemInCollection(CustomData data)
{
    return Task.Run(() => {
        DoABunchOfWorkElsewhere().Wait();
        data.finished = true;
        data.result = "Parallel";
    });
}

You might want to be careful, with large amount of items, you could overflow your max thread count with background threads. In this case, c# just deletes your threads, which can be difficult to debug later.

I have done this before, It might be easier if instead of handing the whole collection to some magic linq, rather do a classic consumer problem:

class ParallelWorker<T>
{
    private Action<T> Action;
    private Queue<T> Queue = new Queue<T>();
    private object QueueLock = new object();
    private void DoWork() 
    {
        while(true)
        {
            T item;
            lock(this.QueueLock)
            {
                if(this.Queue.Count == 0) return; //exit thread
                item = this.Queue.DeQueue();
            }

            try { this.Action(item); }
            catch { /*...*/ }
        }
    }

    public void DoParallelWork(IEnumerable<T> items, int maxDegreesOfParallelism, Action<T> action)
    {
        this.Action = action;

        this.Queue.Clear();
        this.Queue.AddRange(items);

        List<Thread> threads = new List<Thread>();
        for(int i = 0; i < items; i++)
        {
            ParameterizedThreadStart threadStart = new ParameterizedThreadStart(DoWork);
            Thread thread = new Thread(threadStart);
            thread.Start();
            threads.Add(thread);
        }

        foreach(Thread thread in threads)
        {
            thread.Join();
        }
    }
}

This was done IDE free, so there may be typos.

I'm going to make the suggestion that you use Microsoft's Reactive Framework (NuGet "Rx-Main") to do this task.

Here's the code:

public void DoWorkInParallel(CustomCollectionClass collection)
{
    var query =
        from x in collection.ToObservable()
        from r in Observable.FromAsync(() => DoWorkOnItemInCollection(x))
        select x;

    query.Subscribe(x => { }, ex => { }, async () =>
    {
        await SaveModifedCollection(collection);
    });
}

Done. That's it. Nothing more.

I have to say though, that when I tried to get your code to run it was full of bugs and issues. I suspect that the code you posted isn't your production code, but an example you wrote specifically for this question. I suggest that you try to make a running compilable example before posting.

Nevertheless, my suggestion should work for you with a little tweaking.

It is multi-threaded and thread-safe. And it does do cleanly save the modified collection when done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM