简体   繁体   中英

TPL DataFlow Workflow

I have just started reading TPL Dataflow and it is really confusing for me. There are so many articles on this topic which I read but I am unable to digest it easily. May be it is difficult and may be I haven't started to grasp the idea.

The reason why I started looking into this is that I wanted to implement a scenario where parallel tasks could be run but in order and found that TPL Dataflow can be used as this.

I am practicing TPL and TPL Dataflow both and am at very beginners level so I need help from experts who could guide me to the right direction. In the test method written by me I have done the following thing,

private void btnTPLDataFlow_Click(object sender, EventArgs e)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();

        txtOutput.Clear();

        ExecutionDataflowBlockOptions execOptions = new ExecutionDataflowBlockOptions();
        execOptions.MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded;

        ActionBlock<string> actionBlock = new ActionBlock<string>(async v =>
        {
            await Task.Delay(200);
            await Task.Factory.StartNew(

                () => txtOutput.Text += v + Environment.NewLine, 
                CancellationToken.None,
                TaskCreationOptions.None,
                scheduler
                );

        }, execOptions);

        for (int i = 1; i < 101; i++)
        {
            actionBlock.Post(i.ToString());
        }

        actionBlock.Complete();

        watch.Stop();
        lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
    }

Now the procedure is parallel and both asynchronous (not freezing my UI) but the output generated is not in order whereas I have read that TPL Dataflow keeps the order of the elements by default. So my guess is that, then the Task which I have created is the culprit and it is not output the string in correct order. Am I right?

If this is the case then how do I make this Asynchronous and in order both?

I have tried to separate the code and tried to distribute the code in to different methods but my this try is failed as only string is output to textbox and nothing else happened.

 private async void btnTPLDataFlow_Click(object sender, EventArgs e)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();

        await TPLDataFlowOperation();

        watch.Stop();
        lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
    }

    public async Task TPLDataFlowOperation()
    {
        var actionBlock = new ActionBlock<int>(async values => txtOutput.Text += await ProcessValues(values) + Environment.NewLine,
            new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded, TaskScheduler = scheduler });

        for (int i = 1; i < 101; i++)
        {
            actionBlock.Post(i);
        }

        actionBlock.Complete();
        await actionBlock.Completion;
    }

    private async Task<string> ProcessValues(int i)
    {
        await Task.Delay(200);
        return "Test " + i;
    }

I know I have written a bad piece of code but this is the first time I am experimenting with TPL Dataflow.

How do I make this Asynchronous and in order?

This is something of a contradiction. You can make concurrent tasks start in order, but you can't really guarantee that they will run or complete in order.

Let's examine your code and see what's happening.

First, you've selected DataflowBlockOptions.Unbounded . This tells TPL Dataflow that it shouldn't limit the number of tasks that it allows to run concurrently. Therefore, each of your tasks will start at more-or-less the same time, in order.

Your asynchronous operation begins with await Task.Delay(200) . This will cause your method to be suspended and then resume after about 200 ms. However, this delay is not exact, and will vary from one invocation to the next. Also, the mechanism by which your code is resumed after the delay may presumably take a variable amount of time. Because of this random variation in the actual delay, then next bit of code to run is now not in order—resulting in the discrepancy you're seeing.

You might find this example interesting. It's a console application to simplify things a bit.

class Program
{
    static void Main(string[] args)
    {
        OutputNumbersWithDataflow();
        OutputNumbersWithParallelLinq();

        Console.ReadLine();
    }

    private static async Task HandleStringAsync(string s)
    {
        await Task.Delay(200);
        Console.WriteLine("Handled {0}.", s);
    }

    private static void OutputNumbersWithDataflow()
    {
        var block = new ActionBlock<string>(
            HandleStringAsync,
            new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });

        for (int i = 0; i < 20; i++)
        {
            block.Post(i.ToString());
        }

        block.Complete();

        block.Completion.Wait();
    }

    private static string HandleString(string s)
    {
        // Perform some computation on s...
        Thread.Sleep(200);

        return s;
    }

    private static void OutputNumbersWithParallelLinq()
    {
        var myNumbers = Enumerable.Range(0, 20).AsParallel()
                                               .AsOrdered()
                                               .WithExecutionMode(ParallelExecutionMode.ForceParallelism)
                                               .WithMergeOptions(ParallelMergeOptions.NotBuffered);

        var processed = from i in myNumbers
                        select HandleString(i.ToString());

        foreach (var s in processed)
        {
            Console.WriteLine(s);
        }
    }
}

The first set of numbers is calculated using a method rather similar to yours—with TPL Dataflow. The numbers are out-of-order.

The second set of numbers, output by OutputNumbersWithParallelLinq() , doesn't use Dataflow at all. It relies on the Parallel LINQ features built into .NET. This runs my HandleString() method on background threads, but keeps the data in order through to the end .

The limitation here is that PLINQ doesn't let you supply an async method. (Well, you could, but it wouldn't give you the desired behavior.) HandleString() is a conventional synchronous method; it just gets executed on a background thread.

And here's a more complex Dataflow example that does preserve the correct order :

private static void OutputNumbersWithDataflowTransformBlock()
{
    Random r = new Random();
    var transformBlock = new TransformBlock<string, string>(
        async s =>
        {
            // Make the delay extra random, just to be sure.
            await Task.Delay(160 + r.Next(80));
            return s;
        },
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });

    // For a GUI application you should also set the
    // scheduler here to make sure the output happens
    // on the correct thread.
    var outputBlock = new ActionBlock<string>(
        s => Console.WriteLine("Handled {0}.", s),
        new ExecutionDataflowBlockOptions
            { 
                SingleProducerConstrained = true,
                MaxDegreeOfParallelism = 1
            });

    transformBlock.LinkTo(outputBlock, new DataflowLinkOptions { PropagateCompletion = true });

    for (int i = 0; i < 20; i++)
    {
        transformBlock.Post(i.ToString());
    }

    transformBlock.Complete();

    outputBlock.Completion.Wait();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM