简体   繁体   中英

Completion in TPL Dataflow Loops

I have a problem with determining how to detect completion within a looping TPL Dataflow.

I have a feedback loop in part of a dataflow which is making GET requests to a remote server and processing data responses (transforming these with more dataflow then committing the results).

The data source splits its results into pages of 1000 records, and won't tell me how many pages it has available for me. I have to just keep reading until i get less than a full page of data.

Usually the number of pages is 1, frequently it is up to 10, every now and again we have 1000s.

I have many requests to fetch at the start.
I want to be able to use a pool of threads to deal with this, all of which is fine, I can queue multiple requests for data and request them concurrently. If I stumble across an instance where I need to get a big number of pages I want to be using all of my threads for this. I don't want to be left with one thread churning away whilst the others have finished.

The issue I have is when I drop this logic into dataflow, such as:

//generate initial requests for activity
var request = new TransformManyBlock<int, DataRequest>(cmp => QueueRequests(cmp));

//fetch the initial requests and feedback more requests to our input buffer if we need to
TransformBlock<DataRequest, DataResponse> fetch = null;
fetch = new TransformBlock<DataRequest, DataResponse>(async req =>
{
    var resp = await Fetch(req);

    if (resp.Results.Count == 1000)
        await fetch.SendAsync(QueueAnotherRequest(req));

    return resp;
}
, new ExecutionDataflowBlockOptions {  MaxDegreeOfParallelism = 10 });

//commit each type of request
var commit = new ActionBlock<DataResponse>(async resp => await Commit(resp));

request.LinkTo(fetch);
fetch.LinkTo(commit);

//when are we complete?

QueueRequests produces an IEnumerable<DataRequest> . I queue the next N page requests at once, accepting that this means I send slightly more calls than I need to. DataRequest instances share a LastPage counter to avoid neadlessly making requests that we know are after the last page. All this is fine.

The problem:
If I loop by feeding back more requests into fetch's input buffer as I've shown in this example, then i have a problem with how to signal (or even detect) completion. I can't set completion on fetch from request, as once completion is set I can't feedback any more.

I can monitor for the input and output buffers being empty on fetch, but I think I'd be risking fetch still being busy with a request when I set completion, thus preventing queuing requests for additional pages.

I could do with some way of knowing that fetch is busy (either has input or is busy processing an input).

Am I missing an obvious/straightforward way to solve this?

  • I could loop within fetch, rather than queuing more requests. The problem with that is I want to be able to use a set maximum number of threads to throttle what I'm doing to the remote server. Could a parallel loop inside the block share a scheduler with the block itself and the resulting thread count be controlled via the scheduler?

  • I could create a custom transform block for fetch to handle the completion signalling. Seems like a lot of work for such a simple scenario.

Many thanks for any help offered!

In TPL Dataflow, you can link the blocks with DataflowLinkOptions with specifying the propagation of completion of the block :

request.LinkTo(fetch, new DataflowLinkOptions { PropagateCompletion = true });
fetch.LinkTo(commit, new DataflowLinkOptions { PropagateCompletion = true });

After that, you simply call the Complete() method for the request block, and you're done!

// the completion will be propagated to all the blocks
request.Complete();

The final thing you should use is Completion task property of the last block:

commit.Completion.ContinueWith(t =>
    {
        /* check the status of the task and correctness of the requests handling */
    });

For now I have added a simple busy state counter to the fetch block:-

int fetch_busy = 0;

TransformBlock<DataRequest, DataResponse>  fetch_activity=null;
fetch = new TransformBlock<DataRequest, ActivityResponse>(async req => 
    {
        try
        {
            Interlocked.Increment(ref fetch_busy);
            var resp = await Fetch(req);

            if (resp.Results.Count == 1000)
            {
                await fetch.SendAsync( QueueAnotherRequest(req) );
            }

            Interlocked.Decrement(ref fetch_busy);
            return resp;
        }
        catch (Exception ex)
        {
            Interlocked.Decrement(ref fetch_busy);
            throw ex;
        }
    }
    , new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });

Which I then use to signal complete as follows:-

request.Completion.ContinueWith(async _ =>
    {
        while ( fetch.InputCount > 0 || fetch_busy > 0 )
        {
            await Task.Delay(100);
        }

        fetch.Complete();
    });

Which doesnt seem very elegant, but should work I think.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM