简体   繁体   中英

Using multiple BlockingCollection<T> buffers in implementing pipeline validation

So, I have a requirement to read each record(line) of a large data file and then application various validation rules on each of these lines. So, rather than just apply sequential validation, I decided to see if I could use some pipelining to help speed things up. So, I need to apply the same set of Business validation rules(5 at the moment) to all items in my collection. As there is no need to return output from each validation process, I don't need to worry about passing values from one validation routine to the other. I do however need to make the same data available to all my validation steps and to do this, I came up with coping the same data(records) to 5 different buffers, which will be used by each of the validation stages.

Below is the code I have going. But I have little confidence in this applied and wanted to know if there is a better way of doing this please? I appreciate any help you can give on this please. Thanks in advance.

public static void LoadBuffers(List<BlockingCollection<FlattenedLoadDetail>> outputs,
            BlockingCollection<StudentDetail> students)
        {
            try
            {
                foreach (var student in students)
                {
                    foreach (var stub in student.RecordYearDetails)
                        foreach (var buffer in outputs)
                            buffer.Add(stub);
                }
            }
            finally
            {
                 foreach (var buffer in outputs)
                     buffer.CompleteAdding();
            }

        }


    public void Process(BlockingCollection<StudentRecordDetail> StudentRecords)
    {

        //Validate header record before proceeding

        if(! IsHeaderRecordValid)
            throw new Exception("Invalid Header Record Found.");
        const int buffersize = 20;
        var buffer1 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
        var buffer2 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
        var buffer3 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
        var buffer4 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
        var taskmonitor = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.NotOnCanceled);

        using (var loadUpStartBuffer = taskmonitor.StartNew(() => LoadBuffers(
            new List<BlockingCollection<FlattenedLoadDetail>>
            {buffer1, buffer2, buffer3, buffer4}, StudentRecords)))
        {
            var recordcreateDateValidationStage = taskmonitor.StartNew(() => ValidateRecordCreateDateActivity.Validate(buffer1));
            var uniqueStudentIDValidationStage =
                taskmonitor.StartNew(() => ValidateUniqueStudentIDActivity.Validate(buffer2));
            var SSNNumberRangeValidationStage =
                taskmonitor.StartNew(() => ValidateDocSequenceNumberActivity.Validate(buffer3));
            var SSNRecordNumberMatchValidationStage =
                taskmonitor.StartNew(() => ValidateStudentSSNRecordNumberActivity.Validate(buffer4));

            Task.WaitAll(loadUpStartBuffer, recordcreateDateValidationStage, uniqueStudentIDValidationStage,
                SSNNumberRangeValidationStage, SSNRecordNumberMatchValidationStage);

        }
    }

In fact, if I could tie up the tasks in such a way that once one fails, all the others stop, that would help me a lot but I am a newbie to this pattern and kind of trying to figure out best way to handle this problem I have here. Should I just throw caution to the wind and have each of the validation steps load an output buffer to be passed on to subsequent task? Is that a better way to go with this?

The first question you need to answer for yourself is whether you want to improve latency or throughput.

The strategy you depicted takes a single item and perform parallel calculation on it. This means that an item is serviced very fast, but at the expense of other items that are left waiting for their turn to enter.

Consider an alternative concurrent approach. You can treat the entire validation process as a sequential operation, but simultaneously service more than one item in parallel.

It seems to me that in your case you will benefit more from the latter approach, especially from the perspective of simplicity and since I am guessing that latency is not as important here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM