简体   繁体   English

使用多个BlockingCollection <T> 实现管道验证的缓冲区

[英]Using multiple BlockingCollection<T> buffers in implementing pipeline validation

So, I have a requirement to read each record(line) of a large data file and then application various validation rules on each of these lines. 因此,我需要读取一个大数据文件的每个记录(行),然后在这些行中的每一个上应用各种验证规则。 So, rather than just apply sequential validation, I decided to see if I could use some pipelining to help speed things up. 因此,我决定不只是应用顺序验证,而决定是否可以使用一些流水线来帮助加快处理速度。 So, I need to apply the same set of Business validation rules(5 at the moment) to all items in my collection. 因此,我需要对集​​合中的所有项目应用相同的一组业务验证规则(目前为5个)。 As there is no need to return output from each validation process, I don't need to worry about passing values from one validation routine to the other. 由于不需要从每个验证过程返回输出,因此我不必担心将值从一个验证例程传递到另一个验证例程。 I do however need to make the same data available to all my validation steps and to do this, I came up with coping the same data(records) to 5 different buffers, which will be used by each of the validation stages. 但是,我确实需要使所有验证步骤都可以使用相同的数据,并且要做到这一点,我想出了将相同的数据(记录)处理到5个不同的缓冲区的方法,每个验证阶段都将使用这些缓冲区。

Below is the code I have going. 以下是我要执行的代码。 But I have little confidence in this applied and wanted to know if there is a better way of doing this please? 但是我对此应用缺乏信心,想知道是否有更好的方法可以做到这一点? I appreciate any help you can give on this please. 感谢您可以为此提供的任何帮助。 Thanks in advance. 提前致谢。

public static void LoadBuffers(List<BlockingCollection<FlattenedLoadDetail>> outputs,
            BlockingCollection<StudentDetail> students)
        {
            try
            {
                foreach (var student in students)
                {
                    foreach (var stub in student.RecordYearDetails)
                        foreach (var buffer in outputs)
                            buffer.Add(stub);
                }
            }
            finally
            {
                 foreach (var buffer in outputs)
                     buffer.CompleteAdding();
            }

        }


    public void Process(BlockingCollection<StudentRecordDetail> StudentRecords)
    {

        //Validate header record before proceeding

        if(! IsHeaderRecordValid)
            throw new Exception("Invalid Header Record Found.");
        const int buffersize = 20;
        var buffer1 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
        var buffer2 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
        var buffer3 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
        var buffer4 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
        var taskmonitor = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.NotOnCanceled);

        using (var loadUpStartBuffer = taskmonitor.StartNew(() => LoadBuffers(
            new List<BlockingCollection<FlattenedLoadDetail>>
            {buffer1, buffer2, buffer3, buffer4}, StudentRecords)))
        {
            var recordcreateDateValidationStage = taskmonitor.StartNew(() => ValidateRecordCreateDateActivity.Validate(buffer1));
            var uniqueStudentIDValidationStage =
                taskmonitor.StartNew(() => ValidateUniqueStudentIDActivity.Validate(buffer2));
            var SSNNumberRangeValidationStage =
                taskmonitor.StartNew(() => ValidateDocSequenceNumberActivity.Validate(buffer3));
            var SSNRecordNumberMatchValidationStage =
                taskmonitor.StartNew(() => ValidateStudentSSNRecordNumberActivity.Validate(buffer4));

            Task.WaitAll(loadUpStartBuffer, recordcreateDateValidationStage, uniqueStudentIDValidationStage,
                SSNNumberRangeValidationStage, SSNRecordNumberMatchValidationStage);

        }
    }

In fact, if I could tie up the tasks in such a way that once one fails, all the others stop, that would help me a lot but I am a newbie to this pattern and kind of trying to figure out best way to handle this problem I have here. 实际上,如果我能够以这样一种方式来捆绑任务:一旦一个失败,所有其他任务都停止了,那将对我有很大帮助,但是我是这种模式的新手,并且可以尝试找出解决该问题的最佳方法问题在这里。 Should I just throw caution to the wind and have each of the validation steps load an output buffer to be passed on to subsequent task? 我是否应该谨慎行事,并让每个验证步骤都加载要传递给后续任务的输出缓冲区? Is that a better way to go with this? 这是更好的方法吗?

The first question you need to answer for yourself is whether you want to improve latency or throughput. 您需要自己回答的第一个问题是您是否要提高延迟或吞吐量。

The strategy you depicted takes a single item and perform parallel calculation on it. 您所描述的策略采用单个项目并对其进行并行计算。 This means that an item is serviced very fast, but at the expense of other items that are left waiting for their turn to enter. 这意味着一个物品的维修速度非常快,但是却以其他物品的轮换为代价。

Consider an alternative concurrent approach. 考虑另一种并行方法。 You can treat the entire validation process as a sequential operation, but simultaneously service more than one item in parallel. 您可以将整个验证过程视为顺序操作,但可以同时并行处理多个项目。

It seems to me that in your case you will benefit more from the latter approach, especially from the perspective of simplicity and since I am guessing that latency is not as important here. 在我看来,从您的情况来看,您将从后一种方法中受益匪浅,尤其是从简单性的角度出发,并且由于我猜想延迟在这里并不那么重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM