简体   繁体   English

Mule批量提交和记录失败

[英]Mule batch commit and records failures

My current scenario: 我目前的情况:

I have 10000 records as input to batch . 我有10000条记录作为批量输入 As per my understanding, batch is only for record-by-record processing.Hence, i am transforming each record using dataweave component inside batch step(Note: I havenot used any batch-commit) and writing each record to file. 根据我的理解,批处理仅用于逐个记录处理。因此,我在批处理步骤中使用dataweave组件转换每个记录(注意:我没有使用任何批处理提交)并将每个记录写入文件。 The reason for doing record-by-record processing is suppose in any particular record, there is an invalid data, only that particular record gets failed, rest of them will be processed fine. 进行逐条记录处理的原因在于,在任何特定记录中都存在无效数据,只有该特定记录失败,其余部分将被正确处理。

But in many of the blogs I see, they are using a batchcommit(with streaming) with dataweave component. 但在我看到的许多博客中,他们使用带有dataweave组件的batchcommit(带流) So as per my understanding, all the records will be given in one shot to dataweave, and if one record has invalid data, all the 10000 records will get failed(at dataweave). 因此,根据我的理解,所有记录将一次性提供给数据编织,如果一条记录包含无效数据,则所有10000条记录都将失败(在数据编织时)。 Then, the point of record-by-record processing is lost. 然后,逐个记录处理的点丢失。 Is the above assumption correct or am I thinking wrong way?? 以上假设是正确的还是我想错了?

That is the reason I am not using batch Commit. 这就是我不使用批量提交的原因。

Now, as I said am sending each record to a file. 现在,正如我所说,我将每条记录发送到一个文件。 Actually, i do have the requirement of sending each record to 5 different CSV files . 实际上,我确实需要将每条记录发送到5个不同的CSV文件 So, currently I am using Scatter-Gather component inside my BatchStep to send it to five different routes. 所以,目前我在BatchStep中使用Scatter-Gather组件将它发送到五个不同的路由。 在此输入图像描述

As, you can see the image. 因为,你可以看到图像。 the input phase gives a collection of 10000 records. 输入阶段提供10000条记录的集合。 Each record will be send to 5 routes using Scatter-Gather. 每条记录将使用Scatter-Gather发送到5条路线。

Is, the approach I am using is it fine, or any better Design can be followed?? 是,我使用的方法是好的,或任何更好的设计可以遵循?

Also, I have created a 2nd Batch step, to capture ONLY FAILEDRECORDS. 此外,我已经创建了第二个批处理步骤, 捕获FAILEDRECORDS。 But, with the current Design, I am not able to Capture failed records. 但是,使用当前的设计,我无法捕获失败的记录。

SHORT ANSWERS 简短的答案

Is the above assumption correct or am I thinking wrong way?? 以上假设是正确的还是我想错了?

In short, yes you are thinking the wrong way. 简而言之,是的,你在想错误的方法。 Read my loooong explanation with example to understand why, hope you will appreciate it. 阅读我的loooong解释与示例,以了解为什么,希望你会欣赏它。

Also, I have created a 2nd Batch step, to capture ONLY FAILEDRECORDS. 此外,我已经创建了第二个批处理步骤,仅捕获FAILEDRECORDS。 But, with the current Design, I am not able to Capture failed records. 但是,使用当前的设计,我无法捕获失败的记录。

You probably forget to set max-failed-records = "-1" (unlimited) on batch job. 您可能忘记在批处理作业上设置max-failed-records = "-1" (无限制)。 Default is 0, on first failed record batch will return and not execute subsequent steps. 默认值为0,在第一个失败的记录批次将返回并且不执行后续步骤。

Is, the approach I am using is it fine, or any better Design can be followed?? 是,我使用的方法是好的,或任何更好的设计可以遵循?

I think it makes sense if performance is essential for you and you can't cope with the overhead created by doing this operation in sequence. 我认为,如果性能对您来说是必不可少的,那么您无法应对按顺序执行此操作所产生的开销。 If instead you can slow down a bit it could make sense to do this operation in 5 different steps, you will loose parallelism but you can have a better control on failing records especially if using batch commit. 相反,你可以放慢一点,在5个不同的步骤中执行此操作是有意义的,你将失去并行性,但你可以更好地控制失败的记录,特别是如果使用批量提交。

MULE BATCH JOB IN PRACTICE 在实践中做多个工作

I think the best way to explain how it works it trough an example. 我认为通过一个例子解释它是如何工作的最佳方式。

Take in consideration the following case: You have a batch processing configured with max-failed-records = "-1" (no limit). 请考虑以下情况:您的批处理配置为max-failed-records = "-1" (无限制)。

<batch:job name="batch_testBatch" max-failed-records="-1">

In this process we input a collection composed by 6 strings. 在此过程中,我们输入由6个字符串组成的集合。

 <batch:input>
            <set-payload value="#[['record1','record2','record3','record4','record5','record6']]" doc:name="Set Payload"/>
 </batch:input>

The processing is composed by 3 steps" The first step is just a logging of the processing and the second step will instead do a logging and throw an exception on record3 to simulate a failure. 处理由3个步骤组成“第一步只是记录处理,第二步将执行记录并在记录3上抛出异常以模拟故障。

<batch:step name="Batch_Step">
        <logger message="-- processing #[payload] in step 1 --" level="INFO" doc:name="Logger"/>
 </batch:step>
 <batch:step name="Batch_Step2">
     <logger message="-- processing #[payload] in step 2 --" level="INFO" doc:name="Logger"/>
     <scripting:transformer doc:name="Groovy">
         <scripting:script engine="Groovy"><![CDATA[
         if(payload=="record3"){
             throw new java.lang.Exception();
         }
         payload;
         ]]>
         </scripting:script>
     </scripting:transformer>
</batch:step>

The third step will instead contain just the commit with a commit count value of 2. 第三步将仅包含提交计数值为2的提交。

<batch:step name="Batch_Step3">
    <batch:commit size="2" doc:name="Batch Commit">
        <logger message="-- committing #[payload] --" level="INFO" doc:name="Logger"/>
    </batch:commit>
</batch:step>

Now you can follow me in the execution of this batch processing: 现在,您可以在执行此批处理时关注我:

在此输入图像描述

On start all 6 records will be processed by the first step and logging in console would look like this: 启动时,第一步将处理所有6条记录,登录控制台将如下所示:

 -- processing record1 in step 1 --
 -- processing record2 in step 1 --
 -- processing record3 in step 1 --
 -- processing record4 in step 1 --
 -- processing record5 in step 1 --
 -- processing record6 in step 1 --
Step Batch_Step finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch

Now things would be more interesting on step 2 the record 3 will fail because we explicitly throw an exception but despite this the step will continue in processing the other records, here how the log would look like. 现在事情会更有意思在第2步记录3将失败,因为我们显式抛出异常,但尽管如此,该步骤将继续处理其他记录,这里的日志将如何。

-- processing record1 in step 2 --
-- processing record2 in step 2 --
-- processing record3 in step 2 --
com.mulesoft.module.batch.DefaultBatchStep: Found exception processing record on step ...
Stacktrace
....
-- processing record4 in step 2 --
-- processing record5 in step 2 --
-- processing record6 in step 2 --
Step Batch_Step2 finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch

At this point despite a failed record in this step batch processing will continue because the parameter max-failed-records is set to -1 (unlimited) and not to the default value of 0. 此时,尽管此步骤中的记录失败,批处理仍将继续,因为参数max-failed-records设置为-1 (无限制)而不是默认值0。

At this point all the successful records will be passed to step3, this because, by default, the accept-policy parameter of a step is set to NO_FAILURES . 此时所有成功的记录都将传递给step3,这是因为默认情况下,step的accept-policy参数设置为NO_FAILURES (Other possible values are ALL and ONLY_FAILURES ). (其他可能的值为ALLONLY_FAILURES )。

Now the step3 that contains the commit phase with a count equal to 2 will commit the records two by two: 现在,包含计数等于2的提交阶段的step3将逐个提交记录:

-- committing [record1, record2] --
-- committing [record4, record5] --
Step: Step Batch_Step3 finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch
-- committing [record6] --

As you can see this confirms that record3 that was in failure was not passed to the next step and therefore not committed. 正如您所看到的,这确认了失败的record3未传递到下一步,因此未提交。

Starting from this example I think you can imagine and test more complex scenario, for example after commit you could have another step that process only failed records for make aware administrator with a mail of the failure. 从这个例子开始,我认为你可以想象并测试更复杂的场景,例如在提交之后你可以有另一个步骤来处理失败的记录,以便知道管理员发送失败的邮件。 After you can always use external storage to store more advanced info about your records as you can read in my answer to this other question . 您可以随时使用外部存储来存储有关您的记录的更多高级信息,因为您可以阅读我对其他问题的回答

Hope this helps 希望这可以帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM