简体   繁体   English

如何在Pentaho Data Inegration(釜)中不返回匹配行?

[英]How to return no matched row in Pentaho Data Inegration (Kettle)?

I look for a solution to perform SSIS lookup in Pentaho Data Integration. 我正在寻找一种在Pentaho Data Integration中执行SSIS查找的解决方案。 I'll try to explain with an exemple : I have two tables A and B. Here , data in table A : 1 2 3 4 5 Here , data in table B: 3 4 5 6 7 After my process : All rows in A and not in B ==> will be insert to B All rows in B and not in A ==> will be deleted to A So , here my final Table B : 3 4 5 1 2 someone can help me please ? 我将尝试举例说明:我有两个表A和B。这里,表A中的数据:1 2 3 4 5这里,表B中的数据:3 4 5 6 7经过我的过程:A中的所有行而不是B ==>将插入到B中,而不是A中的所有行==>都将删除到A中,那么我的最终表B:3 4 5 1 2有人可以帮我吗?

There is indeed a step that does this, but it doesn't do it alone. 确实有一个步骤可以做到这一点,但它并不是一个人就能做到。 It's the Merge rows(diff) step and it has some requirements. 这是“ Merge rows(diff)步骤,它有一些要求。 In your case, A is the "compare" table and B is the "reference" table. 在您的情况下,A是“比较”表,而B是“参考”表。

First of all, both inputs (rows from A and B in your case, Dev and Prod in mine) need to be sorted by a key value. 首先,两个输入(在您的情况下为A和B的行,在我的情况下为Dev和Prod)都需要按键值进行排序。 In the step you specify the key fields to match on, and then the value fields to compare. 在该步骤中,您指定要匹配的键字段,然后指定要比较的值字段。 The step adds a field to the output (by default called 'flagfield'). 该步骤将一个字段添加到输出中(默认情况下称为“ flagfield”)。 After comparing each row, this field is given one of four values: "new", "changed", "deleted", or "identical". 比较每行之后,为该字段提供以下四个值之一:“新”,“已更改”,“已删除”或“相同”。 Note in my example below I have explicit sort steps. 请注意,在下面的示例中,我有明确的排序步骤。 That's because the sorting scheme of my database is not compatible with PDI's, and for this step to work, your data must be in PDI's sort order. 那是因为我的数据库的排序方案与PDI的不兼容,并且要执行此步骤,您的数据必须按照PDI的排序顺序。 You may not need these. 您可能不需要这些。

You can follow this with a Synchronize after merge step to apply the identified changes. 您可以Synchronize after merge步骤Synchronize after merge此步骤,以应用识别的更改。 In this step you specify the flagfield and the values that correspond to insert, update, and delete. 在此步骤中,您将指定标志字段以及与插入,更新和删除相对应的值。 FYI these are specified on the "Advanced" tab, and they must be filled out for the step to work. 仅供参考,这些是在“高级”标签上指定的,必须填写它们才能使该步骤起作用。

For a very small table like your example, I would favor just a truncate and full load with a Table output step, but if your tables are large and the number of changes relatively small (<= ~25%) and replication is not available, this step is usually the way to go. 对于像您的示例这样的非常小的表,我希望仅使用Table output步骤进行截断和满载,但是如果表很大并且更改数量相对较小(<=〜25%)并且复制不可用,这一步通常是要走的路。

在此处输入图片说明

In Pentaho direct step is not availble. 在Pentaho中,直接步骤不可用。 There are so many ways to do these. 有很多方法可以做到这些。

=> Writing sql's to achieve your solution. =>编写sql即可实现您的解决方案。 If you write sql's execution speed also faster. 如果编写sql的执行速度也更快。

=> Using filter step also you can acheive. =>使用过滤步骤也可以实现。

Thank you. 谢谢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM