简体繁体 English

多步骤数据库查找水壶/ Pentaho

[英]Multi step Database lookup Kettle / Pentaho

原文 2016-09-05 13:48:09 3 2 mysql/ database/ pentaho/ lookup/ kettle

I am struggling with something in pentaho and I am not entirely sure if pentaho will be able to handle this problem. 我在pentaho方面有些挣扎，我不确定pentaho是否能够解决此问题。 I will try explain as best I can. 我会尽力解释。

So I have a column in my fact sales called reference number, which I must use to lookup an ID from the dimension table and return the ID. 因此，我在事实销售中有一列称为参考号，我必须使用该列从维度表中查找ID并返回该ID。 But if the first column i did my lookup on in the dimension returns a null , I need to check the same field from fact table lookup in another column in dimension and then another column again. 但是，如果我在维度上进行查找的第一列返回null，则需要在维度的另一列中再次检查事实表查找中的同一字段，然后再次检查另一列。

Is there a way in Pentaho where i can ask it to go through a process of 3 different lookups and return the id if a match in one of those 3 columns exist into the same column in fact sales. 在Pentaho中有一种方法可以让我通过3种不同的查找过程，如果在实际销售中同一列中存在这3列之一，则返回ID。

I'm using MySQL as my database 我正在使用MySQL作为数据库

2 个解决方案

This seems to be somewhat a basic task for Pentaho Data Integration. 这似乎是Pentaho数据集成的一项基本任务。

You could do this manually by performing three Database lookup (or) Dimension lookup/update (depending on type of your dimension) which will store every lookup result in a different field. 您可以通过执行三个 数据库查找 （或） 维查找/更新 （取决于维的类型）来手动执行此操作，这会将每个查找结果存储在不同的字段中。

Then, use a Modified Java Script Value to perform null coalescing - choosing first non-null value and finally if you need a Select Values script to remove three columns with lookup results that are no longer needed. 然后，使用“ 修改后的Java脚本值”执行空合并 -选择第一个非空值，最后，如果需要选择值脚本来删除三列且不再需要查找结果，则使用空值合并。

Below is a screen with a simplified case but I'm sure you can follow the logic behind it and implement it in your case scenario as I've mentioned steps that you could use to achieve the task: 下面是一个带有简化案例的屏幕，但是我确信您可以遵循其背后的逻辑并在您的案例中实现它，因为我提到了可以用来完成任务的步骤：

It would be far faster to use a filter step. 使用过滤步骤将更快。 If the looked up first value is null filter on the null to break the null stream to a second lookup and the found data to your "found" step. 如果查找的第一个值是null，则对null进行过滤，以将null流拆分为第二个查找，并将找到的数据拆分到“ found”步骤。 Rinse and repeat till you have what you want. 冲洗并重复直到您拥有所需的内容。

Then use a multiway Merge Join to stitch your dataset back together. 然后使用多路合并合并将数据集缝合在一起。 The merge join step might not even be necessary to be honest if the resulting streams are all identical which you can achieve with some select steps if they're not. 老实说，如果结果流都是相同的，那么甚至不需要合并合并步骤，如果不相同，则可以通过某些选择步骤来实现。 No need to look everything up at once and this really is not taking advantage of the parallel processing at all to look it all up once and evaluate. 无需一次查找所有内容，这实际上根本没有利用并行处理来一次查找所有内容并进行评估。

Does that help? 有帮助吗？