简体   繁体   English

Pentaho PDI-XML连接步骤的XPath语法

[英]Pentaho PDI - XPath syntax for XML Join step

I am trying to join xml code with Pentaho PDI in a transformation with a "Add XML" step, which add some fields with a "Root XML Element" set as "Node" (like below) and a "XML Join" step. 我试图通过“添加XML”步骤将XML代码与Pentaho PDI结合在一起,该步骤添加了一些字段,这些字段的“根XML元素”设置为“节点”(如下所示),并且具有“ XML连接”步骤。

I want to insert some fields with the same data into each and every "Node". 我想在每个“节点”中插入一些具有相同数据的字段。

<Rootnode>
 <Node>
 <Node>
 <Node>
</Rootnode>

The problem is that, no matter what XPath expression I try, the fields I want to insert are only inserted in the first node. 问题是,无论我尝试使用哪种XPath表达式,要插入的字段都只会插入第一个节点中。 Expressions like "RootNode/Node" or "//Node" are not working. 诸如“ RootNode / Node”或“ // Node”之类的表达式不起作用。

This is the result I get: 这是我得到的结果:

<RootNode>
   <Node>
    <inserted field>
  <Node>
  <Node>
</RootNode>

This is what I want to get: 这就是我想要得到的:

<RootNode>
  <Node>
    <inserted field>
  <Node>
    <inserted field>
  <Node>
    <inserted field>
</RootNode>

Questions: can the XML-join step only join code into one explicitly specified node or is there a XPath-expression I can use in the XML-join step´s XPath Statement input to insert the code into all nodes of my choice? 问题:XML联接步骤可以仅将代码联接到一个明确指定的节点中,还是可以在XML联接步骤的XPath语句输入中使用XPath表达式将代码插入到我选择的所有节点中?

(I dont think a complex join with a comparion field is appropriate because I dont have anything to compare with.) (我不认为使用比较字段进行复杂的连接是适当的,因为我没有什么可比较的。)

Yes, you can read XML nodes in a recurring fashion. 是的,您可以重复读取XML节点。 Check this : How to extract XML node values and from recurring nodes in pentaho? 检查一下: 如何从pentaho的重复节点中提取XML节点值? solution to do it. 解决方案。 You need to properly define the XPATH in the Fields Section. 您需要在“字段”部分中正确定义XPATH。 The use of "."(dot) is important in here. 在这里,“。”(点)的使用很重要。

Now for your question, you need to apply the same logic as above but with slight change. 现在,对于您的问题,您需要应用与上述相同的逻辑,但要稍作更改。 Check the image below: 查看下面的图片:

在此处输入图片说明

What i have done here is : 我在这里所做的是:

  1. Firstly, get the Rootnode structure, which i will use it to join later (the first row of the ktr in the image) 首先,获得Rootnode结构,稍后将使用它来连接(图像中ktr的第一行)

  2. Secondly, i read all the Node in a recuring fashion as explained in the above link. 其次,我按上述链接中的解释以阅读的方式阅读了所有Node Check the image below: 查看下面的图片:

在此处输入图片说明

  1. Add a constant field. 添加一个常量字段。 This is your new Field Node. 这是您的新字段节点。 i have used newField as the node name. 我已经使用newField作为节点名称。

In the "Add XML" part, add this new field to the Node section. 在“添加XML”部分中,将此新字段添加到“ Node部分。

  1. XML Join : In this step, give the joining condition as //Rootnode . XML Join :在此步骤中,将连接条件指定为//Rootnode This will join with the Rootnode coming in from "point #1". 这将与来自“点#1”的Rootnode一起加入。

在此处输入图片说明

  1. Finally generate the XML output (target.xml) . 最后生成XML输出(target.xml)

    在此处输入图片说明

I have placed a gist of the .ktr as explained above. 如上所述,我已经放置了.ktr的要点 Please have a look. 请看一看。

Note: The source.xml file is same as in the question. 注意:source.xml文件与问题中的文件相同。


Handling complex XML structure: 处理复杂的XML结构:

As highlighted in the comments below, the above approach would fail when it comes to handling complex xml structure. 如以下注释中突出显示的那样,上述方法在处理复杂的xml结构时将失败。 So in order to achieve this, we would require to use 'XML Input Stream (StAX)' step, which uses the StAX parser to easily read the complex XML structures. 因此,为了实现此目的,我们将需要使用“ XML输入流(StAX)”步骤,该步骤使用StAX解析器轻松读取复杂的XML结构。

I have documented the same in this blog here along with the gist. 我已经在本博客的要旨中记录了同样的内容。 Please check this out !! 请检查一下! I assume this would work on your data set too. 我认为这也适用于您的数据集。

Hope it helps :) 希望能帮助到你 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM