简体   繁体   English

XSLT如何根据特定子节点的内容向父节点添加属性

[英]XSLT How to add an attribute to parent node based on a specific child node's content

I'm new to XSLT. 我是XSLT的新手。 I need to aggregate some information of the contents of PDF files given in xml through pdf2txt.py. 我需要通过pdf2txt.py汇总xml中提供的PDF文件内容的一些信息。 Some of the PDF are large (+100MB) and even larger is their xml output. 一些PDF很大(+ 100MB),甚至更大的是xml输出。 Hence, it seems more efficient (time) to process all in memory piping the output through several xsltproc commands in order to prune the xml code from unneeded content. 因此,通过几个xsltproc命令处理所有存储在管道中的输出,以便从不需要的内容中修剪xml代码,似乎效率更高(时间)。 Among other things there is an xml node with a text content that I would like to convert into an attribute of its parent node. 除其他事项外,还有一个XML节点,其文本内容我想将其转换为其父节点的属性。

More specifically, I have the following input XML file structure: 更具体地说,我具有以下输入XML文件结构:

<?xml version="1.0"?>
<pages>
  <page id="1">
    <text bbox="2831.881,1170.243,3124.184,1192.535">text11</text>
    <text bbox="3149.641,1291.323,3318.336,1313.615">sheet</text>
    <text bbox="3149.641,1291.323,3318.336,1313.615">P793</text>
  </page>
  <page id="2">
    <text bbox="2831.881,1170.243,3124.184,1192.535">text21</text>
    <text bbox="3149.641,1291.323,3318.336,1313.615">sheet:</text>
    <text bbox="3149.641,1291.323,3318.336,1313.615">S234</text>
  </page>
</pages>

and I would like to transform it into (notice the added page attribute): 并且我想将其转换为 (注意添加的page属性):

<?xml version="1.0"?>
<pages>
  <page id="1" sheet="P793">
    <text bbox="2831.881,1170.243,3124.184,1192.535">text11</text>
    <text bbox="3149.641,1291.323,3318.336,1313.615">sheet</text>
    <text bbox="3149.641,1291.323,3318.336,1313.615">P793</text>
  </page>
  <page id="2" sheet="S234">
    <text bbox="2831.881,1170.243,3124.184,1192.535">text21</text>
    <text bbox="3149.641,1291.323,3318.336,1313.615">sheet</text>
    <text bbox="3149.641,1291.323,3318.336,1313.615">S234</text>
  </page>
</pages>

Following the example in XSLT: Add Attribute to parent based on child attribute value containing a specific string I have tried with the following XSL stylesheet : 遵循XSLT中的示例:基于包含特定字符串的子属性值将属性添加到父项,我尝试使用以下XSL样式表

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:preserve-space elements="text"/>

<xsl:template match="/">
 <xsl:apply-templates/>
</xsl:template>

<xsl:template match="page">
   <xsl:apply-templates select="@*"/>
  <xsl:variable name="sheet" select="//text[contains(text(),'sheet')]/following::text[string-length()>3]"/>
  <xsl:attribute name="sheet"><xsl:copy-of select="$sheet" /></xsl:attribute>
   <xsl:apply-templates select="node()"/>
</xsl:template>

<xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

However, I get no output. 但是,我没有输出。 I tried substituting the variable trick with a for-each loop over the text nodes in order to define the new page attribute, but then I get the error I'm trying to add an attribute after adding child nodes, something I don't quite understand. 我尝试通过在文本节点上进行for-each循环替换变量技巧来定义新的page属性,但随后出现错误,我尝试在添加子节点后尝试添加属性,这是我不完全满意的了解。

Is it possible to "look-ahead" for such a node value and add an attribute to the parent node using it? 是否可以“预见”这样的节点值,并使用它向父节点添加属性? How? 怎么样? Why does my stylesheet doesn't give any output? 为什么我的样式表不提供任何输出?

My final goal is to remove as well the XML text lines corresponding to the sheet nodes and their labels, but this seems simpler to solve than this look-ahead, attribute copy and I'll deal later with it. 我的最终目标是也删除与工作表节点及其标签相对应的XML文本行,但这似乎比预读,属性复制更容易解决,我将在以后处理。

Thanks! 谢谢!

EDIT: I simplified my input case and xsl stylesheet. 编辑:我简化了我的输入情况和xsl样式表。 Actually, with the examples I provided here there is an output, but it is an error output: 实际上,在这里提供的示例中,有一个输出,但它是一个错误输出:

runtime error: file test.xsl line 18 element copy
Attribute nodes must be added before any child nodes to an element.
runtime error: file test.xsl line 13 element attribute
xsl:attribute: Cannot add attributes to an element if children have been already added to the element.
no result for -

And this is an error I haven't figure out yet how to deal with. 这是我尚未弄清楚如何处理的错误。 Googling didn't help. 谷歌搜索没有帮助。

The main problem is in the template matching page , where the first thing you do is create an attribute 主要问题是在模板匹配page ,首先要做的是创建一个属性

<xsl:template match="page">
    <xsl:apply-templates select="@*"/>

But you have not actually copied the page element first, so it will try to add the attribute, and child text nodes, onto the previous element that was created; 但是您实际上并没有首先复制page元素,因此它将尝试将属性和子text节点添加到之前创建的元素上。 namely pages . pages For the second page element matched it will try to do the same thing, but error because you cannot add attributes to elements which have already had child elements added. 对于匹配的第二个page元素,它将尝试执行相同的操作,但是会出错,因为您无法将属性添加到已经添加了子元素的元素。

Try this template instead 试试这个模板

<xsl:template match="page">
    <xsl:copy>
       <xsl:apply-templates select="@*"/>
        <xsl:variable name="sheet" select="text[contains(text(),'sheet')]/following-sibling::text[string-length()>3]"/>
        <xsl:attribute name="sheet"><xsl:value-of select="$sheet" /></xsl:attribute>
        <xsl:apply-templates select="node()"/>
    </xsl:copy>
</xsl:template>

Note the change in the expression for sheet . 注意sheet的表达式中的变化。 Previously you were starting it with //text , which will find the very first text element anywhere in the document. 以前,您从//text开始,它将在文档中的任何位置找到第一个text元素。 The // need to be removed, to make it relative to the current page node. //需要删除//以使其相对于当前page节点。

Additionally, note the use of following-sibling , rather than following so that it restricts it self to only the sibling nodes under the current page element. 此外,请注意使用following-sibling ,而不是following ,这样会限制它自身仅根据目前的兄弟节点page元素。

Finally, is it only the immediately following-sibling you want to access? 最后,仅是您要访问的紧随其后的兄弟? If so, you might need to add an extra condition to the expression 如果是这样,您可能需要在表达式中添加一个额外条件

<xsl:variable name="sheet" select="text[contains(text(),'sheet')]/following-sibling::text[1][string-length()>3]"/>

Or perhaps reverse the logic, and write it this way instead 或者也许颠倒了逻辑,而是这样写

<xsl:variable name="sheet" select="text[string-length()>3][contains(preceding-sibling::text[1],'sheet')]"/>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM