[英]XSL Merging Multiple XML Records in the Same File
我有一个包含多个记录一个XML文件。 每条记录都有一个键。 我想通过键选择所有记录并将每个记录折叠成一个 XML 记录。 每个 XML 记录中的一些数据是重复的,并且存在空元素。 我还想删除重复项和空标签。
输入
<Data>
<Record>
<Key>12345</Key>
<Number>09095I</Number>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text Field 2</Text_Field_2>
<Author>A1</Author>
<Author>A2</Author>
<Author></Author>
<Author>A1</Author>
<Author>A2</Author>
<Author>A3</Author>
<Author></Author>
<Author>A1</Author>
<Date>10/12/2019</Date>
<Summary>Record 1: Summary 1 Text</Summary>
</Record>
<Record>
<Key>12345</Key>
<Number>09095I</Number>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>A2</Author>
<Author></Author>
<Author>A1</Author>
<Author>A3</Author>
<Author></Author>
<Author>B2</Author>
<Author></Author>
<Author>B2</Author>
<Date>10/12/2019</Date>
<Summary>Record 2: Summary 1 Text</Summary>
</Record>
<Record>
<Key>23456</Key>
<Number>43095I</Number>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>AA2</Author>
<Author></Author>
<Author>AA1</Author>
<Author>AA3</Author>
<Author></Author>
<Author>AA3</Author>
<Author>BB2</Author>
<Author></Author>
<Author>AA3</Author>
<Date>01/12/2020</Date>
<Summary>Record 1: Summary 1 Text</Summary>
</Record>
<Record>
<Key>23456</Key>
<Number>43095I</Number>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>AA1</Author>
<Author>AA3</Author>
<Author></Author>
<Author>CC2</Author>
<Author></Author>
<Author>AA1</Author>
<Author>CC2</Author>
<Date>01/12/2020</Date>
<Summary>Record 2: Summary 1 Text</Summary>
</Record>
<Record>
<Key>23456</Key>
<Number>43095I</Number>
<Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>AA1</Author>
<Author>AA3</Author>
<Author></Author>
<Author>CC2</Author>
<Author></Author>
<Author>AA1</Author>
<Author>CC3</Author>
<Date>01/12/2020</Date>
<Summary>Record 3: Summary 1 Text</Summary>
</Record>
<Record>
<Key>778899</Key>
<Number>998822I</Number>
<Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>A2</Author>
<Author></Author>
<Author>D1</Author>
<Author>D2</Author>
<Author></Author>
<Author>D3</Author>
<Author>D33</Author>
<Author></Author>
<Author>D33</Author>
<Date>10/12/2019</Date>
<Summary>Record 1: Summary 1 Text</Summary>
</Record>
</Data>
期望输出
<Data>
<Record>
<Key>12345</Key>
<Number>09095I</Number>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text Field 2</Text_Field_2>
<Author>A1</Author>
<Author>A2</Author>
<Author>A3</Author>
<Author>B2</Author>
<Date>10/12/2019</Date>
<Summary>Record 1: Summary 1 Text</Summary>
<Summary>Record 2: Summary 1 Text</Summary>
</Record>
<Record>
<Key>23456</Key>
<Number>43095I</Number>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>AA1</Author>
<Author>AA2</Author>
<Author>AA3</Author>
<Author>BB2</Author>
<Author>CC2</Author>
<Author>CC3</Author>
<Date>01/12/2020</Date>
<Summary>Record 1: Summary 1 Text</Summary>
<Summary>Record 2: Summary 1 Text</Summary>
<Summary>Record 3: Summary 1 Text</Summary>
</Record>
<Record>
<Key>778899</Key>
<Number>998822I</Number>
<Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>A2</Author>
<Author>D1</Author>
<Author>D2</Author>
<Author>D3</Author>
<Author>D33</Author>
<Date>10/12/2019</Date>
<Summary>Record 1: Summary 1 Text</Summary>
</Record>
</Data>
我已经使用过这段代码,但我不确定这是正确的路径。
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*" />
<xsl:key name="key" match="Record" use="Key"/>
<xsl:key name="kNamedSiblings" match="*"
use="concat(generate-id(..), '+', name())"/>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="key('kNamedSiblings',
concat(generate-id(..), '+', name())
)/node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(*) and . = '']" />
<xsl:template match="*[generate-id() !=
generate-id(key('kNamedSiblings',
concat(generate-id(..), '+', name()))[1]
)]" />
</xsl:stylesheet>
电流输出
<?xml version="1.0"?>
<Data>
<Record>
<Key>12345</Key>
<Number>09095I</Number>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text Field 2</Text_Field_2>
<Author>A1A2A1A2A3A1</Author>
<Date>10/12/2019</Date>
<Summary>Record 1: Summary 1 Text</Summary>
<Key>12345</Key>
<Number>09095I</Number>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>A2A1A3B2B2</Author>
<Date>10/12/2019</Date>
<Summary>Record 2: Summary 1 Text</Summary>
<Key>23456</Key>
<Number>43095I</Number>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>AA2AA1AA3AA3BB2AA3</Author>
<Date>01/12/2020</Date>
<Field_Text_1>This is the Text 1</Field_Text_1>
<Key>23456</Key>
<Number>43095I</Number>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>AA1AA3CC2AA1CC2</Author>
<Date>01/12/2020</Date>
<Field_Text_1>This is the Text 1</Field_Text_1>
<Key>23456</Key>
<Number>43095I</Number>
<Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>AA1AA3CC2AA1CC3</Author>
<Date>01/12/2020</Date>
<Field_Text_1>This is the Text 1</Field_Text_1>
<Key>778899</Key>
<Number>998822I</Number>
<Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
<Author>A2A3A3A3</Author>
<Date>10/12/2019</Date>
<Field_Text_1>This is the Text 1</Field_Text_1>
</Record>
</Data>
我当前的代码创建了一个大记录,而不是三个单独的记录。 此外,不维护 Author 元素。 相反,创建一个元素并将这些值集中在一起。 我知道这是一个分阶段的解决方案,涉及: - 将多个记录合并为一个键 - 删除空标签 - 删除具有相同值的重复标签 - 维护原始 XML 结构
了解解决方案也会有很大帮助。
因为您的样式表表明您可以使用 XSLT-2.0,所以您可以将您的方法从使用复杂的xsl:key
one 简化为更直接的xsl:for-each-group
one:
<xsl:template match="Data">
<xsl:copy>
<xsl:for-each-group select="Record" group-by="Key">
<xsl:copy>
<xsl:for-each-group select="current-group()/*[normalize-space()]" group-by="concat(name(),.)">
<xsl:sort select="name()" order="ascending" />
<xsl:copy-of select="current-group()[1]" />
</xsl:for-each-group>
</xsl:copy>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
这个模板组的Record
由元素Key
由包括该元素的名称和内容的字符串,然后组它的结果。 其结果按字母顺序排序以将具有相同名称的元素分组。
然后,输出第一个(也是唯一的)元素。
输出是:
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<Record>
<Author>A1</Author>
<Author>A2</Author>
<Author>A3</Author>
<Author>B2</Author>
<Date>10/12/2019</Date>
<Key>12345</Key>
<Number>09095I</Number>
<Summary>Record 1: Summary 1 Text</Summary>
<Summary>Record 2: Summary 1 Text</Summary>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text Field 2</Text_Field_2>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
</Record>
<Record>
<Author>AA2</Author>
<Author>AA1</Author>
<Author>AA3</Author>
<Author>BB2</Author>
<Author>CC2</Author>
<Author>CC3</Author>
<Date>01/12/2020</Date>
<Key>23456</Key>
<Number>43095I</Number>
<Summary>Record 1: Summary 1 Text</Summary>
<Summary>Record 2: Summary 1 Text</Summary>
<Summary>Record 3: Summary 1 Text</Summary>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
</Record>
<Record>
<Author>A2</Author>
<Author>D1</Author>
<Author>D2</Author>
<Author>D3</Author>
<Author>D33</Author>
<Date>10/12/2019</Date>
<Key>778899</Key>
<Number>998822I</Number>
<Summary>Record 1: Summary 1 Text</Summary>
<Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
</Record>
</Data>
除了zx485 很好的 XSLT 2.0 答案之外,这里还有一个带有双键分组的 XSLT 1.0 样式表:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*" />
<xsl:key name="Record-by-Key" match="Record" use="Key"/>
<xsl:key name="Record-by-Key-child-by-name-value" match="Record/*"
use="concat(../Key,'+',name(),'+',.)"/>
<xsl:template match="Data">
<Data>
<xsl:for-each
select="*[generate-id()=generate-id(key('Record-by-Key',Key)[1])]">
<Record>
<xsl:for-each
select="key('Record-by-Key',Key)
/*[generate-id()
=generate-id(
key('Record-by-Key-child-by-name-value',
concat(../Key,'+',name(),'+',.))[1])]">
<xsl:sort select="name()"/>
<xsl:copy-of select="self::*[node()]"/>
</xsl:for-each>
</Record>
</xsl:for-each>
</Data>
</xsl:template>
</xsl:stylesheet>
输出:
<Data>
<Record>
<Author>A1</Author>
<Author>A2</Author>
<Author>A3</Author>
<Author>B2</Author>
<Date>10/12/2019</Date>
<Key>12345</Key>
<Number>09095I</Number>
<Summary>Record 1: Summary 1 Text</Summary>
<Summary>Record 2: Summary 1 Text</Summary>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text Field 2</Text_Field_2>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
</Record>
<Record>
<Author>AA2</Author>
<Author>AA1</Author>
<Author>AA3</Author>
<Author>BB2</Author>
<Author>CC2</Author>
<Author>CC3</Author>
<Date>01/12/2020</Date>
<Key>23456</Key>
<Number>43095I</Number>
<Summary>Record 1: Summary 1 Text</Summary>
<Summary>Record 2: Summary 1 Text</Summary>
<Summary>Record 3: Summary 1 Text</Summary>
<Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
<Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
</Record>
<Record>
<Author>A2</Author>
<Author>D1</Author>
<Author>D2</Author>
<Author>D3</Author>
<Author>D33</Author>
<Date>10/12/2019</Date>
<Key>778899</Key>
<Number>998822I</Number>
<Summary>Record 1: Summary 1 Text</Summary>
<Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
<Text_Field_2>This is Text_Field_2</Text_Field_2>
</Record>
</Data>
附录:也可以按名称强制执行儿童顺序...
由于我们已经有了 XSLT 1 和 XSLT 2 解决方案,为了完整起见,这里使用xsl:merge
的 XSLT 3 解决方案:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="Data">
<xsl:copy>
<xsl:merge>
<xsl:merge-source select="Record">
<xsl:merge-key select="Key"/>
</xsl:merge-source>
<xsl:merge-action>
<xsl:copy>
<xsl:merge>
<xsl:merge-source select="current-merge-group()/*[normalize-space()]" sort-before-merge="yes">
<xsl:merge-key select="name()"/>
<xsl:merge-key select="."/>
</xsl:merge-source>
<xsl:merge-action>
<xsl:copy-of select="."/>
</xsl:merge-action>
</xsl:merge>
</xsl:copy>
</xsl:merge-action>
</xsl:merge>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.