简体   繁体   English

XSL 合并同一文件中的多个 XML 记录

[英]XSL Merging Multiple XML Records in the Same File

I have a single XML file containing multiple records.我有一个包含多个记录一个XML文件。 Each record has a key.每条记录都有一个键。 I'd like to select all of the records by key and collapse each into one XML record.我想通过键选择所有记录并将每个记录折叠成一个 XML 记录。 Some of the data in each XML record is repeated and there are empty elements.每个 XML 记录中的一些数据是重复的,并且存在空元素。 I'd also like to remove duplicates and empty tags.我还想删除重复项和空标签。

Input输入

<Data>
    <Record>
        <Key>12345</Key>
        <Number>09095I</Number>
        <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
        <Text_Field_2>This is Text Field 2</Text_Field_2>
        <Author>A1</Author>
        <Author>A2</Author>
        <Author></Author>
        <Author>A1</Author>
        <Author>A2</Author>
        <Author>A3</Author>
        <Author></Author>
        <Author>A1</Author>
        <Date>10/12/2019</Date>
        <Summary>Record 1: Summary 1 Text</Summary>
    </Record>
    <Record>
        <Key>12345</Key>
        <Number>09095I</Number>
        <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
        <Text_Field_2>This is Text_Field_2</Text_Field_2>
        <Author>A2</Author>
        <Author></Author>
        <Author>A1</Author>
        <Author>A3</Author>
        <Author></Author>
        <Author>B2</Author>
        <Author></Author>
        <Author>B2</Author>
        <Date>10/12/2019</Date>
        <Summary>Record 2: Summary 1 Text</Summary>
    </Record>
    <Record>
        <Key>23456</Key>
        <Number>43095I</Number>
        <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
        <Text_Field_2>This is Text_Field_2</Text_Field_2>
        <Author>AA2</Author>
        <Author></Author>
        <Author>AA1</Author>
        <Author>AA3</Author>
        <Author></Author>
        <Author>AA3</Author>
        <Author>BB2</Author>
        <Author></Author>
        <Author>AA3</Author>
        <Date>01/12/2020</Date>
        <Summary>Record 1: Summary 1 Text</Summary>
    </Record>
    <Record>
        <Key>23456</Key>
        <Number>43095I</Number>
        <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
        <Text_Field_2>This is Text_Field_2</Text_Field_2>
        <Author>AA1</Author>
        <Author>AA3</Author>
        <Author></Author>
        <Author>CC2</Author>
        <Author></Author>
        <Author>AA1</Author>
        <Author>CC2</Author>
        <Date>01/12/2020</Date>
        <Summary>Record 2: Summary 1 Text</Summary>
    </Record>
    <Record>
        <Key>23456</Key>
        <Number>43095I</Number>
        <Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
        <Text_Field_2>This is Text_Field_2</Text_Field_2>
        <Author>AA1</Author>
        <Author>AA3</Author>
        <Author></Author>
        <Author>CC2</Author>
        <Author></Author>
        <Author>AA1</Author>
        <Author>CC3</Author>
        <Date>01/12/2020</Date>
        <Summary>Record 3: Summary 1 Text</Summary>
    </Record>
    <Record>
        <Key>778899</Key>
        <Number>998822I</Number>
        <Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
        <Text_Field_2>This is Text_Field_2</Text_Field_2>
        <Author>A2</Author>
        <Author></Author>
        <Author>D1</Author>
        <Author>D2</Author>
        <Author></Author>
        <Author>D3</Author>
        <Author>D33</Author>
        <Author></Author>
        <Author>D33</Author>
        <Date>10/12/2019</Date>
        <Summary>Record 1: Summary 1 Text</Summary>
    </Record>
</Data>

Desired Output期望输出

<Data>
    <Record>
        <Key>12345</Key>
        <Number>09095I</Number>
        <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
        <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
        <Text_Field_2>This is Text Field 2</Text_Field_2>
        <Author>A1</Author>
        <Author>A2</Author>
        <Author>A3</Author>
        <Author>B2</Author>
        <Date>10/12/2019</Date>
        <Summary>Record 1: Summary 1 Text</Summary>
        <Summary>Record 2: Summary 1 Text</Summary>
    </Record>

    <Record>
        <Key>23456</Key>
        <Number>43095I</Number>
        <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
        <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
        <Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
        <Text_Field_2>This is Text_Field_2</Text_Field_2>
        <Author>AA1</Author>
        <Author>AA2</Author>
        <Author>AA3</Author>
        <Author>BB2</Author>
        <Author>CC2</Author>
        <Author>CC3</Author>
        <Date>01/12/2020</Date>
        <Summary>Record 1: Summary 1 Text</Summary>
        <Summary>Record 2: Summary 1 Text</Summary>
        <Summary>Record 3: Summary 1 Text</Summary>
    </Record>
    <Record>
        <Key>778899</Key>
        <Number>998822I</Number>
        <Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
        <Text_Field_2>This is Text_Field_2</Text_Field_2>
        <Author>A2</Author>
        <Author>D1</Author>
        <Author>D2</Author>
        <Author>D3</Author>
        <Author>D33</Author>
        <Date>10/12/2019</Date>
        <Summary>Record 1: Summary 1 Text</Summary>
    </Record>
</Data>

I've used this code, but I'm not sure this it is the correct path.我已经使用过这段代码,但我不确定这是正确的路径。

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*" />

    <xsl:key name="key" match="Record" use="Key"/>
    <xsl:key name="kNamedSiblings" match="*" 
           use="concat(generate-id(..), '+', name())"/>

    <xsl:template match="*">
      <xsl:copy>
        <xsl:apply-templates select="key('kNamedSiblings', 
                                         concat(generate-id(..), '+', name())
                                        )/node()" />
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*[not(*) and . = '']" />
    <xsl:template match="*[generate-id() != 
                         generate-id(key('kNamedSiblings', 
                                         concat(generate-id(..), '+', name()))[1]
                                    )]" />
</xsl:stylesheet>

Current output电流输出

<?xml version="1.0"?>
<Data>
  <Record>

    <Key>12345</Key>
    <Number>09095I</Number>
    <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
    <Text_Field_2>This is Text Field 2</Text_Field_2>
    <Author>A1A2A1A2A3A1</Author>
    <Date>10/12/2019</Date>
    <Summary>Record 1: Summary 1 Text</Summary>

    <Key>12345</Key>
    <Number>09095I</Number>
    <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
    <Text_Field_2>This is Text_Field_2</Text_Field_2>
    <Author>A2A1A3B2B2</Author>
    <Date>10/12/2019</Date>
    <Summary>Record 2: Summary 1 Text</Summary>

    <Key>23456</Key>
    <Number>43095I</Number>
    <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
    <Text_Field_2>This is Text_Field_2</Text_Field_2>
    <Author>AA2AA1AA3AA3BB2AA3</Author>
    <Date>01/12/2020</Date>
    <Field_Text_1>This is the Text 1</Field_Text_1>

    <Key>23456</Key>
    <Number>43095I</Number>
    <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
    <Text_Field_2>This is Text_Field_2</Text_Field_2>
    <Author>AA1AA3CC2AA1CC2</Author>
    <Date>01/12/2020</Date>
    <Field_Text_1>This is the Text 1</Field_Text_1>

    <Key>23456</Key>
    <Number>43095I</Number>
    <Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
    <Text_Field_2>This is Text_Field_2</Text_Field_2>
    <Author>AA1AA3CC2AA1CC3</Author>
    <Date>01/12/2020</Date>
    <Field_Text_1>This is the Text 1</Field_Text_1>

    <Key>778899</Key>
    <Number>998822I</Number>
    <Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
    <Text_Field_2>This is Text_Field_2</Text_Field_2>
    <Author>A2A3A3A3</Author>
    <Date>10/12/2019</Date>
    <Field_Text_1>This is the Text 1</Field_Text_1>
  </Record>
</Data>

My current code creates one large record, not three separate ones.我当前的代码创建了一个大记录,而不是三个单独的记录。 In addition, the Author elements aren't maintained.此外,不维护 Author 元素。 Instead, one element is created and the values are lumped together.相反,创建一个元素并将这些值集中在一起。 I understand this is a phased solution involving: - The merging of multiple records into one per key - The removal of empty tags - The removal of duplicate tags with the same value - Maintaining the original XML structure我知道这是一个分阶段的解决方案,涉及: - 将多个记录合并为一个键 - 删除空标签 - 删除具有相同值的重复标签 - 维护原始 XML 结构

Understanding the solution would also be a big help.了解解决方案也会有很大帮助。

Because your stylesheet indicates that you are able to use XSLT-2.0, you can simplify your approach from using the complicated xsl:key one to a more straightforward xsl:for-each-group one:因为您的样式表表明您可以使用 XSLT-2.0,所以您可以将您的方法从使用复杂的xsl:key one 简化为更直接的xsl:for-each-group one:

<xsl:template match="Data">
  <xsl:copy>
    <xsl:for-each-group select="Record" group-by="Key">
      <xsl:copy>
        <xsl:for-each-group select="current-group()/*[normalize-space()]" group-by="concat(name(),.)">
          <xsl:sort select="name()" order="ascending" />
          <xsl:copy-of select="current-group()[1]" />
        </xsl:for-each-group>
      </xsl:copy>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

This template groups the Record elements by Key and then groups its result by a string consisting of the element name and its content.这个模板Record由元素Key由包括该元素的名称和内容的字符串,然后组它的结果。 The result of this is sorted alphabetically to group the elements with the same name.其结果按字母顺序排序以将具有相同名称的元素分组。
Then, the first (and so unique) element is output.然后,输出第一个(也是唯一的)元素。

The output is:输出是:

<?xml version="1.0" encoding="UTF-8"?>
<Data>
   <Record>
      <Author>A1</Author>
      <Author>A2</Author>
      <Author>A3</Author>
      <Author>B2</Author>
      <Date>10/12/2019</Date>
      <Key>12345</Key>
      <Number>09095I</Number>
      <Summary>Record 1: Summary 1 Text</Summary>
      <Summary>Record 2: Summary 1 Text</Summary>
      <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
      <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
      <Text_Field_2>This is Text Field 2</Text_Field_2>
      <Text_Field_2>This is Text_Field_2</Text_Field_2>
   </Record>
   <Record>
      <Author>AA2</Author>
      <Author>AA1</Author>
      <Author>AA3</Author>
      <Author>BB2</Author>
      <Author>CC2</Author>
      <Author>CC3</Author>
      <Date>01/12/2020</Date>
      <Key>23456</Key>
      <Number>43095I</Number>
      <Summary>Record 1: Summary 1 Text</Summary>
      <Summary>Record 2: Summary 1 Text</Summary>
      <Summary>Record 3: Summary 1 Text</Summary>
      <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
      <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
      <Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
      <Text_Field_2>This is Text_Field_2</Text_Field_2>
   </Record>
   <Record>
      <Author>A2</Author>
      <Author>D1</Author>
      <Author>D2</Author>
      <Author>D3</Author>
      <Author>D33</Author>
      <Date>10/12/2019</Date>
      <Key>778899</Key>
      <Number>998822I</Number>
      <Summary>Record 1: Summary 1 Text</Summary>
      <Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
      <Text_Field_2>This is Text_Field_2</Text_Field_2>
   </Record>
</Data>

Besides zx485's good XSLT 2.0 answer , here is an XSLT 1.0 stylesheet with a double key grouping:除了zx485 很好的 XSLT 2.0 答案之外,这里还有一个带有双键分组的 XSLT 1.0 样式表:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*" />
    <xsl:key name="Record-by-Key" match="Record" use="Key"/>
    <xsl:key name="Record-by-Key-child-by-name-value" match="Record/*" 
             use="concat(../Key,'+',name(),'+',.)"/>
    <xsl:template match="Data">
      <Data>
        <xsl:for-each 
             select="*[generate-id()=generate-id(key('Record-by-Key',Key)[1])]">
            <Record>
                <xsl:for-each  
                    select="key('Record-by-Key',Key)
                            /*[generate-id()
                                =generate-id(
                                    key('Record-by-Key-child-by-name-value',
                                        concat(../Key,'+',name(),'+',.))[1])]">
                    <xsl:sort select="name()"/>
                    <xsl:copy-of select="self::*[node()]"/>                
                </xsl:for-each>
            </Record>
        </xsl:for-each>  
      </Data>
    </xsl:template>
</xsl:stylesheet>

Output:输出:

<Data>
   <Record>
      <Author>A1</Author>
      <Author>A2</Author>
      <Author>A3</Author>
      <Author>B2</Author>
      <Date>10/12/2019</Date>
      <Key>12345</Key>
      <Number>09095I</Number>
      <Summary>Record 1: Summary 1 Text</Summary>
      <Summary>Record 2: Summary 1 Text</Summary>
      <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
      <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
      <Text_Field_2>This is Text Field 2</Text_Field_2>
      <Text_Field_2>This is Text_Field_2</Text_Field_2>
   </Record>
   <Record>
      <Author>AA2</Author>
      <Author>AA1</Author>
      <Author>AA3</Author>
      <Author>BB2</Author>
      <Author>CC2</Author>
      <Author>CC3</Author>
      <Date>01/12/2020</Date>
      <Key>23456</Key>
      <Number>43095I</Number>
      <Summary>Record 1: Summary 1 Text</Summary>
      <Summary>Record 2: Summary 1 Text</Summary>
      <Summary>Record 3: Summary 1 Text</Summary>
      <Text_Field_1>Record 1: This is Text Field 1</Text_Field_1>
      <Text_Field_1>Record 2: This is Text Field 1</Text_Field_1>
      <Text_Field_1>Record 3: This is Text Field 1</Text_Field_1>
      <Text_Field_2>This is Text_Field_2</Text_Field_2>
   </Record>
   <Record>
      <Author>A2</Author>
      <Author>D1</Author>
      <Author>D2</Author>
      <Author>D3</Author>
      <Author>D33</Author>
      <Date>10/12/2019</Date>
      <Key>778899</Key>
      <Number>998822I</Number>
      <Summary>Record 1: Summary 1 Text</Summary>
      <Text_Field_1>Record 1: This is Text_Field_1</Text_Field_1>
      <Text_Field_2>This is Text_Field_2</Text_Field_2>
   </Record>
</Data>

Addendum: it's also possible to enforce children order by name...附录:也可以按名称强制执行儿童顺序...

As we already have XSLT 1 and XSLT 2 solutions, for completeness here an XSLT 3 one using xsl:merge :由于我们已经有了 XSLT 1 和 XSLT 2 解决方案,为了完整起见,这里使用xsl:merge的 XSLT 3 解决方案:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:output indent="yes"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="Data">
      <xsl:copy>
          <xsl:merge>
              <xsl:merge-source select="Record">
                  <xsl:merge-key select="Key"/>
              </xsl:merge-source>
              <xsl:merge-action>
                  <xsl:copy>
                      <xsl:merge>
                          <xsl:merge-source select="current-merge-group()/*[normalize-space()]" sort-before-merge="yes">
                              <xsl:merge-key select="name()"/>
                              <xsl:merge-key select="."/>
                          </xsl:merge-source>
                          <xsl:merge-action>
                              <xsl:copy-of select="."/>
                          </xsl:merge-action>
                      </xsl:merge>
                  </xsl:copy>
              </xsl:merge-action>
          </xsl:merge>
      </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/gWEaSv5 https://xsltfiddle.liberty-development.net/gWEaSv5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM