简体   繁体   中英

Merge entries in XML

I have a XML containing products and I need to somehow merge to one entry:

<SHOPITEM>
        <PRODUCT>POINT</PRODUCT>
        <FRAMESIZE>MD</FRAMESIZE>
        <CODE>029,00</CODE>
        <COLOR>black / yellow</COLOR>
</SHOPITEM>
<SHOPITEM>
        <PRODUCT>POINT</PRODUCT>
        <FRAMESIZE>LD</FRAMESIZE>
        <CODE>029,01</CODE>
        <COLOR>black / yellow</COLOR>
</SHOPITEM>
<SHOPITEM>
        <PRODUCT>POINT</PRODUCT>
        <FRAMESIZE>LD</FRAMESIZE>
        <CODE>029,03</CODE>
        <COLOR>green / white</COLOR>
</SHOPITEM>
<SHOPITEM>
        <PRODUCT>POINT</PRODUCT>
        <FRAMESIZE>MD</FRAMESIZE>
        <CODE>029,04</CODE>
        <COLOR>green / white</COLOR>
</SHOPITEM>

The <PRODUCT> is same, what is change is the <FRAMESIZE>, <CODE>, <COLOR> .

Is there any way to get from this an usable data? The best would be in PHP, but also would be good to generate a new XML file which I can process in PHP:

<SHOPITEM>
        <PRODUCT>POINT</PRODUCT>
        <FRAMESIZE1>MD</FRAMESIZE1>
        <CODE1>029,00</CODE1>
        <COLOR1>black / yellow</COLOR2>
        <FRAMESIZE2>LD</FRAMESIZE2>
        <CODE2>029,01</CODE2>
        <COLOR2>black / yellow</COLOR2>
        <FRAMESIZE3>LD</FRAMESIZE3>
        <CODE3>029,03</CODE3>
        <COLOR3>green / white</COLOR3>
        <FRAMESIZE4>MD</FRAMESIZE4>
        <CODE4>029,04</CODE4>
        <COLOR4>green / white</COLOR4>
</SHOPITEM>

My XSLT-fu is weak, but this produces your desired output (after wrapping your sample XML with a root tag):

xmlstarlet sel -t -v '//SHOPITEM[1]/PRODUCT' -n -m '//SHOPITEM' -v FRAMESIZE -n -v CODE -n -v COLOR -n file.xml | 
awk '
  BEGIN {print "<SHOPITEM>"} 
  END   {print "</SHOPITEM>"}
  NR==1 {print "  <PRODUCT>" $0 "</PRODUCT>"; next} 
  {
    n++;     t="FRAMESIZE"; printf "  <%s%d>%s</%s%d>\n", t, n, $0, t, n
    getline; t="CODE";      printf "  <%s%d>%s</%s%d>\n", t, n, $0, t, n
    getline; t="COLOR";     printf "  <%s%d>%s</%s%d>\n", t, n, $0, t, n
  }
'
<SHOPITEM>
  <PRODUCT>POINT</PRODUCT>
  <FRAMESIZE1>MD</FRAMESIZE1>
  <CODE1>029,00</CODE1>
  <COLOR1>black / yellow</COLOR1>
  <FRAMESIZE2>LD</FRAMESIZE2>
  <CODE2>029,01</CODE2>
  <COLOR2>black / yellow</COLOR2>
  <FRAMESIZE3>LD</FRAMESIZE3>
  <CODE3>029,03</CODE3>
  <COLOR3>green / white</COLOR3>
  <FRAMESIZE4>MD</FRAMESIZE4>
  <CODE4>029,04</CODE4>
  <COLOR4>green / white</COLOR4>
</SHOPITEM>

In hindsight, this output format may be easier to process:

xmlstarlet ... file.xml | awk '
      BEGIN {print "<SHOPITEM>"; fmt="\t\t<%s>%s</%s>\n"} 
      END   {print "</SHOPITEM>"}
      NR==1 {print "\t<PRODUCT>" $0 "</PRODUCT>"; next} 
      {
        n++
        printf "\t<PRODUCT_ITEM id=\"%d\">\n", n
        t="FRAMESIZE"; printf fmt, t, $0, t; getline
        t="CODE";      printf fmt, t, $0, t; getline
        t="COLOR";     printf fmt, t, $0, t
        print "\t</PRODUCT_ITEM>"
      }
    '
<SHOPITEM>
    <PRODUCT>POINT</PRODUCT>
    <PRODUCT_ITEM id="1">
        <FRAMESIZE>MD</FRAMESIZE>
        <CODE>029,00</CODE>
        <COLOR>black / yellow</COLOR>
    </PRODUCT_ITEM>
    <PRODUCT_ITEM id="2">
        <FRAMESIZE>LD</FRAMESIZE>
        <CODE>029,01</CODE>
        <COLOR>black / yellow</COLOR>
    </PRODUCT_ITEM>
    <PRODUCT_ITEM id="3">
        <FRAMESIZE>LD</FRAMESIZE>
        <CODE>029,03</CODE>
        <COLOR>green / white</COLOR>
    </PRODUCT_ITEM>
    <PRODUCT_ITEM id="4">
        <FRAMESIZE>MD</FRAMESIZE>
        <CODE>029,04</CODE>
        <COLOR>green / white</COLOR>
    </PRODUCT_ITEM>
</SHOPITEM>

I strongly recommend you figure out an XSLT solution - glenn jackman

I can only second that. So, here is your XSLT solution. However, the question is: Did you show a representative XML sample or are there several different PRODUCT elements in your real XML data?

Also, naming elements CODE1 , CODE2 and so on can be done, but I would (again, strongly) recommend not to do it. I'm glad to add in this detail, but first clarify if you really need this crippling naming convention or if you can use attributes instead:

<CODE n="1"/>

XML Input

As suggested by Glenn already, there must be a single outermost element to make your input well-formed XML.

<root>
    <SHOPITEM>
            <PRODUCT>POINT</PRODUCT>
            <FRAMESIZE>MD</FRAMESIZE>
            <CODE>029,00</CODE>
            <COLOR>black / yellow</COLOR>
    </SHOPITEM>
    <SHOPITEM>
            <PRODUCT>POINT</PRODUCT>
            <FRAMESIZE>LD</FRAMESIZE>
            <CODE>029,01</CODE>
            <COLOR>black / yellow</COLOR>
    </SHOPITEM>
    <SHOPITEM>
            <PRODUCT>POINT</PRODUCT>
            <FRAMESIZE>LD</FRAMESIZE>
            <CODE>029,03</CODE>
            <COLOR>green / white</COLOR>
    </SHOPITEM>
    <SHOPITEM>
            <PRODUCT>POINT</PRODUCT>
            <FRAMESIZE>MD</FRAMESIZE>
            <CODE>029,04</CODE>
            <COLOR>green / white</COLOR>
    </SHOPITEM>
</root>

XSLT Stylesheet (1.0)

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />

    <xsl:strip-space elements="*"/>

    <xsl:template match="/root">
        <SHOPITEM>
            <xsl:copy-of select="SHOPITEM[1]/PRODUCT"/>
            <xsl:copy-of select="SHOPITEM/*[not(self::PRODUCT)]"/>
        </SHOPITEM>
    </xsl:template>

</xsl:transform>

XML Output

<SHOPITEM>
   <PRODUCT>POINT</PRODUCT>
   <FRAMESIZE>MD</FRAMESIZE>
   <CODE>029,00</CODE>
   <COLOR>black / yellow</COLOR>
   <FRAMESIZE>LD</FRAMESIZE>
   <CODE>029,01</CODE>
   <COLOR>black / yellow</COLOR>
   <FRAMESIZE>LD</FRAMESIZE>
   <CODE>029,03</CODE>
   <COLOR>green / white</COLOR>
   <FRAMESIZE>MD</FRAMESIZE>
   <CODE>029,04</CODE>
   <COLOR>green / white</COLOR>
</SHOPITEM>

EDIT :

What I missed too, that there is many different elements as Mathias asked.

XML Input

A more reasonable sample for testing, with more than one PRODUCT :

<root>
    <SHOPITEM>
            <PRODUCT>POINT</PRODUCT>
            <FRAMESIZE>MD</FRAMESIZE>
            <CODE>029,00</CODE>
            <COLOR>black / yellow</COLOR>
    </SHOPITEM>
    <SHOPITEM>
            <PRODUCT>POINT</PRODUCT>
            <FRAMESIZE>LD</FRAMESIZE>
            <CODE>029,01</CODE>
            <COLOR>black / yellow</COLOR>
    </SHOPITEM>
    <SHOPITEM>
            <PRODUCT>OTHER</PRODUCT>
            <FRAMESIZE>LD</FRAMESIZE>
            <CODE>029,03</CODE>
            <COLOR>green / white</COLOR>
    </SHOPITEM>
    <SHOPITEM>
            <PRODUCT>OTHER</PRODUCT>
            <FRAMESIZE>MD</FRAMESIZE>
            <CODE>029,04</CODE>
            <COLOR>green / white</COLOR>
    </SHOPITEM>
</root>

Stylesheet

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />

    <xsl:strip-space elements="*"/>

    <xsl:key name="prod" match="SHOPITEM" use="PRODUCT"/>

    <xsl:template match="/root">
        <xsl:copy>
            <xsl:for-each select="SHOPITEM[generate-id() = generate-id(key('prod',PRODUCT)[1])]">
                <SHOPITEM>
                    <xsl:copy-of select="PRODUCT"/>
                    <xsl:copy-of select="/root/SHOPITEM[PRODUCT = current()/PRODUCT]/*[not(self::PRODUCT)]"/>
                </SHOPITEM>
            </xsl:for-each>
        </xsl:copy>
    </xsl:template>

</xsl:transform>

XML Output

<root>
   <SHOPITEM>
      <PRODUCT>POINT</PRODUCT>
      <FRAMESIZE>MD</FRAMESIZE>
      <CODE>029,00</CODE>
      <COLOR>black / yellow</COLOR>
      <FRAMESIZE>LD</FRAMESIZE>
      <CODE>029,01</CODE>
      <COLOR>black / yellow</COLOR>
   </SHOPITEM>
   <SHOPITEM>
      <PRODUCT>OTHER</PRODUCT>
      <FRAMESIZE>LD</FRAMESIZE>
      <CODE>029,03</CODE>
      <COLOR>green / white</COLOR>
      <FRAMESIZE>MD</FRAMESIZE>
      <CODE>029,04</CODE>
      <COLOR>green / white</COLOR>
   </SHOPITEM>
</root>

Here is another solution in XSLT 1.0 - which assumes that there can be multiple <SHOPTITEM> elements.

I have added a root element ( <root> ) because your input XML was not well-formed. You can also see/test the solution here: http://xsltransform.net/pPqsHTk

Note that there is one template to match the first PRODUCT, which groups the data according to the name of the PRODUCT. And another template which handles alle occurences of the same PRODUCT which are not the first and simply does nothing.

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="root">
        <xsl:copy>
            <xsl:apply-templates />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="SHOPITEM[not(PRODUCT = preceding::SHOPITEM/PRODUCT)]">
        <SHOPITEM>
            <xsl:copy-of select="*"/>
            <xsl:copy-of select="following-sibling::SHOPITEM[PRODUCT = current()/PRODUCT]/*[not(self::PRODUCT)]"/>
        </SHOPITEM>
    </xsl:template>

    <xsl:template match="SHOPITEM[PRODUCT = preceding::SHOPITEM/PRODUCT]"/>
</xsl:transform>

This is not the fastest solution, but it should work reasonably fast if your input xml is not too big.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM