简体   繁体   English

使用xslt 1.0将csv转换为xml

[英]Transform csv to xml using xslt 1.0

Following the approach described in: Converting CSV to hierarchichal XML using XSLT 遵循以下方法中描述的方法: 使用XSLT将CSV转换为分层XML

Now, raw input file would contain empty tokens as below: 现在,原始输入文件将包含空令牌,如下所示:

<root>
GroupName,GroupValue,SubGroupName,SubGroupValue,ItemName,ItemValue
,A,1,C,1,G
1,,1,C,2,H
1,A,2,D,1,I
</root>

And the original XSLT 1.0 provided is: 并且提供的原始XSLT 1.0是:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="k1" match="row" use="cell[1]"/>
<xsl:key name="k2" match="row" use="concat(cell[1], '|', cell[3])"/>

<xsl:template match="/">
    <!-- tokenize csv -->
    <xsl:variable name="rows">
        <xsl:call-template name="tokenize">
            <xsl:with-param name="text" select="root"/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="data">
        <xsl:for-each select="exsl:node-set($rows)/row[position() > 1]">
            <row>
                <xsl:call-template name="tokenize">
                    <xsl:with-param name="text" select="."/>
                    <xsl:with-param name="delimiter" select="','"/>
                    <xsl:with-param name="name" select="'cell'"/>
                </xsl:call-template>
            </row>
        </xsl:for-each>
    </xsl:variable>
    <!-- output -->
    <document>
        <xsl:for-each select="exsl:node-set($data)/row[count(. | key('k1', cell[1])[1]) = 1]">
            <group>
                <name>
                    <xsl:value-of select="cell[1]"/>
                </name>
                <value>
                    <xsl:value-of select="cell[2]"/>
                </value>
                <xsl:for-each select="key('k1', cell[1])[count(. | key('k2', concat(cell[1], '|', cell[3]))[1]) = 1]">
                    <subgroup>
                        <name>
                            <xsl:value-of select="cell[3]"/>
                        </name>
                        <value>
                            <xsl:value-of select="cell[4]"/>
                        </value>
                        <items>
                            <xsl:for-each select="key('k2', concat(cell[1], '|', cell[3]))">
                                <item>
                                    <name>
                                        <xsl:value-of select="cell[5]"/>
                                    </name>
                                    <value>
                                        <xsl:value-of select="cell[6]"/>
                                    </value>
                                </item>
                            </xsl:for-each>
                        </items>
                    </subgroup>
                </xsl:for-each>
            </group>
        </xsl:for-each>
    </document>
</xsl:template>

<xsl:template name="tokenize">
    <xsl:param name="text"/>
    <xsl:param name="delimiter" select="'&#10;'"/>
    <xsl:param name="name" select="'row'"/>
    <xsl:variable name="token" select="substring-before(concat($text, $delimiter), $delimiter)" />
    <xsl:if test="$token">
        <xsl:element name="{$name}">
            <xsl:value-of select="$token"/>
        </xsl:element>
    </xsl:if>
    <xsl:if test="contains($text, $delimiter)">
        <!-- recursive call -->
        <xsl:call-template name="tokenize">
            <xsl:with-param name="text" select="substring-after($text, $delimiter)"/>
            <xsl:with-param name="delimiter" select="$delimiter"/>
            <xsl:with-param name="name" select="$name"/>
        </xsl:call-template>
    </xsl:if>
</xsl:template>

</xsl:stylesheet>

How to tweak the xslt so that it would NOT skip empty tokens in this scenario and produce the following xml output? 如何调整xslt,以便在这种情况下不会跳过空令牌并产生以下xml输出?

<?xml version="1.0" encoding="utf-8"?>
<Document>
  <data>
  <GroupName></GroupName>
  <GroupValue>A</GroupValue>
  ...
  </data>

  <data>
  <GroupName>1</GroupName>
  <GroupValue></GroupValue>
  ...
  </data>

  <data>
  <GroupName>1</GroupName>
  <GroupValue>A</GroupValue>
  ...
  </data>
</Document>

Try this: 尝试这个:

also you can check this at http://xsltransform.net/nb9MWt1/2 您也可以在http://xsltransform.net/nb9MWt1/2上进行检查

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common"
    exclude-result-prefixes="exsl"
    version="1.0">
    <xsl:output indent="yes"/>

    <xsl:variable name="elements">
        <element>GroupName</element>
        <element>GroupValue</element>
        <element>SubGroupName</element>
        <element>SubGroupValue</element>
        <element>ItemName</element>
        <element>ItemValue</element>
    </xsl:variable>

    <xsl:template match="root">
        <Document>
            <xsl:call-template name="row">
                <xsl:with-param name="data" select="."/>
            </xsl:call-template>
        </Document>
    </xsl:template>

    <xsl:template name="row">
        <xsl:param name="data"/>
        <xsl:choose>
            <xsl:when test="contains($data, '&#xa;')">
                <xsl:if test="normalize-space(substring-before($data, '&#xa;')) != ''">
                    <data>
                        <xsl:call-template name="cell">
                            <xsl:with-param name="celldata" select="substring-before($data, '&#xa;')"/>
                            <xsl:with-param name="position" select="1"/>
                        </xsl:call-template>
                    </data>
                </xsl:if>

                <xsl:if test="normalize-space(substring-after($data, '&#xa;')) != ''">
                    <xsl:call-template name="row">
                        <xsl:with-param name="data" select="substring-after($data, '&#xa;')"/>
                        <xsl:with-param name="position" select="1"/>
                    </xsl:call-template>
                </xsl:if>
            </xsl:when>
            <xsl:otherwise>
                <xsl:if test="normalize-space($data) != ''">
                    <data>
                        <xsl:call-template name="cell">
                            <xsl:with-param name="celldata" select="$data"/>
                            <xsl:with-param name="position" select="1"/>
                        </xsl:call-template>
                    </data>
                </xsl:if>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template name="cell">
        <xsl:param name="celldata"/>
        <xsl:param name="position"/>
        <xsl:choose>
            <xsl:when test="contains($celldata, ',')">
                <xsl:element name="{exsl:node-set($elements)//element[position() = $position]}">
                    <xsl:value-of select="substring-before($celldata, ',')"/>
                </xsl:element>
                <xsl:call-template name="cell">
                    <xsl:with-param name="celldata" select="substring-after($celldata, ',')"/>
                    <xsl:with-param name="position" select="number($position) + 1"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:element name="{$elements//element[position() = $position]}">
                    <xsl:value-of select="$celldata"/>
                </xsl:element>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

Sorry for using this to be comment, but not sure how to format code via comment: Thanks Rupesh, but what if 1) only a few fields are selected to be in the output file, also, 2)fields name could be required to change to other name in the output? 抱歉,使用它作为注释,但是不确定如何通过注释设置代码格式:谢谢Rupesh,但是如果1)输出文件中只有少数几个字段被选中,并且2)可能需要更改字段名怎么办?到输出中的其他名称?

For 1), changing the xsl to the below to output only 3 selected fields 对于1),将xsl更改为以下内容以仅输出3个选定字段

<xsl:variable name="elements">
    <element>SubGroupName</element>
    <element>ItemName</element>
    <element>ItemValue</element>
</xsl:variable>

it doesn't work? 它行不通吗?

For 2), let's say, GroupValue is renamed to "Field1" in output, and "ItemName" is renamed to "Field2" in output and those are the only 2 fields to be required... 例如,对于2),GroupValue在输出中被重命名为“ Field1”,而“ ItemName”在输出中被重命名为“ Field2”,并且这是唯一需要的2个字段...

<?xml version="1.0" encoding="utf-8"?>
<Document>
  <data>
  <Field1>A</Field1>
  <Field2>1</Field2>
  </data>

  <data>
  <Field1></Field1>
  <Field2>2</Field2>
  </data>

  <data>
  <Field1>A</Field1>
  <Field2>1</Field2>
  </data>

</Document>

Lastly, input fields might contain space, eg 最后,输入字段可能包含空格,例如

<root>
Group Name,Group Value,Sub Group Name,Sub Group Value,Item Name,Item Value
,A,1,C,1,G 
1,,1,C,2,H
1,A,2,D,1,I
</root>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM