簡體   English   中英

OpenXML表到CSV

[英]OpenXML table to csv

我正在嘗試使用xsltproc從OpenXML提取表。 有人可以為我提供以下幫助(我的願望清單)1.如何過濾非表格文本2.格式化表格(可能轉換為csv)3.將結果定向到多個輸出文件(每個文件中有一個表格),版本為1.0

我的嘗試:

 <?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
 <xsl:output method="text"/>
 <xsl:template match="w:tbl">
 <xsl:apply-templates/><xsl:for-each select="w:tr"><xsl:text>
 </xsl:text>
  </xsl:for-each>
 <xsl:apply-templates/><xsl:for-each select="w:tcW"><xsl:text>  ;</xsl:text>
  </xsl:for-each>
 </xsl:template>
 </xsl:stylesheet>

產生輸出:

>  xsltproc new2 word/document.xml
Node Selection and Pattern MatchingIn XSLT stylesheets, template rules for node selection and pattern matching are applied via the select attribute of the xsl:apply-templates command and the match attribute of the xsl:template element, respectively. A specification can be created to determine how to resolve issues in the event that a multiple number of applicable template rules exist, or alternately, when there are no applicable template rules at all.Table1 headingTextbodyBlahblahBody2Blah2Blah2


Table1 headingTextbodyBlahblahBody2Blah2Blah2Node SelectionWith the select attribute of xsl:apply-templates command, an XPath description can be used to either (1) select a multiple number of nodes with identical names, or (2) select a multiple number of nodes with differing names. Under scenario (1), using XPath to designate "ProductList/ Product" results in the selection of two Product element nodes.Table1 headingCol1TextbodyBlahCol1 BlahblahBody2Blah2Col1 Blah2Blah2Body3Blah3Col1 Blah3Blah3



Table1 headingCol1TextbodyBlahCol1 BlahblahBody2Blah2Col1 Blah2Blah2Body3Blah3Col1 Blah3Blah3

預期產量:

Table1  ;heading    ;Text
body    ;Blah   ;blah
Body2   ;Blah2  ;Blah2

Table1  ;heading    ;Col1   ;Text
body    ;Blah   ;Col1 Blah  ;blah
Body2   ;Blah2  ;Col1 Blah2 ;Blah2
Body3   ;Blah3  ;Col1 Blah3 ;Blah3

我成功進行了一次粗略的格式化嘗試,但以下嘗試卻失敗了,但未能達到上述目標的2/3。

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<xsl:output method="text"/>

<xsl:template match="w:tr">
 <xsl:apply-templates/><xsl:text>
</xsl:text>
</xsl:template>

<xsl:template match="w:tcW">
 <xsl:apply-templates/><xsl:text>   ;</xsl:text>
</xsl:template>

<xsl:template match="w:p">
<xsl:apply-templates/><xsl:if test="position()!=last()"><xsl:text>
</xsl:text></xsl:if>
</xsl:template>

</xsl:stylesheet>

我的輸入XML:

   <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14"><w:body><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Node Selection and Pattern Matching</w:t></w:r></w:p><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>In XSLT stylesheets, template rules for node selection and pattern matching are applied via the select attribute of the xsl:apply-templates command and the match attribute of the xsl:template element, respectively. A specification can be created to determine how to resolve issues in the event that a multiple number of applicable template rules exist, or alternately, when there are no applicable template rules at all.</w:t></w:r></w:p><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"/><w:tbl><w:tblPr><w:tblStyle w:val="TableGrid"/><w:tblW w:w="0" w:type="auto"/><w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/></w:tblPr><w:tblGrid><w:gridCol w:w="3116"/><w:gridCol w:w="3117"/><w:gridCol w:w="3117"/></w:tblGrid><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="3116" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t xml:space="preserve">Table1 </w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>heading</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Text</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="3116" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>body</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Blah</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>blah</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="3116" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Body2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Blah2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="3117" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0"><w:r><w:t>Blah2</w:t></w:r></w:p></w:tc></w:tr></w:tbl><w:p w:rsidR="006C4C5A" w:rsidRDefault="006C4C5A"/><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Node Selection</w:t></w:r></w:p><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>With the select attribute of xsl:apply-templates command, an XPath description can be used to either (1) select a multiple number of nodes with identical names, or (2) select a multiple number of nodes with differing names. Under scenario (1), using XPath to designate "ProductList/ Product" results in the selection of two Product element nodes.</w:t></w:r></w:p><w:tbl><w:tblPr><w:tblStyle w:val="TableGrid"/><w:tblW w:w="0" w:type="auto"/><w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/></w:tblPr><w:tblGrid><w:gridCol w:w="2383"/><w:gridCol w:w="2420"/><w:gridCol w:w="2194"/><w:gridCol w:w="2353"/></w:tblGrid><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="2383" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t xml:space="preserve">Table1 </w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2420" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>heading</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2194" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Col1</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2353" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Text</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="2383" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>body</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2420" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Blah</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2194" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Col1 Blah</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2353" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>blah</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="2383" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Body2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2420" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Blah2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2194" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Col1 Blah2</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2353" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="0042011C"><w:r><w:t>Blah2</w:t></w:r></w:p></w:tc></w:tr><w:tr w:rsidR="003404B0" w:rsidTr="003404B0"><w:tc><w:tcPr><w:tcW w:w="2383" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>B</w:t></w:r><w:r><w:t>ody3</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2420" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Blah3</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2194" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Col1 Blah3</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:w="2353" w:type="dxa"/></w:tcPr><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"><w:r><w:t>Blah3</w:t></w:r><w:bookmarkStart w:id="0" w:name="_GoBack"/><w:bookmarkEnd w:id="0"/></w:p></w:tc></w:tr></w:tbl><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"/><w:p w:rsidR="003404B0" w:rsidRDefault="003404B0" w:rsidP="003404B0"/><w:sectPr w:rsidR="003404B0"><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/><w:cols w:space="720"/><w:docGrid w:linePitch="360"/></w:sectPr></w:body></w:document>

您不能隨意使用<xsl:apply-templates/> ,因為它還會應用默認模板-其中之一復制文本節點。 還要注意,在任何表之外,都有與模板匹配的節點。 嘗試這種方式:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> 
<xsl:output method="text"/>

<xsl:template match="/">
    <xsl:apply-templates select="//w:tbl"/>
</xsl:template> 

<xsl:template match="w:tbl">
    <xsl:apply-templates select="w:tr"/>
    <xsl:text>&#10;</xsl:text>
</xsl:template>

<xsl:template match="w:tr">
    <xsl:apply-templates select="w:tc"/>
    <xsl:text>&#10;</xsl:text>
</xsl:template>

<xsl:template match="w:tc">
    <xsl:apply-templates select=".//w:t"/>
    <xsl:if test="position()!=last()">
        <xsl:text>&#9;</xsl:text>
    </xsl:if>
</xsl:template>

</xsl:stylesheet>

要么:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> 
<xsl:output method="text"/>

<xsl:template match="/">
    <xsl:for-each select="//w:tbl">
        <xsl:for-each select="w:tr">
            <xsl:for-each select="w:tc">
                <xsl:value-of select=".//w:t"/>
                    <xsl:if test="position()!=last()">
                        <xsl:text>&#9;</xsl:text>
                    </xsl:if>
            </xsl:for-each>
            <xsl:text>&#10;</xsl:text>
        </xsl:for-each>
        <xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

-
兩者都將產生制表符分隔的輸出。 注意,假定表單元格本身不包含制表符。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM