簡體   English   中英

使用XPath和PHP過濾XML文檔

[英]Filter XML document using XPath and PHP

我正在嘗試使用PHP和XPath提取XML數據。 考慮以下XML文檔:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <channel>
        <item>
            <title>My Second Great Title</title>
            <link>http://server.com/content/my-second-great-title</link>
            <tag>vuluptate</tag>
            <tag>id</tag>
            <tag>cras</tag>
            <tag>pretium</tag>
            <tag>conubia</tag>
            <tag>libero</tag>
            <description>This is a second great description</description>
            <publishedAt>Sat, 08 Nov 2015 10:00:52 +0000</publishedAt>
            <isVisible>true</isVisible>
            <content>Ut luctus auctor varius. Donec vitae erat felis. Nam ac erat vulputate, consequat elit id, dictum urna. Vestibulum dignissim eget felis vitae tempor. Suspendisse molestie lectus at est accumsan, et porta sapien elementum. Vivamus pretium imperdiet nisl id consequat. Sed gravida bibendum odio, et vehicula nibh hendrerit eget. Cras sit amet semper sem. Vivamus non lorem sed ex fringilla malesuada consequat non arcu. Etiam nec sodales tortor. In scelerisque massa vitae purus suscipit consectetur. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Cras ultrices eros tortor, eu sollicitudin eros pellentesque sit amet. Integer rutrum velit eget libero efficitur, non auctor lorem rutrum. Vivamus porta dolor ut enim dapibus, nec rutrum nisi sagittis.</content>
        </item>
        <item>
            <title>My Great Title</title>
            <link>http://server.com/content/my-great-title</link>
            <tag>lorem</tag>
            <tag>ipsum</tag>
            <tag>arcu</tag>
            <tag>sic</tag>
            <description>This is a great description</description>
            <publishedAt>Sat, 08 Nov 2015 10:00:52 +0000</publishedAt>
            <isVisible>true</isVisible>
            <content>Praesent consectetur, dolor non vehicula ultrices, nisl libero feugiat ligula, ut faucibus metus arcu et dui. Curabitur eleifend feugiat posuere. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec cursus blandit lorem, ullamcorper vestibulum massa molestie non. Maecenas erat enim, pretium eget velit dapibus, consequat placerat eros. Nam vulputate nisi at urna gravida accumsan. Fusce id ultrices nunc. Aenean varius quam in tincidunt cursus. Quisque sed arcu est. Etiam dignissim, neque at maximus feugiat, turpis nunc sollicitudin eros, et lobortis enim dui sed felis. Nulla rhoncus diam porttitor ullamcorper imperdiet.</content>
        </item>
        <item>
            <title>My Title</title>
            <link>http://server.com/content/my-title</link>
            <tag>auctor</tag>
            <tag>felis</tag>
            <description>This is a simple description</description>
            <publishedAt>Sat, 05 Nov 2015 16:07:23 +0000</publishedAt>
            <isVisible>true</isVisible>
            <content>Ut luctus auctor varius. Donec vitae erat felis. Nam ac erat vulputate, consequat elit id, dictum urna. Vestibulum dignissim eget felis vitae tempor. Suspendisse molestie lectus at est accumsan, et porta sapien elementum. Vivamus pretium imperdiet nisl id consequat. Sed gravida bibendum odio, et vehicula nibh hendrerit eget. Cras sit amet semper sem. Vivamus non lorem sed ex fringilla malesuada consequat non arcu. Etiam nec sodales tortor. In scelerisque massa vitae purus suscipit consectetur. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Cras ultrices eros tortor, eu sollicitudin eros pellentesque sit amet. Integer rutrum velit eget libero efficitur, non auctor lorem rutrum. Vivamus porta dolor ut enim dapibus, nec rutrum nisi sagittis.</content>
        </item>
    </channel>
</root>

到目前為止,我一直在嘗試使用類似以下的表達式:

//root/channel/item/title|//root/channel/item/link|//root/channel/item/tag

不幸的是, <item>標簽在應用表達式后丟失了,那么有沒有辦法過濾保留item標簽的數據呢?

您的XPath表達式是正確的。 它給出了正確的輸出-這就是您所要求的。 您正在全局(//)選擇titlelinktag element-nodes。 這就是您從這種表達中得到的結果。 沒有選擇任何item元素節點。

要為三個提到的標簽過濾每個item-node ,您必須遍歷所有item node並過濾其子級(並可能重建item elements)。 沒有全局過濾所有三個元素(// ... | // ... | // ...)。

因為您沒有提供PHP代碼段,所以我將在XSLT中對此進行說明:

你做了什么:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
  <xsl:template match="/">
   <xsl:copy-of select="//root/channel/item/title|//root/channel/item/link|//root/channel/item/tag" />
  </xsl:template>
 </xsl:stylesheet>

應該(可能)執行以下操作:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
 <xsl:strip-space elements="*"/>

  <xsl:template match="root">
     <xsl:element name="root">          
       <xsl:for-each select="channel">  <!-- iterating over 'channel'-nodes -->
         <xsl:element name="channel">   <!-- reconstruct 'channel'-node  -->             
          <xsl:for-each select="item">     <!-- iterating over 'item'-nodes -->
            <xsl:element name="item">      <!-- iterating over 'item'-nodes -->
              <xsl:copy-of select="title|link|tag" />    <!-- filtering each for the three elements -->
            </xsl:element>      
          </xsl:for-each>              
         </xsl:element>
       </xsl:for-each>           
     </xsl:element>
  </xsl:template>

 </xsl:stylesheet>

需要重組整個XML文檔時,請考慮使用XSLT解決方案。 像其他通用語言一樣,PHP維護XSLT處理器。 本質上,您需要寫出不需要的節點。 下面運行身份轉換以按原樣復制整個文檔,然后將空模板匹配寫入不需要的節點。 我包括兩個等效的解決方案。

XSLT腳本(另存為.xsl或.xslt文件)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- SOLUTION 1-->
  <!-- <xsl:template match="description|publishedAt|isVisible|content"/> -->

  <!-- SOLUTION 2-->
  <xsl:template match="item/*[not(name()='title' or name()='link' or name()='tag')]"/>

</xsl:transform>

PHP腳本

<?php

// Load the XML source and XSLT file
$doc = new DOMDocument();    
$doc->load('Input.xml');

$xsl = new DOMDocument;
$xsl->load('XSLTScript.xsl');

// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); 

// Transform XML source
$newXml = $proc->transformToXML($doc);

// Save output to file
$xmlfile = 'Output.xml';
file_put_contents($xmlfile, $newXml);

?>

輸出值

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <channel>
    <item>
      <title>My Second Great Title</title>
      <link>http://server.com/content/my-second-great-title</link>
      <tag>vuluptate</tag>
      <tag>id</tag>
      <tag>cras</tag>
      <tag>pretium</tag>
      <tag>conubia</tag>
      <tag>libero</tag>
    </item>
    <item>
      <title>My Great Title</title>
      <link>http://server.com/content/my-great-title</link>
      <tag>lorem</tag>
      <tag>ipsum</tag>
      <tag>arcu</tag>
      <tag>sic</tag>
    </item>
    <item>
      <title>My Title</title>
      <link>http://server.com/content/my-title</link>
      <tag>auctor</tag>
      <tag>felis</tag>
    </item>
  </channel>
</root>

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM