簡體   English   中英

從 XML 中刪除所有節點,但在 PHP 中刪除特定節點

[英]Remove all nodes from XML but specific ones in PHP

我有一個來自 Google 的 XML,內容如下:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
 <channel>
  <title>E-commerce's products.</title>
  <description><![CDATA[Clothing and accessories.]]></description>
  <link>https://www.ourwebsite.com/</link>
  <item>
   <title><![CDATA[Product #1 title]]></title>
   <g:brand><![CDATA[Product #1 brand]]></g:brand>
   <g:mpn><![CDATA[5643785645]]></g:mpn>
   <g:gender>Male</g:gender>
   <g:age_group>Adult</g:age_group>
   <g:size>Unica</g:size>
   <g:condition>new</g:condition>
   <g:id>fr_30763_06352</g:id>
   <g:item_group_id>fr_30763</g:item_group_id>
   <link><![CDATA[https://www.ourwebsite.com/product_1_url.htm?mid=62367]]></link>
   <description><![CDATA[Product #1 description]]></description>
   <g:image_link><![CDATA[https://data.ourwebsite.com/imgprodotto/product-1_big.jpg]]></g:image_link>
   <g:sale_price>29.25 EUR</g:sale_price>
   <g:price>65.00 EUR</g:price>
   <g:shipping_weight>0.5 kg</g:shipping_weight>
   <g:featured_product>y</g:featured_product>
   <g:product_type><![CDATA[Product #1 category]]></g:product_type>
   <g:availability>in stock</g:availability>
   <g:availability_date>2022-08-10T00:00-0000</g:availability_date>
   <qty>3</qty>
   <g:payment_accepted>Visa</g:payment_accepted>
   <g:payment_accepted>MasterCard</g:payment_accepted>
   <g:payment_accepted>CartaSi</g:payment_accepted>
   <g:payment_accepted>Aura</g:payment_accepted>
   <g:payment_accepted>PayPal</g:payment_accepted>
  </item>
  <item>
   <title><![CDATA[Product #2 title]]></title>
   <g:brand><![CDATA[Product #2 brand]]></g:brand>
   <g:mpn><![CDATA[573489547859]]></g:mpn>
   <g:gender>Unisex</g:gender>
   <g:age_group>Adult</g:age_group>
   <g:size>Unica</g:size>
   <g:condition>new</g:condition>
   <g:id>fr_47362_382936</g:id>
   <g:item_group_id>fr_47362</g:item_group_id>
   <link><![CDATA[https://www.ourwebsite.com/product_2_url.htm?mid=168192]]></link>
   <description><![CDATA[Product #2 description]]></description>
   <g:image_link><![CDATA[https://data.ourwebsite.com/imgprodotto/product-2_big.jpg]]></g:image_link>
   <g:sale_price>143.91 EUR</g:sale_price>
   <g:price>159.90 EUR</g:price>
   <g:shipping_weight>8.0 kg</g:shipping_weight>
   <g:product_type><![CDATA[Product #2 category]]></g:product_type>
   <g:availability>in stock</g:availability>
   <g:availability_date>2022-08-10T00:00-0000</g:availability_date>
   <qty>1</qty>
   <g:payment_accepted>Visa</g:payment_accepted>
   <g:payment_accepted>MasterCard</g:payment_accepted>
   <g:payment_accepted>CartaSi</g:payment_accepted>
   <g:payment_accepted>Aura</g:payment_accepted>
   <g:payment_accepted>PayPal</g:payment_accepted>
  </item>
  ...
 </channel>
</rss>

我需要生成一個 XML 文件,從<item>中的所有標簽中清除<g:mpn><link><g:sale_price><qty>

在上面的例子中,結果應該是

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
 <channel>
  <title>E-commerce's products.</title>
  <description><![CDATA[Clothing and accessories.]]></description>
  <link>https://www.ourwebsite.com/</link>
  <item>
   <g:mpn><![CDATA[5643785645]]></g:mpn>
   <link><![CDATA[https://www.ourwebsite.com/product_1_url.htm?mid=62367]]></link>
   <g:sale_price>29.25 EUR</g:sale_price>
   <qty>3</qty>
  </item>
  <item>
   <g:mpn><![CDATA[573489547859]]></g:mpn>
   <link><![CDATA[https://www.ourwebsite.com/product_2_url.htm?mid=168192]]></link>
   <g:sale_price>143.91 EUR</g:sale_price>
   <qty>1</qty>
  </item>
  ...
 </channel>
</rss>

我查看了 SimpleXML、DOMDocument、XPath 文檔,但找不到排除特定元素的方法。 我不想 select 命名我必須刪除的節點,因為將來谷歌可能會添加一些節點並且它們不會被我的腳本刪除。

我還嘗試使用 SimpleXML 遍歷命名空間元素,如果與我必須保留的節點不匹配,則取消設置它們:

$g = $element->children($namespaces['g']); //$element is the SimpleXMLElement of <item> tag
foreach ($g as $gchild) {
    if ($gchild->getName() != "mpn") {  //for example
        unset($gchild);
    }
}

但是上面的代碼並沒有刪除除<g:mpn>之外的所有節點,例如。

PS:考慮到 XML 包含命名空間和非命名空間元素的事實

先感謝您。

編輯:我已經設法使用以下代碼做到這一點:

$elementsToKeep = array("mpn", "link", "sale_price", "qty");

$domdoc = new DOMDocument();
$domdoc->preserveWhiteSpace = FALSE;
$domdoc->formatOutput = TRUE;
$domdoc->loadXML($myXMLDocument->asXML());  //$myXMLDocument is the SimpleXML document related to the original XML
$xpath = new DOMXPath($domdoc);

foreach ($element->children() as $child) {
    $cname = $child->getName();
    if (!in_array($cname, $elementsToKeep)) {
        foreach($xpath->query('/rss/channel/item/'.$cname) as $node) {
            $node->parentNode->removeChild($node);
        }
    }
}

$g = $element->children($namespaces['g']);
foreach ($g as $gchild) {
    $gname = $gchild->getName();
    if (!in_array($gname, $elementsToKeep)) {
        foreach($xpath->query('/rss/channel/item/g:'.$gname) as $node) {
            $node->parentNode->removeChild($node);
        }
    }
}

我在無命名空間標簽和命名空間標簽上使用了 DOMDocument 和 DOMXPath 以及兩個循環,以便使用 DOMDocument 的removeChild function。

真的沒有更清潔的解決方案嗎? 再次感謝

稍微簡單一些:

$items = $xpath->query('//item');
foreach($items as $item) {
        $targets = $xpath->query('.//*',$item);
        foreach($targets as $target) {
            if (!in_array($target->localName, $elementsToKeep)) {
                $target->parentNode->removeChild($target);
            }
        };
    };

使用 XPath 來表示您要刪除的所有子元素。

然后使用您選擇的庫來刪除元素。

SimpleXMLElement 示例:

$sxe = simplexml_load_string($xml);
foreach ($sxe->xpath('//item/*[
    not(
           name() = "g:mpn" 
        or name() = "link" 
        or name() = "g:sale_price" 
        or name() = "qty"
    )
]') as $child) unset($child[0]);


echo $sxe->asXML(), "\n";

DOMDocument 示例:

這與前面的示例基本相同,只是對 xpath 表達式進行了一些修改,以顯式使用元素的命名空間 URI。 這可以防止它在命名空間前綴更改時中斷(它也適用於 SimpleXMLElement 示例):

$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);

foreach ($xpath->query('//item/*[
    not(
           (local-name() = "mpn"        and namespace-uri() = "http://base.google.com/ns/1.0") 
        or (local-name() = "link"       and namespace-uri() = "") 
        or (local-name() = "sale_price" and namespace-uri() = "http://base.google.com/ns/1.0") 
        or (local-name() = "qty"        and namespace-uri() = "")
    )
]') as $child) {
    $child->parentNode->removeChild($child);
}

echo $doc->saveXML(), "\n";

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM