简体   繁体   中英

Parsing XML with namespace with PowerShell

I need a little help with understanding of an XML in PowerShell. I have several XML files like this:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.example.com/xml/catalog/2006-10-31">
    <product product-id="11210">
        ...
        <available-flag>true</available-flag>
        <online-flag>false</online-flag>
        <online-flag site-id="ru">true</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">true</online-flag>
        ...
    </product>
    <product product-id="50610">
        ...
        <available-flag>true</available-flag>
        <online-flag>true</online-flag>
        <online-flag site-id="ru">false</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">fasle</online-flag>
        ...
    </product>
    <product product-id="82929">
        ...
        <available-flag>true</available-flag>
        <online-flag>true</online-flag>
        <online-flag site-id="ru">false</online-flag>
        <online-flag site-id="fr">true</online-flag>
        <online-flag site-id="uk">false</online-flag>
        <online-flag site-id="de">true</online-flag>
        ...
    </product>
</catalog>

I need to get the values of two elements in PowerShell:

  • <online-flag> (without site-id attribute)
  • <online-flag site-id="ru">

for the product with product-id="50610" .

I have the following code:

$Path = "C:\Temp\0\2017-08-12_190211.xml"
$XPath = "/ns:catalog/ns:product[@product-id='50610']"

$files = Get-ChildItem $Path | Where {-not $_.PSIsContainer}

if ($files -eq $null) {
    return
}

foreach ($file in $files) {
    [xml]$xml = Get-Content $file
    $namespace = $xml.DocumentElement.NamespaceURI
    $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
    $ns.AddNamespace("ns", $namespace)
    $product = $xml.SelectSingleNode($XPath, $ns)
}

Several questions:

  1. With this code I am able to select the needed product node. PowerShell shows:

     online-flag : {true, online-flag, online-flag, online-flag...} 

    But how then I can select the values of the needed online-flag elements (if it is possible both ways: XPath one and the object one)?

  2. Is it possible to select a node in the "object" way? Like this:

     $product = $xml.catalog.product | Where-Object {$_."product-id".value -eq "50610"} 
  3. If I have several files, what is the best way to select filename, global online-flag (without attributes), specific online-flag?

I was able to get the data I need with the "object" way:

$product = $xml.catalog.product | Where-Object {$_."product-id" -eq "50610"}
$of = $product."online-flag"
$glblsid = $of | Where-Object {$_ -is [System.String]}
$specsid = ($of | Where-Object {$_."site-id" -eq "ru"})."#text"

But I don't like the way I managed to do this. Is there a more convenient solution?

And answer to the second question is yes - see the first line.

Use two different XPath expressions:

  1. for selecting a node without a particular attribute:

     //ns:product[@product-id='50610']/ns:online-flag[not(@site-id)] 
  2. for selecting a node with a particular attribute value:

     //ns:product[@product-id='50610']/ns:online-flag[@site-id='ru'] 

You can select nodes relative to an already selected node by making the XPath expression relative to the current node ( . ):

$XPath = "/ns:catalog/ns:product[@product-id='50610']"
...
$product = $xml.SelectSingleNode($XPath, $ns)
$product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns)
$product.SelectSingleNode("./ns:online-flag[@site-id='ru']", $ns)

If you need result data consisting of the filename and the two node values I'd recommend building custom objects:

$files | ForEach-Object {
    [xml]$xml = Get-Content $_
    ...
    New-Object -Type PSObject -Property @{
        'Filename'  = $_
        'online'    = $product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns).'#text'
        'ru_online' = $product.SelectSingleNode("./ns:online-flag[@site-id='ru']", $ns).'#text'
    }
}

Using dot-notation and filtering via Where-Object should be possible, but I wouldn't recommend it. I find XPath far more efficient.

To complete this topic. I measured the performance of 3 methods: dot style, XPath on the file and XPath on the node. There is no significant differece between them. Here are the details.

I parsed 2 times 2 files 60MB each.

  1. Object style (dot style)

     ... $StartTime = Get-Date foreach ($file in $files) { [xml]$xml = Get-Content $file #Object style $product = $xml.catalog.product | Where-Object {$_."product-id" -eq "50610"} $of = $product."online-flag" $glblsid = $of | Where-Object {$_ -is [System.String]} $specsid = ($of | Where-Object {$_."site-id" -eq "ru"})."#text" Write-Output "$($file.Name) $glblsid $specsid" } $EndTime = Get-Date $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime Write-Output $TimeSpan.TotalMilliseconds 

    Results:

    \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n36269,535 \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n36628,3304 \n
  2. XPath on the file:

     ... $StartTime = Get-Date foreach ($file in $files) { [xml]$xml = Get-Content $file #XPath on the file $namespace = $xml.DocumentElement.NamespaceURI $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable) $ns.AddNamespace("ns", $namespace) $glblsid = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']/ns:online-flag[not(@site-id)]", $ns).'#text' $specsid = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']/ns:online-flag[@site-id='ru']", $ns).'#text' Write-Output "$($file.Name) $glblsid $specsid" } $EndTime = Get-Date $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime Write-Output $TimeSpan.TotalMilliseconds 

    Results:

    \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n36129,1368 \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n38890,3014 \n
  3. XPath on the node:

     ... $StartTime = Get-Date foreach ($file in $files) { [xml]$xml = Get-Content $file #XPath on the node $namespace = $xml.DocumentElement.NamespaceURI $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable) $ns.AddNamespace("ns", $namespace) $product = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']", $ns) $glblsid = $product.SelectSingleNode("ns:online-flag[not(@site-id)]", $ns).'#text' $specsid = $product.SelectSingleNode("ns:online-flag[@site-id='ru']", $ns).'#text' Write-Output "$($file.Name) $glblsid $specsid" } $EndTime = Get-Date $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime Write-Output $TimeSpan.TotalMilliseconds 

    Results:

    \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n33477,1708 \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n34116,7626 \n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM