I need a little help with understanding of an XML in PowerShell. I have several XML files like this:
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.example.com/xml/catalog/2006-10-31">
<product product-id="11210">
...
<available-flag>true</available-flag>
<online-flag>false</online-flag>
<online-flag site-id="ru">true</online-flag>
<online-flag site-id="fr">true</online-flag>
<online-flag site-id="uk">false</online-flag>
<online-flag site-id="de">true</online-flag>
...
</product>
<product product-id="50610">
...
<available-flag>true</available-flag>
<online-flag>true</online-flag>
<online-flag site-id="ru">false</online-flag>
<online-flag site-id="fr">true</online-flag>
<online-flag site-id="uk">false</online-flag>
<online-flag site-id="de">fasle</online-flag>
...
</product>
<product product-id="82929">
...
<available-flag>true</available-flag>
<online-flag>true</online-flag>
<online-flag site-id="ru">false</online-flag>
<online-flag site-id="fr">true</online-flag>
<online-flag site-id="uk">false</online-flag>
<online-flag site-id="de">true</online-flag>
...
</product>
</catalog>
I need to get the values of two elements in PowerShell:
<online-flag>
(without site-id
attribute) <online-flag site-id="ru">
for the product with product-id="50610"
.
I have the following code:
$Path = "C:\Temp\0\2017-08-12_190211.xml"
$XPath = "/ns:catalog/ns:product[@product-id='50610']"
$files = Get-ChildItem $Path | Where {-not $_.PSIsContainer}
if ($files -eq $null) {
return
}
foreach ($file in $files) {
[xml]$xml = Get-Content $file
$namespace = $xml.DocumentElement.NamespaceURI
$ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
$ns.AddNamespace("ns", $namespace)
$product = $xml.SelectSingleNode($XPath, $ns)
}
Several questions:
With this code I am able to select the needed product node. PowerShell shows:
online-flag : {true, online-flag, online-flag, online-flag...}
But how then I can select the values of the needed online-flag
elements (if it is possible both ways: XPath one and the object one)?
Is it possible to select a node in the "object" way? Like this:
$product = $xml.catalog.product | Where-Object {$_."product-id".value -eq "50610"}
If I have several files, what is the best way to select filename, global online-flag (without attributes), specific online-flag?
I was able to get the data I need with the "object" way:
$product = $xml.catalog.product | Where-Object {$_."product-id" -eq "50610"}
$of = $product."online-flag"
$glblsid = $of | Where-Object {$_ -is [System.String]}
$specsid = ($of | Where-Object {$_."site-id" -eq "ru"})."#text"
But I don't like the way I managed to do this. Is there a more convenient solution?
And answer to the second question is yes - see the first line.
Use two different XPath expressions:
for selecting a node without a particular attribute:
//ns:product[@product-id='50610']/ns:online-flag[not(@site-id)]
for selecting a node with a particular attribute value:
//ns:product[@product-id='50610']/ns:online-flag[@site-id='ru']
You can select nodes relative to an already selected node by making the XPath expression relative to the current node ( .
):
$XPath = "/ns:catalog/ns:product[@product-id='50610']"
...
$product = $xml.SelectSingleNode($XPath, $ns)
$product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns)
$product.SelectSingleNode("./ns:online-flag[@site-id='ru']", $ns)
If you need result data consisting of the filename and the two node values I'd recommend building custom objects:
$files | ForEach-Object {
[xml]$xml = Get-Content $_
...
New-Object -Type PSObject -Property @{
'Filename' = $_
'online' = $product.SelectSingleNode("./ns:online-flag[not(@site-id)]", $ns).'#text'
'ru_online' = $product.SelectSingleNode("./ns:online-flag[@site-id='ru']", $ns).'#text'
}
}
Using dot-notation and filtering via Where-Object
should be possible, but I wouldn't recommend it. I find XPath far more efficient.
To complete this topic. I measured the performance of 3 methods: dot style, XPath on the file and XPath on the node. There is no significant differece between them. Here are the details.
I parsed 2 times 2 files 60MB each.
Object style (dot style)
... $StartTime = Get-Date foreach ($file in $files) { [xml]$xml = Get-Content $file #Object style $product = $xml.catalog.product | Where-Object {$_."product-id" -eq "50610"} $of = $product."online-flag" $glblsid = $of | Where-Object {$_ -is [System.String]} $specsid = ($of | Where-Object {$_."site-id" -eq "ru"})."#text" Write-Output "$($file.Name) $glblsid $specsid" } $EndTime = Get-Date $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime Write-Output $TimeSpan.TotalMilliseconds
Results:
\nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n36269,535 \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n36628,3304 \n
XPath on the file:
... $StartTime = Get-Date foreach ($file in $files) { [xml]$xml = Get-Content $file #XPath on the file $namespace = $xml.DocumentElement.NamespaceURI $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable) $ns.AddNamespace("ns", $namespace) $glblsid = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']/ns:online-flag[not(@site-id)]", $ns).'#text' $specsid = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']/ns:online-flag[@site-id='ru']", $ns).'#text' Write-Output "$($file.Name) $glblsid $specsid" } $EndTime = Get-Date $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime Write-Output $TimeSpan.TotalMilliseconds
Results:
\nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n36129,1368 \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n38890,3014 \n
XPath on the node:
... $StartTime = Get-Date foreach ($file in $files) { [xml]$xml = Get-Content $file #XPath on the node $namespace = $xml.DocumentElement.NamespaceURI $ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable) $ns.AddNamespace("ns", $namespace) $product = $xml.SelectSingleNode("/ns:catalog/ns:product[@product-id='50610']", $ns) $glblsid = $product.SelectSingleNode("ns:online-flag[not(@site-id)]", $ns).'#text' $specsid = $product.SelectSingleNode("ns:online-flag[@site-id='ru']", $ns).'#text' Write-Output "$($file.Name) $glblsid $specsid" } $EndTime = Get-Date $TimeSpan = New-TimeSpan -Start $StartTime -End $EndTime Write-Output $TimeSpan.TotalMilliseconds
Results:
\nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n33477,1708 \nPS> .\\ParseXML2.ps1 \n2017-08-10_190159.xml false false \n2017-08-11_190203.xml false true \n34116,7626 \n
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.