简体   繁体   English

使用 PowerShell 和来自变量的节点解析 XML 文件

[英]Parsing an XML file with PowerShell with node from variable

Hello dear fellow Powershell users,亲爱的Powershell用户您好,

I'm trying to parse xml files, which can differ in structure.我正在尝试解析 xml 个文件,这些文件的结构可能有所不同。 Therefore, I want to access the node values based on the node structure received from a variable.因此,我想根据从变量接收到的节点结构来访问节点值。

Example例子

#XML file
$xml = [xml] @'
<node1>
    <node2>
        <node3>
            <node4>test1</node4>
        </node3>
    </node2>
</node1>
'@

Accessing the values directly works.直接访问这些值是有效的。

#access XML node directly -works-
$xml.node1.node2.node3.node4        # working <OK>

Accessing the values via node information from variable does not work.通过变量中的节点信息访问值不起作用。

#access XML node via path from variable -does not work-
$testnodepath = 'node1.node2.node3.node4'

$xml.$testnodepath                  # NOT working
$xml.$($testnodepath)               # NOT working

Is there a way to access the XML node values directly via receiving node information from a variable?有没有办法通过从变量接收节点信息直接访问 XML 节点值?

PS: I am aware, that there is a way via Selectnode, but I assume that is inefficient since it basically searching for keywords. PS:我知道,有一种方法可以通过 Selectnode,但我认为这是低效的,因为它基本上是在搜索关键字。

#Working - but inefficient
$testnodepath = 'node1/node2/node3/node4'
$xml.SelectNodes($testnodepath)

I need a very efficient way of parsing the XML file since I will need to parse huge XML files.我需要一种非常有效的方法来解析 XML 文件,因为我需要解析巨大的 XML 文件。 Is there a way to directly access the node values in the form $xml.node1.node2.node3.node4 by receiving the node structure from a variable?有没有办法通过从变量接收节点结构来直接访问 $xml.node1.node2.node3.node4 形式的节点值?

You might use the ExecutionContext ExpandString for this:您可以为此使用ExecutionContext ExpandString

$ExecutionContext.InvokeCommand.ExpandString("`$(`$xml.$testnodepath)")
test1

If the node path ( $testnodepath ) comes from outside (eg a parameter), you might want to prevent any malicious code injections by striping of any character that is not a word character or a dot ( . ):如果节点路径 ( $testnodepath ) 来自外部(例如参数),您可能希望通过剥离任何非单词字符或点 ( . ) 的字符来防止任何恶意代码注入

$securenodepath = $testnodepath -Replace '[^\w\.]'
$ExecutionContext.InvokeCommand.ExpandString("`$(`$xml.$securenodepath)")

You can split the string containing the property path into individual names and then dereference them 1 by 1:您可以将包含属性路径的字符串拆分为单独的名称,然后将它们一一解引用:

# define path
$testnodepath = 'node1.node2.node3.node4'

# create a new variable, this will be our intermediary for keeping track of each node/level we've resolved so far
$target = $xml

# now we just loop through each node name in the path
foreach($nodeName in $testnodepath.Split('.')){
  # keep advancing down through the path, 1 node name at a time
  $target = $target.$nodeName
}

# this now resolves to the same value as `$xml.node1.node2.node3.node4`
$target

I will need to parse huge XML files我将需要解析巨大的 XML 个文件

The following presents a memory-friendly streaming approach, that doesn't require to load the whole XML document (DOM) into memory. So you could parse really huge XML files even if they don't fit into memory. It should also improve parsing speed as we can simply skip elements that we are not interested in. To accomplish this, we use System.Xml.XmlReader to process XML elements on-the-fly , while they are read from the file.下面介绍了一种内存友好的流式处理方法,不需要将整个 XML 文档 (DOM) 加载到 memory。因此您可以解析非常大的 XML 文件,即使它们不适合 memory。它还应该改进解析速度,因为我们可以简单地跳过我们不感兴趣的元素。为此,我们使用System.Xml.XmlReader处理XML 个元素,同时从文件中读取它们。

I've wrapped the code in a reusable function :我将代码包装在可重复使用的 function 中

Function Import-XmlElementText( [String] $FilePath, [String[]] $ElementPath ) {

    $stream = $reader = $null

    try {
        $stream = [IO.File]::OpenRead(( Convert-Path -LiteralPath $FilePath )) 
        $reader = [System.Xml.XmlReader]::Create( $stream )

        $curElemPath = ''  # The current location in the XML document

        # While XML nodes are read from the file
        while( $reader.Read() ) {
            switch( $reader.NodeType ) {
                ([System.Xml.XmlNodeType]::Element) {
                    if( -not $reader.IsEmptyElement ) {
                        # Start of a non-empty element -> add to current path
                        $curElemPath += '/' + $reader.Name
                    }
                }
                ([System.Xml.XmlNodeType]::Text) {
                    # Element text -> collect if path matches
                    if( $curElemPath -in $ElementPath ) {
                        [PSCustomObject]@{
                            Path  = $curElemPath
                            Value = $reader.Value
                        }
                    }
                }
                ([System.Xml.XmlNodeType]::EndElement) {
                    # End of element - remove current element from the path
                    $curElemPath = $curElemPath.Substring( 0, $curElemPath.LastIndexOf('/') ) 
                }
            }
        }
    }
    finally {
        if( $reader ) { $reader.Close() }
        if( $stream ) { $stream.Close() }
    }
}

Call it like this:像这样称呼它:

Import-XmlElementText -FilePath test.xml -ElementPath '/node1/node2a/node3a', '/node1/node2b'

Given this input XML :鉴于此输入 XML

<node1>
    <node2a>
        <node3a>test1</node3a>
        <node3b/>
        <node3c a='b'/>
        <node3d></node3d>
    </node2a>
    <node2b>test2</node2b>
</node1>

This output is produced:这个output是这样产生的:

Path                 Value
----                 -----
/node1/node2a/node3a test1
/node1/node2b        test2

Actually the function outputs objects which can be processed by pipeline commands as usual or be stored in an array:实际上 function 输出对象可以像往常一样由管道命令处理或存储在数组中:

$foundElems = Import-XmlElementText -FilePath test.xml -ElementPath '/node1/node2a/node3a', '/node1/node2b'

$foundElems[1].Value  # Prints 'test2'

Notes:笔记:

  • Convert-Path is used to convert a PowerShell path (aka PSPath), which might be relative, to an absolute path that can be used by .NET functions. Convert-Path用于将可能是相对的 PowerShell 路径(又名 PSPath)转换为可由 .NET 函数使用的绝对路径。 This is required because .NET uses a different current directory than PowerShell and a PowerShell path can be in a form that .NET doesn't even understand (eg Microsoft.PowerShell.Core\FileSystem::C:\something.txt ).这是必需的,因为 .NET 使用与 PowerShell 不同的当前目录,并且 PowerShell 路径可以采用 .NET 甚至不理解的形式(例如Microsoft.PowerShell.Core\FileSystem::C:\something.txt )。
  • When encountering start of an element, we have to skip empty elements such as <node/> , because for such elements we don't enter the EndElement case branch, which would render the current path ( $curElemPath ) invalid (the element would not be removed from the current path again).当遇到元素的开始时,我们必须跳过空元素,例如<node/> ,因为对于这样的元素,我们不会进入EndElement case 分支,这会使当前路径 ( $curElemPath ) 无效(该元素不会再次从当前路径中删除)。

I have a similar requirement to this, however, my requirement is to set values referencing nodes using a variable.我对此有类似的要求,但是,我的要求是使用变量设置引用节点的值。 We need this ability so that we can have one script which can reference different psd1 files and set the information correctly.我们需要这种能力,以便我们可以有一个脚本可以引用不同的 psd1 文件并正确设置信息。 Hard coding paths mean we need multiple scripts to do the same thing.硬编码路径意味着我们需要多个脚本来做同样的事情。 As you can imagine this is a nightmare.你可以想象这是一场噩梦。

... The following works. ... 以下作品。

[XML]$doc = Get-Content $my_xml_file
$xml_cfg = Import-LocalizedData = xml_information.psd1
$xml_path = "FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id"
$doc.FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id = $xml_cfg.from_id

However, this fails: $doc.$xml_path = xml_cfg.from_id但是,这失败了:$doc.$xml_path = xml_cfg.from_id

ERROR: "The property 'FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id' cannot be found on this object. Verify that the property exists and can be set."

... ...

It is a real shame PowerShell cannot handle variable references to objects. PowerShell 不能处理对对象的变量引用真是太可惜了。 Referencing objects using variables works fine in Perl and thanks to these sorts of limitations prevents us from migrating all our code to PowerShell.使用变量引用对象在 Perl 中工作正常,由于这些限制,我们无法将所有代码迁移到 PowerShell。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM