简体   繁体   English

如何使用PowerShell解析XML文件并删除两个标签之间的文本?

[英]How can I parse an XML file and delete text between two tags using PowerShell?

I have a file that has multiple instances of the following: 我有一个包含以下多个实例的文件:

<password encrypted="True">271NFANCMnd8BFdERjHoAwEA7BTuX</password>

But for each instance the password is different. 但是对于每个实例,密码都是不同的。

I would like the output to delete the encyrpted password: 我希望输出删除加密密码:

<password encrypted="True"></password>

What is the best method using PowerShell to loop through all instances of the pattern within the file and output to a new file? 使用PowerShell在文件中的模式的所有实例之间循环并输出到新文件的最佳方法是什么?

Something like: 就像是:

gc file1.txt | (regex here) > new_file.txt

where (regex here) is something like: 其中(这里的正则表达式)类似于:

s/"True">.*<\/pass//

This one is fairly easy in regex, and you can do it that way, or you can parse it as actual XML, which may be more appropriate. 在正则表达式中,这很容易,您可以这样做,也可以将其解析为实际的XML,这可能更合适。 I'll demonstrate both ways. 我将演示两种方式。 In each case, we'll start with this common bit: 在每种情况下,我们将从以下共同点开始:

$raw = @"
<xml>
    <something>
        <password encrypted="True">hudhisd8sd9866786863rt</password>
    </something>
    <another>
        <thing>
            <password encrypted="True">nhhs77378hd8y3y8y282yr892</password>
        </thing>
    </another>
    <test>
        <password encrypted="False">plain password here</password>
    </test>
</xml>
"@

Regex 正则表达式

$raw -ireplace '(<password encrypted="True">)[^<]+(</password>)', '$1$2'

or: 要么:

$raw -ireplace '(?<=<password encrypted="True">).+?(?=</password>)', ''

XML XML格式

$xml = [xml]$raw

foreach($password in $xml.SelectNodes('//password')) {
    $password.InnerText = ''
}

Only replace the encrypted passwords: 仅替换加密的密码:

$xml = [xml]$raw

foreach($password in $xml.SelectNodes('//password[@encrypted="True"]')) {
    $password.InnerText = ''
}

Explanations 说明

Regex 1 正则表达式1

(<password encrypted="True">)[^<]+(</password>)

正则表达式可视化

Debuggex Demo Debuggex演示

The first regex method uses 2 capture groups to capture the opening and closing tags, and replaces the entire match with those tags (so the middle is omitted). 第一个regex方法使用2个捕获组来捕获开始和结束标签,并用这些标签替换整个匹配项(因此省略中间部分)。

Regex 2 正则表达式2

(?<=<password encrypted="True">).+?(?=</password>)

正则表达式可视化

Debuggex Demo Debuggex演示

The second regex method uses positive lookaheads and lookbehinds. 第二种正则表达式方法使用正向先行和后向。 It finds 1 or more characters which are preceded by the opening tag and followed by the closing tag . 它会发现1个或多个字符, 这些字符的前面是开始标签,然后是结束标签 Since lookarounds are zero-width, they are not part of the match, therefore they don't get replaced. 由于环顾处是零宽度,因此不属于匹配项,因此不会被替换。

XML XML格式

Here we're using a simple xpath query to find all of the password nodes. 在这里,我们使用一个简单的xpath查询来查找所有password节点。 We iterate through each one with a foreach loop and set its innerText to an empty string. 我们使用foreach循环遍历每个循环,并将其innerText设置为空字符串。

The second version checks that the encrypted attribute is set to True and only operates on those. 第二个版本检查加密属性是否设置为True并且仅在这些属性上运行。

Which to Choose 选择哪个

I personally think that the XML method is more appropriate, because it means you don't have to account for variations in XML syntax so much. 我个人认为XML方法更合适,因为这意味着您不必过多考虑XML语法的变化。 You can also more easily account for different attributes specified on the nodes or different attribute values. 您还可以更轻松地考虑节点上指定的不同属性或不同的属性值。

By using xpath you have a lot more flexibility than with regex for processing XML. 通过使用xpath,与使用regex相比,您具有更大的灵活性来处理XML。

File operations 文件操作

I noticed your sample to read the data used gc (short for Get-Content ). 我注意到您的示例读取了gc使用的数据( Get-Content缩写)。 Be aware that this reads the file line-by-line. 请注意,这将逐行读取文件。

You can use this to get your raw content in one string, for conversion to XML or processing by regex: 您可以使用它来将原始内容转换为一个字符串,以转换为XML或通过正则表达式进行处理:

$raw = Get-Content file1.txt -Raw

You can write it out pretty easily too: 您也可以很容易地将其写出:

$raw | Out-File file1.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM