简体   繁体   English

使用sed在xml文件中查找和替换

[英]Find and replace in xml file using sed

I need to find and replace the value of the specific xml element. 我需要找到并替换特定xml元素的值。 The conditions are as follows: 条件如下:

  • the value of element enabled must be changed from 0 to 1 ; 启用的元素值必须从0更改为1 ;
  • enabled must be the child of an somenode element enabled必须是somenode元素的子元素

My test xml looks like this: 我的测试xml看起来像这样:

<somenode name="node1">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

I expect that first and third enabled elements would be changed. 我希望第一个和第三个启用的元素会被更改。 So far I have managed to write this sed command: 到目前为止,我已设法编写此sed命令:

sed -n "1h;1!H;${;g;s|\(<somenode [^>]*>\)\(.*\)\(<enabled>\s*\)0\(\s*</enabled>\)\(.*</somenode>\)|\1\2\3 1 \4\5|g;p;}" test.xml

but it changes only the last one, and I believe it is due to greedy match. 但它只改变了最后一个,我相信这是由于贪婪的比赛。 Any help would be appreciated. 任何帮助,将不胜感激。

It is generally a poor idea to try to use regexes to parse XML. 尝试使用正则表达式解析XML通常是一个糟糕的主意。 See previous discussion such as Parsing XML with REGEX in Java . 请参阅前面的讨论,例如使用Java中的REGEX解析XML (Actually your XML is not well-formed since it does not have exactly one root element). (实际上你的XML格式不正确,因为它没有一个根元素)。 There are many different (free) XML engines for parsing and manipulating XML in almost every language and I'd recommend you use one of those. 有许多不同的(免费)XML引擎用于解析和操作几乎所有语言的XML,我建议你使用其中的一种。

Use xmlstarlet if possible: 如果可能,请使用xmlstarlet:

echo '
<root>
<somenode name="node1">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</somenode>

<someothernode name="node2">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</someothernode>

<somenode name="node3">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</somenode>
</root>
' > testfile.xml


xml val testfile.xml
xml el -v testfile.xml

xml ed --help

# version 1
xml ed -u "//somenode[1]/enabled" -v '1' \
       -u "//somenode[2]/enabled" -v '1' \
       testfile.xml  

# version 2  (-L for in-place editing; xmlstarlet v1.0.2)
xml ed -L -u "//somenode[@name='node1']/enabled" -v '1' \
          -u "//somenode[@name='node3']/enabled" -v '1' \
          testfile.xml  

Forget sed for complex multi-line processing. 忘记sed进行复杂的多线处理。 Seriously. 认真。

If you're not willing to use a proper XML tool, at least use a standard string processing tool that has proper branching statements :-) 如果您不愿意使用正确的XML工具,至少使用具有适当分支语句的标准字符串处理工具:-)

If you can guarantee your file is formatted in the way you have it, you can use something like: 如果您可以保证文件的格式与您拥有的格式相同,则可以使用以下内容:

pax> echo '<somenode name="node1">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>
' | awk '
    BEGIN {s = 0}
    /^<somenode / {s=1}
    /^<\/somenode>/ {s=0}
    /^    <enabled>0<\/enabled>/ {if (s==1) {$0="    <enabled>1</enabled>"}}
    {print}
'

to get: 要得到:

<somenode name="node1">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>

The trouble with that sort of method is that it doesn't handle what may be perfectly valid XML files. 这种方法的问题在于它无法处理可能是完全有效的XML文件。 This particular version has certain limitations such as: 此特定版本具有某些限制,例如:

  • the somenode start and end tags must be at the start of the line. somenode开始和结束标记必须位于行的开头。
  • the enabled tag must be preceded by four spaces. enabled标记前面必须有四个空格。 You could work around these to make it a bit more flexible but, by the time you've written your script to handle any valid XML input, it'll be such a monstrosity that it would have been quicker to use an XML transformation tool. 您可以解决这些问题,使其更加灵活,但是当您编写脚本来处理任何有效的XML输入时,使用XML转换工具会更加快捷。

That's why it's better to use a tool built specifically for the job. 这就是为什么最好使用专门为工作而构建的工具。 But, if you just want a quick hack and the file format is under your control, it's probably okay to use the awk (or perl or python or your other quick-and-dirty scripting tool of choice). 但是,如果你只是想快速入侵并且文件格式在你的控制之下,那么使用awk (或perlpython或你选择的其他快速和脏的脚本工具)可能是可以的。

Other people have already explained why it is generally not a good idea to process XML with regular expressions. 其他人已经解释了为什么用正则表达式处理XML通常不是一个好主意

With all that in mind, here's the sed program to substitute text matching foo with bar between lines matching start and end (inclusively): 考虑到所有这些,这里是sed程序, 用匹配foo的文本替换匹配startend (包含)的行之间的bar

/start/,/end/s/foo/bar/

you can use gawk 你可以使用gawk

awk -vRS= '/somenode/{ 
    $0=gensub("(.*<enabled>)([01])(</enabled>.*)", "\\11\\3","g",$0) 
}1'  file

output 产量

$ ./shell.sh
<somenode name="node1">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>
<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>
<somenode name="node3">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>

You seems need to loop something with sed 你似乎需要用sed循环一些东西

http://www.rtfiber.com.tw/~changyj/sed/html/p.20070613a.html http://www.rtfiber.com.tw/~changyj/sed/html/p.20070613a.html

I still can't figure out though, just for your information. 我仍然想不通,只是为了你的信息。

your requirement is quite simple as seen from your description, therefore there's no need to use XML parsers/tools, if you don't want to. 从您的描述中可以看出,您的要求非常简单,因此如果您不愿意,则无​​需使用XML解析器/工具。 you can use just the shell(or other shell tools you may prefer) 你可以只使用shell(或者你喜欢的其他shell工具)

#!/bin/bash
while read -r line
do 
    case "$line" in
        *"<someothernode"* ) flag=0;;
        *"<somenode"* )flag=1;;
    esac
    if [ "$flag" -eq "1" ] ;then
        case "$line" in
            *"<enabled"* ) 
                echo "${line/<enabled>0/<enabled>1}"
                ;;
            *) echo $line;
        esac
    else
        echo $line
    fi    
done < "file"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM