简体   繁体   English

Shell脚本提取递归xml标签

[英]Shell script to extract recursive xml tags

I have an XML file of form: 我有一个格式的XML文件:

...
<element1>
<element2>
<group1>
<tag1>value</tag1>
<tag2>value</tag2>
</group1>
<group1>
<tag1>value</tag1>
<tag2>value</tag2>
</group1>
<element2>
...

I used 我用了

sed -n '/\<group1\>/,\<\/group1>/p' filename

to extract all content of group1 tags, including all childs. 提取group1标记的所有内容,包括所有子项。 This is exactly what I want. 这正是我想要的。

<group1>
<tag1>value</tag1>
<tag2>value</tag2>
</group1>
<group1>
<tag1>value</tag1>
<tag2>value</tag2>
</group1>

However, if the input XML is of form 但是,如果输入XML的格式为

...
<element1>
<element2>
<group2>
     <group2>value</group2>
     <otherTag>value</otherTag>
</group2>
<element3>
<group2>
     <group2>value</group2>
     <otherTag>value</otherTag>
</group2>
...

And I tried to extract following content 我试图提取以下内容

<group2>
     <group2>value</group2>
     <otherTag>value</otherTag>
</group2>
<group2>
     <group2>value</group2>
     <otherTag>value</otherTag>
</group2>

The sed command above just returns: 上面的sed命令只返回:

<group2>
     <group2>value</group2>

It understands the stop pattern </group2> and do no more extraction. 它了解停止模式</group2>并且不再进行提取。 I'm quite confused here. 我在这里很困惑。 Why doesn't it continue extracting the next <group2> , as in <group1> case. 为什么不继续提取下一个<group2> ,就像<group1>一样。 Is there any way to make it work with sed? 有什么办法可以使其与sed一起使用? and any other alternatives? 还有其他选择吗?

您可以像这样更改sed

sed -n '/\<group1\>/,/^<\/group1>/p' filename  | grep -v 'element3'

Far better to use XPath with a command line xpath interpreter, like xpath, xmlstarlet, my xidel or xmllint. 最好将XPath与命令行xpath解释器一起使用,例如xpath,xmlstarlet,我的xidel或xmllint。

All group elements on the 3rd level: 第三层上的所有组元素:

/elememt1/*/group1

All group elements that do not contain a group2: 所有不包含group2的组元素:

//group2[not(group2)]

Some like this? 像这样吗?

awk '/^<group2>/,/^<\/group2>/' file
<group2>
     <group2>value</group2>
     <otherTag>value</otherTag>
</group2>
<group2>
     <group2>value</group2>
     <otherTag>value</otherTag>
</group2>

This works if there are different spacing on the tag, if all is adjusted to the left, it will not work 如果标签上的间距不同,这将起作用,如果所有间距都向左调整,它将无法工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM