[英]How to select element from a list and go to parent and then sibling (by Xpath)
[英]xpath to select from child element to end of parent
我正在尝试使用lxml来执行此操作,但实际上这是关于正确的xpath的问题。 我想从<pgBreak>
元素中选择直到其父元素的末尾,在本例中为<p
>
<root>
<pgBreak pgId="1"/>
<p>
some text to fill out a para
<pgBreak pgId="2"/>
some more text
<quote> A quoted block </quote>
remainder of para
</p>
</root>
<root>
<pgBreak pgId="1"/>
<p>
some text to fill out a para
</p>
<pgBreak pgId="2"/>
<p>
some more text
<quote> A quoted block </quote>
remainder of para
</p>
</root>
您尝试做的事情并不简单:不仅要匹配“ pgBreak”元素和所有后续同级元素,还要将它们移出父级作用域并将同级元素包装在“ p”元素中。 好玩的东西。
以下代码应为您提供一个实现方法的想法(免责声明:仅用于示例,需要清理,可能不处理边缘情况)。 代码是故意取消注释的,因此您必须弄清楚它:)
我已经稍微修改了输入XML以更好地说明功能。
import lxml.etree
text = """
<root>
<pgBreak pgId="1"/>
<p>
some text to fill out a para
<pgBreak pgId="2"/>
some more text
<quote> A quoted block </quote>
remainder of para
<pgBreak pgId="3"/>
<p>
blurb
</p>
</p>
</root>
"""
root = lxml.etree.fromstring(text)
for pgbreak in root.xpath('//pgBreak'):
inner = pgbreak.getparent()
if inner == root:
continue
outer = inner.getparent()
pgbreak_index = inner.index(pgbreak)
inner_index = outer.index(inner) + 1
siblings = inner[pgbreak_index+1:]
inner.remove(pgbreak)
outer.insert(inner_index,pgbreak)
if siblings[0].tag != 'p':
p = lxml.etree.Element('p')
p.text = pgbreak.tail
pgbreak.tail = None
for node in siblings:
p.append(node)
outer.insert(inner_index+1,p)
else:
for node in siblings:
inner_index += 1
outer.insert(inner_index,node)
输出为:
<root>
<pgBreak pgId="1"/>
<p>
some text to fill out a para
</p>
<pgBreak pgId="2"/>
<p>
some more text
<quote> A quoted block </quote>
remainder of para
</p>
<pgBreak pgId="3"/>
<p>
blurb
</p>
</root>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.