[英]How to remove text not in tag using lxml?
Now i got xml like following: 现在我得到如下的xml:
<div>
<p>the first paragraph</p>
<p>the sencond paragraph</p>
something others...
</div>
And i want remove these something others... from object content
. 我想从对象
content
删除其他...
I know it can be got by using content.xpath('.//text()[not(ancestor::p)]')
, but it seems be no good method to remove these text directly from object. 我知道可以通过使用
content.xpath('.//text()[not(ancestor::p)]')
,但似乎不是直接从对象中删除这些文本的好方法。
Update: I tried //p[last()]/following::*
, it does not works as i want... 更新:我试过
//p[last()]/following::*
,它不能按我想要的方式工作...
They are stored in the tail
attribute of the previous sibling tag, so to remove all these "something others..." do: 它们存储在上一个兄弟标记的
tail
属性中,因此要删除所有这些“其他...”,请执行以下操作:
for elem in document.iter():
elem.tail = ''
edit : 编辑 :
To remove the tail texts of every last p
sibling in the document: 要删除每一个最后的尾巴文本
p
文档中的兄弟:
for elem in document.iter():
if elem.tag == 'p' and not elem.getnext():
elem.tail = ''
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.