如何使用lxml删除不在标签中的文本？

Question

Now i got xml like following: 现在我得到如下的xml：

<div>
<p>the first paragraph</p>
<p>the sencond paragraph</p>
something others...
</div>

And i want remove these something others... from object content . 我想从对象content删除其他...

I know it can be got by using content.xpath('.//text()[not(ancestor::p)]') , but it seems be no good method to remove these text directly from object. 我知道可以通过使用content.xpath('.//text()[not(ancestor::p)]') ，但似乎不是直接从对象中删除这些文本的好方法。

Update: I tried //p[last()]/following::* , it does not works as i want... 更新：我试过//p[last()]/following::* ，它不能按我想要的方式工作...

Answer 1

They are stored in the tail attribute of the previous sibling tag, so to remove all these "something others..." do: 它们存储在上一个兄弟标记的tail属性中，因此要删除所有这些“其他...”，请执行以下操作：

for elem in document.iter():
    elem.tail = ''

edit : 编辑：

To remove the tail texts of every last p sibling in the document: 要删除每一个最后的尾巴文本p文档中的兄弟：

for elem in document.iter():
    if elem.tag == 'p' and not elem.getnext():
        elem.tail = ''

如何使用lxml删除不在标签中的文本？

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-12-23 10:47:05

如何使用lxml删除不在标签中的文本？

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-12-23 10:47:05

解决方案1
2 已采纳 2014-12-23 10:47:05