lxml-忽略 在html中标记

Question

I wrote a tiny html-parser in Python using lxml. 我使用lxml在Python中编写了一个小的html解析器。 It's very useful, but I have a problem. 这很有用，但是我有一个问题。

I have the following code: 我有以下代码：

tags = doc.xpath('//table//tr/td[@align="right"]/b')
for tag in tags:
    print(x.text.strip())

It works fine. 工作正常。 But if there is a   tag inside a  element, like this: 但是，如果元素内有一个 标签，如下所示：

<b> first-half <br>
    second-half </b>

this code will only print first-half into the  tag. 此代码仅将first-half打印到标记中。

How can I get all of text in  even if there is a   tag? 即使有 标签，如何获取所有文本？

Thanks. 谢谢。

Answer 1

Use text_content() to extract all of the non-markup text within a tag. 使用text_content()提取标签中的所有非标记文本。 Replace x.text with x.text_content() . 将x.text替换为x.text_content() 。

lxml-忽略 <br> 在html中标记

问题描述

1 个解决方案

解决方案1
5 已采纳 2013-02-28 21:12:35

lxml-忽略 <br> 在html中标记

问题描述

1 个解决方案

解决方案1 5 已采纳 2013-02-28 21:12:35

解决方案1
5 已采纳 2013-02-28 21:12:35