使用lxml.etree读取元素中的文本

Question

I am using the Python version of the lxml libray. 我正在使用lxml libray的Python版本。 I am currently trying to parse the text from a table but am encountering a problem in that some of the text is links. 我目前正在尝试从表中解析文本，但是遇到了一些文本是链接的问题。

For example, one of the cells may look something like this: 例如，一个单元格可能看起来像这样：

<td>
    Can I kick it, <a>to all the people</a> who can quest like a <a>tribe</a> does
</td>

Say after parsing the html, the td element is stored as foo . 说在解析html之后，td元素存储为foo 。 Then foo.text will not display the whole text, only the parts that aren't links. 然后foo.text将不显示整个文本，仅显示不是链接的部分。 Moreover, if I find the link text using [i.text for i in foo.getchildren()] I no longer know the order in which to put the non-link text and link text. 此外，如果我使用[i.text for i in foo.getchildren()]找到链接文本，我将不再知道放置非链接文本和链接文本的顺序。

Is there an easy way to get around this? 有一个简单的方法可以解决此问题吗？

Answer 1

Well after searching for an hour, within 2 minutes of posting this question I have found the solution. 在搜索一个小时之后，在发布此问题的2分钟内，我找到了解决方案。

Use the method foo.text_content() and this will display what is needed. 使用方法foo.text_content() ，这将显示所需的内容。

使用lxml.etree读取元素中的文本

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-09-23 00:38:16

使用lxml.etree读取元素中的文本

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-09-23 00:38:16

解决方案1
1 已采纳 2013-09-23 00:38:16