使用lxml从html获取文本

Question

I'm trying to get the list of celebrity names from this site using Xpath from lxml, but having trouble. 我正在尝试使用来自lxml的Xpath从该站点获取名人姓名列表，但遇到了麻烦。

Here is the HTML 这是HTML

<div class="lists">
            <dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a>    </dd>

And I want to get the text Adam Levine 我想得到文本亚当·莱文

My code in python is... 我在python中的代码是...

celebs = tree.xpath('//dd[a]/following-sibling::node()')

But my result is Element dd at 0x1084ad4c8>... 但是我的结果是元素dd在0x1084ad4c8> ...

If anyone could help that would be great. 如果有人可以帮助，那就太好了。 Thanks 谢谢

Answer 1

Extract the text with text() , not the following-sibling::node() , like this: 使用text()而不是following-sibling::node()提取文本，如下所示：

from lxml import etree

# your HTML is invalid, I have purposefully put the </dl> and </div> closing tags
s = '''<div class="lists">
            <dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a>    </dd></dl></div>'''

tree = etree.fromstring(s)

tree.xpath('.//dd/a/text()')
['Adam Levine']

使用lxml从html获取文本

问题描述

1 个解决方案

解决方案1
0 2014-11-08 17:12:48

使用lxml从html获取文本

问题描述

1 个解决方案

解决方案1 0 2014-11-08 17:12:48

解决方案1
0 2014-11-08 17:12:48