get text from html using lxml

Question

I'm trying to get the list of celebrity names from this site using Xpath from lxml, but having trouble.

Here is the HTML

<div class="lists">
            <dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a>    </dd>

And I want to get the text Adam Levine

My code in python is...

celebs = tree.xpath('//dd[a]/following-sibling::node()')

But my result is Element dd at 0x1084ad4c8>...

If anyone could help that would be great. Thanks

Answer 1

Extract the text with text() , not the following-sibling::node() , like this:

from lxml import etree

# your HTML is invalid, I have purposefully put the </dl> and </div> closing tags
s = '''<div class="lists">
            <dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a>    </dd></dl></div>'''

tree = etree.fromstring(s)

tree.xpath('.//dd/a/text()')
['Adam Levine']

get text from html using lxml

Question

1 answers

solution1
0 2014-11-08 17:12:48

get text from html using lxml

Question

1 answers

solution1 0 2014-11-08 17:12:48

solution1
0 2014-11-08 17:12:48