I'm trying to get the list of celebrity names from this site using Xpath from lxml, but having trouble.
Here is the HTML
<div class="lists">
<dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a> </dd>
And I want to get the text Adam Levine
My code in python is...
celebs = tree.xpath('//dd[a]/following-sibling::node()')
But my result is Element dd at 0x1084ad4c8>...
If anyone could help that would be great. Thanks
Extract the text with text()
, not the following-sibling::node()
, like this:
from lxml import etree
# your HTML is invalid, I have purposefully put the </dl> and </div> closing tags
s = '''<div class="lists">
<dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a> </dd></dl></div>'''
tree = etree.fromstring(s)
tree.xpath('.//dd/a/text()')
['Adam Levine']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.