I want to parse big HTML Text from a website. So i Have parsed the Div and now I want the Content inside the Tag for example:
<div id="lala"><p>I WANT</p> <ul><li>THIS</li></ul>. <p>All of them</p></div>
This is my following code:
patchpage = requests.get(href)
tree = html.fromstring(patchpage.content)
patch_message = tree.xpath('//div[@class="messageText"]')
for item in patch_message:
await client.say(item.text.strip()) # This is bugging and give me error
return await client.say(patch_message)
for now patch_message gives me:
[<Element div at 0x29c4be2fa98>]
Not really what I except :/ Can someone tell me how to parse the div content to python?
Assuming that you get the error AttributeError: 'NoneType' object has no attribute 'strip'
You just need to exclude the None objects from being stripped.
for item in patch_message:
if item.text:
print item.text.strip()
text_content ():
Returns the text content of the element, including the text content of its children, with no markup.
To extract all text content from every div
tag in your patch_message
list, simply use item[0].text_content()
for every item.
tree.xpath()
returns a list of found elements.
patch_message = tree.xpath('//div[@class="messageText"]')
for item in patch_message:
await client.say(item[0].text_content())
return await client.say(patch_message)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.