简体   繁体   中英

Lxml parse DIV inside Tag from HTML

I want to parse big HTML Text from a website. So i Have parsed the Div and now I want the Content inside the Tag for example:

<div id="lala"><p>I WANT</p> <ul><li>THIS</li></ul>. <p>All of them</p></div>

This is my following code:

patchpage = requests.get(href)
        tree = html.fromstring(patchpage.content)
        patch_message = tree.xpath('//div[@class="messageText"]')
        for item in patch_message:
            await client.say(item.text.strip())  # This is bugging and give me error
        return await client.say(patch_message)

for now patch_message gives me:

[<Element div at 0x29c4be2fa98>]

Not really what I except :/ Can someone tell me how to parse the div content to python?

Assuming that you get the error AttributeError: 'NoneType' object has no attribute 'strip'

You just need to exclude the None objects from being stripped.

for item in patch_message:
    if item.text:
        print item.text.strip()

text_content ():

Returns the text content of the element, including the text content of its children, with no markup.

To extract all text content from every div tag in your patch_message list, simply use item[0].text_content() for every item.

tree.xpath() returns a list of found elements.

patch_message = tree.xpath('//div[@class="messageText"]')
        for item in patch_message:
            await client.say(item[0].text_content())
        return await client.say(patch_message)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM