Lxml parse DIV inside Tag from HTML

Question

I want to parse big HTML Text from a website. So i Have parsed the Div and now I want the Content inside the Tag for example:

<div id="lala"><p>I WANT</p> <ul><li>THIS</li></ul>. <p>All of them</p></div>

This is my following code:

patchpage = requests.get(href)
        tree = html.fromstring(patchpage.content)
        patch_message = tree.xpath('//div[@class="messageText"]')
        for item in patch_message:
            await client.say(item.text.strip())  # This is bugging and give me error
        return await client.say(patch_message)

for now patch_message gives me:

[<Element div at 0x29c4be2fa98>]

Not really what I except :/ Can someone tell me how to parse the div content to python?

Answer 1

Assuming that you get the error AttributeError: 'NoneType' object has no attribute 'strip'

You just need to exclude the None objects from being stripped.

for item in patch_message:
    if item.text:
        print item.text.strip()

Answer 2

text_content ():

Returns the text content of the element, including the text content of its children, with no markup.

To extract all text content from every div tag in your patch_message list, simply use item[0].text_content() for every item.

tree.xpath() returns a list of found elements.

patch_message = tree.xpath('//div[@class="messageText"]')
        for item in patch_message:
            await client.say(item[0].text_content())
        return await client.say(patch_message)

Lxml parse DIV inside Tag from HTML

Question

2 answers

solution1
0 2017-11-07 15:25:45

solution2
0 2018-07-19 21:21:07

Lxml parse DIV inside Tag from HTML

Question

2 answers

solution1 0 2017-11-07 15:25:45

solution2 0 2018-07-19 21:21:07

solution1
0 2017-11-07 15:25:45

solution2
0 2018-07-19 21:21:07