How to get the inner html of an element using scrapy

Question

This is my HTML document

<div class='my-class'>
    <p>some text</p>
</div>

I want to get the inner HTML of div.my-class element, which is:

<p>some text</p>

The inner html is not always a <p> it could be some other element.

Here is what I have tried but not able to get the desired output:

res = response.css('div.my-class').get(); 

/* result */
<div class='my-class'>
 <p>some text</p>
</div>

//-------------------------------------------

res = response.css('div.my-class::text').get(); 

/* result */
some text

Answer 1

Here is a way to get the children of the element of class my-class:

html = "<div class='my-class'><p>some text</p></div>"
response = Selector(text=html, type="html")
print(response.xpath('//*[@class="my-class"]/*').get())

Answer 2

The following CSS selector gets the expected output (* matches all descendant elements):

res = response.css('div.my-class::text *').get(); 

/* result */
<p>some text</p>

Note that if you have multiple child elements , then you need to use getall() to get the entire inner html, for example, if you have the following input:

<div class='my-class'>
    <h1>heade</h1>
    <p>
        outter paragraph
        <p>
            inner paragraph
            <link>label</label>
        </p>
    </p>
    
</div>

Then you can get all the inner elements, and join them into a single string variable:

// get all immediate children and put them into an array
res_array = response.css('div.my-class::text > *').getall(); 

// join the array elements into res
res = " ".join(res_array);

*Note: if you don't include > before , then it would recursively go through the inner elements, which means the inner elements appear more than one in the array

How to get the inner html of an element using scrapy

Question

2 answers

solution1
2 ACCPTED 2020-09-02 07:51:59

solution2
0 2020-09-02 08:11:52

How to get the inner html of an element using scrapy

Question

2 answers

solution1 2 ACCPTED 2020-09-02 07:51:59

solution2 0 2020-09-02 08:11:52

solution1
2 ACCPTED 2020-09-02 07:51:59

solution2
0 2020-09-02 08:11:52