Remove first tag html using python & scrapy

Question

I have a HTML:

<div class="abc">
            <div class="xyz">
                <div class="needremove"></div>
                <p>text</p>
                <p>text</p>
                <p>text</p>
                <p>text</p>
            </div>
    </div>

I used: response.xpath('//div[contains(@class,"abc")]/div[contains(@class,"xyz")]').extract()

Result:

u'['<div class="xyz">
        <div class="needremove"></div>
        <p>text</p>
        <p>text</p>
        <p>text</p>
        <p>text</p>
    </div>']

I want remove <div class="needremove"></div> . May you help me?

Answer 1

You can get all the child tags except the div with class="needremove" :

response.xpath('//div[contains(@class, "abc")]/div[contains(@class, "xyz")]/*[local-name() != "div" and not(contains(@class, "needremove"))]').extract()

Demo from the shell:

$ scrapy shell index.html
In [1]: response.xpath('//div[contains(@class, "abc")]/div[contains(@class, "xyz")]/*[local-name() != "div" and not(contains(@class, "needremove"))]').extract()
Out[1]: [u'<p>text</p>', u'<p>text</p>', u'<p>text</p>', u'<p>text</p>']

Remove first tag html using python & scrapy

Question

1 answers

solution1
1 ACCPTED 2015-06-05 08:08:46

Remove first tag html using python & scrapy

Question

1 answers

solution1 1 ACCPTED 2015-06-05 08:08:46

solution1
1 ACCPTED 2015-06-05 08:08:46