使用python＆scrapy删除第一个标记html

Question

我有一个HTML：

<div class="abc">
            <div class="xyz">
                <div class="needremove"></div>
                <p>text</p>
                <p>text</p>
                <p>text</p>
                <p>text</p>
            </div>
    </div>

我用过：response.xpath（'// div [包含（@class，“ abc”）] / div [包含（@class，“ xyz”）]'）。extract（）

结果：

u'['<div class="xyz">
        <div class="needremove"></div>
        <p>text</p>
        <p>text</p>
        <p>text</p>
        <p>text</p>
    </div>']

我要删除<div class="needremove"></div> 。 你能帮我吗？

Answer 1

您可以使用class="needremove"获得除div以外的所有所有子标记：

response.xpath('//div[contains(@class, "abc")]/div[contains(@class, "xyz")]/*[local-name() != "div" and not(contains(@class, "needremove"))]').extract()

外壳演示：

$ scrapy shell index.html
In [1]: response.xpath('//div[contains(@class, "abc")]/div[contains(@class, "xyz")]/*[local-name() != "div" and not(contains(@class, "needremove"))]').extract()
Out[1]: [u'<p>text</p>', u'<p>text</p>', u'<p>text</p>', u'<p>text</p>']

使用python＆scrapy删除第一个标记html

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-06-05 08:08:46

使用python＆scrapy删除第一个标记html

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-06-05 08:08:46

解决方案1
1 已采纳 2015-06-05 08:08:46