Scrapy：使用CSS选择器排除节点/标签

Question

In the documentation and SO articles, there are only references on how to exclude CSS classes using this nomenclature: 在文档和SO文章中，仅提供有关如何使用此术语排除CSS类的参考：

response.css("div[id='content']:not([class*='infobox'])")

What I want to achieve however is to exclude a node, or even, multiple nodes, such as <span> and <div> elements which are inside an <li> element. 但是，我要实现的是排除一个节点，甚至排除多个节点，例如<li>元素内的<li> <span>和<div>元素。

Let me give you an example. 让我给你举个例子。 Let's say I am scraping this HTML: 假设我正在抓取以下HTML：

<li class="classA">
  <div class="classB">
    ..
  </div>

  <span class="classC">Whatever</span>

  This is the string I want to scrape
</li>

,and I am only interested in scraping the text "This is the string I want to scrape", thus I want to skip both <div> and <span> nodes. ，而我只对刮擦文本“这是我要刮擦的字符串”感兴趣，因此我想同时跳过<div>和<span>节点。 I tried to use the following, inside the scrapy shell, to no avail: 我试图在刮y的外壳中使用以下内容，但无济于事：

response.css(".classA:not(span|div)::text").extract()

,but I am still getting the excluded nodes. ，但我仍在获取排除的节点。

Answer 1

It's very easy: 这很简单：

1. Using css selector 1.使用CSS选择器

response.css('li.classA::text').extract_first()

2. Using xpath selector 2.使用xpath选择器

response.xpath('//li[@class = "classA"]/text()').extract_first()

Answer 2

简单：

response.css('li::text').extract_first()

Scrapy：使用CSS选择器排除节点/标签

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-03-24 06:30:13

It's very easy: 这很简单：

1. Using css selector 1.使用CSS选择器

2. Using xpath selector 2.使用xpath选择器

解决方案2
1 2019-01-24 08:41:59

Scrapy：使用CSS选择器排除节点/标签

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-03-24 06:30:13

It's very easy: 这很简单：

1. Using css selector 1.使用CSS选择器

2. Using xpath selector 2.使用xpath选择器

解决方案2 1 2019-01-24 08:41:59

解决方案1
2 已采纳 2019-03-24 06:30:13

解决方案2
1 2019-01-24 08:41:59