[英]Scrapy: Using CSS Selectors to exclude a node/tag
In the documentation and SO articles, there are only references on how to exclude CSS classes using this nomenclature: 在文档和SO文章中,仅提供有关如何使用此术语排除CSS类的参考:
response.css("div[id='content']:not([class*='infobox'])")
What I want to achieve however is to exclude a node, or even, multiple nodes, such as <span>
and <div>
elements which are inside an <li>
element. 但是,我要实现的是排除一个节点,甚至排除多个节点,例如<li>
元素内的<li>
<span>
和<div>
元素。
Let me give you an example. 让我给你举个例子。 Let's say I am scraping this HTML: 假设我正在抓取以下HTML:
<li class="classA">
<div class="classB">
..
</div>
<span class="classC">Whatever</span>
This is the string I want to scrape
</li>
,and I am only interested in scraping the text "This is the string I want to scrape", thus I want to skip both <div>
and <span>
nodes. ,而我只对刮擦文本“这是我要刮擦的字符串”感兴趣,因此我想同时跳过<div>
和<span>
节点。 I tried to use the following, inside the scrapy shell, to no avail: 我试图在刮y的外壳中使用以下内容,但无济于事:
response.css(".classA:not(span|div)::text").extract()
,but I am still getting the excluded nodes. ,但我仍在获取排除的节点。
response.css('li.classA::text').extract_first()
response.xpath('//li[@class = "classA"]/text()').extract_first()
简单:
response.css('li::text').extract_first()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.