简体   繁体   English

在 scrapy 中没有得到预期的 output

[英]Not getting expected output in scrapy

I'm doing web scraping but i am not getting the output i expected.我正在做 web 抓取,但我没有得到我预期的 output。

I'm learning web scraping and still a beginner.我正在学习 web 抓取,但仍然是初学者。 The problem is that not all the quotes are being scraped.问题是并不是所有的报价都被刮掉了。

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'Quotes'
    start_urls = [
    'http://quotes.toscrape.com/'
    ]
    def parse(self, response):
        for quotes in response.selector.xpath("//div[@class='quote']"):
            yield{
            'text':quotes.xpath("//span[@class='text']/text()").extract_first(),
            'author':quotes.xpath("//small[@class='author']/text()").extract_first(),
            'tags':quotes.xpath("//div[@class='tags']/child::a/text()").extract(),
            }

I am expecting that all the quotes on the first page should be scraped.我希望第一页上的所有报价都应该被刮掉。 Instead i get same quote and author again and again but it is extracting all the tags everytime.相反,我一次又一次地得到相同的报价和作者,但它每次都提取所有标签。 I am still a beginner.我还是个初学者。 I'll appreciate the help.我会很感激帮助。

this is a common mistake when using xpath on nested selectors.这是在嵌套选择器上使用 xpath 时的常见错误。

When you use xpath on a selector that you already extracted, if you want to use what you already extracted as the root for the new xpath selector, you need to start the xpath with .当您在已提取的选择器上使用 xpath 时,如果您想将已提取的内容用作新 xpath 选择器的根,则需要使用 Z3D788FA62D7C185A1BEE4C9147EE1091 启动. . . If you don't do that, it will just use all the DOM as it normally does.如果您不这样做,它将像往常一样使用所有 DOM。

So just change the final lines to:所以只需将最后几行更改为:

{
    'text':quotes.xpath(".//span[@class='text']/text()").extract_first(),
    'author':quotes.xpath(".//small[@class='author']/text()").extract_first(),
    'tags':quotes.xpath(".//div[@class='tags']/child::a/text()").extract(),
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM