[英]Not getting expected output in scrapy
I'm doing web scraping but i am not getting the output i expected.我正在做 web 抓取,但我没有得到我预期的 output。
I'm learning web scraping and still a beginner.我正在学习 web 抓取,但仍然是初学者。 The problem is that not all the quotes are being scraped.
问题是并不是所有的报价都被刮掉了。
import scrapy
class QuoteSpider(scrapy.Spider):
name = 'Quotes'
start_urls = [
'http://quotes.toscrape.com/'
]
def parse(self, response):
for quotes in response.selector.xpath("//div[@class='quote']"):
yield{
'text':quotes.xpath("//span[@class='text']/text()").extract_first(),
'author':quotes.xpath("//small[@class='author']/text()").extract_first(),
'tags':quotes.xpath("//div[@class='tags']/child::a/text()").extract(),
}
I am expecting that all the quotes on the first page should be scraped.我希望第一页上的所有报价都应该被刮掉。 Instead i get same quote and author again and again but it is extracting all the tags everytime.
相反,我一次又一次地得到相同的报价和作者,但它每次都提取所有标签。 I am still a beginner.
我还是个初学者。 I'll appreciate the help.
我会很感激帮助。
this is a common mistake when using xpath on nested selectors.这是在嵌套选择器上使用 xpath 时的常见错误。
When you use xpath on a selector that you already extracted, if you want to use what you already extracted as the root for the new xpath selector, you need to start the xpath with .
当您在已提取的选择器上使用 xpath 时,如果您想将已提取的内容用作新 xpath 选择器的根,则需要使用 Z3D788FA62D7C185A1BEE4C9147EE1091 启动
.
. . If you don't do that, it will just use all the DOM as it normally does.
如果您不这样做,它将像往常一样使用所有 DOM。
So just change the final lines to:所以只需将最后几行更改为:
{
'text':quotes.xpath(".//span[@class='text']/text()").extract_first(),
'author':quotes.xpath(".//small[@class='author']/text()").extract_first(),
'tags':quotes.xpath(".//div[@class='tags']/child::a/text()").extract(),
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.