简体   繁体   English

用scrapy刮一页

[英]Scraping one page with scrapy

I have a long stream of URLs that I need to scrape and extract data, I want to use scrapy for that.我有很长的 URL 流需要抓取和提取数据,我想为此使用scrapy。

Say I have a Twisted reactor setup and I creating spider假设我有一个 Twisted reactor 设置并且我创建了蜘蛛

runner = CrawlerRunner(scrapy_settings)
d = runner.crawl(spider_cls)

Is there a way I can send URLs to spider so it'll process them?有没有办法将 URL 发送给蜘蛛,以便它处理它们?

crawl()方法可以接受额外的参数:

d = runner.crawl(spider_cls, start_urls=["url"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM