简体繁体 English

用scrapy刮一页

[英]Scraping one page with scrapy

原文 2016-05-31 19:23:36 1 1 python/ scrapy

I have a long stream of URLs that I need to scrape and extract data, I want to use scrapy for that.我有很长的 URL 流需要抓取和提取数据，我想为此使用scrapy。

Say I have a Twisted reactor setup and I creating spider假设我有一个 Twisted reactor 设置并且我创建了蜘蛛

runner = CrawlerRunner(scrapy_settings)
d = runner.crawl(spider_cls)

Is there a way I can send URLs to spider so it'll process them?有没有办法将 URL 发送给蜘蛛，以便它处理它们？

1 个解决方案

crawl()方法可以接受额外的参数：

d = runner.crawl(spider_cls, start_urls=["url"])

用 Scrapy 抓取 ajax 页面？ - Scraping ajax page with Scrapy?

Scrapy-刮一页并刮下一页 - Scrapy — Scraping a page and scraping next pages

Scrapy 不抓取页面上的所有项目 - Scrapy not scraping all items on a page

使用 Python 和 Scrapy 抓取 ASP 页面 - Scraping ASP page with Python and Scrapy

使用Scrapy刮擦递归页面数据 - Scraping recursive page data with Scrapy

Scrapy-递归地抓取到第三页 - Scrapy - Recursively scraping to third page

Scrapy 跟踪和抓取第三页 - Scrapy tracking and scraping third page

如果缺少一件物品，Scrapy 不会刮擦 - Scrapy not scraping if one item missing

Web使用Scrapy和CSS选择器刮擦整个页面 - Web Scraping a whole page with Scrapy and CSS Selectors

Python + scrapy +网站抓取：页面未抓取 - Python + scrapy + web scraping : page is not being scraped

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用 Scrapy 抓取 ajax 页面？ - Scraping ajax page with Scrapy? Scrapy-刮一页并刮下一页 - Scrapy — Scraping a page and scraping next pages Scrapy 不抓取页面上的所有项目 - Scrapy not scraping all items on a page 使用 Python 和 Scrapy 抓取 ASP 页面 - Scraping ASP page with Python and Scrapy 使用Scrapy刮擦递归页面数据 - Scraping recursive page data with Scrapy Scrapy-递归地抓取到第三页 - Scrapy - Recursively scraping to third page Scrapy 跟踪和抓取第三页 - Scrapy tracking and scraping third page 如果缺少一件物品，Scrapy 不会刮擦 - Scrapy not scraping if one item missing Web使用Scrapy和CSS选择器刮擦整个页面 - Web Scraping a whole page with Scrapy and CSS Selectors Python + scrapy +网站抓取：页面未抓取 - Python + scrapy + web scraping : page is not being scraped

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM