简体繁体中英

Python Scrapy - How to scrape from 2 different website at the same time?

原文 2020-02-10 14:38:56 4 2 python/ scrapy

I need to scrape data from a list of domain given in Excel; The problem is that I need to scrape data from the original website (let's take for example : https://www.lepetitballon.com ) and data from similartech ( https://www.similartech.com/websites/lepetitballon.com ).

I want them to scrape at the same time so I could receive them and format them once at the end, after that i'll just go to the next domain.

Theoretically, I should just use 2 spiders in an asynchronous way with scrapy?

2 answers

Ideally you would want to keep spiders which scrape differently structured sites separate, that way your code will be a lot easier to maintain in the long run.

Theoretically, if, for some reason you MUST parse them in the same spider, you could just collect the URLs you want to scrape and based on the base path you could invoke different parser callback methods. That being said, I personally cannot think of a reason why you would have to do that. Even if you would have the same structure, you can just reuse your scrapy.Item classes.

Twisted networking library is used by the scrapy framework for its internal networking tasks, and the scrapy has provided to handle the concurrent requests in settings.

Explained here: https://docs.scrapy.org/en/latest/topics/settings.html#concurrent-requests

Or you could use multiple spider which are independent to each others which is already explained in scrapy docs, this might be what you are looking for.

By default, Scrapy runs a single spider per process when you run scrapy crawl. However, Scrapy supports running multiple spiders per process using the internal API.

https://docs.scrapy.org/en/latest/topics/practices.html#running-multiple-spiders-in-the-same-process

As per the efficiency you could choose either option A or B, this really depends upon your resources and requirements whereas option A can be good for lower resources with decent speed or option B can be ideal for better speed with higher resources consumption than option A.

How to scrape dynamic website - using python scrapy?

Python Scrapy: Login to a website then scrape

How to scrape the output of widgets on a website using python/scrapy?

How to scrape table with different xpath on the same level with Scrapy?

How to scrape all contents from infinite scroll website? scrapy

How do I scrape from this website using scrapy and splash?

How to scrape JavaScript rendered data from a website using Scrapy?

How to scrape data from multiple unrelated sections of a website (using Scrapy)

Python scrapy: how to scrape link by detecting class in the same level?

How to scrape 2 web page with same domain on scrapy using python?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to scrape dynamic website - using python scrapy? Python Scrapy: Login to a website then scrape How to scrape the output of widgets on a website using python/scrapy? How to scrape table with different xpath on the same level with Scrapy? How to scrape all contents from infinite scroll website? scrapy How do I scrape from this website using scrapy and splash? How to scrape JavaScript rendered data from a website using Scrapy? How to scrape data from multiple unrelated sections of a website (using Scrapy) Python scrapy: how to scrape link by detecting class in the same level? How to scrape 2 web page with same domain on scrapy using python?

Related Tags

Python Scrapy - How to scrape from 2 different website at the same time?

Question

2 answers

solution1
1 2020-02-10 16:53:46

solution2
0 2020-02-10 17:42:38

Python Scrapy - How to scrape from 2 different website at the same time?

Question

2 answers

solution1 1 2020-02-10 16:53:46

solution2 0 2020-02-10 17:42:38

solution1
1 2020-02-10 16:53:46

solution2
0 2020-02-10 17:42:38