简体繁体中英

Python Scrapy - Scraping data from multiple website URLs

原文 2014-11-18 06:24:31 2 1 python/ xpath/ web-scraping/ scrapy

For one of my web project I need to scrape data from different web sources. To keep it simple i am explaining with an example.

Lets say i want to scrape the data about mobiles listed in their manufacturer site.

http://www.somebrand1.com/mobiles/ . . http://www.somebrand3.com/phones/

I have huge list of URLs. Every brand's page will have their own way of HTML presentation for browser.

How can i write a normalized script to traverse the HTML of those listing web page URLs and scrape the data irrespective of the format they are in?

Or else do i need to write a script to scrape data from every pattern?

1 answers

This is called a Broad Crawling and, generally speaking, this is not an easy thing to implement because of the different nature, representation, loading mechanisms web-sites use.

The general idea would be to have a generic spider and some sort of a site-specific configuration where you would have a mapping between item fields and xpath expressions or CSS selectors used to retrieve the field values from the page. In a real life, things are not that simple as it seems, some fields would require post-processing, other fields would need to be extracted after sending a separate request etc. In other words, it would be very difficult to keep generic and reliable at the same time .

The generic spider should receive a target site as a parameter , read the site-specific configuration and crawl the site according to it.

Also see:

Broad Crawls

Scrapy approach to scraping multiple URLs

My Scrapy code in Python is inconsistent when scraping multiple Urls

scraping multiple differnet urls simultaneously using scrapy splash using python

order of json data is messed up when scraping multiple urls Scrapy

Scraping multiple urls and storing the corresponding data in separate files in scrapy

Scrapy: scraping multiple data

Scraping website using python & scrapy

Python data scraping with Scrapy

Scraping DATA from Javascript using SCRAPY and PYTHON

Scraping Data with Scrapy in Python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Scrapy approach to scraping multiple URLs My Scrapy code in Python is inconsistent when scraping multiple Urls scraping multiple differnet urls simultaneously using scrapy splash using python order of json data is messed up when scraping multiple urls Scrapy Scraping multiple urls and storing the corresponding data in separate files in scrapy Scrapy: scraping multiple data Scraping website using python & scrapy Python data scraping with Scrapy Scraping DATA from Javascript using SCRAPY and PYTHON Scraping Data with Scrapy in Python

Related Tags

Python Scrapy - Scraping data from multiple website URLs

Question

1 answers

solution1 4 ACCPTED 2014-11-18 06:31:26

solution1
4 ACCPTED 2014-11-18 06:31:26