简体   繁体   中英

How to make web crawler for all e-commerce websites

I am new to scrapy. I would like to make my web crawler for my personal experiment,that would crawl the entire Internet and store the URL of e-commerce websites to my db.I have searched all over the Google ,and found this this one and many more are almost same.

But there is start_urls = ['http://brickset.com/sets/year-2016'] that I want to modify and want to add whole Internet.Is this possible ? if yes ,please guide me the right approach.

Thanks in advance.

So let's approach this problem a bit differently. It is actually impossible to build a crawler that can actually crawl all e-commerce websites and bring you the results.

This leaves us with our best option Search Engines . What you can rather do is crawl any of the search engines with your product query and gather links that have a product listed for selling.

The second challenge that you'll face is how to tell the difference between e-commerce sites and other sites. Tools like DiffBot would really help in that.

This needs to be done real-time cause obviously you won't be planning on making a humongous database of all the products out there on the indexed sites on internet.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM