I am new to scrapy. I would like to make my web crawler for my personal experiment,that would crawl the entire Internet and store the URL of e-commerce websites to my db.I have searched all over the Google ,and found this this one and many more are almost same.
But there is start_urls = ['http://brickset.com/sets/year-2016']
that I want to modify and want to add whole Internet.Is this possible ? if yes ,please guide me the right approach.
Thanks in advance.
So let's approach this problem a bit differently. It is actually impossible to build a crawler that can actually crawl all e-commerce websites and bring you the results.
This leaves us with our best option Search Engines
. What you can rather do is crawl any of the search engines with your product query
and gather links that have a product listed for selling.
The second challenge that you'll face is how to tell the difference between e-commerce
sites and other
sites. Tools like DiffBot
would really help in that.
This needs to be done real-time cause obviously you won't be planning on making a humongous database of all the products out there on the indexed sites on internet.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.