I have a model Shop
in my database which sum up everything I want to know about a shop ( name
, url
, price
)
I would like some advise on what is the best way to deal with my situation. Basically, what I want is to scrap website (which don't have API) to get the price displayed on the page.
For example, let's say that I want to get the price from this page , every time a user visit the page X and the price from this page every time he comes to the page Y and so on with 1000+ pages.
The Shops
in my database would be :
Shop #1 : {:name => "Tshirt", :url => "XXXXX", :price => "PRICE_FROM_THE_URL"}
Shop #2 : {:name => "Veste", :url => "XXXXX", :price => "PRICE_FROM_THE_URL"}
I see two options to update the price every time a user ask it :
code
and do price = eval(Shop.code)
self.id
I tried both options. Both works as expected, but my concerns is that option #1 looks like the "ugly" one but easier to maintain, while option #2 is not well fitted if you have 1000+ shops to record and every one has a different scrapping method. I will end up with thousands of code line and it will become impossible to understand.
Nokogiri lets you scrape content by css selectors . Knowing that, considen following design guidelines:
Create a model with selectors for specified shop, name it: ShopSelectorGroup
(it can also be created as ActiveRecord model, to store selectors in database).
class ShopSelectorGroup attr_accessor: :price_selector, :other_selector, :shop_name end
Then create a class Scraper
which will be configured by injecting an instance of ShopSelectorGroup
class.
require 'nokogiri' require 'open-uri' class Scraper def initialize(selector_group) @selector_group = selector_group end def scrape(url) open(url) do |content| Nokogiri::HTML(content).css(@selector_group.price_selector).each do |data| yield data end end end end
Use it as follows:
selector_group = ShopSelectorGroup.new selector_group.price_selector = 'span.price' # or when used as ActiveRecord model # selector_group = ShopSelectorGroup.findByShopName('MyShop') scraper = Scraper.new(selector_group) scraper.scrape(url) do |data| p data # or persist data in database end
Hope this helps!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.