简体   繁体   English

如何在Rails中加速抓取

[英]How to speed up scraping in Rails

This code works fine locally, but on Heroku it takes more than 30 seconds due to a request timeout: 这段代码在本地运行良好,但是在Heroku上,由于请求超时,它花费了30秒钟以上:

if @url
  @arr = Array.new
  begin
    doc = Nokogiri::HTML(open(@url))
    doc.css(".new-cars-results-box").each do |item|
      hash = Hash.new

      type = item.at_css(".new-car-name").text
      link = "http://uae.yallamotor.com"+item.at_css(".new-car-name")[:href]
      @arr << [link,type]
    end
  rescue
  end

end

How can I speed this up? 我怎样才能加快速度?

you query the DOM 2 times any result box when you can just query once for all the '.new-car-name', and you creating for each one useless hash 您只需对所有'.new-car-name'进行一次查询就可以对DOM查询任何结果框2次,并且为每个无用的哈希创建

Try this: 尝试这个:

if @url
    @arr = Array.new
  begin
    doc = Nokogiri::HTML(open(@url))
    url_prefix = 'http://uae.yallamotor.com'
    doc.css(".new-cars-results-box > .new-car-name").each do |item|
      type = item.text
      link = url_prefix + item[:href]
      @arr << [link,type]
    end
  rescue
  end
end

also try to replace this line 也尝试替换这条线

doc.css(".new-cars-results-box > .new-car-name").each do |item|

with this: 有了这个:

doc.css(".new-car-name").each do |item|

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM