繁体   English   中英

太多连接重置异常错误 - Ruby 中的机械化

[英]Too many connection resets Exception Error - Mechanize in Ruby

我在 Ruby 上使用 Mechanize 并不断收到此异常错误

C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:158:in `rescue in rbuf_fill': too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 37920120, last used 1457465950.371121 seconds ago (Net::HTTP::Persistent::Error)
    from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:152:in `rbuf_fill'
    from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:134:in `readuntil'
    from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:144:in `readline'
    from C:/Ruby200/lib/ruby/2.0.0/net/http/response.rb:39:in `read_status_line'
    from C:/Ruby200/lib/ruby/2.0.0/net/http/response.rb:28:in `read_new'
    from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1406:in `block in transport_request'
    from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1403:in `catch'
    from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1403:in `transport_request'
    from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1376:in `request'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/net-http-persistent-2.9/lib/net/http/persistent.rb:986:in `request'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:259:in `fetch'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:1281:in `post_form'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:548:in `submit'
    from C:/Users/Feshaq/workspace/ERISScrap/eca_sample/eca_on_scraper.rb:152:in `<main>'

这是第 152 行:

#Click the form button
agent.page.forms[0].click_button

或者,我尝试了给定的代码段并不断收到异常错误:

#get the form
form = agent.page.form_with(:name => "AdvancedSearchForm")
# get the button you want from the form
button = form.button_with(:value => "Search")
# submit the form using that button
agent.submit(form, button)

任何帮助表示赞赏

我已经多次遇到这个问题。 我处理它的方法是将运行刮刀的代码块包装在一个救援子句中,如果出错,我只需终止连接并重置代理及其标头。 这在 100% 的时间里都有效,并且没有给我带来任何问题。 然后我继续我在代码中离开的地方。 下面的示例是我运行的爬虫程序,用于遍历建筑物列表和查找页面等:

def begin_scraping_list
    Building.all.each do |building_info|
      begin          
      next if convert_boroughs_for_form(building_info) == :no_good
      fill_in_first_page_address_form_and_submit(building_info)
      get_to_proper_second_page
      go_to_page_we_want_for_scraping
      scrape_the_table(building_info)
      rescue
        puts "error happened"
        @agent.shutdown
        @agent = Mechanize.new { |agent| agent.user_agent_alias = 'Windows Chrome'}
        @agent.request_headers
        sleep(5)
        redo
      end
    end
  end

因此,在您的情况下,您希望将您发布的问题区域包装在救援块中

  begin
    #get the form
    form = agent.page.form_with(:name => "AdvancedSearchForm")
    # get the button you want from the form
    button = form.button_with(:value => "Search")
    # submit the form using that button
    agent.submit(form, button)
    rescue
     agent.shutdown
     agent = Mechanize.new { |agent| agent.user_agent_alias = 'Windows Chrome'}
     agent.request_headers
     sleep(2)
     #get the form
     form = agent.page.form_with(:name => "AdvancedSearchForm")
     # get the button you want from the form
     button = form.button_with(:value => "Search")
     # submit the form using that button
     agent.submit(form, button)
   end

注意如果机械化导致了问题,而您还没有对其进行大量投资,那么使用水豚和 phantomjs 将获得更好的抓取体验。 更成熟,更发达。 在它后面。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM