简体   繁体   English

随机超时:机械化宝石的Ruby中的错误异常

[英]Randomness Timeout:Error Exception in Ruby with Mechanize Gem

I'm building an application in Ruby 1.9.3-p327 that fetch-parse some pages(scrapping) and then according some values insert/update some columns into the database. 我正在Ruby 1.9.3-p327中构建一个应用程序,该应用程序获取-解析一些页面(报废),然后根据一些值将一些列插入/更新到数据库中。 In order to fetch-parse, the app use Mechanize gem, and the access to the database(MySQL) is through activerecord gem. 为了获取解析,该应用程序使用Mechanize gem,并且通过activerecord gem访问数据库(MySQL)。

The weird problem that I had is that sometimes a Timeout::Error exception is raised randomness, sometimes never happens but maybe in two more days will happen, and with different type of records or pages. 我遇到的怪异问题是,有时会引发Timeout :: Error异常随机性,有时不会发生,但可能会再过两天,并且记录或页面的类型会不同。 The log of the exception is: 异常的日志为:

/root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/protocol.rb:146:in `rescue in rbuf_fill': too many connection resets (due to Timeout::Error - Timeout::Error) after 0 requests on 21716860, last used 1378984537.2796552 seconds ago (Net::HTTP::Persistent::Error)
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/protocol.rb:140:in `rbuf_fill'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:2562:in `read_status_line'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:2551:in `read_new'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:1319:in `block in transport_request'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/1.9.1/net/http.rb:1293:in `request'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9/lib/net/http/persistent.rb:986:in `request'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:in `fetch'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/mechanize-2.7.2/lib/mechanize.rb:432:in `get'
    from /root/notificador-corte/lib/downloader.rb:10:in `fetch'
    from /root/notificador-corte/worker.rb:63:in `fetch_page'
    from /root/notificador-corte/worker.rb:49:in `process_causa'
    from /root/notificador-corte/worker.rb:41:in `block in worker_main_cycle'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/activerecord-4.0.0/lib/active_record/relation/delegation.rb:13:in `each'
    from /root/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/activerecord-4.0.0/lib/active_record/relation/delegation.rb:13:in `each'
    from /root/notificador-corte/worker.rb:39:in `worker_main_cycle'
    from /root/notificador-corte/worker.rb:26:in `run'
    from /root/notificador-corte/app.rb:12:in `<main>'

The downloader.rb line 10 contains the definition of the method fetch: downloader.rb第10行包含fetch方法的定义:

def fetch(url)
    begin
      @agent.get(url) )
    rescue Errno::ETIMEDOUT, Timeout::Error => exception
    end
  end

The worker.rb in line 63 contains the call to the fetch method. 第63行的worker.rb包含对fetch方法的调用。

Reading the documentation, said that I should be trying setting the read_timeout , open_timeout properties for the agent(Mechanize), and also try with idle_timeout , keep_alive , but the error still remains randomness. 阅读文档,说我应该尝试设置read_timeout,open_timeout为代理(机械化)性能,并且还与idle_timeout,KEEP_ALIVE尝试,但仍然存在错误的随机性。

The content of the Gemfile is: Gemfile的内容为:

gem 'activerecord', "~> 4.0.0" 
gem 'mechanize', "~> 2.7.1"
gem 'mysql', '~> 2.9.1'
gem 'actionmailer', "~> 4.0.0" 
gem 'rspec', "~> 2.14.1"

I don't think it necessarily is a bug in either your code or mechanize it self. 我认为这不一定是代码中的错误,也不一定是使其自身机械化的错误。 Most likely it's a network issue. 最有可能是网络问题。

I would rather implement a policy into that rescue statement, so that you make sure, that whenever this error occurs, you make sure to "retry" at a later point. 我宁愿在该rescue语句中实施一项策略,以便确保确保每当发生此错误时,您都可以稍后重试。

When you rescue you have 当您营救时

 Errno::ETIMEDOUT

Is this miss-spelled? 这是拼写错误吗? Or is it something I'm unfamiliar with? 还是我不熟悉的东西?

It's very possible that your issue is some bad websites or links. 您的问题很有可能是某些不良的网站或链接。 I've had all sorts of issues when I scrape the internet at large. 当我刮擦整个互联网时,我遇到了各种各样的问题。 I found it's best to catch all errors, print the error message, and continue to the next possible operation.. That way your scraper won't stop on the bad cases and you can go back and resolve issues as they come up. 我发现最好捕获所有错误,打印错误消息,然后继续下一个可能的操作。.这样,您的刮板机将不会在严重的情况下停止工作,并且您可以返回并解决出现的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM