[英]Too many connection resets Exception Error - Mechanize in Ruby
我在 Ruby 上使用 Mechanize 並不斷收到此異常錯誤
C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:158:in `rescue in rbuf_fill': too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 37920120, last used 1457465950.371121 seconds ago (Net::HTTP::Persistent::Error)
from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:152:in `rbuf_fill'
from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:134:in `readuntil'
from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:144:in `readline'
from C:/Ruby200/lib/ruby/2.0.0/net/http/response.rb:39:in `read_status_line'
from C:/Ruby200/lib/ruby/2.0.0/net/http/response.rb:28:in `read_new'
from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1406:in `block in transport_request'
from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1403:in `catch'
from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1403:in `transport_request'
from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1376:in `request'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/net-http-persistent-2.9/lib/net/http/persistent.rb:986:in `request'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:259:in `fetch'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:1281:in `post_form'
from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:548:in `submit'
from C:/Users/Feshaq/workspace/ERISScrap/eca_sample/eca_on_scraper.rb:152:in `<main>'
這是第 152 行:
#Click the form button
agent.page.forms[0].click_button
或者,我嘗試了給定的代碼段並不斷收到異常錯誤:
#get the form
form = agent.page.form_with(:name => "AdvancedSearchForm")
# get the button you want from the form
button = form.button_with(:value => "Search")
# submit the form using that button
agent.submit(form, button)
任何幫助表示贊賞
我已經多次遇到這個問題。 我處理它的方法是將運行刮刀的代碼塊包裝在一個救援子句中,如果出錯,我只需終止連接並重置代理及其標頭。 這在 100% 的時間里都有效,並且沒有給我帶來任何問題。 然后我繼續我在代碼中離開的地方。 下面的示例是我運行的爬蟲程序,用於遍歷建築物列表和查找頁面等:
def begin_scraping_list
Building.all.each do |building_info|
begin
next if convert_boroughs_for_form(building_info) == :no_good
fill_in_first_page_address_form_and_submit(building_info)
get_to_proper_second_page
go_to_page_we_want_for_scraping
scrape_the_table(building_info)
rescue
puts "error happened"
@agent.shutdown
@agent = Mechanize.new { |agent| agent.user_agent_alias = 'Windows Chrome'}
@agent.request_headers
sleep(5)
redo
end
end
end
因此,在您的情況下,您希望將您發布的問題區域包裝在救援塊中
begin
#get the form
form = agent.page.form_with(:name => "AdvancedSearchForm")
# get the button you want from the form
button = form.button_with(:value => "Search")
# submit the form using that button
agent.submit(form, button)
rescue
agent.shutdown
agent = Mechanize.new { |agent| agent.user_agent_alias = 'Windows Chrome'}
agent.request_headers
sleep(2)
#get the form
form = agent.page.form_with(:name => "AdvancedSearchForm")
# get the button you want from the form
button = form.button_with(:value => "Search")
# submit the form using that button
agent.submit(form, button)
end
注意如果機械化導致了問題,而您還沒有對其進行大量投資,那么使用水豚和 phantomjs 將獲得更好的抓取體驗。 更成熟,更發達。 在它后面。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.