I use the capybara-webkit
gem to scrape data from certain pages in my Rails application. I've noticed, what seems to be "random" / "sporadic", that the application will crash with the following error:
Capybara::Webkit::ConnectionError: /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/bin/webkit_server failed to start.
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:56:in `parse_port'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:42:in `discover_port'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:26:in `start'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/connection.rb:67:in `start_server'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/connection.rb:17:in `initialize'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/driver.rb:16:in `new'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/driver.rb:16:in `initialize'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit.rb:15:in `new'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit.rb:15:in `block in <top (required)>'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-2.7.1/lib/capybara/session.rb:85:in `driver'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-2.7.1/lib/capybara/session.rb:233:in `visit'
It happens even after it's already connected and accessed a website multiple times before. Here's a code snippet of what I'm using currently...
if site.url.present?
begin
# Visit the URL
session = Capybara::Session.new(:webkit)
session.visit(site.url) # here is where the error occurs...
document = Nokogiri::HTML.parse(session.body)
# Load configuration options for Development Group
roster_table_selector = site.development_group.table_selector
header_row_selector = site.development_group.table_header_selector
row_selector = site.development_group.table_row_selector
row_offset = site.development_group.table_row_selector_offset
header_format_type = site.config_header_format_type
# Get the Table and Header Row for processing
roster_table = document.css(roster_table_selector)
header_row = roster_table.css(header_row_selector)
header_hash = retrieve_headers(header_row, header_format_type)
my_object = process_rows(roster_table, header_hash, site, row_selector, row_offset)
rescue ::Capybara::Webkit::ConnectionError => e
raise e
rescue OpenURI::HTTPError => e
if e.message == '404 Not Found'
raise "404 Page not found..."
else
raise e
end
end
end
I've even thought perhaps I don't find out why it's happening necessarily - but just recover when it does. So I was going to do a "retry" in the rescue block for the error but it appears the server is just down - so I get the same result when retrying. Perhaps someone knows of a way I can check if the server is down and restart it then perform a retry? Thanks for the help!
So after further investigating it appears that I was generating a new Capybara::Session
for each iteration of my loop. I moved it outside of the loop and also added Capybara.reset_sessions!
at the end of my loop. Not sure if that helps with anything -- but the issue seems to have been resolved. I'll monitor it for the next hour or so. Below is an example of my ActiveJob code now...
class ScrapeJob < ActiveJob::Base
queue_as :default
include Capybara::DSL
def perform(*args)
session = Capybara::Session.new(:webkit)
Site.where(config_enabled: 1).order(:code).each do |site|
process_roster(site, session)
Capybara.reset_sessions!
end
end
def process_roster(site, session)
if site.roster_url.present?
begin
# Visit the Roster URL
session.visit(site.roster_url)
document = Nokogiri::HTML.parse(session.body)
# processing code...
# pass the session that was created as the final parameter..
my_object = process_rows( ..., session)
rescue ::Capybara::Webkit::ConnectionError => e
raise e
rescue OpenURI::HTTPError => e
if e.message == '404 Not Found'
raise "404 Page not found..."
else
raise e
end
end
end
end
end
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.