简体   繁体   中英

Why am I getting sporadic disconnects when using capybara-webkit gem in Rails 4?

I use the capybara-webkit gem to scrape data from certain pages in my Rails application. I've noticed, what seems to be "random" / "sporadic", that the application will crash with the following error:

Capybara::Webkit::ConnectionError: /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/bin/webkit_server failed to start.
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:56:in `parse_port'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:42:in `discover_port'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:26:in `start'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/connection.rb:67:in `start_server'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/connection.rb:17:in `initialize'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/driver.rb:16:in `new'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/driver.rb:16:in `initialize'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit.rb:15:in `new'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit.rb:15:in `block in <top (required)>'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-2.7.1/lib/capybara/session.rb:85:in `driver'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-2.7.1/lib/capybara/session.rb:233:in `visit'

It happens even after it's already connected and accessed a website multiple times before. Here's a code snippet of what I'm using currently...

if site.url.present?

  begin
    # Visit the URL
    session = Capybara::Session.new(:webkit)
    session.visit(site.url)  # here is where the error occurs...
    document = Nokogiri::HTML.parse(session.body)

    # Load configuration options for Development Group
    roster_table_selector = site.development_group.table_selector
    header_row_selector   = site.development_group.table_header_selector
    row_selector   = site.development_group.table_row_selector
    row_offset     = site.development_group.table_row_selector_offset
    header_format_type    = site.config_header_format_type

    # Get the Table and Header Row for processing
    roster_table        = document.css(roster_table_selector)
    header_row          = roster_table.css(header_row_selector)
    header_hash         = retrieve_headers(header_row, header_format_type)

    my_object = process_rows(roster_table, header_hash, site, row_selector, row_offset)

  rescue ::Capybara::Webkit::ConnectionError => e
    raise e

  rescue OpenURI::HTTPError => e
    if e.message == '404 Not Found'
      raise "404 Page not found..."
    else
      raise e
    end
  end
end

I've even thought perhaps I don't find out why it's happening necessarily - but just recover when it does. So I was going to do a "retry" in the rescue block for the error but it appears the server is just down - so I get the same result when retrying. Perhaps someone knows of a way I can check if the server is down and restart it then perform a retry? Thanks for the help!

So after further investigating it appears that I was generating a new Capybara::Session for each iteration of my loop. I moved it outside of the loop and also added Capybara.reset_sessions! at the end of my loop. Not sure if that helps with anything -- but the issue seems to have been resolved. I'll monitor it for the next hour or so. Below is an example of my ActiveJob code now...

class ScrapeJob < ActiveJob::Base
  queue_as :default
  include Capybara::DSL

  def perform(*args)

    session = Capybara::Session.new(:webkit)

    Site.where(config_enabled: 1).order(:code).each do |site|
      process_roster(site, session)
      Capybara.reset_sessions!
    end

  end

  def process_roster(site, session)

    if site.roster_url.present?

      begin
        # Visit the Roster URL 
        session.visit(site.roster_url)
        document = Nokogiri::HTML.parse(session.body)

        # processing code...

        # pass the session that was created as the final parameter..
        my_object = process_rows( ..., session)

      rescue ::Capybara::Webkit::ConnectionError => e
        raise e

      rescue OpenURI::HTTPError => e
        if e.message == '404 Not Found'
          raise "404 Page not found..."
        else
          raise e
        end
      end
    end
  end
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM