简体   繁体   English

Ruby gem机械化抛出错误:未定义的方法“ <=>”

[英]Ruby gem mechanize throwing an error : undefined method `<=>'

I am using the Ruby gem mechanize to scrape some html... When I load my page and display the necessary results, the page laods fine. 我正在使用Ruby gem机械化刮取一些html ...当我加载页面并显示必要的结果时,页面会很好。 After a reload, I get this error when doing "search_results = @agent.submit(search_form)": 重新加载后,执行“ search_results = @ agent.submit(search_form)”时出现此错误:

undefined method `<=>' for {emptyelem <input name="hl" value="en" type="hidden">}:Hpricot::Elem

Before I post any code, just does this ring any bells? 在我发布任何代码之前,这是否会响起铃声?

Thanks. 谢谢。

Code: 码:

    start = Time.now

    # initial set up
    @agent = Mechanize.new
    Mechanize.html_parser = Hpricot
    page = @agent.get("http://www.google.com/")
    search_form = page.forms.first

    # conduct initial search
    @search_term = search_form.q = params[:search].to_s
    search_results = @agent.submit(search_form)

    # helper variables
    search_qs = ""; @page_number = 1; i = 0; @flag = false;

    # get the query string structure
    search_results.links.each { |li| search_qs = li.href if li.href.match(/.*search\?q=.*start=.*/) }

    # search through all paginated pages
    while (i < 500)
      search_qs = search_qs.gsub(/start=\d+/,"start=#{i}")
      @search_url = "http://google.com#{search_qs}"
      search_results = @agent.get(@search_url)
      search_results.links.each { |li| @flag = true if li.text.match("All Bout Texas Tailgating") }
      break if @flag
      i+=10; @page_number+=1
    end

@execution_time = Time.now-start

render :layout => false

VIEW: 视图:

<h2>Query results for "<%= @search_term %>" on Google</h2>

<% if @flag %>
    <p>What page is this keyword found: <b><%= @page_number %></b></p>
    <p><%= link_to  "Click to see page", "#{@search_url}", {:target => "_blank"} %></p>
    <p>How long did this query take to run?: <%= @execution_time %> seconds</p>
<% else %>
    <p>Keyword not found in Google search reults</p>
<% end %>

STACK TRACE: 堆栈跟踪:

 NoMethodError (undefined method `<=>' for {emptyelem <input name="hl" value="en" type="hidden">}:Hpricot::Elem):
  mechanize (1.0.0) lib/mechanize/form/field.rb:30:in `<=>'
  mechanize (1.0.0) lib/mechanize/form.rb:171:in `sort'
  mechanize (1.0.0) lib/mechanize/form.rb:171:in `build_query'
  mechanize (1.0.0) lib/mechanize.rb:373:in `submit'
  app/controllers/admin/importer_controller.rb:24:in `check_page_rank'
  /opt/local/lib/ruby/1.8/webrick/httpserver.rb:104:in `service'
  /opt/local/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'
  /opt/local/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
  /opt/local/lib/ruby/1.8/webrick/server.rb:162:in `start'
  /opt/local/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
  /opt/local/lib/ruby/1.8/webrick/server.rb:95:in `start'
  /opt/local/lib/ruby/1.8/webrick/server.rb:92:in `each'
  /opt/local/lib/ruby/1.8/webrick/server.rb:92:in `start'
  /opt/local/lib/ruby/1.8/webrick/server.rb:23:in `start'
  /opt/local/lib/ruby/1.8/webrick/server.rb:82:in `start'

Rendered rescues/_trace (98.4ms)
Rendered rescues/_request_and_response (1.2ms)
Rendering rescues/layout (internal_server_error)

So if you look through the source for mechanize in form.rb - form submitting is calling a function called build_query which sorts the fields on the form. 因此,如果您仔细查看form.rb中的机械化源代码 -表单提交将调用一个名为build_query的函数,该函数对表单上的字段进行排序。 Since sort uses the <=> operator, and it's undefined on Hpricot elements, you are getting an exception. 由于sort使用<=>运算符,并且在Hpricot元素上未定义,因此您将获得异常。

It seems as if mechanize was built to use Nokogiri - it may have unfixed bugs with other parsing implementations. 似乎机械化是为了使用Nokogiri而构建的-其他解析实现可能存在未修复的错误。 I did not get too deep into mechanize's source and don't want to blame anyone, but you may want to try switching to Nokogiri for this project (if possible). 我并没有对机械化的源代码深入了解,也不想怪任何人,但是您可能想尝试为此项目切换到Nokogiri(如果可能)。 It doesn't seem from this snippet as if you're relying heavily on Hpricot. 从此片段看来,您似乎并不太依赖Hpricot。 It seems strange to me that mechanize is throwing an exception on a hidden form field from Hpricot, but the stack trace is pretty clear in this regard. 机械化在Hpricot的隐藏表单字段上引发异常对我来说似乎很奇怪,但是在这方面堆栈跟踪非常清晰。

Your other major option is to hop into the mechanize source and see if you can fix it yourself (or file a bug on the mechanize github and hope somebody gets to it). 您的另一个主要选择是跳入机械化源代码,看看您是否可以自己修复(或在机械化github上提交错误,希望有人可以使用它)。

Good luck. 祝好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM