简体   繁体   English

机械化的Ruby HTTPNotFound错误

[英]Ruby HTTPNotFound Error with Mechanize

I have a ruby on rails application that is trying to access various links on Yahoo Sports and sometimes when it tries to reach a page, it gives me this error below. 我在ruby on rails应用程序上尝试访问Yahoo Sports上的各种链接,有时当它尝试访问页面时,在下面给了我这个错误。 The error is consistent and any of the links it fails on, it always fails on. 该错误是一致的,并且任何失败的链接都始终失败。 It is not a case of sometimes they work and sometimes they don't. 有时情况并非如此,有时情况并非如此。 You will find though that the page does exist and loads fine, so I'm not sure why it is giving me an error. 您会发现该页面确实存在并且可以正常加载,所以我不确定为什么会给我一个错误。 Has anyone experienced this type of behavior before and if so, do you have any suggestions on how to get this to work? 有没有人曾经经历过这种行为,如果是,您是否对如何使它起作用有任何建议?

404 => Net::HTTPNotFound for http://sports.yahoo.com/mlb/players/9893/ -- unhandled response 404 => Net :: HTTPNotFound for http://sports.yahoo.com/mlb/players/9893/-未处理的响应

@client = Mechanize.new()
@client.request_headers = { "Accept-Encoding" => "" }
@client.ignore_bad_chunking = true

#works
#url = 'http://sports.yahoo.com/mlb/players/7307'

#doesn't work
url = 'http://sports.yahoo.com/mlb/players/9893'

result = @client.get(url)

I wasn't able to figure this out with mechanize, but I was able to get the URL from HTTParty. 我无法通过机械化解决这个问题,但是我能够从HTTParty获取URL。 If you do a rescue from a mechanize failure and retry by looking for a redirect URI you should be set: 如果您要从机械化故障中解救出来,然后通过查找重定向URI重试,则应进行以下设置:

require 'mechanize'
require 'httparty'

@client = Mechanize.new()

url = 'http://sports.yahoo.com/mlb/players/9893'

begin
  result = @client.get(url)
rescue Mechanize::ResponseCodeError => e
  redirect_url = HTTParty.get(url).request.last_uri.to_s
  result = @client.get(redirect_url)
end

You need to handle the redirect. 您需要处理重定向。 Mechanize offers a method for that- follow_meta_refresh. Mechanize提供了一种方法-follow_meta_refresh。 Try adding it to your code. 尝试将其添加到您的代码中。 Example: 例:

require 'mechanize'

@client = Mechanize.new()
@client.request_headers = { "Accept-Encoding" => "" }
@client.ignore_bad_chunking = true
@client.follow_meta_refresh = true
#works
#url = 'http://sports.yahoo.com/mlb/players/7307'

#doesn't work
url = 'http://sports.yahoo.com/mlb/players/9893'

result = @client.get(url)
pp result

The pp at the bottom will print out the page in a nice format for further crawling. 底部的pp将以一种不错的格式打印出该页面,以便进一步抓取。 It looks like the correct content on my machine. 看来我机器上的内容正确。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM