简体   繁体   English

如何在使用Ruby的OpenUri下载之前获取HTTP头

[英]How to get HTTP headers before downloading with Ruby's OpenUri

I am currently using OpenURI to download a file in Ruby. 我目前正在使用OpenURI在Ruby中下载文件。 Unfortunately, it seems impossible to get the HTTP headers without downloading the full file: 不幸的是,似乎无法在不下载完整文件的情况下获取HTTP标头:

open(base_url,
  :content_length_proc => lambda {|t|
    if t && 0 < t
      pbar = ProgressBar.create(:total => t)
  end
  },
  :progress_proc => lambda {|s|
    pbar.progress = s if pbar
  }) {|io|
    puts io.size
    puts io.meta['content-disposition']
  }

Running the code above shows that it first downloads the full file and only then prints the header I need. 运行上面的代码表明它首先下载完整的文件,然后才打印我需要的标题。

Is there a way to get the headers before the full file is downloaded, so I can cancel the download if the headers are not what I expect them to be? 有没有办法在下载完整文件之前获取标题,所以如果标题不是我期望的那样,我可以取消下载?

You can use Net::HTTP for this matter, for example: 您可以使用Net :: HTTP来解决此问题,例如:

require 'net/http'

http = Net::HTTP.start('stackoverflow.com')

resp = http.head('/')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish

Another example, this time getting the header of the wonderful book, Object Orient Programming With ANSI-C : 另一个例子,这次获得精彩书籍的标题, 使用ANSI-C进行面向对象编程

require 'net/http'

http = Net::HTTP.start('www.planetpdf.com')

resp = http.head('/codecuts/pdfs/ooc.pdf')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish

It seems what I wanted is not possible to archieve using OpenURI, at least not, as I said, without loading the whole file first. 看起来我想要的是不可能使用OpenURI,至少不是,正如我所说,没有先加载整个文件。

I was able to do what I wanted using Net::HTTP's request_get 我能够使用Net :: HTTP的request_get做我想做的事

Here an example: 这是一个例子:

http.request_get('/largefile.jpg') {|response|
  if (response['content-length'] < max_length)
    response.read_body do |str|   # read body now
      # save to file
    end
  end
}

Note that this only works when using a block, doing it like: 请注意,这仅在使用块时有效,如下所示:

response = http.request_get('/largefile.jpg')

the body will already be read. 身体已经被阅读了。

Rather than use Net::HTTP, which can be like digging a pool on the beach using a sand shovel, you can use a number of the HTTP clients for Ruby and clean up the code. 而不是使用Net :: HTTP,这可能就像使用沙铲在海滩上挖掘池一样,您可以使用许多HTTP客户端来处理Ruby并清理代码。

Here's a sample using HTTParty : 以下是使用HTTParty的示例:

require 'httparty'

resp = HTTParty.head('http://example.org')
resp.headers
# => {"accept-ranges"=>["bytes"], "cache-control"=>["max-age=604800"], "content-type"=>["text/html"], "date"=>["Thu, 02 Mar 2017 18:52:42 GMT"], "etag"=>["\"359670651\""], "expires"=>["Thu, 09 Mar 2017 18:52:42 GMT"], "last-modified"=>["Fri, 09 Aug 2013 23:54:35 GMT"], "server"=>["ECS (oxr/83AB)"], "x-cache"=>["HIT"], "content-length"=>["1270"], "connection"=>["close"]}

At that point it's easy to check the size of the document: 此时,很容易检查文档的大小:

resp.headers['content-length'] # => "1270"

Unfortunately, the HTTPd you're talking to might not know how big the content will be; 不幸的是,你正在谈论的HTTPd可能不知道内容会有多大; In order to respond quickly servers don't necessarily calculate the size of dynamically generated output, which would take almost as long and be almost as CPU intensive as actually sending it, so relying on the "content-length" value might be buggy. 为了快速响应服务器,不一定要计算动态生成的输出的大小,这将花费几乎与实际发送它一样长的CPU密集度,因此依赖于“内容长度”值可能是错误的。

The issue with Net::HTTP is it won't automatically handle redirects, so then you have to add additional code. Net :: HTTP的问题是它不会自动处理重定向,因此您必须添加其他代码。 Granted, that code is supplied in the documentation, but the code keeps growing as you need to do more things, until you've ended up writing yet another http client (YAHC). 当然,该代码在文档中提供,但代码随着您需要做更多事情而不断增长,直到您最终编写了另一个http客户端(YAHC)。 So, avoid that and use an existing wheel. 所以,避免这种情况并使用现有的车轮。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Ruby下载之前获取远程文件的mtime? - How to get a remote-file's mtime before downloading it in Ruby? 如何在OpenURI(Ruby)中发送自定义标头并进行检查 - How to send custom headers in OpenURI (Ruby) and check them Ruby 使用 OpenURI 或 .net/http 的代理身份验证 GET/POST - Ruby Proxy Authentication GET/POST with OpenURI or net/http 在下载和计算已经下载的数量之前获取文件大小(http + ruby​​) - Get file size before downloading & counting how much already downloaded (http+ruby) 使用Net :: HTTP :: Pipeline下载之前检查标题 - Check headers before downloading with Net::HTTP::Pipeline Ruby Madness使用Nokogiri,Mechanize和OpenUri下载相同的文件以获取不同的信息 - Ruby Madness Downloading same file with Nokogiri, Mechanize and OpenUri to get different information 不能使用带有标题的Ruby的Net :: HTTP得到响应 - can't get a response using Ruby's Net::HTTP with headers 通过ruby OpenURI下载文件时间歇性EOFError - Intermittant EOFError when downloading file via ruby OpenURI 如何判断Ruby的OpenURI open()函数何时给出404页面未找到错误? - How to tell when Ruby's OpenURI open() function gives a 404 page not found error? 如何在OpenURI中指定“http请求标头” - How to specify “http request header” in OpenURI
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM