如何在使用Ruby的OpenUri下载之前获取HTTP头

Question

I am currently using OpenURI to download a file in Ruby. 我目前正在使用OpenURI在Ruby中下载文件。 Unfortunately, it seems impossible to get the HTTP headers without downloading the full file: 不幸的是，似乎无法在不下载完整文件的情况下获取HTTP标头：

open(base_url,
  :content_length_proc => lambda {|t|
    if t && 0 < t
      pbar = ProgressBar.create(:total => t)
  end
  },
  :progress_proc => lambda {|s|
    pbar.progress = s if pbar
  }) {|io|
    puts io.size
    puts io.meta['content-disposition']
  }

Running the code above shows that it first downloads the full file and only then prints the header I need. 运行上面的代码表明它首先下载完整的文件，然后才打印我需要的标题。

Is there a way to get the headers before the full file is downloaded, so I can cancel the download if the headers are not what I expect them to be? 有没有办法在下载完整文件之前获取标题，所以如果标题不是我期望的那样，我可以取消下载？

Answer 1

You can use Net::HTTP for this matter, for example: 您可以使用Net :: HTTP来解决此问题，例如：

require 'net/http'

http = Net::HTTP.start('stackoverflow.com')

resp = http.head('/')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish

Another example, this time getting the header of the wonderful book, Object Orient Programming With ANSI-C : 另一个例子，这次获得精彩书籍的标题， 使用ANSI-C进行面向对象编程 ：

require 'net/http'

http = Net::HTTP.start('www.planetpdf.com')

resp = http.head('/codecuts/pdfs/ooc.pdf')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish

Answer 2

It seems what I wanted is not possible to archieve using OpenURI, at least not, as I said, without loading the whole file first. 看起来我想要的是不可能使用OpenURI，至少不是，正如我所说，没有先加载整个文件。

I was able to do what I wanted using Net::HTTP's request_get 我能够使用Net :: HTTP的request_get做我想做的事

Here an example: 这是一个例子：

http.request_get('/largefile.jpg') {|response|
  if (response['content-length'] < max_length)
    response.read_body do |str|   # read body now
      # save to file
    end
  end
}

Note that this only works when using a block, doing it like: 请注意，这仅在使用块时有效，如下所示：

response = http.request_get('/largefile.jpg')

the body will already be read. 身体已经被阅读了。

Answer 3

Rather than use Net::HTTP, which can be like digging a pool on the beach using a sand shovel, you can use a number of the HTTP clients for Ruby and clean up the code. 而不是使用Net :: HTTP，这可能就像使用沙铲在海滩上挖掘池一样，您可以使用许多HTTP客户端来处理Ruby并清理代码。

Here's a sample using HTTParty : 以下是使用HTTParty的示例：

require 'httparty'

resp = HTTParty.head('http://example.org')
resp.headers
# => {"accept-ranges"=>["bytes"], "cache-control"=>["max-age=604800"], "content-type"=>["text/html"], "date"=>["Thu, 02 Mar 2017 18:52:42 GMT"], "etag"=>["\"359670651\""], "expires"=>["Thu, 09 Mar 2017 18:52:42 GMT"], "last-modified"=>["Fri, 09 Aug 2013 23:54:35 GMT"], "server"=>["ECS (oxr/83AB)"], "x-cache"=>["HIT"], "content-length"=>["1270"], "connection"=>["close"]}

At that point it's easy to check the size of the document: 此时，很容易检查文档的大小：

resp.headers['content-length'] # => "1270"

Unfortunately, the HTTPd you're talking to might not know how big the content will be; 不幸的是，你正在谈论的HTTPd可能不知道内容会有多大; In order to respond quickly servers don't necessarily calculate the size of dynamically generated output, which would take almost as long and be almost as CPU intensive as actually sending it, so relying on the "content-length" value might be buggy. 为了快速响应服务器，不一定要计算动态生成的输出的大小，这将花费几乎与实际发送它一样长的CPU密集度，因此依赖于“内容长度”值可能是错误的。

The issue with Net::HTTP is it won't automatically handle redirects, so then you have to add additional code. Net :: HTTP的问题是它不会自动处理重定向，因此您必须添加其他代码。 Granted, that code is supplied in the documentation, but the code keeps growing as you need to do more things, until you've ended up writing yet another http client (YAHC). 当然，该代码在文档中提供，但代码随着您需要做更多事情而不断增长，直到您最终编写了另一个http客户端（YAHC）。 So, avoid that and use an existing wheel. 所以，避免这种情况并使用现有的车轮。

如何在使用Ruby的OpenUri下载之前获取HTTP头

问题描述

3 个解决方案

解决方案1
11 2013-07-03 18:47:27

解决方案2
5 已采纳 2013-07-15 23:04:36

解决方案3
2 2017-03-02 18:57:36

如何在使用Ruby的OpenUri下载之前获取HTTP头

问题描述

3 个解决方案

解决方案1 11 2013-07-03 18:47:27

解决方案2 5 已采纳 2013-07-15 23:04:36

解决方案3 2 2017-03-02 18:57:36

解决方案1
11 2013-07-03 18:47:27

解决方案2
5 已采纳 2013-07-15 23:04:36

解决方案3
2 2017-03-02 18:57:36