简体   繁体   English

无法从Ruby on Rails应用程序中的外部URL提取完整的标头

[英]Unable to pull complete header from external URL in Ruby on Rails app

Background: 背景:
The rails app I'm working on opens up articles from other websites in an iframe. 我正在使用的Rails应用程序在iframe中打开了其他网站上的文章。 But some publisher websites (like pitchfork.com, vox.com, medium.com) prevent themselves from opening in iframes by setting "X-Frame-Options: SAMEORIGIN" in their header. 但是某些发布商网站(例如pitchfork.com,vox.com,medium.com)通过在标题中设置“ X-Frame-Options:SAMEORIGIN”来阻止自己在iframe中打开。 So given the URL for an article, I'm trying to examine the header and either open it in the iframe (default) or to open the original site in a new tab (when I detect X-Frame-Options in the header). 因此,给定文章的URL,我试图检查标题并在iframe中将其打开(默认),或在新标签页中打开原始网站(当我在标题中检测到X-Frame-Options时)。


Problem: 问题:
The header I pull in Rails is sometimes incomplete when I pull it (and print to the console) with the following code: 当我使用以下代码将其拉入(并打印到控制台)时,我拉入Rails的头部有时是不完整的:

puts y['site'] # example: "vox.com"
puts y['head'] # example: "/2016/1/25/10829662/obama-on-clinton-media"
require 'net/http'
http = Net::HTTP.start(y['site'])
resp = http.head(y['head'])
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish

Example: the header that rails pulls for the this vox.com article ( http://www.vox.com/2016/1/25/10829662/obama-on-clinton-media ) is as follows: 示例:此vox.com文章( http://www.vox.com/2016/1/25/10829662/obama-on-clinton-media )的rails标题如下:

server: nginx/1.6.2
date: Fri, 29 Jan 2016 22:05:17 GMT
content-type: text/html
content-length: 184
connection: keep-alive
location: http://www.vox.com/2016/1/25/10829662/obama-on-clinton-media

But when I try to open it an iframe, the chrome console tells me that it cannot because X-Frame-Options is set to SAMEORIGIN. 但是,当我尝试打开iframe时,chrome控制台告诉我它无法打开,因为X-Frame-Options设置为SAMEORIGIN。 Upon further investigation in the Network tab, I am able to examine to examine the full header and it is as follows: 在“网络”选项卡中进行进一步调查后,我可以检查完整的标头,如下所示:

HTTP/1.1 200 OK
Server: nginx
Content-Type: text/html; charset=utf-8
Status: 200 OK
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Cache-Control: max-age=0, must-revalidate
X-Request-Id: 693f75c9be4dde491ba3cd78232ac4870c4f82e2
X-Runtime: 0.404545
Content-Encoding: gzip
Via: 1.1 varnish-v4
Content-Length: 26450
Accept-Ranges: bytes
Date: Fri, 29 Jan 2016 22:10:47 GMT
Via: 1.1 varnish
Age: 106
Connection: keep-alive
X-Served-By: cache-jfk1034-JFK
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1454105446.991771,VS0,VE12
Vary: Accept-Encoding, Origin, X-Forwarded-Proto

This problem doesn't occur for all sites. 并非所有网站都出现此问题。 For example, the header I pull from pitchfork.com clearly indicates that it has x-frame-options set. 例如,我从pitchfork.com提取的标头清楚地表明它已设置了x-frame-options。 But with sites like vox.com and medium.com the header that I pull does not show x-frame-options (as well as many other items that are being left out). 但是对于像vox.com和medium.com这样的网站,我拉出的标头并不显示x-frame-options(以及许多其他被遗漏的项目)。

How can I pull the correct/complete header in my Rails controller in a way that will always detect whether a URL has X-Frame-Options in its header? 如何以一种始终检测URL头中是否包含X-Frame-Options的方式在Rails控制器中提取正确/完整的头?

I tried here in IRB console, and I noticed that the request to vox.com website is returning 301 Moved Permanently, and it sent the new location in the header. 我在IRB控制台中尝试过此操作,但我发现对vox.com网站的请求返回“永久移动301”,它在标头中发送了新位置。

irb(main):001:0> y = {}
=> {}
irb(main):002:0> y['site'] = "vox.com"
=> "vox.com"
irb(main):003:0> y['head'] = "/2016/1/25/10829662/obama-on-clinton-media"
=> "/2016/1/25/10829662/obama-on-clinton-media"
irb(main):004:0> require 'net/http'
=> true
irb(main):005:0> http = Net::HTTP.start(y['site'])
=> #<Net::HTTP vox.com:80 open=true>
irb(main):006:0> resp = http.head(y['head'])
=> #<Net::HTTPMovedPermanently 301 Moved Permanently readbody=true> (HERE)
irb(main):007:0> resp.each { |k, v| puts "#{k}: #{v}" }
server: nginx/1.6.2
date: Fri, 29 Jan 2016 22:40:07 GMT
content-type: text/html
content-length: 184
connection: keep-alive
location: http://www.vox.com/2016/1/25/10829662/obama-on-clinton-media
=> {"server"=>["nginx/1.6.2"], "date"=>["Fri, 29 Jan 2016 22:40:07 GMT"], "content-type"=>["text/html"], "content-length"=>["184"], "connection"=>["keep-alive"], "location"=>["http://www.vox.com/2016/1/25/10829662/obama-on-clinton-media"]}
irb(main):008:0> http.finish
=> nil

The only difference between the URL you used and the location that the server sent for redirect is the 'www'. 您使用的URL和服务器为重定向发送的位置之间的唯一区别是“ www”。 Try to use with the 'www' and see if it works. 尝试与“ www”一起使用,看看它是否有效。

You can improve your code to read the response code, and if it's 301, try again with the URL the server sent. 您可以改进代码以读取响应代码,如果它是301,请使用服务器发送的URL再试一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM