简体   繁体   中英

Ruby: How to escape url with square brackets [ and ]?

This url:

http://gawker.com/5953728/if-alison-brie-and-gillian-jacobs-pin-up-special-doesnt-get-community-back-on-the-air-nothing-will-[nsfw]

should be:

http://gawker.com/5953728/if-alison-brie-and-gillian-jacobs-pin-up-special-doesnt-get-community-back-on-the-air-nothing-will-%5Bnsfw%5D

But when I pass the first one into URI.encode , it doesn't escape the square brackets. I also tried CGI.escape , but that escapes all the '/' as well.

What should I use to escape URLS properly? Why doesn't URI.encode escape square brackets?

You can escape [ with %5B and ] with %5D .

Your URL will be:

URL.gsub("[","%5B").gsub("]","%5D")

I don't like that solution but it's working.

encode doesn't escape brackets because they aren't special -- they have no special meaning in the path part of a URI, so they don't actually need escaping.

If you want to escape chars other than just the "unsafe" ones, pass a second arg to the encode method. That arg should be a regex matching, or a string containing, every char you want encoded ( including chars the function would otherwise already match!).

If using a third-party gem is an option, try addressable .

require "addressable/uri"

url = Addressable::URI.parse("http://[::1]/path[]").normalize!.to_s
#=> "http://[::1]/path%5B%5D"

Note that the normalize! method will not only escape invalid characters but also perform casefolding on the hostname part, unescaping on unnecessarily escaped characters and the like:

uri = Addressable::URI.parse("http://Example.ORG/path[]?query[]=%2F").normalize!
url = uri.to_s #=> "http://example.org/path%5B%5D?query%5B%5D=/"

So, if you just want to normalize the path part, do as follows:

uri = Addressable::URI.parse("http://Example.ORG/path[]?query[]=%2F")
uri.path = uri.normalized_path
url = uri.to_s #=> "http://Example.ORG/path%5B%5D?query[]=%2F"

According to new IP-v6 syntax there could be urls like this:

http://[1080:0:0:0:8:800:200C:417A]/index.html

Because of this we should escape [] only after host part of the url:

if url =~ %r{\[|\]}
  protocol, host, path = url.split(%r{/+}, 3)
  path = path.gsub('[', '%5B').gsub(']', '%5D') # Or URI.escape(path, /[^\-_.!~*'()a-zA-Z\d;\/?:@&%=+$,]/)
  url = "#{protocol}//#{host}/#{path}"
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM