簡體   English   中英

如何在Rails中解析此請求參數?

[英]How to parse this request params in Rails?

我收到對我的Web服務器的傳入請求,例如s =“%u041D%u0430%u0434%u043E%u0435%u043B”。

如何在Rails中將其解碼為普通UTF8字符串? 謝謝!

看起來像是JavaScript中的escape產生的非標准格式 如果您可以影響發送此數據的代碼,則可能應該嘗試安排它改用encodeURI (這會產生UTF-8編碼字符的“正常”百分比編碼)。

# Unescape percent encoding.
#
# The normal byte-oriented format ("%41") and the non-standard <em>%u</em>
# format ("%u0410") are both supported. The single-byte variant is decoded
# as if it represents bytes encoded with the same encoding as +str+. The
# two-byte <em>%u</em> variant is decoded as UTF-16BE and then re-encoded
# with the same encoding as +str+; surrogate pairs are supported.
#
# Since the resulting string will have the same encoding as +str+, all byte
# sequences resulting from the byte-oriented decoding must be valid sequences
# in the the encoding of +str+. Correspondingly, the encoding of +str+ must
# be compatible with any extended characters that are decoded from the
# UTF-16BE <em>%u</em> encodings.

def unescape(str)
  hh = /[0-9a-f]{2}/i
  hhhh = /[0-9a-f]{4}/i
  str.gsub(/((?:%#{hh})+)|((?:%u#{hhhh})+)/) do
    if $1
      $1.scan(hh).map(&:hex).pack('C*').force_encoding(str.encoding)
    elsif $2
      $2.scan(hhhh).map(&:hex).pack('S*').force_encoding(Encoding::UTF_16BE).
        encode!(str.encoding)
    else
      raise 'unhandled match'
    end
  end
end


def all_same?(e)
  first = e.first
  e.drop(1).all? { |o| o.eql?(first) }
end

ss = [
  # %-encoded-UTF-16BE -> SJIS (just for something fun... UTF-8 works fine)
  '%u041D%u0430%u0434%u043E%u0435%u043B'.encode!(Encoding::SJIS),
  # %-encoded-ISO-8859-5 -> ISO-8859-5
  '%bd%d0%d4%de%d5%db'.encode!(Encoding::ISO8859_5),
  # %-encoded-UTF-8 -> UTF-8
  '%d0%9d%d0%b0%d0%b4%d0%be%d0%b5%d0%bb'.encode!(Encoding::UTF_8),
]

ss2 = [ # demonstrate non-decoded content and UTF-16BE surrogate pair decoding
  # %-encoded-UTF-16BE -> UTF-8
  'A%uD801%uDC10%u0410'.encode!(Encoding::UTF_8),
  # %-encoded-UTF-8 -> UTF-8
  '%41%f0%90%90%90%D0%90'.encode!(Encoding::UTF_8),
]

ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss.map { |s| s.encode(Encoding::UTF_8) }

ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss2.map { |s| s.encode(Encoding::UTF_8) }

通過irb運行時:

ruby-1.9.2-head >   ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:Shift_JIS>, #<Encoding:ISO-8859-5>, #<Encoding:UTF-8>]
 => ["\x{844E}\x{8470}\x{8474}\x{8480}\x{8475}\x{847C}", "\xBD\xD0\xD4\xDE\xD5\xDB", "Надоел"] 
ruby-1.9.2-head > all_same? ss.map { |s| s.encode(Encoding::UTF_8) }
 => true 
ruby-1.9.2-head > 
ruby-1.9.2-head >   ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:UTF-8>, #<Encoding:UTF-8>]
 => ["A𐐐А", "A𐐐А"] 
ruby-1.9.2-head > all_same? ss2.map { |s| s.encode(Encoding::UTF_8) }
 => true 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM