简体   繁体   English

在Ruby中将匹配的UTF-8值字符串转换为UTF-8字符

[英]Convert matched string of UTF-8 values to UTF-8 characters in Ruby

Trying to convert output from a rest_client GET to the characters that are represented with escape sequences. 尝试将rest_client GET的输出转换为以转义序列表示的字符。

Input: ..."sub_id":"\ග\脸\脴\㼰\脹\⭱"... 输入: ..."sub_id":"\ග\脸\脴\㼰\脹\⭱"...

(which I put in 'all_subs') (我把它放在“ all_subs”中)

Match: m = /sub_id\\"\\:\\"([^\\"]+)\\"/.match(all_subs.to_str) [1] 匹配: m = /sub_id\\"\\:\\"([^\\"]+)\\"/.match(all_subs.to_str) [1]

Print: puts m.force_encoding("UTF-8").unpack('U*').pack('U*') 打印: puts m.force_encoding("UTF-8").unpack('U*').pack('U*')

But it just comes out the same way I put it in. ie, "\ග\脸\脴\㼰\脹\⭱" 但是它的输出方式与我输入的方式相同。例如,“ \\ u0d9c \\ u8138 \\ u8134 \\ u3f30 \\ u8139 \\ u2b71”

However, if I convert a raw string of it: 但是,如果我将其转换为原始字符串:

puts "\ග\脸\脴\㼰\脹\⭱".unpack('U*').pack('U*')

The output is perfect as "ග脸脴㼰脹⭱" 输出完美为“ග脸脴㼰胀⭱”

What you're getting when you parse the input string is actually this: 解析输入字符串时得到的实际上是这样的:

m = "\\u0d9c\\u8138\\u8134\\u3f30\\u8139\\u2b71"

Which is not the same as: 这与以下内容不同:

"\u0d9c\u8138\u8134\u3f30\u8139\u2b71"

Therefore one option is to eval the string so that ruby applies the codepoints: 因此,一种选择是eval字符串,以便ruby应用代码点:

puts eval("\"#{m}\"")
=> ග脸脴㼰脹

However note that there are security implications when running eval. 但是请注意,在运行eval时会涉及安全性。

If the string is always like in your example. 如果字符串始终像您的示例中那样。 You could also do something like this, which is safe: 您也可以这样做,这很安全:

puts m.split("\\u")[1..-1].map { |c| c.to_i(16) }.pack("U*")
=> ග脸脴㼰脹

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM