Unicode string:
string = "CEO Frye \u2013 response to Capitalism discussion in Davos: Vote aggressively with your wallet against firms without social conscience."
I tried (via Is this the best way to unescape unicode escape sequences in Ruby? ):
def unescape_unicode(s)
s.gsub(/\\u([\da-fA-F]{4})/) {|m| [$1].pack("H*").unpack("n*").pack("U*")}
end
unescape_unicode(string) #=> CEO Frye \u2013 response to Capitalism discussion in Davos: Vote aggressively with your wallet against firms without social conscience.
But output (to file) is still identical to input! Any help would be appreciated.
Edit: Not using IRB, using RubyMine, and input is parsed from Twitter, hence the single "\\u\u0026quot;
not "\\\\u\u0026quot;
Edit 2:
Are you trying it from irb
, or outputting the string with p
?
String#inspect
(called from irb
and p str
) transform unicode characters into \\uxxxx
format to allow the string to be printed anywhere. Also, when you type "CEO Frye \– response to..."
, this is a escaped sequence resolved by the ruby parser. It is a unicode character in the final string.
str1 = "a\u2013b"
str1.size #=> 3
str2 = "a\\u2013b"
str2.size #=> 8
unescape_unicode(str2) == str1 #=> true
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.