Ruby：将笑脸转换为utf-8编码

Question

How can I convert this 我该如何转换

string = "ok test body 😁😁😁\r\n-- \r\n test"

Into this 入这个

"ok test body \\ud83d\\ude01\\ud83d\\ude01\\ud83d\\ude01\r\n-- \r\n test"

I have tried this 我已经试过了

string.encode('utf-16be','utf-8')

which convert it into this form 转换成这种形式

#"ok test body \u{1F601} \u{1F601}\u{1F601}\r\n-- \r\n test"

I think i need regular expression to solve this. 我认为我需要正则表达式来解决这个问题。 Can anyone tell me how to do that. 谁能告诉我该怎么做。 Thanks 谢谢

Answer 1

Using this previous answer , this code just applies the 'U+1F601' to "\?\?" conversion to non-ascii characters : 使用前面的答案，此代码仅将'U+1F601' to "\?\?"转换为非ascii字符：

encoded_string = string.gsub(/[^[:ascii:]]/) do |non_ascii|
  non_ascii.force_encoding('utf-8')
           .encode('utf-16be')
           .unpack('H*').first
           .gsub(/(....)/,'\u\1')
end

For : 对于：

string = "ok test body 😁😁😁\r\n-- \r\n test"

it outputs: 它输出：

"ok test body \\ud83d\\ude01\\ud83d\\ude01\\ud83d\\ude01\r\n-- \r\n test"

Answer 2

Quite similar to Eric Duminil's answer : 非常类似于Eric Duminil的答案：

string.gsub(/[\u{10000}-\u{10FFFF}]/) { |m|
  '\u%s\u%s' % m.encode('UTF-16BE').unpack('H4H4')
}
#=> "ok test body \\ud83d\\ude01\\ud83d\\ude01\\ud83d\\ude01\r\n-- \r\n test"

The regular expression matches code points U+10000 to U+10FFFF, ie characters from the Supplementary Planes . 正则表达式匹配代码点U + 10000至U + 10FFFF，即来自补充平面的字符。 In UTF-16, these are represented as so-called surrogate pairs . 在UTF-16中，这些被表示为所谓的代理对 。

Each matched character is split via unpack into its high and low surrogate: (the pattern H4 extracts 4 hexadecimal characters, ie 2 bytes or 16 bits) 每个匹配的字符通过unpack拆分成其高低替代：（模式H4提取4个十六进制字符，即2个字节或16位）

'😁'.encode('UTF-16BE').unpack('H4H4')
#=> ["d83d", "de01"]

The result is formatted via % : 结果通过%格式化：

'\u%s\u%s' % ["d83d", "de01"]
#=> "\\ud83d\\ude01"

Ruby：将笑脸转换为utf-8编码

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-01-17 13:20:25

解决方案2
0 2017-01-17 14:36:53

Ruby：将笑脸转换为utf-8编码

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-01-17 13:20:25

解决方案2 0 2017-01-17 14:36:53

解决方案1
1 已采纳 2017-01-17 13:20:25

解决方案2
0 2017-01-17 14:36:53