[英]What unicode character (emoji) it was?
I have that string in my text file: ├░┬č┬Ź┬ć
我的文本文件中有那个字符串: ├░┬č┬Ź┬ć
What is known is that it was emoji or at least some surrogate character/character created by javascript string of length 2 or 4已知的是它是表情符号或至少是由长度为 2 或 4 的 javascript 字符串创建的一些代理字符/字符
Because of some reason it end up in that form.由于某种原因,它最终以这种形式结束。 (It was obtained from mysql database which is utf8_general_ci
and by node.js/mysql2/connection with charset latin1_swedish_ci
) (它是从 mysql 数据库utf8_general_ci
和 node.js/mysql2/connection with charset latin1_swedish_ci
)
How can I find what emoji it was?我怎样才能找到它是什么表情符号? Is it possible?是否可以?
Other examples:其他例子:
├░┬č┬ĺ┬Ž
├░┬č┬ś┬ł
├░┬č┬ą┬Á
├░┬č┬ĺ┬Ž
├░┬č┬ś┬ł
├░┬č┬ą┬Á
Algorithm written in JS would be best option.用 JS 编写的算法将是最好的选择。
It's double mojibake as shown in the following python
code snippet (sorry, I cannot give Javascript equivalent):它是双mojibake ,如下面的python
代码片段所示(抱歉,我不能给出等效的Javascript ):
print('🍆 💦 😈 🥵'.
encode('utf-8').decode('latin1'). # 1st mojibake stage
encode('utf-8').decode('cp852') # 2nd mojibake stage
) # ├░┬č┬Ź┬ć ├░┬č┬ĺ┬Ž ├░┬č┬ś┬ł ├░┬č┬ą┬Á
Possible repair (although prevention is better than cure ):可能的修复(尽管预防胜于治疗):
print('├░┬č┬Ź┬ć ├░┬č┬ĺ┬Ž ├░┬č┬ś┬ł ├░┬č┬ą┬Á'.
encode('cp852').decode('utf-8'). # fix 2nd mojibake stage
encode('latin1').decode('utf-8') # fix 1st mojibake stage
) # 🍆 💦 😈 🥵
FYI, those emojis are (column CodePoint
contains Unicode ( U+hhhh
) and UTF-8 bytes; column Description
contains surrogate pairs in parentheses):仅供参考,这些表情符号是(列CodePoint
包含 Unicode ( U+hhhh
) 和 UTF-8 字节;列Description
包含括号中的代理对):
Char CodePoint Description
---- --------- -----------
🍆 {U+1F346, 0xF0,0x9F,0x8D,0x86} AUBERGINE (0xd83c,0xdf46)
💦 {U+1F4A6, 0xF0,0x9F,0x92,0xA6} SPLASHING SWEAT SYMBOL (0xd83d,0xdca6)
😈 {U+1F608, 0xF0,0x9F,0x98,0x88} SMILING FACE WITH HORNS (0xd83d,0xde08)
🥵 {U+1F975, 0xF0,0x9F,0xA5,0xB5} OVERHEATED FACE (0xd83e,0xdd75)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.