如何解码字符串中的unicode字符？

Question

我有以下字符串：

Conversely, companies that aren\u0019t sharp-eyed enough to see that their real Dumbwaiter Pitches are lame, tired, or just plain evil \u0014 well, they usually end up facing extinction.

此字符串包含 '\t'。 我无法解码，因为它已经是一个字符串。 如果我先编码，然后解码，它仍然显示'\t'。 我如何让它显示一个 ' ？

Answer 1

一种选择是对它进行literal_eval：

import ast
s = r"Conversely, companies that aren\u0019t sharp-eyed enough to see that their real Dumbwaiter Pitches are lame, tired, or just plain evil \u0014 well, they usually end up facing extinction. \u2661"
r = ast.literal_eval(f'"{s}"')
print(r)

输出：

Conversely, companies that arent sharp-eyed enoughto see that their real Dumbwaiter Pitches are lame, tired, or just plain evil  well, they usually endup facing extinction. ♡

Answer 2

不知何故，Unicode 转义字符串超出了 2000 十六进制。 Unicode 破折号和撇号是：

Unicode 字符“EM DASH”（U+2014）

和

Unicode 字符“右单引号”（U+2019）

所以无论如何让我们修复它，即使错误是在源（THEM）而不是目的地：

import re
text = r'Conversely, companies that aren\u0019t sharp-eyed enough to see that their real Dumbwaiter Pitches are lame, tired, or just plain evil \u0014 well, they usually end up facing extinction.'
pattern = r'\\u([0-9a-fA-F]{4})'

# used to indicate the end of the previous match
# to save the string parts that don't need character encoding
off = 0
# start with an empty string
s = r''
# find and iterate over all matches of \uHHHH where H is a hex digit
for u in re.finditer(pattern, text):
    # append anything up to the unicode escape
    s += text[off:u.start()]
    # fix encoding mistake, unicode escapes are 2000 hex off the mark
    # then append it
    s += chr(int(u.group(1), 16) + 0x2000)
    # set off to the end of the match
    off = u.end()
# append everything from the last match to the end of the line
s += text[off:len(text)]
print(s)

打印出来

Conversely, companies that aren’t sharp-eyed enough to see that their real Dumbwaiter Pitches are lame, tired, or just plain evil — well, they usually end up facing extinction.

请注意，尽管我很高兴地忽略了文本中任何可能存在的\\\\u00xx （反斜杠本身被转义），但这是我留给您解决的问题。 当然，文本中任何正确的Unicode 转义也会被更改。

如何解码字符串中的unicode字符？

问题描述

2 个解决方案

解决方案1
2 2020-01-14 15:31:51

解决方案2
0 2020-01-14 16:26:26

如何解码字符串中的unicode字符？

问题描述

2 个解决方案

解决方案1 2 2020-01-14 15:31:51

解决方案2 0 2020-01-14 16:26:26

解决方案1
2 2020-01-14 15:31:51

解决方案2
0 2020-01-14 16:26:26