简体   繁体   English

python中的Unicode-Ascii混合字符串

[英]Unicode-Ascii mixed string in python

I have a string that is stored in DB as: 我有一个字符串存储在数据库中为:

FB (\u30a8\u30a2\u30eb\u30fc)

when I load this row from python code, I am unable to format it correctly. 当我从python代码加载此行时,无法正确设置其格式。

# x = load that string
print x # returns u'FB (\\u30a8\\u30a2\\u30eb\\u30fc)'

Notice two "\\" This messes up the unicode chars on frontend Instead of showing the foreign chars, html shows it as \エ\ア\ル\ー 请注意,两个“ \\”将前端的unicode字符弄乱了,而不是显示外部字符,html将其显示为\\ u30a8 \\ u30a2 \\ u30eb \\ u30fc

However, if I load append some characters to convert it into a json format and load the json, I get the expected result. 但是,如果我加载附加一些字符以将其转换为json格式并加载json,则会得到预期的结果。

s = '{"a": "%s"}'%x
json.loads(s)['a']
#prints u'FB (\u30a8\u30a2\u30eb\u30fc)'

Notice the difference between this result (which shows up correctly on frontend) and directly printing x (which has extra ). 请注意,此结果(在前端正确显示)与直接打印x(具有extra)之间是有区别的。 So though this hacky solution works, I want a cleaner solution. 因此,尽管这种骇人听闻的解决方案有效,但我想要一个更干净的解决方案。 I played around a lot with x.encode('utf-8') etc, but none has worked yet. 我在x.encode('utf-8')等游戏中玩了很多,但是都没有用。

Thank you! 谢谢!

Since you already have a Unicode string, encode it back to ASCII and decode it with the unicode_escape codec: 由于您已经有了Unicode字符串,因此将其编码回ASCII并使用unicode_escape编解码器进行解码:

>>> s = u'FB (\\u30a8\\u30a2\\u30eb\\u30fc)'
>>> s
u'FB (\\u30a8\\u30a2\\u30eb\\u30fc)'
>>> print s
FB (\u30a8\u30a2\u30eb\u30fc)
>>> s.encode('ascii').decode('unicode_escape')
u'FB (\u30a8\u30a2\u30eb\u30fc)'
>>> print s.encode('ascii').decode('unicode_escape')
FB (エアルー)
raw_string = '\u30a8\u30a2\u30eb\u30fc'
string = ''.join([unichr(int(r, 16)) for r in raw_string.split('\\u') if r])
print(string)

A way to solve this, expecting a better answer. 解决此问题的一种方法,期待更好的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM