简体   繁体   English

如何在python中将unicode字符串转换为普通文本

[英]How to convert unicode string into normal text in python

Consider I have a Unicode string (Not the real unicode but the string that looks like unicode). 考虑我有一个Unicode字符串(不是真正的unicode,而是看起来像unicode的字符串)。 and I want to get it's utf-8 variant. 我想得到它的utf-8变种。 How can I do it in Python? 我怎么能在Python中做到这一点? For example If I have String like: 例如,如果我有像这样的字符串:

title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"

How Can I do it so that I get its utf-8 variant (Georgian symbols): 我该怎么做才能得到它的utf-8变体(格鲁吉亚符号):

ისრაელი == იერუსალიმი ისრაელი==იერუსალიმი

To say it simply I want to Have code like: 简单地说,我希望有这样的代码:

title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
utfTitle = title.TurnToUTF()
print(utfTitle)

And I want this code to have output: 我希望这段代码有输出:

ისრაელი == იერუსალიმი ისრაელი==იერუსალიმი

You can use the unicode-escape codec to get rid of the doubled-backslashes and use the string effectively. 您可以使用unicode-escape编解码器来摆脱双反斜杠并有效地使用字符串。

Assuming that title is a str , you will need to encode the string first before decoding back to unicode( str ). 假设titlestr ,则需要在解码回unicode( str )之前先对字符串进行编码。

>>> t = title.encode('utf-8').decode('unicode-escape')
>>> t
'ისრაელი == იერუსალიმი'

If title is a bytes instance you can decode directly: 如果title是一个bytes实例,你可以直接解码:

>>> t = title.decode('unicode-escape')
>>> t
'ისრაელი == იერუსალიმი'

Here, you go. 干得好。 Just use decode method and apply unicode_escape 只需使用decode方法并应用unicode_escape

For Python 2.x 对于Python 2.x

title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
utfTitle = title.decode('unicode_escape')
print(utfTitle)

#output :ისრაელი == იერუსალიმი

For python 3.x 对于python 3.x

title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
print(title.encode('ascii').decode('unicode-escape'))

let assume the unicode be str type and convert using decode and unicode-escape method 假设unicode是str类型并使用decode和unicode-escape方法进行转换

title="\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"

res1 = title.encode('utf-8')

res2 = res1.decode('unicode-escape')

print(res2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM