[英]How to convert unicode string into normal text in python
Consider I have a Unicode string (Not the real unicode but the string that looks like unicode). 考虑我有一个Unicode字符串(不是真正的unicode,而是看起来像unicode的字符串)。 and I want to get it's utf-8 variant.
我想得到它的utf-8变种。 How can I do it in Python?
我怎么能在Python中做到这一点? For example If I have String like:
例如,如果我有像这样的字符串:
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
How Can I do it so that I get its utf-8 variant (Georgian symbols): 我该怎么做才能得到它的utf-8变体(格鲁吉亚符号):
ისრაელი == იერუსალიმი
ისრაელი==იერუსალიმი
To say it simply I want to Have code like: 简单地说,我希望有这样的代码:
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
utfTitle = title.TurnToUTF()
print(utfTitle)
And I want this code to have output: 我希望这段代码有输出:
ისრაელი == იერუსალიმი
ისრაელი==იერუსალიმი
You can use the unicode-escape codec to get rid of the doubled-backslashes and use the string effectively. 您可以使用unicode-escape编解码器来摆脱双反斜杠并有效地使用字符串。
Assuming that title
is a str
, you will need to encode the string first before decoding back to unicode( str
). 假设
title
是str
,则需要在解码回unicode( str
)之前先对字符串进行编码。
>>> t = title.encode('utf-8').decode('unicode-escape')
>>> t
'ისრაელი == იერუსალიმი'
If title
is a bytes
instance you can decode directly: 如果
title
是一个bytes
实例,你可以直接解码:
>>> t = title.decode('unicode-escape')
>>> t
'ისრაელი == იერუსალიმი'
Here, you go. 干得好。 Just use
decode
method and apply unicode_escape
只需使用
decode
方法并应用unicode_escape
For Python 2.x 对于Python 2.x
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
utfTitle = title.decode('unicode_escape')
print(utfTitle)
#output :ისრაელი == იერუსალიმი
For python 3.x 对于python 3.x
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
print(title.encode('ascii').decode('unicode-escape'))
let assume the unicode be str type and convert using decode and unicode-escape method 假设unicode是str类型并使用decode和unicode-escape方法进行转换
title="\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
res1 = title.encode('utf-8')
res2 = res1.decode('unicode-escape')
print(res2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.