![](/img/trans.png)
[英]How to convert a byte string with a unicode character to normal text in Python?
[英]How to convert unicode string into normal text in python
考慮我有一個Unicode字符串(不是真正的unicode,而是看起來像unicode的字符串)。 我想得到它的utf-8變種。 我怎么能在Python中做到這一點? 例如,如果我有像這樣的字符串:
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
我該怎么做才能得到它的utf-8變體(格魯吉亞符號):
ისრაელი==იერუსალიმი
簡單地說,我希望有這樣的代碼:
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
utfTitle = title.TurnToUTF()
print(utfTitle)
我希望這段代碼有輸出:
ისრაელი==იერუსალიმი
您可以使用unicode-escape編解碼器來擺脫雙反斜杠並有效地使用字符串。
假設title
是str
,則需要在解碼回unicode( str
)之前先對字符串進行編碼。
>>> t = title.encode('utf-8').decode('unicode-escape')
>>> t
'ისრაელი == იერუსალიმი'
如果title
是一個bytes
實例,你可以直接解碼:
>>> t = title.decode('unicode-escape')
>>> t
'ისრაელი == იერუსალიმი'
干得好。 只需使用decode
方法並應用unicode_escape
對於Python 2.x
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
utfTitle = title.decode('unicode_escape')
print(utfTitle)
#output :ისრაელი == იერუსალიმი
對於python 3.x
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
print(title.encode('ascii').decode('unicode-escape'))
假設unicode是str類型並使用decode和unicode-escape方法進行轉換
title="\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
res1 = title.encode('utf-8')
res2 = res1.decode('unicode-escape')
print(res2)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.