在Python中解碼unicode字符串變量

Question

我在Python v2.7中使用API來獲取字符串，其內容未知。 內容可以是英語，德語或法語。 分配給返回字符串的變量名稱是“category”。 變量類別的返回值的示例是： -

"temp\\u00eate de poussi\\u00e8res"

我已經嘗試使用category.decode('utf-8')將字符串解碼為，在上面的例子中，是法語，但不幸的是它仍然返回相同的值，當我打印結果時，在開頭有一個額外的unicode'u' category.decode('utf-8') 。

u'"temp\\u00eate de poussi\\u00e8res'

我也嘗試了category.encode('utf-8')但它返回相同的值（減去字符串前面的'u'： -

'"temp\\u00eate de poussi\\u00e8res"'

有什么建議么？

Answer 1

我認為你的字符串中有字面斜杠，而不是unicode字符。

也就是說， \ê是ê的unicode轉義編碼，但\\\ê實際上是一個斜線（轉義），兩個零和兩個字母。

與引號類似，您的第一個和最后一個字符是字面雙引號" 。

您可以將這些斜杠加代碼點轉換為等效字符：

x = '"temp\\u00eate de poussi\\u00e8res"'
d = x.decode("unicode_escape")
print d

輸出是：

"tempête de poussières"

請注意，要查看正確的國際字符，您必須使用print。 如果你只是在交互式Python shell中編寫d ，你得到：

 u'"temp\xeate de poussi\xe8res"'

其中\\xea等同於\ê ，這是ê的轉義序列。

如果需要，刪除引號留給讀者作為練習;-)。

Answer 2

看起來API使用JSON。 您可以使用json模塊對其進行解碼：

>>> import json
>>> json.loads('"temp\\u00eate de poussi\\u00e8res"')
u'temp\xeate de poussi\xe8res'
>>> print(json.loads('"temp\\u00eate de poussi\\u00e8res"'))
tempête de poussières

在Python中解碼unicode字符串變量

問題描述

2 個解決方案

解決方案1
2 2018-12-04 15:17:38

解決方案2
1 2018-12-04 17:05:33

在Python中解碼unicode字符串變量

問題描述

2 個解決方案

解決方案1 2 2018-12-04 15:17:38

解決方案2 1 2018-12-04 17:05:33

解決方案1
2 2018-12-04 15:17:38

解決方案2
1 2018-12-04 17:05:33