在gspread包裝器中使用unicode函數時出錯。潛在和錯誤

Question

當將unicode函數與以下字符串一起使用時，會產生錯誤：

unicode('All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 68: ordinal not in range(128)

當我檢查位置68時，它似乎是撇號' ：

>>> str='All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> str[62:75]
' haven\xe2\x80\x99t us'

有沒有辦法解決這個問題。 我在第426行的文件models.py中的gspread包裝器中發現了此錯誤。這是以下行：

425 cell_elem = feed.find(_ns1('cell'))
426 cell_elem.set('inputValue', unicode(val))
427 uri = self._get_link('edit', feed).get('href')

因此，在這種情況下，一旦我嘗試使用值（字符串）更新單元格，gspread包裝器就會嘗試將其轉換為unicode，但由於撇號而無法這樣做。 潛在地，這是一個錯誤。 該如何處理？ 謝謝您的幫助。

Answer 1

無需替換字符。 只需將編碼后的字符串正確解碼為unicode：

>>> s = 'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> s.decode('utf-8')
u'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven\u2019t used it for a month so I\u2019ll check on this.'  # unicode object

您需要告訴python str對象使用什么編碼才能將其轉換為unicode，而不僅僅是直接使用unicode(some_str) 。 在這種情況下，您的字符串使用UTF-8編碼。 使用這種方法比嘗試替換字符更好地擴展，因為對於數據庫中存在的每個unicode字符都不需要特殊情況。

IMO，使用Python處理Unicode的最佳實踐是：

盡早將字符串解碼為來自外部源（例如DB）的unicode。
在內部將它們用作unicode對象。
僅在需要將它們發送到外部位置（文件，DB，套接字等）時，才將它們編碼回字節字符串。

我還建議您檢查一下這張幻燈片，它很好地概述了如何在Python中處理unicode。

在gspread包裝器中使用unicode函數時出錯。潛在和錯誤

問題描述

1 個解決方案

解決方案1
0 2014-07-30 18:58:01

在gspread包裝器中使用unicode函數時出錯。 潛在和錯誤

問題描述

1 個解決方案

解決方案1 0 2014-07-30 18:58:01

在gspread包裝器中使用unicode函數時出錯。潛在和錯誤

解決方案1
0 2014-07-30 18:58:01