在gspread包装器中使用unicode函数时出错。潜在和错误

Question

When using the unicode function with the following string it gives an error: 当将unicode函数与以下字符串一起使用时，会产生错误：

unicode('All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 68: ordinal not in range(128)

When I check position 68 it appears to be the apostroph ' : 当我检查位置68时，它似乎是撇号' ：

>>> str='All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> str[62:75]
' haven\xe2\x80\x99t us'

Is there a way to deal with this issue. 有没有办法解决这个问题。 I found this bug in the gspread wrapper in the file models.py on line 426. Here is the line: 我在第426行的文件models.py中的gspread包装器中发现了此错误。这是以下行：

425 cell_elem = feed.find(_ns1('cell'))
426 cell_elem.set('inputValue', unicode(val))
427 uri = self._get_link('edit', feed).get('href')

So once I try to update a cell with a value, string in this case, the gspread wrapper tries to convert it into unicode, but cannot do so because of the apostroph. 因此，在这种情况下，一旦我尝试使用值（字符串）更新单元格，gspread包装器就会尝试将其转换为unicode，但由于撇号而无法这样做。 Potentially, it is a bug. 潜在地，这是一个错误。 How to deal with this issue? 该如何处理？ Thanks for the help. 谢谢您的帮助。

Answer 1

There's no need to replace the character. 无需替换字符。 Just properly decode the encoded string to unicode: 只需将编码后的字符串正确解码为unicode：

>>> s = 'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> s.decode('utf-8')
u'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven\u2019t used it for a month so I\u2019ll check on this.'  # unicode object

You need to tell python what encoding your str object is using in order to convert it to unicode, rather than just using unicode(some_str) directly. 您需要告诉python str对象使用什么编码才能将其转换为unicode，而不仅仅是直接使用unicode(some_str) 。 In this case, your string is encoded with UTF-8 . 在这种情况下，您的字符串使用UTF-8编码。 Using this approach will scale better than trying to replace characters, because you won't need a special case for every unicode character that exists in the DB. 使用这种方法比尝试替换字符更好地扩展，因为对于数据库中存在的每个unicode字符都不需要特殊情况。

IMO, the best practice for dealing with unicode in Python is this: IMO，使用Python处理Unicode的最佳实践是：

Decode strings to unicode from external sources (like a DB) as early as possible. 尽早将字符串解码为来自外部源（例如DB）的unicode。
Use them as unicode objects internally. 在内部将它们用作unicode对象。
Encode them back to byte strings only when you need to send them to an external location (a file, a DB, a socket, etc.) 仅在需要将它们发送到外部位置（文件，DB，套接字等）时，才将它们编码回字节字符串。

I'd also recommend checking out this slide deck , which gives a really good overview of how to deal with unicode in Python. 我还建议您检查一下这张幻灯片，它很好地概述了如何在Python中处理unicode。

在gspread包装器中使用unicode函数时出错。潜在和错误

问题描述

1 个解决方案

解决方案1
0 2014-07-30 18:58:01

在gspread包装器中使用unicode函数时出错。 潜在和错误

问题描述

1 个解决方案

解决方案1 0 2014-07-30 18:58:01

在gspread包装器中使用unicode函数时出错。潜在和错误

解决方案1
0 2014-07-30 18:58:01