[英]Error when using unicode function in the gspread wrapper. Potentially and bug
When using the unicode function with the following string it gives an error: 当将unicode函数与以下字符串一起使用时,会产生错误:
unicode('All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 68: ordinal not in range(128)
When I check position 68 it appears to be the apostroph '
: 当我检查位置68时,它似乎是撇号
'
:
>>> str='All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> str[62:75]
' haven\xe2\x80\x99t us'
Is there a way to deal with this issue. 有没有办法解决这个问题。 I found this bug in the gspread wrapper in the file models.py on line 426. Here is the line:
我在第426行的文件models.py中的gspread包装器中发现了此错误。这是以下行:
425 cell_elem = feed.find(_ns1('cell'))
426 cell_elem.set('inputValue', unicode(val))
427 uri = self._get_link('edit', feed).get('href')
So once I try to update a cell with a value, string in this case, the gspread wrapper tries to convert it into unicode, but cannot do so because of the apostroph. 因此,在这种情况下,一旦我尝试使用值(字符串)更新单元格,gspread包装器就会尝试将其转换为unicode,但由于撇号而无法这样做。 Potentially, it is a bug.
潜在地,这是一个错误。 How to deal with this issue?
该如何处理? Thanks for the help.
谢谢您的帮助。
There's no need to replace the character. 无需替换字符。 Just properly decode the encoded string to unicode:
只需将编码后的字符串正确解码为unicode:
>>> s = 'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven’t used it for a month so I’ll check on this.'
>>> s.decode('utf-8')
u'All but Buitoni are using Pinterest buffers and Pratt & Lamber haven\u2019t used it for a month so I\u2019ll check on this.' # unicode object
You need to tell python what encoding your str
object is using in order to convert it to unicode, rather than just using unicode(some_str)
directly. 您需要告诉python
str
对象使用什么编码才能将其转换为unicode,而不仅仅是直接使用unicode(some_str)
。 In this case, your string is encoded with UTF-8
. 在这种情况下,您的字符串使用
UTF-8
编码。 Using this approach will scale better than trying to replace characters, because you won't need a special case for every unicode character that exists in the DB. 使用这种方法比尝试替换字符更好地扩展,因为对于数据库中存在的每个unicode字符都不需要特殊情况。
IMO, the best practice for dealing with unicode in Python is this: IMO,使用Python处理Unicode的最佳实践是:
unicode
objects internally. unicode
对象。 I'd also recommend checking out this slide deck , which gives a really good overview of how to deal with unicode in Python. 我还建议您检查一下这张幻灯片 ,它很好地概述了如何在Python中处理unicode。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.