[英]Unicode Decode Error in Python
TDB = csv.reader(codecs.open('data/TDS.csv', 'rb', encoding='utf-8'), delimiter=',', quotechar='"')
ts = db.testCol
for row in TDB:
print row[1]
T = {"t":row[1],
"s": row[0]}
post_id = ts.insert(T)
I not sure why i can't encode it into utf-8 while i want to insert data into database i must make it in utf8 format. 我不确定为什么要在数据库中插入数据时无法将其编码为utf-8,所以我必须将其转换为utf8格式。
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 36: invalid continuation byte
Before i put the encoding function, i got this from pymongo. 在放置编码功能之前,我是从pymongo获得的。
bson.errors.InvalidStringData: strings in documents must be valid UTF-8
and i guess, this is the data it couldn't encode 我想这是无法编码的数据
'compleja e intelectualmente retadora , el ladrÛn de orquÌdeas es uno de esos filmes que vale la pena ver precisamente por su originalidad . '
Anyone know how should i do? 有人知道我该怎么办吗? Thanks
谢谢
Ok, this might help.. 好的,这可能会有所帮助。
There are a list of encodings here: 这里有一个编码列表:
http://docs.python.org/2/library/codecs.html#standard-encodings http://docs.python.org/2/library/codecs.html#standard-encodings
latin-1
is a common encoding used for languages in europe. latin-1
是欧洲语言常用的编码。
The basic flow with dealing with encodings is: 处理编码的基本流程是:
You can try going through encodings that seem right and see which ones don't cause an error: 您可以尝试看似正确的编码,看看哪些不会导致错误:
enc = "latin-1"
f = open("TSD.csv", "r")
content = f.read() # raw encoded content
u_content = content.decode(enc) # decodes from enc to unicode
utf8_content = u_content.encode("utf8")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.