简体   繁体   English

UnicodeEncodeError 与 csv.writer

[英]UnicodeEncodeError with csv.writer

Pardon my ugly newb code, I'm learning.请原谅我丑陋的新代码,我正在学习。 I'm pulling movie data from OMDB API, but when I move it to CSV I get UnicodeEncodeError for many films.我正在从 OMDB API 中提取电影数据,但是当我将其移动到 CSV 时,我得到了许多电影的 UnicodeEncodeError。 Likely because actor names have accents, for instance.例如,可能是因为演员名字有重音。 I want to 1.) Identify which films are problematic, 2.) skip them, and/or 3.) preferably correct the error.我想 1.) 确定哪些电影有问题,2.) 跳过它们,和/或 3.) 最好更正错误。 What I have currently just passes the whole thing when an error occurs.当发生错误时,我目前所拥有的只是通过了整个事情。 Looking for a simple fix, since I'm novice.寻找一个简单的修复,因为我是新手。

import csv
import os
import json
import omdb

movie_list = ['A Good Year', 'A Room with a View', 'Anchorman', 'Amélie', 'Annie Hall', 'Before Sunrise']

data_list = []

textdoc = open('textdoc.txt','w')

for w in movie_list:
    x = omdb.request(t=w, fullplot=True, tomatoes=True, r='json')
    y = x.content
    z = json.loads(y)
    data_list.append([z["Title"], z["Year"], z["Actors"], z["Awards"], z["Director"], z["Genre"], z["Metascore"], z["Plot"], z["Rated"], z["Runtime"], z["Writer"], z["imdbID"], z["imdbRating"], z["imdbVotes"], z["tomatoRating"], z["tomatoReviews"], z["tomatoFresh"], z["tomatoRotten"], z["tomatoConsensus"], z["tomatoUserMeter"], z["tomatoUserRating"], z["tomatoUserReviews"]])

try:
    with open('Films.csv', 'w') as g:
        a = csv.writer(g, delimiter=',')
        a.writerow(["Title", "Year", "Actors", "Awards", "Director", "Genre", "Metascore", "Plot", "Rated", "Runtime", "Writer", "imdbID", "imdbRating", "imdbVotes", "tomatoRating", "tomatoReviews", "tomatoFresh", "tomatoRotten", "tomatoConsensus", "tomatoUserMeter", "tomatoUserRating", "tomatoUserReviews"])
        a.writerows(data_list)
except UnicodeEncodeError:
    print("fail")

Python 2.x:Instead of with open("Films.csv", 'w') as g: you could try to use codecs in order to open the csv output as UTF-8 encoding. Python 2.x:代替with open("Films.csv", 'w') as g:您可以尝试使用编解码器以将 csv 输出打开为UTF-8编码。

import codecs
with codecs.open('Films.csv', 'w', encoding='UTF-8') as g:
# rest of code

Python 3.x: try opening g with UTF-8 encoding: Python 3.x:尝试使用UTF-8编码打开g

with open('Films.csv', 'w', encoding='UTF-8') as g:
# rest of code.

try out smart_str试试smart_str

from django.utils.encoding import smart_str
data_list.append(map(smart_str, [z['element1'], z['element2']]))
a.write_row(map(smart_str, ["Title", "Year", "Actors", "Awards", "Director", "Genre", "Metascore", "Plot", "Rated", "Runtime", "Writer", "imdbID", "imdbRating", "imdbVotes", "tomatoRating", "tomatoReviews", "tomatoFresh", "tomatoRotten", "tomatoConsensus", "tomatoUserMeter", "tomatoUserRating", "tomatoUserReviews"]))
a.write_rows(data_list)

If using Python 2, csvwriter doesn't really support Unicode, but there is an example in the csv documentation to work around it.如果使用 Python 2, csvwriter并不真正支持 Unicode,但csv文档中有一个示例可以解决它。 An example is in this answer .这个答案就是一个例子。

If using Python 3, then make the following changes:如果使用 Python 3,则进行以下更改:

y = x.content.decode('utf8')

and

with open('Films.csv', 'w', encoding='utf8',newline='') as g:

With these changes text is decoded to Unicode for processing within the Python script, and encoded back to UTF-8 when written to a file.通过这些更改,文本被解码为 Unicode 以在 Python 脚本中进行处理,并在写入文件时编码回 UTF-8。 This is the recommended way to deal with Unicode.这是处理 Unicode 的推荐方法。

newline='' is the correct way to open a file for csv use. newline=''是打开文件以供csv使用的正确方法。 See this answer and the csv docs.请参阅此答案csv文档。

You can remove the try / except as well.您也可以删除try / except It just suppresses useful tracebacks.它只是抑制有用的回溯。

The solution that works for me is to add at the beginning of the export procedure:对我有用的解决方案是在导出过程的开头添加:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM