[英]python: getting UnicodeEncodeError when trying to parse a list
trying to pipe a list that I've scraped from http://www.ropeofsilicon.com/roger-eberts-great-movies-list/ through the API at http://www.omdbapi.com/ to grab their IMDB ids. 想管,我已经从一个刮名单http://www.ropeofsilicon.com/roger-eberts-great-movies-list/通过API在http://www.omdbapi.com/抓住他们IMDB IDS 。
creating logging for movies that I can and can't find as follows: 为我无法找到的电影创建日志,如下所示:
import requests
OMDBPath = "http://www.omdbapi.com/"
movieFile = open("movies.txt")
foundLog = open("log_found.txt", 'w')
notFoundLog = open("log_not_found.txt", 'w')
####
for line in movieFile:
name = line.split('(')[0].decode('utf8')
print name
year = False
if line.find('(') != -1:
year = line[line.find('(')+1 : line.find(')')].decode('utf8')
OMDBQuery = {'t': name, 'y': year}
else:
OMDBQuery = {'t': name}
req = requests.get(OMDBPath, params=OMDBQuery)
if req.json()[u'Response'] == "False":
if year:
notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
else:
notFoundLog.write("Couldn't find " + name + "\n")
# else:
# print req.json()
# foundLog.write(req.text.decode('utf8').encode('latin1') + ",")
movieFile.close()
foundLog.close()
notFoundLog.close()
Been reading a lot about unicode encoding and decoding, looks like this is happening because I'm not encoding the file in the right manner? 读了很多有关unicode编码和解码的内容,看来是因为我没有以正确的方式对文件进行编码? Not sure what's wrong here, getting an issue when I get to "Caché":
不确定这里出了什么问题,当我进入“Caché”时遇到问题:
Caché
Traceback (most recent call last):
File "app.py", line 34, in <module>
notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 18: ordinal not in range(128)
Here is a working solution that relies on the codecs
module to provide transparent encoding/decoding to/from utf-8
for the various files you open: 这是一个有效的解决方案,它依赖于
codecs
模块为打开的各种文件提供到utf-8
透明编码/解码:
import requests
import codecs
OMDBPath = "http://www.omdbapi.com/"
with codecs.open("movies.txt", encoding='utf-8') as movieFile, \
codecs.open("log_found.txt", 'w', encoding='utf-8') as foundLog, \
codecs.open("log_not_found.txt", 'w', encoding='utf-8') as notFoundLog:
for line in movieFile:
name = line.split('(')[0]
print(name)
year = False
if line.find('(') != -1:
year = line[line.find('(')+1 : line.find(')')]
OMDBQuery = {'t': name, 'y': year}
else:
OMDBQuery = {'t': name}
req = requests.get(OMDBPath, params=OMDBQuery)
if req.json()[u'Response'] == "False":
if year:
notFoundLog.write(u"Couldn't find {} ({})\n".format(name, year))
else:
notFoundLog.write(u"Couldn't find {}\n".format(name))
#else:
#print(req.json())
#foundLog.write(u"{},".format(req.text))
Note that the use of the codecs
module is only required in Python 2.x. 请注意,仅在Python 2.x中才需要使用
codecs
模块。 In Python 3.x, the built-in open
function should handle this properly by default. 在Python 3.x中,默认情况下内置的
open
函数应正确处理此问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.