python：尝试解析列表时获取UnicodeEncodeError

Question

trying to pipe a list that I've scraped from http://www.ropeofsilicon.com/roger-eberts-great-movies-list/ through the API at http://www.omdbapi.com/ to grab their IMDB ids. 想管，我已经从一个刮名单http://www.ropeofsilicon.com/roger-eberts-great-movies-list/通过API在http://www.omdbapi.com/抓住他们IMDB IDS 。

creating logging for movies that I can and can't find as follows: 为我无法找到的电影创建日志，如下所示：

import requests

OMDBPath = "http://www.omdbapi.com/"

movieFile = open("movies.txt")
foundLog = open("log_found.txt", 'w')
notFoundLog = open("log_not_found.txt", 'w')

####

for line in movieFile:
    name = line.split('(')[0].decode('utf8')
    print name
    year = False
    if line.find('(') != -1:
        year = line[line.find('(')+1 : line.find(')')].decode('utf8')
        OMDBQuery = {'t': name, 'y': year}
    else:
        OMDBQuery = {'t': name}

    req = requests.get(OMDBPath, params=OMDBQuery)
    if req.json()[u'Response'] == "False":
        if year:
            notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
        else:
            notFoundLog.write("Couldn't find " + name + "\n")
    # else:
    #     print req.json()
    #     foundLog.write(req.text.decode('utf8').encode('latin1') + ",")
movieFile.close()
foundLog.close()
notFoundLog.close()

Been reading a lot about unicode encoding and decoding, looks like this is happening because I'm not encoding the file in the right manner? 读了很多有关unicode编码和解码的内容，看来是因为我没有以正确的方式对文件进行编码？ Not sure what's wrong here, getting an issue when I get to "Caché": 不确定这里出了什么问题，当我进入“Caché”时遇到问题：

Caché
Traceback (most recent call last):
  File "app.py", line 34, in <module>
    notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 18: ordinal not in range(128)

Answer 1

Here is a working solution that relies on the codecs module to provide transparent encoding/decoding to/from utf-8 for the various files you open: 这是一个有效的解决方案，它依赖于codecs模块为打开的各种文件提供到utf-8透明编码/解码：

import requests
import codecs

OMDBPath = "http://www.omdbapi.com/"

with codecs.open("movies.txt", encoding='utf-8') as movieFile, \
     codecs.open("log_found.txt", 'w', encoding='utf-8') as foundLog, \
     codecs.open("log_not_found.txt", 'w', encoding='utf-8') as notFoundLog:
    for line in movieFile:
        name = line.split('(')[0]
        print(name)
        year = False
        if line.find('(') != -1:
            year = line[line.find('(')+1 : line.find(')')]
            OMDBQuery = {'t': name, 'y': year}
        else:
            OMDBQuery = {'t': name}

        req = requests.get(OMDBPath, params=OMDBQuery)
        if req.json()[u'Response'] == "False":
            if year:
                notFoundLog.write(u"Couldn't find {} ({})\n".format(name, year))
            else:
                notFoundLog.write(u"Couldn't find {}\n".format(name))
        #else:
            #print(req.json())
            #foundLog.write(u"{},".format(req.text))

Note that the use of the codecs module is only required in Python 2.x. 请注意，仅在Python 2.x中才需要使用codecs模块。 In Python 3.x, the built-in open function should handle this properly by default. 在Python 3.x中，默认情况下内置的open函数应正确处理此问题。

python：尝试解析列表时获取UnicodeEncodeError

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-07-08 04:08:27

python：尝试解析列表时获取UnicodeEncodeError

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-07-08 04:08:27

解决方案1
1 已采纳 2014-07-08 04:08:27