Python CSV操作脚本中的UnicodeEncodeError

Question

I have a script that was working earlier but now stops due to UnicodeEncodeError. 我有一个较早工作的脚本，但由于UnicodeEncodeError现在已停止。

I am using Python 3.4.3. 我正在使用Python 3.4.3。

The full error message is the following: 完整的错误消息如下：

Traceback (most recent call last):
  File "R:/A/APIDevelopment/ScivalPubsExternal/Combine/ScivalPubsExt.py", line 58, in <module>
    outputFD.writerow(row)
  File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x8a' in position 413: character maps to <undefined>

How can I address this error? 我该如何解决这个错误？

The Python script is the following below: 以下是Python脚本：

import pdb
import csv,sys,os
import glob
import os
import codecs

os.chdir('R:/A/APIDevelopment/ScivalPubsExternal/Combine')
joinedFileOut='ScivalUpdate'
csvSourceDir="R:/A/APIDevelopment/ScivalPubsExternal/Combine/AustralianUniversities"

# create dictionary from Codes file (Institution names and codes)
codes = csv.reader(open('Codes.csv'))
#rows of the file are stored as lists/arrays
InstitutionCodesDict = {}
InstitutionYearsDict = {}
for row in codes:
   #keys: instnames, #values: instcodes
    InstitutionCodesDict[row[0]] = row[1]
    #define year dictionary with empty values field
    InstitutionYearsDict[row[0]] = []

#to create a fiel descriptor for the outputfile, wt means text mode (also rt opr r is the same)
with open(joinedFileOut,'wt') as csvWriteFD:
#write the file (it is still empty here)
   outputFD=csv.writer(csvWriteFD,delimiter=',')
#with closes the file at the end, if exception occurs then before that


   # open each scival file, create file descriptor (encoding needed) and then read it and print the name of the file
   if not glob.glob(csvSourceDir+"/*.csv"):
      print("CSV source files not found")
      sys.exit()

   for scivalFile in glob.glob(csvSourceDir+"/*.csv"):
       #with open(scivalFile,"rt", encoding="utf8") as csvInFD:
       with open(scivalFile,"rt", encoding="ISO-8859-1") as csvInFD:
          fileFD = csv.reader(csvInFD)
          print(scivalFile)

          #create condition for loop
          printon=False

          #reads all rows in file and creates lists/arrays of each row
          for row in fileFD:
              if len(row)>1:
                 #the next printon part is skipped when looping through the rows above the data because it is not set to true
                 if printon:
                    #inserts instcode and inst sequentially to each row where there is data and after the header row
                    row.insert(0, InstitutionCode)
                    row.insert(0, Institution)
                    if row[10].strip() == "-":
                       row[10] = " "
                    else:
                       p = row[10].zfill(8)
                       q = p[0:4] + '-' + p[4:]
                       row[10] = q
                    #writes output file
                    outputFD.writerow(row)
                 else:
                    if "Publications at" in row[1]:
                       #get institution name from cell B1
                       Institution=row[1].replace('Publications at the ', "").replace('Publications at ',"")
                       print(Institution)
                       #lookup institution code from dictionary
                       InstitutionCode=InstitutionCodesDict[Institution]
                    #printon gets set to TRUE after the header column
                    if "Title" in row[0]: printon=True
                    if "Publication years" in row[0]:
                       #get the year to print it later to see which years were pulled
                       year=row[1]
                       #add year to institution in dictionary
                       if not year in InstitutionYearsDict[Institution]:
                          InstitutionYearsDict[Institution].append(year)


# Write a report showing the institution name followed by the years for
# which we have that institution's data.
with open("Instyears.txt","w") as instReportFD:
   for inst in (InstitutionYearsDict):
      instReportFD.write(inst)
      for yr in InstitutionYearsDict[inst]:
         instReportFD.write(" "+yr)
      instReportFD.write("\n")

Answer 1

The error is caused by an attempt to write a string containing a U+008A character using the default cp1252 encoding of your system. 该错误是由于尝试使用系统的默认cp1252编码写入包含U + 008A字符的字符串而引起的。 It is trivial to fix, just declare a latin1 encoding (or iso-8859-1) for your output file (because it just outputs the original byte without conversion): 修复起来很简单，只需为输出文件声明一个latin1编码（或iso-8859-1）（因为它只输出原始字节而不进行转换）：

with open(joinedFileOut,'wt', encoding='latin1') as csvWriteFD:

But this will only hide the real problem: where does this 0x8a character come from? 但这只会掩盖真正的问题：这个0x8a字符是从哪里来的？ My advice is to intercept the exception and dump the line where it occurs: 我的建议是拦截异常并转储发生异常的行：

try:
    outputFD.writerow(row)
except UnicodeEncodeError:
    # print row, the name of the file being processed and the line number

It is probably caused by one of the input files not being is-8859-1 encoded but more probably utf8 encoded... 它可能是由于其中一个输入文件未经过is-8859-1编码，但更可能是utf8编码引起的。

Answer 2

Make sure to use the correct encoding of your source and destination files. 确保对源文件和目标文件使用正确的编码。 You open files in three locations: 您在三个位置打开文件：

codes = csv.reader(open('Codes.csv'))
  :    :    :
with open(joinedFileOut,'wt') as csvWriteFD:
    outputFD=csv.writer(csvWriteFD,delimiter=',')
  :    :    :
with open(scivalFile,"rt", encoding="ISO-8859-1") as csvInFD:
    fileFD = csv.reader(csvInFD)

This should look something like: 看起来应该像这样：

# Use the correct encoding.  If you made this file on
# Windows it is likely Windows-1252 (also known as cp1252):
with open('Codes.csv', encoding='cp1252') as f:
    codes = csv.reader(f)
  :    :    :
# The output encoding can be anything you want.  UTF-8
# supports all Unicode characters.  Windows apps tend to like
# the files to start with a UTF-8 BOM if the file is UTF-8,
# so 'utf-8-sig' is an option.
with open(joinedFileOut,'w', encoding='utf-8-sig') as csvWriteFD:
    outputFD=csv.writer(csvWriteFD)
  :    :    :
# This file is probably the cause of your problem and is not ISO-8859-1.
# Maybe UTF-8 instead? 'utf-8-sig' will safely handle and remove a UTF-8 BOM
# if present.
with open(scivalFile,'r', encoding='utf-8-sig') as csvInFD:
    fileFD = csv.reader(csvInFD)

Python CSV操作脚本中的UnicodeEncodeError

问题描述

2 个解决方案

解决方案1
0 2016-08-18 09:09:20

解决方案2
0 已采纳 2016-08-20 18:15:24

Python CSV操作脚本中的UnicodeEncodeError

问题描述

2 个解决方案

解决方案1 0 2016-08-18 09:09:20

解决方案2 0 已采纳 2016-08-20 18:15:24

解决方案1
0 2016-08-18 09:09:20

解决方案2
0 已采纳 2016-08-20 18:15:24