简体   繁体   中英

Python ASCII codec can't encode character error during write to CSV

I'm not entirely sure what I need to do about this error. I assumed that it had to do with needing to add .encode('utf-8'). But I'm not entirely sure if that's what I need to do, nor where I should apply this.

The error is:

line 40, in <module>
writer.writerows(list_of_rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 1
7: ordinal not in range(128)

This is the base of my python script.

import csv
from BeautifulSoup import BeautifulSoup

url = \
'https://dummysite'

response = requests.get(url)

html = response.content

soup = BeautifulSoup(html)

table = soup.find('table', {'class': 'table'})

list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
    text = cell.text.replace('[','').replace(']','')
    list_of_cells.append(text)
list_of_rows.append(list_of_cells)

outfile = open("./test.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Name", "Location"])
writer.writerows(list_of_rows)

Python 2.x CSV library is broken. You have three options. In order of complexity:

  1. Edit: See below Use the fixed library https://github.com/jdunck/python-unicodecsv ( pip install unicodecsv ). Use as a drop-in replacement - Example:

     with open("myfile.csv", 'rb') as my_file: r = unicodecsv.DictReader(my_file, encoding='utf-8')
  2. \n
\n

  1. Read the CSV manual regarding Unicode: https://docs.python.org/2/library/csv.html (See examples at the bottom)

  2. Manually encode each item as UTF-8:

     for cell in row.findAll('td'): text = cell.text.replace('[','').replace(']','') list_of_cells.append(text.encode("utf-8"))

Edit, I found python-unicodecsv is also broken when reading UTF-16 . It complains about any 0x00 bytes.

Instead, use https://github.com/ryanhiebert/backports.csv , which more closely resembles Python 3 implementation and uses io module..

Install:

pip install backports.csv

Usage:

from backports import csv
import io

with io.open(filename, encoding='utf-8') as f:
    r = csv.reader(f):

The issue lies with the csv library in python 2. From the unicodecsv project page

Python 2's csv module doesn't easily deal with unicode strings, leading to the dreaded “'ascii' codec can't encode characters in position …” exception.

If you can, just install unicodecsv

pip install unicodecsv

import unicodecsv

writer = unicodecsv.writer(csvfile)
writer.writerow(row)

除了Alastair的出色建议之外,我发现最简单的选择是使用 python3 而不是 python 2。我的脚本中所需的只是根据 Python3 的语法open语句中的wb更改为简单的w

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM