简体   繁体   中英

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2730' in position 1: ordinal not in range(128)

Any idea how to fix this?

import csv
import re
import time
import urllib2
from urlparse import urljoin
from bs4 import BeautifulSoup

BASE_URL = 'http://omaha.craigslist.org/sys/'
URL = 'http://omaha.craigslist.org/sya/'
FILENAME = '/Users/mona/python/craigstvs.txt'

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
soup = BeautifulSoup(opener.open(URL))

with open(FILENAME, 'a') as f:
    writer = csv.writer(f, delimiter=';')
    for link in soup.find_all('a', class_=re.compile("hdrlnk")):
        timeset = time.strftime("%m-%d %H:%M")

        item_url = urljoin(BASE_URL, link['href'])
        item_soup = BeautifulSoup(opener.open(item_url))

        # do smth with the item_soup? or why did you need to follow this link?

        writer.writerow([timeset, link.text, item_url])

as an experience i have to say that csv module doesn't support unicode totally but you may find opening file in this way useful

import codecs
...
codecs.open('file.csv', 'r', 'UTF-8')

or may want to handle it yourself instead of using csv module

You just need to encode the text:

link.text.encode("utf-8")

Also you can use requests instead of urllib2:

import requests
BASE_URL = 'http://omaha.craigslist.org/sys/'
URL = 'http://omaha.craigslist.org/sya/'
FILENAME = 'craigstvs.txt'
soup = BeautifulSoup(requests.get(URL).content)
with open(FILENAME, 'a') as f:
    writer = csv.writer(f, delimiter=';')
    for link in soup.find_all('a', class_=re.compile("hdrlnk")):
        timeset = time.strftime("%m-%d %H:%M")
        item_url = urljoin(BASE_URL, link['href'])
        item_soup = BeautifulSoup(requests.get(item_url).content)
        # do smth with the item_soup? or why did you need to follow this link?
        writer.writerow([timeset, link.text.encode("utf-8"), item_url])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM