繁体   English   中英

使用python编写csv时出错,可能是unicode / encode / decode问题

[英]Error writing csv with python, probably unicode/encode/decode issue

我一直在尝试寻找其他地方的答案,但是我不理解解释或解决方案对我的情况不起作用。

因此,在这种情况下:
1.输出字符为中文
2.阅读部分工作正常,只是书写故障
3.我正在使用Python 2.7.13

请帮忙!

顺便说一句,我对python来说还很陌生,所以如果您找到可以通过使用任何更好的实践加以改进的任何内容,请指出来! 我真的很感激!

谢谢!

这是代码:

# -*- coding: utf-8 -*-
import csv
import urllib2
from bs4 import BeautifulSoup
import socket
import httplib
# import sys  <= this did not work
# reload(sys)
# sys.setdefaultencoding('utf-8')

with open('/users/Rachael/Desktop/BDnodes.csv', 'r') as readcsv, 
open("/users/Rachael/Desktop/CheckTitle.csv", 'wb') as writecsv:
    writer = csv.writer(writecsv)
    for row in readcsv.readlines():
        opener = urllib2.build_opener()
        opener.addheaders = [('User-Agent',
                          'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
        urllib2.install_opener(opener)
        openpage = urllib2.urlopen(row).read()
        soup = BeautifulSoup(openpage, "lxml")
        # print "page results:"
        for child in soup.findAll("h3", {"class": "t"}):
            try:
                geturls = child.a.get('href')
                # print urllib2.urlopen(geturls).geturl()
                url_result = urllib2.urlopen(geturls).geturl()
                # print url_result
                try:
                    openitem = urllib2.urlopen(url_result).read()
                    gettitle = BeautifulSoup(openitem, 'lxml')
                    url_title = gettitle.title.text
                except urllib2.HTTPError:
                    url_title = 'passed http error'
                    pass
                except urllib2.URLError:
                    url_title = 'passed url error'
                    pass
                except socket.timeout:
                    url_title = 'passed timeout'
                    pass
                except httplib.BadStatusLine:
                   url_title = 'passed badstatus'
                    pass
                except:
                    url_title = 'unknown'
                    pass
            except urllib2.HTTPError as e:
                pass
            except urllib2.URLError:
                pass
            except socket.timeout:
                pass
            except httplib.BadStatusLine:
                pass
            writer.writerow([url_result, url_title])
            # writer.writerow([url_result, url_title.encode('utf-8')]) did not work either, even tried with 'utf-16'
writecsv.close()

错误是:

C:\Python27\python.exe C:/Users/Rachael/PycharmProjects/untitled1/OpenNGet.py
Traceback (most recent call last):
  File "C:/Users/Rachael/PycharmProjects/untitled1/OpenNGet.py", line 55, in <module>
    writer.writerow([url_result, url_title])
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Process finished with exit code 1

您可以在open函数中传递编码参数。

import codecs
codecs.open("/users/Rachael/Desktop/CheckTitle.csv", 'wb', encoding='utf-8') as writecsv

可能是您原来的解决方案是正确的,但是问题出在'result'变量而不是标题中?

尝试类似

writer.writerow([url_result.encode('utf-8'), url_title.encode('utf-8')])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM