简体   繁体   English

使用python导出csv文件时如何将cp1252转换为UTF-8

[英]How to convert cp1252 to UTF-8 when export csv file using python

I have Unicode error when I tried to export the CSV file (web-scraping, I'm using Beautifulsoup and imported both CSV and Beautifulsoup). 尝试导出CSV文件时(网络抓取,我使用Beautifulsoup并导入CSV和Beautifulsoup)时出现Unicode错误。 The code is used by Mac Linux which quite supports the UTF-8 but I'm using Windows. 该代码由完全支持UTF-8的Mac Linux使用,但我使用的是Windows。 The error shows as 错误显示为

> UnicodeEncodeError Traceback (most recent call last) in () 71
> 'ranking_title': ranking_title, ---> 72 'ranking_category':
> ranking_category}) 73
> 
> ~\Anaconda3\lib\csv.py in writerow(self, rowdict) 154 def
> writerow(self, rowdict): --> 155 return
> self.writer.writerow(self._dict_to_list(rowdict)) 156
> 
> ~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final) 18
> def encode(self, input, final=False): ---> 19 return
> codecs.charmap_encode(input,self.errors,encoding_table)[0] 20
> 
> UnicodeEncodeError: 'charmap' codec can't encode characters in
> position 299-309: character maps to

The original code that works for Mac is: 适用于Mac的原始代码是:

def get_page(url):
    request = urllib.request.Request(url)
    response = urllib.request.urlopen(request)
    mainpage = response.read().decode('utf8')
    return mainpage

I tried decode the cp1252 and encode the UTF-8 at the beginning of the worksheet: 我尝试解码cp1252并在工作表的开头对UTF-8进行编码:

def get_page(url):
    request = urllib.request.Request(url)
    response = urllib.request.urlopen(request)
    mainpage = response.read().decode('cp1252').encode('utf8')
    return mainpage

But it doesn't work.Please help. 但这不起作用,请帮忙。

The UnicodeEncodeError you are facing occurs when you write the data to the CSV output file. 当您将数据写入CSV输出文件时,会遇到您遇到的UnicodeEncodeError As the error message tells us, Python uses a "charmap" codec which doesn't support the characters contained in your data. 错误消息告诉我们,Python使用了“字符映射”编解码器,该编解码器不支持数据中包含的字符。 This usually happens when you open a file without specifying the encoding parameter on a Windows machine. 当您在Windows计算机上未指定编码参数的情况下open文件时,通常会发生这种情况。

In the attached code document (comment link), snippet no. 在随附的代码文档(注释链接)中,代码段为。 10, we can see that this is the case. 10,我们可以看到是这种情况。 You wrote: 你写了:

with open('wongnai.csv', 'w', newline='') as record:
    fieldnames = ...

In this case, Python uses a platform-dependent default encoding, which is usually some 8-bit encoding on Windows machines. 在这种情况下,Python使用依赖于平台的默认编码,在Windows机器上通常是一些8位编码。 Specify a codec that supports all of Unicode, and writing the file should succeed: 指定一个支持所有Unicode的编解码器,并且写入文件应该成功:

with open('wongnai.csv', 'w', newline='', encoding='utf16') as record:
    fieldnames = ...

You can also use "utf8" or "utf32" instead of "utf16", of course. 当然,您也可以使用“ utf8”或“ utf32”代替“ utf16”。 UTF-8 is very popular for saving files in Unix environments and on the Internet, but if you are planning to open the CSV file with Excel later on, you might face some trouble to get the application to display the data properly. UTF-8在Unix环境和Internet中用于保存文件非常流行,但是,如果您打算稍后使用Excel打开CSV文件,则可能会遇到麻烦,无法使应用程序正确显示数据。 A more Windows-proof (but technically non-standard) solution is to use "utf-8-sig", which adds some semi-magic character to the beginning of the file for helping Windows programs understand that it's UTF-8. 一种更适合Windows的(但技术上不是标准的)解决方案是使用“ utf-8-sig”,该文件的开头添加了一些半魔术字符,以帮助Windows程序了解它是UTF-8。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 以 utf-8 或 cp1252 格式导入 .csv 文件 - Python importing .csv files in utf-8 or cp1252 如何使用python将具有cp1252字符的unicode字符串转换为UTF-8? - How do I convert unicode string with cp1252 characters into UTF-8 with Python? 使用Python将CP1252文本文件转换为Ascii以进行电子邮件导出 - Convert CP1252 text file to Ascii for email export using Python (Python) Beautifull soup 和编码 (utf-8, cp1252,ascii…) - (Python) Beautifull soup and encoding (utf-8, cp1252,ascii…) 将 python 3.7 默认编码从 cp1252 更改为 cp65001 aka UTF-8 - Change python 3.7 default encoding from cp1252 to cp65001 aka UTF-8 csv.writer 编码“utf-8”,但读取编码“cp1252” - csv.writer encoding 'utf-8', but reading encoding 'cp1252' 与cp1252 consoless上utf-8中的进程通信 - communicate with a process in utf-8 on a cp1252 consoless 将Python 3 Spyder控制台中的Windows 8代码页从cp1252更改为utf-8 - Change Windows 8 codepage in Python 3 Spyder console from cp1252 to utf-8 如何可靠地猜测 MacRoman、CP1252、Latin1、UTF-8 和 ASCII 之间的编码 - How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII Python:来自多个CSV的多个数据帧,将cp1252编码为utf8 - Python: Multiple dataframes from multiple CSV, encoding cp1252 to utf8
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM