从CSV解码-法语和西班牙语特殊字符

Question

I'm encoding my CSV_table from scrapping process like this : 我正在从这样的剪贴过程中编码我的CSV_table：

with open("Raw_table.csv", 'w',encoding="utf-8") as outfile:
   csv_writer = csv.writer(outfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL,)

Usually, when i want to use them i use a csv_parser like this : 通常，当我想使用它们时，我使用csv_parser像这样：

def parse_csv(content, delimiter = ';'):  
  csv_data = []
  for line in content.split('\n'):
    csv_data.append( [x.strip() for x in line.split( delimiter )] ) # strips spaces also
  return csv_data


list_raw=parse_csv(open('Raw_RC.csv','r',encoding="utf-8").read())

It works when i'm scrapping from USA, England website. 当我从美国英格兰网站上报废时，它可以工作。 Here i have to deal with French, Spanish and German things it gives me such error when trying to read from the csv with parse_csv 在这里，我不得不处理法语，西班牙语和德语的问题，当尝试使用parse_csv从csv读取时，它给了我这样的错误

    csv_writer.writerow([k] + v)
ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128)

How can i fix this ? 我怎样才能解决这个问题？

Subsidiary questions : 附属问题：

Should I encode the CSV, scrap the site another way (eg set BeautifoulSoup differently) otherwise when it's german or french ? 我应该对CSV编码，还是用德语或法语，以另一种方式（例如，将BeautifoulSoup设置为其他方式）删除网站？
This encoding problem can be related with all of the \\xa0 i get from scrapping ? 此编码问题可能与我从\\xa0获得的所有\\xa0有关。 I don't think so because i'm able to parse UK,USA cdv whereas there are also full of them. 我不这么认为，因为我可以解析UK，USA CDV，但其中也有很多。

Every bytes of your time you take to solve this is appreciated ! 您花费时间来解决此问题的每一字节都值得赞赏！ :) :)

Answer 1

When working with french/german/spanish character (website written in that language), don't use : encoding='utf-8' but encoding='ISO-8859-1' instead. 使用法语/德语/西班牙语字符（以该语言编写的网站）时，请勿使用： encoding='utf-8'而应使用encoding='ISO-8859-1' 。

So writing : 所以写：

with open("Raw_table.csv", 'w',encoding="ISO-8859-1") as outfile:
   csv_writer = csv.writer(outfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL,)

And reading : 和阅读：

list_raw=parse_csv(open('Raw_RC.csv','r',encoding="ISO-8859-1").read())

The \\xa0 problem is not related. \\ xa0问题不相关。 Indeed, it occurs only in UTF-8. 实际上，它仅在UTF-8中发生。 So my specific french/german typography wasn't related. 因此，我的法语/德语字体与我无关。 To go further on this matter (which wasn't the core of the question) please see the following link suggested by tripleee. 要进一步处理此问题（这不是问题的核心），请参阅由Tripleee建议的以下链接。

从CSV解码-法语和西班牙语特殊字符

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-08-10 16:47:50

从CSV解码-法语和西班牙语特殊字符

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-08-10 16:47:50

解决方案1
1 已采纳 2015-08-10 16:47:50