[英]Decoding from CSV - French and Spanish special characters
I'm encoding my CSV_table from scrapping process like this : 我正在从这样的剪贴过程中编码我的CSV_table:
with open("Raw_table.csv", 'w',encoding="utf-8") as outfile:
csv_writer = csv.writer(outfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL,)
Usually, when i want to use them i use a csv_parser like this : 通常,当我想使用它们时,我使用csv_parser像这样:
def parse_csv(content, delimiter = ';'):
csv_data = []
for line in content.split('\n'):
csv_data.append( [x.strip() for x in line.split( delimiter )] ) # strips spaces also
return csv_data
list_raw=parse_csv(open('Raw_RC.csv','r',encoding="utf-8").read())
It works when i'm scrapping from USA, England website. 当我从美国英格兰网站上报废时,它可以工作。 Here i have to deal with French, Spanish and German things it gives me such error when trying to read from the csv with parse_csv
在这里,我不得不处理法语,西班牙语和德语的问题,当尝试使用parse_csv
从csv读取时,它给了我这样的错误
csv_writer.writerow([k] + v)
ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128)
How can i fix this ? 我怎样才能解决这个问题 ?
Subsidiary questions : 附属问题:
\\xa0
i get from scrapping ? 此编码问题可能与我从\\xa0
获得的所有\\xa0
有关。 I don't think so because i'm able to parse UK,USA cdv whereas there are also full of them. 我不这么认为,因为我可以解析UK,USA CDV,但其中也有很多。 Every bytes of your time you take to solve this is appreciated ! 您花费时间来解决此问题的每一字节都值得赞赏! :) :)
When working with french/german/spanish character (website written in that language), don't use : encoding='utf-8'
but encoding='ISO-8859-1'
instead. 使用法语/德语/西班牙语字符(以该语言编写的网站)时,请勿使用: encoding='utf-8'
而应使用encoding='ISO-8859-1'
。
So writing : 所以写:
with open("Raw_table.csv", 'w',encoding="ISO-8859-1") as outfile:
csv_writer = csv.writer(outfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL,)
And reading : 和阅读:
list_raw=parse_csv(open('Raw_RC.csv','r',encoding="ISO-8859-1").read())
The \\xa0 problem is not related. \\ xa0问题不相关。 Indeed, it occurs only in UTF-8. 实际上,它仅在UTF-8中发生。 So my specific french/german typography wasn't related. 因此,我的法语/德语字体与我无关。 To go further on this matter (which wasn't the core of the question) please see the following link suggested by tripleee. 要进一步处理此问题(这不是问题的核心),请参阅由Tripleee建议的以下链接 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.