[英]encoding issue when reading CSV file with python
I have hit a road block when trying to read a CSV file with python. 尝试使用python读取CSV文件时遇到障碍。
UPDATE: if you want to just skip the character or error you can open the file like this: 更新:如果您只想跳过字符或错误,可以打开文件,如下所示:
with open(os.path.join(directory, file), 'r', encoding="utf-8", errors="ignore") as data_file:
So far I have tried. 到目前为止,我已经尝试过了。
for directory, subdirectories, files in os.walk(root_dir):
for file in files:
with open(os.path.join(directory, file), 'r') as data_file:
reader = csv.reader(data_file)
for row in reader:
print (row)
the error I am getting is: 我得到的错误是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>
I have Tried 我努力了
with open(os.path.join(directory, file), 'r', encoding="UTF-8") as data_file:
Error: 错误:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 223: character maps to <undefined>
Now if I just print the data_file it says they are cp1252 encoded but if I try 现在,如果我只打印data_file,它说它们是cp1252编码的,但是如果我尝试
with open(os.path.join(directory, file), 'r', encoding="cp1252") as data_file:
The error I get is: 我得到的错误是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>
I also tried the recommended package. 我也尝试了推荐的套餐。
The error I get is: 我得到的错误是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>
The line I am trying to parse is: 我要解析的行是:
2015-11-28 22:23:58,670805374291832832,479174464,"MarkCrawford15","RT @WhatTheFFacts: The tallest man in the world was Robert Pershing Wadlow of Alton, Illinois. He was slighty over 8 feet 11 inches tall.","None
any thoughts or help is appreciated. 任何想法或帮助表示赞赏。
I would use csvkit , that uses automatic detection of apposite encoding and decoding. 我会使用csvkit ,它使用自动检测适当的编码和解码。 eg
例如
import csvkit
reader = csvkit.reader(data_file)
As disscussed in the chat- solution is- 正如聊天解决方案中所讨论的那样-
for directory, subdirectories, files in os.walk(root_dir):
for file in files:
with open(os.path.join(directory, file), 'r', encoding="utf-8") as data_file:
reader = csv.reader(data_file)
for row in reader:
data = [i.encode('ascii', 'ignore').decode('ascii') for i in row]
print (data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.