簡體   English   中英

使用python讀取CSV文件時的編碼問題

[英]encoding issue when reading CSV file with python

嘗試使用python讀取CSV文件時遇到障礙。

更新:如果您只想跳過字符或錯誤,可以打開文件,如下所示:

with open(os.path.join(directory, file), 'r', encoding="utf-8", errors="ignore") as data_file:

到目前為止,我已經嘗試過了。

for directory, subdirectories, files in os.walk(root_dir):
    for file in files:
        with open(os.path.join(directory, file), 'r') as data_file:
            reader = csv.reader(data_file)
            for row in reader:
                print (row)

我得到的錯誤是:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>

我努力了

with open(os.path.join(directory, file), 'r', encoding="UTF-8") as data_file:

錯誤:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 223: character maps to <undefined>

現在,如果我只打印data_file,它說它們是cp1252編碼的,但是如果我嘗試

with open(os.path.join(directory, file), 'r', encoding="cp1252") as data_file:

我得到的錯誤是:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>

我也嘗試了推薦的套餐。

我得到的錯誤是:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>

我要解析的行是:

2015-11-28 22:23:58,670805374291832832,479174464,"MarkCrawford15","RT @WhatTheFFacts: The tallest man in the world was Robert Pershing Wadlow of Alton, Illinois. He was slighty over 8 feet 11 inches tall.","None

任何想法或幫助表示贊賞。

我會使用csvkit ,它使用自動檢測適當的編碼和解碼。 例如

import csvkit
reader = csvkit.reader(data_file)

正如聊天解決方案中所討論的那樣-

for directory, subdirectories, files in os.walk(root_dir): 
    for file in files: 
        with open(os.path.join(directory, file), 'r', encoding="utf-8") as data_file: 
            reader = csv.reader(data_file) 
            for row in reader: 
                data = [i.encode('ascii', 'ignore').decode('ascii') for i in row] 
                print (data)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM