簡體   English   中英

python 無法將 csv 解析為列表( utf-8 bom )

[英]python cant parse csv as list ( utf-8 bom )

我有兩個 csv 文件

rtc_csv_file="csv_migration\\rtc-test.csv"
ads_csv_file="csv_migration\\ads-test.csv"

here is the ads-test.csv file (which is causing issues) https://easyupload.io/bk1krp the file is UTF-8 with BOM is what vscode bottom right corner says when i open the csv.

我正在嘗試編寫 python function 來讀取每一行,並將其轉換為字典 object。

my function works for the first file rtc-test.csv just fine, but for the second file ads-test.csv I get an error UTF-16 stream does not start with BOM when i use utf-16 . 所以我嘗試使用utf-8utf-8-sig但它只在每一行中讀取為帶有逗號分隔值的字符串。 我不能用逗號分割,因為我將有包含逗號的列值。

我的 python 代碼正確讀取 rtc-test.csv 作為值列表。 當 csv 使用帶有 bom 的 utf-8 編碼時,如何在 ads-test.csv 中讀取值列表?

代碼:

rtc_csv_file="csv_migration\\rtc-test.csv"
ads_csv_file="csv_migration\\ads-test.csv"

from csv import reader
import csv

# read in csv, convert to map organized by 'id' as index root parent value
def read_csv_as_map(csv_filename, id_format, encodingVar):
    print('filename: '+csv_filename+', id_format: '+id_format+', encoding: '+encodingVar)
    dict={}
    dict['rows']={}
    try:
        with open(csv_filename, 'r', encoding=encodingVar) as read_obj:
            csv_reader = reader(read_obj, delimiter='\t')
            csv_cols = None
            for row in csv_reader:
                if csv_cols is None:
                    csv_cols = row 
                    dict['csv_cols']=csv_cols
                    print('csv_cols=',csv_cols)
                else:
                    row_id_val = row[csv_cols.index(str(id_format))]
                    print('row_id_val=',row_id_val)
                    dict['rows'][row_id_val] = row
        print('done')
        return dict
    except Exception as e:
        print('err=',e)
        return {}

rtc_dict = read_csv_as_map(rtc_csv_file, 'Id', 'utf-16')
ads_dict = read_csv_as_map(ads_csv_file, 'ID', 'utf-16')

控制台 output:

filename: csv_migration\rtc-test.csv, id_format: Id, encoding: utf-16
csv_cols= ['Summary', 'Status', 'Type', 'Id', '12NC']
row_id_val= 262998
done
filename: csv_migration\ads-test.csv, id_format: ID, encoding: utf-16
err= UTF-16 stream does not start with BOM

如果我嘗試改用utf-16-le ,我會得到一個不同的錯誤'utf-16-le' codec can't decode byte 0x22 in position 0: truncated data

如果我嘗試使用utf-16-be ,我會收到此錯誤: 'utf-16-be' codec can't decode byte 0x22 in position 0: truncated data

為什么我的 python 代碼無法讀取此 csv 文件?

您的 CSV 使用 UTF-8 (默認)而不是 UTF-16 編碼,因此將其作為編碼傳遞:

ads_csv_file="ads-test.csv"

from csv import reader

# read in csv, convert to map organized by 'id' as index root parent value
def read_csv_as_map(csv_filename, id_format, encodingVar):
    print('filename: '+csv_filename+', id_format: '+id_format+', encoding: '+encodingVar)
    dict={}
    dict['rows']={}
    try:
        with open(csv_filename, 'r', encoding=encodingVar) as read_obj:
            csv_reader = reader(read_obj, delimiter='\t')
            csv_cols = None
            for row in csv_reader:
                if csv_cols is None:
                    csv_cols = row
                    dict['csv_cols']=csv_cols
                    print('csv_cols=',csv_cols)
                else:
                    row_id_val = row[csv_cols.index(str(id_format))]
                    print('row_id_val=',row_id_val)
                    dict['rows'][row_id_val] = row
        print('done')
        return dict
    except Exception as e:
        print('err=',e)
        return {}

ads_dict = read_csv_as_map(ads_csv_file, 'ID', 'utf-8')  # <- updated here

這是 CSV 供參考:

Title,State,Work Item Type,ID,12NC
"453560751251 TOOL, SQ-59 CORNER CLAMP","To Do","FRUPS","6034","453560751251"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM