在python中逐行讀取csv的問題

Question

使用 utf-16 進行編輯似乎讓我更接近正確的方向，但我有包含逗號的 csv 值，例如“一個示例值是描述，它很長，可以包含逗號和引號”

所以用我當前的代碼：

filepath="csv_input/frups.csv"

rows = []
with open(filepath, encoding='utf-16') as f:
    for line in f:
        print('line=',line)
        formatted_line=line.strip().split(",")
        print('formatted_line=',formatted_line)
        rows.append(formatted_line)
        print('')

行格式不正確：


line= "FRUPS"   "11111112"        "Paahou 11111112, 11111112,11111112"    "Bar, Achal"      "Iagress"   "Unassigned"    "Normal"        "GaWu , Suaair center will not be able to repair 3 couch part 11111112, 11111112,11111112 . Pleasa to repair .

formatted_line= ['"FRUPS"\t"11111112"\t"Parts not able to repair in Suzhou 11111112', ' 11111112', '11111112"\t"Baaaaaar', ' Acaaaal"\t"In Progress"\t"Unassigned"\t"Normal"\t"Got coaow Wu ', ' Suar cat 11111112', ' 11111112', '11111112. Pleasa to repair .']

line= 11111112

formatted_line= ['11111112']

所以在這個例子中， line由長空格分隔，但用逗號分隔對於正確逐行讀取數據並不可靠

我正在嘗試在 python 中逐行讀取 csv，但每個解決方案都會導致不同的錯誤。

使用熊貓：

filepath="csv_input/frups.csv"
data = pd.read_csv(filepath, encoding='utf-16')
for thing in data:
    print(thing)
    print('')

無法 read_csv 文件，並出現Error tokenizing data. C error: Expected 7 fields in line 16, saw 8 Error tokenizing data. C error: Expected 7 fields in line 16, saw 8

使用 csv_reader

# open file in read mode
with open(filepath, 'r') as read_obj:
    # pass the file object to reader() to get the reader object
    csv_reader = reader(read_obj)
    # Iterate over each row in the csv using reader object
    for row in csv_reader:
        # row variable is a list that represents a row in csv
        print(row)

for row in csv_reader錯誤， line contains NUL

我試圖弄清楚我們的這些NUL字符是什么，但嘗試使用代碼進行調查會導致不同的錯誤：

data = open(filepath, 'rb').read()
print(data.find('\x00'))

error: argument should be integer or bytes-like object, not 'str'

另一種嘗試刪除某些字符的讀取解決方案


with open(filepath,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

錯誤： TypeError: a bytes-like object is required, not 'str'

似乎我的 csv 有一些奇怪的字符導致 python 出錯。 我可以在 excel 中很好地打開和查看我的 csv，如何逐行讀取我的 csv？ 如

row[0]=['col1','col2','col3']
row[1]=['val1','val2','val3']
etc...

Answer 1

您始終可以手動讀取文件以構建這樣的結構

rows = []
with open(filepath) as f:
    for line in f:
        rows.append(line.strip().split(","))

Answer 2

您在line和formatted_line處顯示的提示是：

你的文件是 utf-16 編碼的
它使用制表符（ \t ）作為分隔符

所以你應該使用：

使用 csv 模塊：

 # open file in read mode with open(filepath, 'r', encoding='utf-16') as read_obj: # pass the file object to reader() to get the reader object csv_reader = reader(read_obj, delimiter='\t') # Iterate over each row in the csv using reader object for row in csv_reader: # row variable is a list that represents a row in csv print(row)

與熊貓：

 data = pd.read_csv(filepath, encoding='utf-16', sep='\t') for thing in data: print(thing) print('')

在python中逐行讀取csv的問題

問題描述

2 個解決方案

解決方案1
0 2022-07-21 19:10:03

解決方案2
0 已采納 2022-07-21 19:40:30

在python中逐行讀取csv的問題

問題描述

2 個解決方案

解決方案1 0 2022-07-21 19:10:03

解決方案2 0 已采納 2022-07-21 19:40:30

解決方案1
0 2022-07-21 19:10:03

解決方案2
0 已采納 2022-07-21 19:40:30