簡體   English   中英

在python中逐行讀取csv的問題

[英]issues reading csv line by line in python

使用 utf-16 進行編輯似乎讓我更接近正確的方向,但我有包含逗號的 csv 值,例如“一個示例值是描述,它很長,可以包含逗號和引號”

所以用我當前的代碼:

filepath="csv_input/frups.csv"

rows = []
with open(filepath, encoding='utf-16') as f:
    for line in f:
        print('line=',line)
        formatted_line=line.strip().split(",")
        print('formatted_line=',formatted_line)
        rows.append(formatted_line)
        print('')

行格式不正確:


line= "FRUPS"   "11111112"        "Paahou 11111112, 11111112,11111112"    "Bar, Achal"      "Iagress"   "Unassigned"    "Normal"        "GaWu , Suaair center will not be able to repair 3 couch part 11111112, 11111112,11111112 . Pleasa to repair .

formatted_line= ['"FRUPS"\t"11111112"\t"Parts not able to repair in Suzhou 11111112', ' 11111112', '11111112"\t"Baaaaaar', ' Acaaaal"\t"In Progress"\t"Unassigned"\t"Normal"\t"Got coaow Wu ', ' Suar cat 11111112', ' 11111112', '11111112. Pleasa to repair .']

line= 11111112

formatted_line= ['11111112']

所以在這個例子中, line由長空格分隔,但用逗號分隔對於正確逐行讀取數據並不可靠


我正在嘗試在 python 中逐行讀取 csv,但每個解決方案都會導致不同的錯誤。

  1. 使用熊貓:
filepath="csv_input/frups.csv"
data = pd.read_csv(filepath, encoding='utf-16')
for thing in data:
    print(thing)
    print('')

無法 read_csv 文件,並出現Error tokenizing data. C error: Expected 7 fields in line 16, saw 8 Error tokenizing data. C error: Expected 7 fields in line 16, saw 8

  1. 使用 csv_reader
# open file in read mode
with open(filepath, 'r') as read_obj:
    # pass the file object to reader() to get the reader object
    csv_reader = reader(read_obj)
    # Iterate over each row in the csv using reader object
    for row in csv_reader:
        # row variable is a list that represents a row in csv
        print(row)

for row in csv_reader錯誤, line contains NUL

我試圖弄清楚我們的這些NUL字符是什么,但嘗試使用代碼進行調查會導致不同的錯誤:

data = open(filepath, 'rb').read()
print(data.find('\x00'))

error: argument should be integer or bytes-like object, not 'str'
  1. 另一種嘗試刪除某些字符的讀取解決方案

with open(filepath,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

錯誤: TypeError: a bytes-like object is required, not 'str'

似乎我的 csv 有一些奇怪的字符導致 python 出錯。 我可以在 excel 中很好地打開和查看我的 csv,如何逐行讀取我的 csv?

row[0]=['col1','col2','col3']
row[1]=['val1','val2','val3']
etc...

您始終可以手動讀取文件以構建這樣的結構

rows = []
with open(filepath) as f:
    for line in f:
        rows.append(line.strip().split(","))

您在lineformatted_line處顯示的提示是:

  • 你的文件是 utf-16 編碼的
  • 它使用制表符( \t )作為分隔符

所以你應該使用:

  1. 使用 csv 模塊:

     # open file in read mode with open(filepath, 'r', encoding='utf-16') as read_obj: # pass the file object to reader() to get the reader object csv_reader = reader(read_obj, delimiter='\t') # Iterate over each row in the csv using reader object for row in csv_reader: # row variable is a list that represents a row in csv print(row)
  2. 與熊貓:

     data = pd.read_csv(filepath, encoding='utf-16', sep='\t') for thing in data: print(thing) print('')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM