[英]issues reading csv line by line in python
使用 utf-16 進行編輯似乎讓我更接近正確的方向,但我有包含逗號的 csv 值,例如“一個示例值是描述,它很長,可以包含逗號和引號”
所以用我當前的代碼:
filepath="csv_input/frups.csv"
rows = []
with open(filepath, encoding='utf-16') as f:
for line in f:
print('line=',line)
formatted_line=line.strip().split(",")
print('formatted_line=',formatted_line)
rows.append(formatted_line)
print('')
行格式不正確:
line= "FRUPS" "11111112" "Paahou 11111112, 11111112,11111112" "Bar, Achal" "Iagress" "Unassigned" "Normal" "GaWu , Suaair center will not be able to repair 3 couch part 11111112, 11111112,11111112 . Pleasa to repair .
formatted_line= ['"FRUPS"\t"11111112"\t"Parts not able to repair in Suzhou 11111112', ' 11111112', '11111112"\t"Baaaaaar', ' Acaaaal"\t"In Progress"\t"Unassigned"\t"Normal"\t"Got coaow Wu ', ' Suar cat 11111112', ' 11111112', '11111112. Pleasa to repair .']
line= 11111112
formatted_line= ['11111112']
所以在這個例子中, line
由長空格分隔,但用逗號分隔對於正確逐行讀取數據並不可靠
我正在嘗試在 python 中逐行讀取 csv,但每個解決方案都會導致不同的錯誤。
filepath="csv_input/frups.csv"
data = pd.read_csv(filepath, encoding='utf-16')
for thing in data:
print(thing)
print('')
無法 read_csv 文件,並出現Error tokenizing data. C error: Expected 7 fields in line 16, saw 8
Error tokenizing data. C error: Expected 7 fields in line 16, saw 8
# open file in read mode
with open(filepath, 'r') as read_obj:
# pass the file object to reader() to get the reader object
csv_reader = reader(read_obj)
# Iterate over each row in the csv using reader object
for row in csv_reader:
# row variable is a list that represents a row in csv
print(row)
for row in csv_reader
錯誤, line contains NUL
我試圖弄清楚我們的這些NUL
字符是什么,但嘗試使用代碼進行調查會導致不同的錯誤:
data = open(filepath, 'rb').read()
print(data.find('\x00'))
error: argument should be integer or bytes-like object, not 'str'
with open(filepath,'rb') as f:
contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")
錯誤: TypeError: a bytes-like object is required, not 'str'
似乎我的 csv 有一些奇怪的字符導致 python 出錯。 我可以在 excel 中很好地打開和查看我的 csv,如何逐行讀取我的 csv? 如
row[0]=['col1','col2','col3']
row[1]=['val1','val2','val3']
etc...
您始終可以手動讀取文件以構建這樣的結構
rows = []
with open(filepath) as f:
for line in f:
rows.append(line.strip().split(","))
您在line
和formatted_line
處顯示的提示是:
\t
)作為分隔符所以你應該使用:
使用 csv 模塊:
# open file in read mode with open(filepath, 'r', encoding='utf-16') as read_obj: # pass the file object to reader() to get the reader object csv_reader = reader(read_obj, delimiter='\t') # Iterate over each row in the csv using reader object for row in csv_reader: # row variable is a list that represents a row in csv print(row)
與熊貓:
data = pd.read_csv(filepath, encoding='utf-16', sep='\t') for thing in data: print(thing) print('')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.