![](/img/trans.png)
[英]Pandas: Read a CSV of timeseries data with 'column' header as row element
[英]Pandas: CSV header and data row size mismatch
是否可以指示Pandas忽略超過標題大小的列?
import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123\n")
df = pandas.read_csv('test.csv')
print(df)
給出答案:
datetime A
0 2018-10-09 18:00:07 123
但是,加載帶有更多數據列(在標頭中定義)的CSV文件:
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
df = pandas.read_csv('test.csv')
print(df)
收益:
datetime A
2018-10-09 18:00:07 123 ABC XYZ
熊貓將標題移到數據的最右邊。
我需要不同的行為。 我希望熊貓忽略數據頭之外的數據行。
注意 :我無法枚舉列,因為這是一個通用的用例。 由於某些與我的代碼無關的原因,有時會有更多預期的數據。 我想忽略多余的數據。
熊貓似乎意識到與實際標題相比,列太多了,並假設前兩個(數據)列是(多)索引。
在read_csv
使用usecols
參數指定要讀取的數據列:
import pandas
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
df = pandas.read_csv('test.csv', usecols=[0,1])
print(df)
產量
datetime A
0 2018-10-09 18:00:07 123
現在,代碼顯示了問題的答案。
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A\n")
csv_file.write("2018-10-09 18:00:07, 123, ABC, XYZ\n")
with open("test.csv") as csv_file:
for i, line in enumerate(csv_file):
if i == 0:
headerCount = line.count(",") + 1
colCount = headerCount
elif i == 1:
dataCount = line.count(",") + 1
elif i > 1:
break
if (headerCount < dataCount):
print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
colCount=headerCount
df = pandas.read_csv('test.csv', usecols=range(colCount))
print(df)
生產:
Warning: Header and data size mismatch. Columns beyond header size will be removed.
datetime A
0 2018-10-09 18:00:07 123
為了使問題更完整,這是完成技巧的代碼:
with open('test.csv', mode='w') as csv_file:
csv_file.write("datetime,A, B, C\n")
csv_file.write("2018-10-09 18:00:07, 123\n")
with open("test.csv") as csv_file:
for i, line in enumerate(csv_file):
if i == 0:
headerCount = line.count(",") + 2
elif i == 1:
dataCount = line.count(",") + 2
if (headerCount != dataCount):
print("Warning: Header and data size mismatch. Columns beyond header size will be removed.")
elif i > 1:
break
df = pandas.read_csv('test.csv', usecols=range(dataCount-1))
print(df)
給出正確的熊貓對象。
Warning: Header and data size mismatch. Columns beyond header size will be removed.
datetime A
0 2018-10-09 18:00:07 123
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.