[英]loading csv file using pandas in python
這是我的示例數據:
2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"
2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"
2017-11-27T00:29:37.698-06:00,,"42,00,00,00,3E,51,1B,D7,42,1C,00,00,40"
我嘗試使用pandas加載數據:
data = pd.read_csv("sample.csv",header = None)
我的輸出是:
0 1 2
0 2017-11-27T00:29:37.698-06:00 NaN 42,00,00,00,3E,51,1B,D7,42,1C,00,00,40
1 2017-11-27T00:29:37.698-06:00 NaN 42,00,00,00,3E,51,1B,D7,42,1C,00,00,40
2 2017-11-27T00:29:37.698-06:00 NaN 42,00,00,00,3E,51,1B,D7,42,1C,00,00,40
我想將第二列中的每個數據與第一列分開作為時間戳。
我的預期輸出是:
0 1 2 3 4....
0 2017-11-27T00:29:37.698-06:00 42 00 00 00
1 2017-11-27T00:29:37.698-06:00 42 00 00 00
2 2017-11-27T00:29:37.698-06:00 42 00 00 00
使用正則表達式傳遞sep
參數。 然后,對數據進行一些清理。
df = pd.read_csv(
'file.csv',
sep='"*,', # separator
header=None, # no headers
engine='python', # allows a regex with multiple characters
index_col=[0] # specify timestamp as the index
)
df.iloc[:, 1] = df.iloc[:, 1].str.strip('"').astype(int)
df.iloc[:, -1] = df.iloc[:, -1].str.strip('"').astype(int)
df
1 2 3 4 5 6 7 8 9 10 11 12 \
0
2017-11-27T00:29:37.698-06:00 NaN 42 0 0 0 3E 51 1B D7 42 1C 0
2017-11-27T00:29:37.698-06:00 NaN 42 0 0 0 3E 51 1B D7 42 1C 0
2017-11-27T00:29:37.698-06:00 NaN 42 0 0 0 3E 51 1B D7 42 1C 0
13 14
0
2017-11-27T00:29:37.698-06:00 0 40
2017-11-27T00:29:37.698-06:00 0 40
2017-11-27T00:29:37.698-06:00 0 40
要使用NaNs刪除列,請使用dropna
-
df.dropna(how='all', axis=1, inplace=True)
首先添加參數parse_dates=[0]
以解析第一列到datetime。
然后join
原始split
列2
並刪除第1
列和第2
列,最后使用add 1
rename
所有列:
df = pd.read_csv("sample.csv",header = None, parse_dates=[0])
df = (df.drop([1,2], axis=1)
.join(df[2].str.split(',', expand=True)
.rename(columns = lambda x: x+1))
)
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 2017-11-27 06:29:37.698 42 00 00 00 3E 51 1B D7 42 1C 00 00 40
1 2017-11-27 06:29:37.698 42 00 00 00 3E 51 1B D7 42 1C 00 00 40
2 2017-11-27 06:29:37.698 42 00 00 00 3E 51 1B D7 42 1C 00 00 40
詳情
print (df[2].str.split(',', expand=True))
0 1 2 3 4 5 6 7 8 9 10 11 12
0 42 00 00 00 3E 51 1B D7 42 1C 00 00 40
1 42 00 00 00 3E 51 1B D7 42 1C 00 00 40
2 42 00 00 00 3E 51 1B D7 42 1C 00 00 40
如果需要,您可以執行自己的csv解析器,如:
def read_my_csv(filename):
with open(filename, 'rU') as f:
# build csv reader
reader = csv.reader(f)
# for each row, check for footer
for row in reader:
yield [row[0]] + row[2].split(',')
import csv
import pandas as pd
df = pd.DataFrame(read_my_csv('csvfile.csv'))
print(df)
0 1 2 3 4 5 6 7 8 9 10 \
0 2017-11-27T00:29:37.698-06:00 42 00 00 00 3E 51 1B D7 42 1C
1 2017-11-27T00:29:37.698-06:00 42 00 00 00 3E 51 1B D7 42 1C
2 2017-11-27T00:29:37.698-06:00 42 00 00 00 3E 51 1B D7 42 1C
11 12 13
0 00 00 40
1 00 00 40
2 00 00 40
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.