[英]FIlrer csv table to have just 2 columns. Python pandas pd .pd
我得到了 csv 文件,其中包含如下行:
result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
我需要讓它們看起來像這樣:
time value
0 2022-10-24T12:12:35Z 44.61
1 2022-10-24T12:12:40Z 17.33
2 2022-10-24T12:12:45Z 41.20
3 2022-10-24T12:12:51Z 33.49
4 2022-10-24T12:12:56Z 55.68
我的異常檢測代碼需要它,這樣我就不必手動刪除列等。 至少不是全部。 我無法使用與收集瓦數信息的機器配合使用的程序來做到這一點。 我試過了,但它不夠用:
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
df = pd.pivot(df, index = '_time', columns = '_field', values = '_value')
df.interpolate(method='linear') # not neccesary
它給出了這個 output:
0
9 83.908
10 80.342
11 79.178
12 75.621
13 72.826
... ...
73522 10.726
73523 5.241
這是向下投影到 pandas 生態系統中列子集的規范方法。
df = df[['_time', '_value']]
您可以簡單地使用pandas.read_csv
usecols
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv', usecols=["_time", "_value"])
注意:如果您需要讀取 ( .csv
) 的全部數據,並且只有 select 列的子集, Pandas核心開發人員建議您使用pandas.DataFrame.loc
。 否則,通過使用df = df[subset_of_cols]
synthax,當您開始對(新的? )子數據幀執行一些操作時,您將收到警告:
設置復制警告:
試圖在 DataFrame 的切片副本上設置一個值。
嘗試使用 .loc[row_indexer,col_indexer] = value 代替
因此,在您的情況下,您可以使用:
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df.loc[:, ["_time", "_value"]] #instead of df[["_time", "_value"]]
另一種選擇是pandas.DataFrame.copy
,
df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df[["_time", "_value"]].copy()
.read_csv有一個usecols
參數來指定你想要在 DataFrame 中的哪些列。
df = pd.read_csv(f,header=0,usecols=['_time','_value'] )
print(df)
_time _value
0 2022-10-24T12:12:35Z 44.61
1 2022-10-24T12:12:40Z 17.33
2 2022-10-24T12:12:45Z 41.20
3 2022-10-24T12:12:51Z 33.49
4 2022-10-24T12:12:56Z 55.68
5 2022-10-24T12:12:57Z 55.68
6 2022-10-24T12:13:02Z 25.92
7 2022-10-24T12:13:08Z 5.71
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.