簡體   English   中英

FIlrer csv 表只有 2 列。 Python pandas PD.PD

[英]FIlrer csv table to have just 2 columns. Python pandas pd .pd

我得到了 csv 文件,其中包含如下行:

result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0

我需要讓它們看起來像這樣:

                   time  value
0  2022-10-24T12:12:35Z  44.61
1  2022-10-24T12:12:40Z  17.33
2  2022-10-24T12:12:45Z  41.20
3  2022-10-24T12:12:51Z  33.49
4  2022-10-24T12:12:56Z  55.68

我的異常檢測代碼需要它,這樣我就不必手動刪除列等。 至少不是全部。 我無法使用與收集瓦數信息的機器配合使用的程序來做到這一點。 我試過了,但它不夠用:

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
df = pd.pivot(df, index = '_time', columns = '_field', values = '_value')
df.interpolate(method='linear') # not neccesary

它給出了這個 output:

            0
9      83.908
10     80.342
11     79.178
12     75.621
13     72.826
...       ...
73522  10.726
73523   5.241

這是向下投影到 pandas 生態系統中列子集的規范方法。

df = df[['_time', '_value']]

您可以簡單地使用pandas.read_csv usecols

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv', usecols=["_time", "_value"])

注意:如果您需要讀取 ( .csv ) 的全部數據,並且只有 select 列的子集, Pandas核心開發人員建議您使用pandas.DataFrame.loc 否則,通過使用df = df[subset_of_cols] synthax,當您開始對(新的? )子數據幀執行一些操作時,您將收到警告:

設置復制警告:
試圖在 DataFrame 的切片副本上設置一個值。
嘗試使用 .loc[row_indexer,col_indexer] = value 代替

因此,在您的情況下,您可以使用:

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df.loc[:, ["_time", "_value"]] #instead of df[["_time", "_value"]]

另一種選擇是pandas.DataFrame.copy

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df[["_time", "_value"]].copy()

.read_csv有一個usecols參數來指定你想要在 DataFrame 中的哪些列。

df = pd.read_csv(f,header=0,usecols=['_time','_value'] )
print(df)

                  _time  _value
0  2022-10-24T12:12:35Z   44.61
1  2022-10-24T12:12:40Z   17.33
2  2022-10-24T12:12:45Z   41.20
3  2022-10-24T12:12:51Z   33.49
4  2022-10-24T12:12:56Z   55.68
5  2022-10-24T12:12:57Z   55.68
6  2022-10-24T12:13:02Z   25.92
7  2022-10-24T12:13:08Z    5.71

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM