FIlrer csv 表只有 2 列。 Python pandas PD.PD

Question

我得到了 csv 文件，其中包含如下行：

result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0

我需要讓它們看起來像這樣：

                   time  value
0  2022-10-24T12:12:35Z  44.61
1  2022-10-24T12:12:40Z  17.33
2  2022-10-24T12:12:45Z  41.20
3  2022-10-24T12:12:51Z  33.49
4  2022-10-24T12:12:56Z  55.68

我的異常檢測代碼需要它，這樣我就不必手動刪除列等。 至少不是全部。 我無法使用與收集瓦數信息的機器配合使用的程序來做到這一點。 我試過了，但它不夠用：

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
df = pd.pivot(df, index = '_time', columns = '_field', values = '_value')
df.interpolate(method='linear') # not neccesary

它給出了這個 output：

            0
9      83.908
10     80.342
11     79.178
12     75.621
13     72.826
...       ...
73522  10.726
73523   5.241

Answer 1

這是向下投影到 pandas 生態系統中列子集的規范方法。

df = df[['_time', '_value']]

Answer 2

您可以簡單地使用pandas.read_csv usecols

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv', usecols=["_time", "_value"])

注意：如果您需要讀取 ( .csv ) 的全部數據，並且只有 select 列的子集， Pandas核心開發人員建議您使用pandas.DataFrame.loc 。 否則，通過使用df = df[subset_of_cols] synthax，當您開始對（新的？ ）子數據幀執行一些操作時，您將收到警告：

設置復制警告：
試圖在 DataFrame 的切片副本上設置一個值。
嘗試使用 .loc[row_indexer,col_indexer] = value 代替

因此，在您的情況下，您可以使用：

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df.loc[:, ["_time", "_value"]] #instead of df[["_time", "_value"]]

另一種選擇是pandas.DataFrame.copy ，

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df[["_time", "_value"]].copy()

Answer 3

.read_csv有一個usecols參數來指定你想要在 DataFrame 中的哪些列。

df = pd.read_csv(f,header=0,usecols=['_time','_value'] )
print(df)

                  _time  _value
0  2022-10-24T12:12:35Z   44.61
1  2022-10-24T12:12:40Z   17.33
2  2022-10-24T12:12:45Z   41.20
3  2022-10-24T12:12:51Z   33.49
4  2022-10-24T12:12:56Z   55.68
5  2022-10-24T12:12:57Z   55.68
6  2022-10-24T12:13:02Z   25.92
7  2022-10-24T12:13:08Z    5.71

FIlrer csv 表只有 2 列。 Python pandas PD.PD

問題描述

3 個解決方案

解決方案1
1 2022-12-06 00:49:50

解決方案2
0 2022-12-06 01:33:43

解決方案3
0 2022-12-06 01:45:21

FIlrer csv 表只有 2 列。 Python pandas PD.PD

問題描述

3 個解決方案

解決方案1 1 2022-12-06 00:49:50

解決方案2 0 2022-12-06 01:33:43

解決方案3 0 2022-12-06 01:45:21

解決方案1
1 2022-12-06 00:49:50

解決方案2
0 2022-12-06 01:33:43

解決方案3
0 2022-12-06 01:45:21