簡體   English   中英

Pandas 如何按日期時間將列移動到不在索引中的日期時間

[英]Pandas how to shift column by datetime into datetime not in index

我想將 pandas 列移動一段時間,並重新索引 dataframe 以適應這種轉變。 取以下dataframe:

df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
                   "Col2": [13, 23, 18, 33, 48],
                   "Col3": [17, 27, 22, 37, 52]},
                  index=pd.date_range("11:00", "13:00", freq="30min"))

我想將Col1移動 15 分鍾,並更新 dataframe 的日期時間索引以允許這些新值。 但是,如果我將Col1移動 15 分鍾,您會看到因為它與索引不對齊,所以整個列只是設置為NaN值:

df["Col1"] = df["Col1"].shift(15, freq="T")
print(df)


                      Col1  Col2  Col3
2021-03-25 11:00:00   NaN    13    17
2021-03-25 11:30:00   NaN    23    27
2021-03-25 12:00:00   NaN    18    22
2021-03-25 12:30:00   NaN    33    37
2021-03-25 13:00:00   NaN    48    52

我希望 dataframe 看起來像這樣:

                     Col1  Col2  Col3
2021-03-25 11:00:00   NaN  13.0  17.0
2021-03-25 11:15:00  10.0   NaN   NaN
2021-03-25 11:30:00   NaN  23.0  27.0
2021-03-25 11:45:00  20.0   NaN   NaN
2021-03-25 12:00:00   NaN  18.0  22.0
2021-03-25 12:15:00  15.0   NaN   NaN
2021-03-25 12:30:00   NaN  33.0  37.0
2021-03-25 12:45:00  30.0   NaN   NaN
2021-03-25 13:00:00   NaN  48.0  52.0
2021-03-25 13:15:00  45.0   NaN   NaN

(我使用以下代碼創建:)

df = pd.DataFrame({"Col1": [float('nan'), 10, float('nan'), 20, float('nan'), 15, float('nan'), 30, float('nan'), 45],
                   "Col2": [13, float('nan'),  23, float('nan'), 18, float('nan'), 33, float('nan'), 48, float('nan')],
                   "Col3": [17, float('nan'), 27, float('nan'), 22, float('nan'), 37, float('nan'), 52, float('nan')]},
                  index=pd.date_range("11:00", "13:15", freq="15min"))

如果對此有任何建議,將不勝感激!

用 concat 檢查

out = pd.concat([df.pop("Col1").shift(15, freq="T"),df],axis=1)
Out[478]: 
                     Col1  Col2  Col3
2021-03-24 11:00:00   NaN  13.0  17.0
2021-03-24 11:15:00  10.0   NaN   NaN
2021-03-24 11:30:00   NaN  23.0  27.0
2021-03-24 11:45:00  20.0   NaN   NaN
2021-03-24 12:00:00   NaN  18.0  22.0
2021-03-24 12:15:00  15.0   NaN   NaN
2021-03-24 12:30:00   NaN  33.0  37.0
2021-03-24 12:45:00  30.0   NaN   NaN
2021-03-24 13:00:00   NaN  48.0  52.0
2021-03-24 13:15:00  45.0   NaN   NaN

BENY 的回答有效,但我發現我的非常大的 dataframe 速度很慢。 因此,我做了以下事情,它對我有用,而且速度更快:

dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="15min")
df = df.reindex(dt_index2)
df["Col1"] = df["Col1"].shift(15, freq="T")
print(df)

                     Col1  Col2  Col3
2021-03-25 11:00:00   NaN  13.0  17.0
2021-03-25 11:15:00  10.0   NaN   NaN
2021-03-25 11:30:00   NaN  23.0  27.0
2021-03-25 11:45:00  20.0   NaN   NaN
2021-03-25 12:00:00   NaN  18.0  22.0
2021-03-25 12:15:00  15.0   NaN   NaN
2021-03-25 12:30:00   NaN  33.0  37.0
2021-03-25 12:45:00  30.0   NaN   NaN
2021-03-25 13:00:00   NaN  48.0  52.0

編輯:

我想做這個任務的原因是因為每一列都需要被它的索引偏移(所以 Col1 偏移 1 秒,Col2 偏移 2 秒等),然后全部加入一列。 為此,與我的結合使用時,BENY 的答案會更好,因為它將 memory 的使用率降至最低。 在這種情況下,您應該執行以下操作:

dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="S")
df2 = pd.DataFrame(columns=["concentration"], index=dt_index2)
df2["concentration"] = df2["concentration"].add(df.pop("Col1").shift(1, freq="S"), fill_value=0)
df2["concentration"] = df2["concentration"].add(df.pop("Col2").shift(2, freq="S"), fill_value=0)
df2["concentration"] = df2["concentration"].add(df.pop("Col3").shift(3, freq="S"), fill_value=0)

使用它可確保您只有一個具有密集索引的列,就像您重新索引其他 dataframe 一樣,您最終會得到 3 列。

另一種選擇是使用resample

df = df.resample("15T").first()

                     Col1  Col2  Col3
2021-03-24 11:00:00  10.0  13.0  17.0
2021-03-24 11:15:00   NaN   NaN   NaN
2021-03-24 11:30:00  20.0  23.0  27.0
2021-03-24 11:45:00   NaN   NaN   NaN
2021-03-24 12:00:00  15.0  18.0  22.0
2021-03-24 12:15:00   NaN   NaN   NaN
2021-03-24 12:30:00  30.0  33.0  37.0
2021-03-24 12:45:00   NaN   NaN   NaN
2021-03-24 13:00:00  45.0  48.0  52.0

然后,您可以簡單地shift Col1:

df.Col1 = df.Col1.shift(1)

                     Col1  Col2  Col3
2021-03-24 11:00:00   NaN  13.0  17.0
2021-03-24 11:15:00  10.0   NaN   NaN
2021-03-24 11:30:00   NaN  23.0  27.0
2021-03-24 11:45:00  20.0   NaN   NaN
2021-03-24 12:00:00   NaN  18.0  22.0
2021-03-24 12:15:00  15.0   NaN   NaN
2021-03-24 12:30:00   NaN  33.0  37.0
2021-03-24 12:45:00  30.0   NaN   NaN
2021-03-24 13:00:00   NaN  48.0  52.0

編輯:這似乎與@Recessive 的回答在速度上相當:

def resampling(df):
    df = df.resample("15T").first()
    df.Col1 = df.Col1.shift(1)

def reindexing(df):
    dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="15min")
    df = df.reindex(dt_index2)
    df["Col1"] = df["Col1"].shift(15, freq="T")

%timeit resampling(df)
1.37 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit reindexing(df)
1.11 ms ± 31.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM