[英]Pandas how to shift column by datetime into datetime not in index
我想將 pandas 列移動一段時間,並重新索引 dataframe 以適應這種轉變。 取以下dataframe:
df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
"Col2": [13, 23, 18, 33, 48],
"Col3": [17, 27, 22, 37, 52]},
index=pd.date_range("11:00", "13:00", freq="30min"))
我想將Col1
移動 15 分鍾,並更新 dataframe 的日期時間索引以允許這些新值。 但是,如果我將Col1
移動 15 分鍾,您會看到因為它與索引不對齊,所以整個列只是設置為NaN
值:
df["Col1"] = df["Col1"].shift(15, freq="T")
print(df)
Col1 Col2 Col3
2021-03-25 11:00:00 NaN 13 17
2021-03-25 11:30:00 NaN 23 27
2021-03-25 12:00:00 NaN 18 22
2021-03-25 12:30:00 NaN 33 37
2021-03-25 13:00:00 NaN 48 52
我希望 dataframe 看起來像這樣:
Col1 Col2 Col3
2021-03-25 11:00:00 NaN 13.0 17.0
2021-03-25 11:15:00 10.0 NaN NaN
2021-03-25 11:30:00 NaN 23.0 27.0
2021-03-25 11:45:00 20.0 NaN NaN
2021-03-25 12:00:00 NaN 18.0 22.0
2021-03-25 12:15:00 15.0 NaN NaN
2021-03-25 12:30:00 NaN 33.0 37.0
2021-03-25 12:45:00 30.0 NaN NaN
2021-03-25 13:00:00 NaN 48.0 52.0
2021-03-25 13:15:00 45.0 NaN NaN
(我使用以下代碼創建:)
df = pd.DataFrame({"Col1": [float('nan'), 10, float('nan'), 20, float('nan'), 15, float('nan'), 30, float('nan'), 45],
"Col2": [13, float('nan'), 23, float('nan'), 18, float('nan'), 33, float('nan'), 48, float('nan')],
"Col3": [17, float('nan'), 27, float('nan'), 22, float('nan'), 37, float('nan'), 52, float('nan')]},
index=pd.date_range("11:00", "13:15", freq="15min"))
如果對此有任何建議,將不勝感激!
用 concat 檢查
out = pd.concat([df.pop("Col1").shift(15, freq="T"),df],axis=1)
Out[478]:
Col1 Col2 Col3
2021-03-24 11:00:00 NaN 13.0 17.0
2021-03-24 11:15:00 10.0 NaN NaN
2021-03-24 11:30:00 NaN 23.0 27.0
2021-03-24 11:45:00 20.0 NaN NaN
2021-03-24 12:00:00 NaN 18.0 22.0
2021-03-24 12:15:00 15.0 NaN NaN
2021-03-24 12:30:00 NaN 33.0 37.0
2021-03-24 12:45:00 30.0 NaN NaN
2021-03-24 13:00:00 NaN 48.0 52.0
2021-03-24 13:15:00 45.0 NaN NaN
BENY 的回答有效,但我發現我的非常大的 dataframe 速度很慢。 因此,我做了以下事情,它對我有用,而且速度更快:
dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="15min")
df = df.reindex(dt_index2)
df["Col1"] = df["Col1"].shift(15, freq="T")
print(df)
Col1 Col2 Col3
2021-03-25 11:00:00 NaN 13.0 17.0
2021-03-25 11:15:00 10.0 NaN NaN
2021-03-25 11:30:00 NaN 23.0 27.0
2021-03-25 11:45:00 20.0 NaN NaN
2021-03-25 12:00:00 NaN 18.0 22.0
2021-03-25 12:15:00 15.0 NaN NaN
2021-03-25 12:30:00 NaN 33.0 37.0
2021-03-25 12:45:00 30.0 NaN NaN
2021-03-25 13:00:00 NaN 48.0 52.0
編輯:
我想做這個任務的原因是因為每一列都需要被它的索引偏移(所以 Col1 偏移 1 秒,Col2 偏移 2 秒等),然后全部加入一列。 為此,與我的結合使用時,BENY 的答案會更好,因為它將 memory 的使用率降至最低。 在這種情況下,您應該執行以下操作:
dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="S")
df2 = pd.DataFrame(columns=["concentration"], index=dt_index2)
df2["concentration"] = df2["concentration"].add(df.pop("Col1").shift(1, freq="S"), fill_value=0)
df2["concentration"] = df2["concentration"].add(df.pop("Col2").shift(2, freq="S"), fill_value=0)
df2["concentration"] = df2["concentration"].add(df.pop("Col3").shift(3, freq="S"), fill_value=0)
使用它可確保您只有一個具有密集索引的列,就像您重新索引其他 dataframe 一樣,您最終會得到 3 列。
另一種選擇是使用resample
:
df = df.resample("15T").first()
Col1 Col2 Col3
2021-03-24 11:00:00 10.0 13.0 17.0
2021-03-24 11:15:00 NaN NaN NaN
2021-03-24 11:30:00 20.0 23.0 27.0
2021-03-24 11:45:00 NaN NaN NaN
2021-03-24 12:00:00 15.0 18.0 22.0
2021-03-24 12:15:00 NaN NaN NaN
2021-03-24 12:30:00 30.0 33.0 37.0
2021-03-24 12:45:00 NaN NaN NaN
2021-03-24 13:00:00 45.0 48.0 52.0
然后,您可以簡單地shift
Col1:
df.Col1 = df.Col1.shift(1)
Col1 Col2 Col3
2021-03-24 11:00:00 NaN 13.0 17.0
2021-03-24 11:15:00 10.0 NaN NaN
2021-03-24 11:30:00 NaN 23.0 27.0
2021-03-24 11:45:00 20.0 NaN NaN
2021-03-24 12:00:00 NaN 18.0 22.0
2021-03-24 12:15:00 15.0 NaN NaN
2021-03-24 12:30:00 NaN 33.0 37.0
2021-03-24 12:45:00 30.0 NaN NaN
2021-03-24 13:00:00 NaN 48.0 52.0
編輯:這似乎與@Recessive 的回答在速度上相當:
def resampling(df):
df = df.resample("15T").first()
df.Col1 = df.Col1.shift(1)
def reindexing(df):
dt_index2 = pd.date_range(df.index[0], df.index[-1], freq="15min")
df = df.reindex(dt_index2)
df["Col1"] = df["Col1"].shift(15, freq="T")
%timeit resampling(df)
1.37 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit reindexing(df)
1.11 ms ± 31.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.