[英]Calculate the actual duration of a pandas rolling offset window
Pandas有一個rolling()
function對Series的windows和DataFrame對象進行計算。 如果索引是日期時間(或者您使用on
參數引用日期時間列),則可以在偏移量(例如 2 秒或 7 天)上執行rolling()
。
我想計算每個 window 的實際持續時間,而不是偏移量。 我能想到的最好的方法是復制時間戳列,將一個設置為索引,然后使用rolling()
獲取最小值和最大值。 但是,調用rolling()
后新的 Timestamp 列被刪除。
import pandas as pd
df = pd.DataFrame({'B': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'Tm': [pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:00:02'),
pd.Timestamp('20130101 09:00:03'),
pd.Timestamp('20130101 09:00:05'),
pd.Timestamp('20130101 09:00:06'),
pd.Timestamp('20130101 09:00:10'),
pd.Timestamp('20130101 09:00:12'),
pd.Timestamp('20130101 09:00:16'),
pd.Timestamp('20130101 09:00:19'),
pd.Timestamp('20130101 09:00:20')]})
df['t'] = df['Tm']
print(df)
max_times = df.rolling('5s', on='Tm').max()
min_times = df.rolling('5s', on='Tm').min()
print(max_times)
print((max_times - min_times).astype('timedelta64[s]'))
Output:
B Tm t
0 0 2013-01-01 09:00:00 2013-01-01 09:00:00
1 1 2013-01-01 09:00:02 2013-01-01 09:00:02
2 2 2013-01-01 09:00:03 2013-01-01 09:00:03
3 3 2013-01-01 09:00:05 2013-01-01 09:00:05
4 4 2013-01-01 09:00:06 2013-01-01 09:00:06
5 5 2013-01-01 09:00:10 2013-01-01 09:00:10
6 6 2013-01-01 09:00:12 2013-01-01 09:00:12
7 7 2013-01-01 09:00:16 2013-01-01 09:00:16
8 8 2013-01-01 09:00:19 2013-01-01 09:00:19
9 9 2013-01-01 09:00:20 2013-01-01 09:00:20
B Tm
0 0.0 2013-01-01 09:00:00
1 1.0 2013-01-01 09:00:02
2 2.0 2013-01-01 09:00:03
3 3.0 2013-01-01 09:00:05
4 4.0 2013-01-01 09:00:06
5 5.0 2013-01-01 09:00:10
6 6.0 2013-01-01 09:00:12
7 7.0 2013-01-01 09:00:16
8 8.0 2013-01-01 09:00:19
9 9.0 2013-01-01 09:00:20
B Tm
0 00:00:00 0.0
1 00:00:01 0.0
2 00:00:02 0.0
3 00:00:02 0.0
4 00:00:03 0.0
5 00:00:01 0.0
6 00:00:01 0.0
7 00:00:01 0.0
8 00:00:01 0.0
9 00:00:02 0.0
肯定有更優雅(和實用)的技術嗎?
我通過以下方式實現了這一點:
rolling()
函數的片段),將索引轉換為 integer,並返回索引數組的最小值和最大值之間的差值,rolling()
並使用apply()
function,它允許您指定要使用的 function。 apply()
function 的文檔在這里: https://pandas.pydata.org/docs/reference/api/pandas.core.window.rolling.Rolling.apply.html
例子:
import pandas as pd
df = pd.DataFrame({'B': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'Tm': [pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:00:02'),
pd.Timestamp('20130101 09:00:03'),
pd.Timestamp('20130101 09:00:05'),
pd.Timestamp('20130101 09:00:06'),
pd.Timestamp('20130101 09:00:10'),
pd.Timestamp('20130101 09:00:12'),
pd.Timestamp('20130101 09:00:16'),
pd.Timestamp('20130101 09:00:19'),
pd.Timestamp('20130101 09:00:20')]})
def duration(X):
ind = pd.to_numeric(X.index) * 10**-9 # Convert from nanoseconds to seconds.
return ind.max() - ind.min()
df = df.set_index("Tm")
print(df)
durations = df.rolling("5s").apply(duration)
df.reset_index()
print(durations)
Output:
B
Tm
2013-01-01 09:00:00 0
2013-01-01 09:00:02 0
2013-01-01 09:00:03 0
2013-01-01 09:00:05 0
2013-01-01 09:00:06 0
2013-01-01 09:00:10 0
2013-01-01 09:00:12 0
2013-01-01 09:00:16 0
2013-01-01 09:00:19 0
2013-01-01 09:00:20 0
B
Tm
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 2.0
2013-01-01 09:00:03 3.0
2013-01-01 09:00:05 3.0
2013-01-01 09:00:06 4.0
2013-01-01 09:00:10 4.0
2013-01-01 09:00:12 2.0
2013-01-01 09:00:16 4.0
2013-01-01 09:00:19 3.0
2013-01-01 09:00:20 4.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.