[英]pandas dataframe window using date index
我在熊貓里有這個數據
data.tail(15)
open high low close vwap
date
2018-11-20 18:45:00 176.73 176.95 176.54 176.89 176.582983
2018-11-20 18:46:00 176.89 177.02 176.81 176.81 176.603020
2018-11-20 18:47:00 176.80 176.80 176.43 176.43 176.612706
2018-11-20 18:48:00 176.45 176.46 176.21 176.21 176.599967
2018-11-20 18:49:00 176.22 176.32 176.14 176.26 176.586624
2018-11-20 18:50:00 176.26 176.38 176.23 176.28 176.577114
2018-11-20 18:51:00 176.31 176.43 176.20 176.20 176.562641
2018-11-20 18:52:00 176.22 176.25 176.15 176.18 176.544664
2018-11-20 18:53:00 176.19 176.19 175.97 176.00 176.506937
2018-11-20 18:54:00 176.00 176.30 175.97 176.30 176.493768
2018-11-20 18:55:00 176.29 176.92 176.11 176.91 176.518353
2018-11-20 18:56:00 176.92 177.03 176.67 176.76 176.554964
2018-11-20 18:57:00 176.78 176.89 176.74 176.76 176.566201
2018-11-20 18:58:00 176.77 176.87 176.56 176.65 176.571326
2018-11-20 18:59:00 176.65 177.17 176.59 176.94 176.681413
我需要按5將子數據框分組,例如:
1:
2018-11-20 18:45:00 176.73 176.95 176.54 176.89 176.582983
2018-11-20 18:46:00 176.89 177.02 176.81 176.81 176.603020
2018-11-20 18:47:00 176.80 176.80 176.43 176.43 176.612706
2018-11-20 18:48:00 176.45 176.46 176.21 176.21 176.599967
2018-11-20 18:49:00 176.22 176.32 176.14 176.26 176.586624
2:
2018-11-20 18:46:00 176.89 177.02 176.81 176.81 176.603020
2018-11-20 18:47:00 176.80 176.80 176.43 176.43 176.612706
2018-11-20 18:48:00 176.45 176.46 176.21 176.21 176.599967
2018-11-20 18:49:00 176.22 176.32 176.14 176.26 176.586624
2018-11-20 18:50:00 176.26 176.38 176.23 176.28 176.577114
班次為1分鍾。
n:
2018-11-20 18:55:00 176.29 176.92 176.11 176.91 176.518353
2018-11-20 18:56:00 176.92 177.03 176.67 176.76 176.554964
2018-11-20 18:57:00 176.78 176.89 176.74 176.76 176.566201
2018-11-20 18:58:00 176.77 176.87 176.56 176.65 176.571326
2018-11-20 18:59:00 176.65 177.17 176.59 176.94 176.681413
這個怎么做? 我嘗試滾動,groupby沒有成功。
pandas 0.23.4
Python 3.6.3
謝謝
如果只是根據所需的序列長度進行迭代,該怎么做。
# takes 5 row for each sub data frame
seq_len = 5
for i in range(0, len(data)):
subdata = data.ix[i:i + int(seq_len), :]
print(subdata)
以下結果顯示在請求的輸出中(pandas 0.22.0,python 3.6.7):
import pandas as pd
from datetime import timedelta
# Width of the time window: 5min
dt = timedelta(minutes=5)
# Step of the sliding window: 1min
step = timedelta(minutes=1)
start = df.index[0]
stop = df.index[-1]
while start <= (stop-dt+step):
idx = (start <= df.index) & (df.index < start+dt)
start += step
print(df[idx])
print()
一個可以指定兩個參數:時間窗口的寬度dt
和“滑動窗口”前進的step
。
這種方法的優點是,僅使用索引即可操作,避免了不必要的重復數據副本(盡管我敢打賭python / pandas可以很好地避免這種情況,以防萬一有人找到了另一種方法來完成這項工作)。
我使用以下數據框進行了測試:
df = pd.DataFrame([["2018-11-20 18:45:00", 176.73, 176.95, 176.54, 176.89, 176.582983],
["2018-11-20 18:46:00", 176.89, 177.02, 176.81, 176.81, 176.603020],
["2018-11-20 18:47:00", 176.80, 176.80, 176.43, 176.43, 176.612706],
["2018-11-20 18:48:00", 176.45, 176.46, 176.21, 176.21, 176.599967],
["2018-11-20 18:49:00", 176.22, 176.32, 176.14, 176.26, 176.586624],
["2018-11-20 18:50:00", 176.26, 176.38, 176.23, 176.28, 176.577114],
["2018-11-20 18:51:00", 176.31, 176.43, 176.20, 176.20, 176.562641],
["2018-11-20 18:52:00", 176.22, 176.25, 176.15, 176.18, 176.544664],
["2018-11-20 18:53:00", 176.19, 176.19, 175.97, 176.00, 176.506937],
["2018-11-20 18:54:00", 176.00, 176.30, 175.97, 176.30, 176.493768],
["2018-11-20 18:55:00", 176.29, 176.92, 176.11, 176.91, 176.518353],
["2018-11-20 18:56:00", 176.92, 177.03, 176.67, 176.76, 176.554964],
["2018-11-20 18:57:00", 176.78, 176.89, 176.74, 176.76, 176.566201],
["2018-11-20 18:58:00", 176.77, 176.87, 176.56, 176.65, 176.571326],
["2018-11-20 18:59:00", 176.65, 177.17, 176.59, 176.94, 176.681413],],
columns=["date", "open", "high", "low", "close", "vwap"])
df = df.set_index("date")
df.index = pd.to_datetime(df.index)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.