簡體   English   中英

使用日期索引的熊貓數據框窗口

[英]pandas dataframe window using date index

我在熊貓里有這個數據

data.tail(15)
                       open    high     low   close        vwap
date                                                           
2018-11-20 18:45:00  176.73  176.95  176.54  176.89  176.582983
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624
2018-11-20 18:50:00  176.26  176.38  176.23  176.28  176.577114
2018-11-20 18:51:00  176.31  176.43  176.20  176.20  176.562641
2018-11-20 18:52:00  176.22  176.25  176.15  176.18  176.544664
2018-11-20 18:53:00  176.19  176.19  175.97  176.00  176.506937
2018-11-20 18:54:00  176.00  176.30  175.97  176.30  176.493768
2018-11-20 18:55:00  176.29  176.92  176.11  176.91  176.518353
2018-11-20 18:56:00  176.92  177.03  176.67  176.76  176.554964
2018-11-20 18:57:00  176.78  176.89  176.74  176.76  176.566201
2018-11-20 18:58:00  176.77  176.87  176.56  176.65  176.571326
2018-11-20 18:59:00  176.65  177.17  176.59  176.94  176.681413

我需要按5將子數據框分組,例如:

1: 
2018-11-20 18:45:00  176.73  176.95  176.54  176.89  176.582983
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624

2: 
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624
2018-11-20 18:50:00  176.26  176.38  176.23  176.28  176.577114

班次為1分鍾。

n: 
2018-11-20 18:55:00  176.29  176.92  176.11  176.91  176.518353
2018-11-20 18:56:00  176.92  177.03  176.67  176.76  176.554964
2018-11-20 18:57:00  176.78  176.89  176.74  176.76  176.566201
2018-11-20 18:58:00  176.77  176.87  176.56  176.65  176.571326
2018-11-20 18:59:00  176.65  177.17  176.59  176.94  176.681413

這個怎么做? 我嘗試滾動,groupby沒有成功。

pandas 0.23.4
Python 3.6.3

謝謝

如果只是根據所需的序列長度進行迭代,該怎么做。

# takes 5 row for each sub data frame
seq_len = 5
for i in range(0, len(data)):
    subdata = data.ix[i:i + int(seq_len), :]
    print(subdata)

以下結果顯示在請求的輸出中(pandas 0.22.0,python 3.6.7):

import pandas as pd
from datetime import timedelta

# Width of the time window: 5min
dt = timedelta(minutes=5)
# Step of the sliding window: 1min
step = timedelta(minutes=1)

start = df.index[0]
stop = df.index[-1]
while start <= (stop-dt+step):
    idx = (start <= df.index) & (df.index < start+dt)
    start += step
    print(df[idx])
    print()

一個可以指定兩個參數:時間窗口的寬度dt和“滑動窗口”前進的step

這種方法的優點是,僅使用索引即可操作,避免了不必要的重復數據副本(盡管我敢打賭python / pandas可以很好地避免這種情況,以防萬一有人找到了另一種方法來完成這項工作)。

我使用以下數據框進行了測試:

df = pd.DataFrame([["2018-11-20 18:45:00",  176.73,  176.95,  176.54,  176.89,  176.582983],
                   ["2018-11-20 18:46:00",  176.89,  177.02,  176.81,  176.81,  176.603020],
                   ["2018-11-20 18:47:00",  176.80,  176.80,  176.43,  176.43,  176.612706],
                   ["2018-11-20 18:48:00",  176.45,  176.46,  176.21,  176.21,  176.599967],
                   ["2018-11-20 18:49:00",  176.22,  176.32,  176.14,  176.26,  176.586624],
                   ["2018-11-20 18:50:00",  176.26,  176.38,  176.23,  176.28,  176.577114],
                   ["2018-11-20 18:51:00",  176.31,  176.43,  176.20,  176.20,  176.562641],
                   ["2018-11-20 18:52:00",  176.22,  176.25,  176.15,  176.18,  176.544664],
                   ["2018-11-20 18:53:00",  176.19,  176.19,  175.97,  176.00,  176.506937],
                   ["2018-11-20 18:54:00",  176.00,  176.30,  175.97,  176.30,  176.493768],
                   ["2018-11-20 18:55:00",  176.29,  176.92,  176.11,  176.91,  176.518353],
                   ["2018-11-20 18:56:00",  176.92,  177.03,  176.67,  176.76,  176.554964],
                   ["2018-11-20 18:57:00",  176.78,  176.89,  176.74,  176.76,  176.566201],
                   ["2018-11-20 18:58:00",  176.77,  176.87,  176.56,  176.65,  176.571326],
                   ["2018-11-20 18:59:00",  176.65,  177.17,  176.59,  176.94,  176.681413],],
                  columns=["date", "open", "high", "low", "close", "vwap"])
df = df.set_index("date")
df.index = pd.to_datetime(df.index)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM