繁体   English   中英

以块为单位循环通过Pandas Dataframe

[英]Loop through a Pandas Dataframe in blocks

给出以下数据帧

      open    high     low   close    volume
0     74.090  74.144  74.089  74.136  0.000012
1     74.110  74.143  74.009  74.072  0.000419
2     74.074  74.190  74.063  74.081  0.000223
3     74.100  74.244  74.085  74.182  0.000429
4     74.194  74.222  74.164  74.199  0.000090
5     74.198  74.265  74.181  74.213  0.000071
6     74.223  74.244  74.120  74.174  0.000124
7     74.181  74.229  74.132  74.161  0.000087
8     74.164  74.337  74.126  74.324  0.000299
9     74.303  74.407  74.302  74.400  0.000185
10    74.408  74.440  74.373  74.409  0.000163
11    74.437  74.438  74.399  74.418  0.000208
12    74.428  74.464  74.385  74.385  0.000231

我如何有效地循环遍历整个数据帧并获得(在新数据帧中)前一行包括当前行的前五行?

如果你想要效率,请使用numpy的步伐

import pandas as pd
import numpy as np
from numpy.lib.stride_tricks import as_strided as stride

sr, sc = v.strides
data = stride(v, (v.shape[1], v.shape[0] - 4, 5), (sc, sr, sr))

pn5 = pd.Panel(data, df.columns, df.index[4:], pd.RangeIndex(5))
df5 = pn5.to_frame()

df5.head(10)

               open    high     low   close    volume
major minor                                          
4     0      74.090  74.144  74.089  74.136  0.000012
      1      74.110  74.143  74.009  74.072  0.000419
      2      74.074  74.190  74.063  74.081  0.000223
      3      74.100  74.244  74.085  74.182  0.000429
      4      74.194  74.222  74.164  74.199  0.000090
5     0      74.110  74.143  74.009  74.072  0.000419
      1      74.074  74.190  74.063  74.081  0.000223
      2      74.100  74.244  74.085  74.182  0.000429
      3      74.194  74.222  74.164  74.199  0.000090
      4      74.198  74.265  74.181  74.213  0.000071

示例处理

def process(df):
    return df.loc[df.name].tail(2)

print(df5.groupby(level=0).apply(process))

               open    high     low   close    volume
major minor                                          
4     3      74.100  74.244  74.085  74.182  0.000429
      4      74.194  74.222  74.164  74.199  0.000090
5     3      74.194  74.222  74.164  74.199  0.000090
      4      74.198  74.265  74.181  74.213  0.000071
6     3      74.198  74.265  74.181  74.213  0.000071
      4      74.223  74.244  74.120  74.174  0.000124
7     3      74.223  74.244  74.120  74.174  0.000124
      4      74.181  74.229  74.132  74.161  0.000087
8     3      74.181  74.229  74.132  74.161  0.000087
      4      74.164  74.337  74.126  74.324  0.000299
9     3      74.164  74.337  74.126  74.324  0.000299
      4      74.303  74.407  74.302  74.400  0.000185
10    3      74.303  74.407  74.302  74.400  0.000185
      4      74.408  74.440  74.373  74.409  0.000163
11    3      74.408  74.440  74.373  74.409  0.000163
      4      74.437  74.438  74.399  74.418  0.000208
12    3      74.437  74.438  74.399  74.418  0.000208
      4      74.428  74.464  74.385  74.385  0.000231

建立

df = pd.DataFrame([
        [74.09, 74.14399999999999, 74.089, 74.13600000000001, 1.2e-05],
        [74.11, 74.143, 74.009, 74.072, 0.00041900000000000005],
        [74.074, 74.19, 74.063, 74.081, 0.000223],
        [74.1, 74.244, 74.085, 74.182, 0.000429],
        [74.194, 74.222, 74.164, 74.199, 9e-05],
        [74.19800000000001, 74.265, 74.181, 74.21300000000001, 7.099999999999999e-05],
        [74.223, 74.244, 74.12, 74.17399999999999, 0.000124],
        [74.181, 74.229, 74.132, 74.161, 8.7e-05],
        [74.164, 74.337, 74.126, 74.324, 0.000299],
        [74.303, 74.407, 74.30199999999999, 74.4, 0.000185],
        [74.408, 74.44, 74.373, 74.40899999999999, 0.00016299999999999998],
        [74.437, 74.438, 74.399, 74.418, 0.00020800000000000001],
        [74.428, 74.464, 74.385, 74.385, 0.000231]
    ], columns=['open', 'high', 'low', 'close', 'volume'])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM