[英]Loop through a Pandas Dataframe in blocks
给出以下数据帧
open high low close volume
0 74.090 74.144 74.089 74.136 0.000012
1 74.110 74.143 74.009 74.072 0.000419
2 74.074 74.190 74.063 74.081 0.000223
3 74.100 74.244 74.085 74.182 0.000429
4 74.194 74.222 74.164 74.199 0.000090
5 74.198 74.265 74.181 74.213 0.000071
6 74.223 74.244 74.120 74.174 0.000124
7 74.181 74.229 74.132 74.161 0.000087
8 74.164 74.337 74.126 74.324 0.000299
9 74.303 74.407 74.302 74.400 0.000185
10 74.408 74.440 74.373 74.409 0.000163
11 74.437 74.438 74.399 74.418 0.000208
12 74.428 74.464 74.385 74.385 0.000231
我如何有效地循环遍历整个数据帧并获得(在新数据帧中)前一行包括当前行的前五行?
如果你想要效率,请使用numpy
的步伐
import pandas as pd
import numpy as np
from numpy.lib.stride_tricks import as_strided as stride
sr, sc = v.strides
data = stride(v, (v.shape[1], v.shape[0] - 4, 5), (sc, sr, sr))
pn5 = pd.Panel(data, df.columns, df.index[4:], pd.RangeIndex(5))
df5 = pn5.to_frame()
df5.head(10)
open high low close volume
major minor
4 0 74.090 74.144 74.089 74.136 0.000012
1 74.110 74.143 74.009 74.072 0.000419
2 74.074 74.190 74.063 74.081 0.000223
3 74.100 74.244 74.085 74.182 0.000429
4 74.194 74.222 74.164 74.199 0.000090
5 0 74.110 74.143 74.009 74.072 0.000419
1 74.074 74.190 74.063 74.081 0.000223
2 74.100 74.244 74.085 74.182 0.000429
3 74.194 74.222 74.164 74.199 0.000090
4 74.198 74.265 74.181 74.213 0.000071
示例处理
def process(df):
return df.loc[df.name].tail(2)
print(df5.groupby(level=0).apply(process))
open high low close volume
major minor
4 3 74.100 74.244 74.085 74.182 0.000429
4 74.194 74.222 74.164 74.199 0.000090
5 3 74.194 74.222 74.164 74.199 0.000090
4 74.198 74.265 74.181 74.213 0.000071
6 3 74.198 74.265 74.181 74.213 0.000071
4 74.223 74.244 74.120 74.174 0.000124
7 3 74.223 74.244 74.120 74.174 0.000124
4 74.181 74.229 74.132 74.161 0.000087
8 3 74.181 74.229 74.132 74.161 0.000087
4 74.164 74.337 74.126 74.324 0.000299
9 3 74.164 74.337 74.126 74.324 0.000299
4 74.303 74.407 74.302 74.400 0.000185
10 3 74.303 74.407 74.302 74.400 0.000185
4 74.408 74.440 74.373 74.409 0.000163
11 3 74.408 74.440 74.373 74.409 0.000163
4 74.437 74.438 74.399 74.418 0.000208
12 3 74.437 74.438 74.399 74.418 0.000208
4 74.428 74.464 74.385 74.385 0.000231
建立
df = pd.DataFrame([
[74.09, 74.14399999999999, 74.089, 74.13600000000001, 1.2e-05],
[74.11, 74.143, 74.009, 74.072, 0.00041900000000000005],
[74.074, 74.19, 74.063, 74.081, 0.000223],
[74.1, 74.244, 74.085, 74.182, 0.000429],
[74.194, 74.222, 74.164, 74.199, 9e-05],
[74.19800000000001, 74.265, 74.181, 74.21300000000001, 7.099999999999999e-05],
[74.223, 74.244, 74.12, 74.17399999999999, 0.000124],
[74.181, 74.229, 74.132, 74.161, 8.7e-05],
[74.164, 74.337, 74.126, 74.324, 0.000299],
[74.303, 74.407, 74.30199999999999, 74.4, 0.000185],
[74.408, 74.44, 74.373, 74.40899999999999, 0.00016299999999999998],
[74.437, 74.438, 74.399, 74.418, 0.00020800000000000001],
[74.428, 74.464, 74.385, 74.385, 0.000231]
], columns=['open', 'high', 'low', 'close', 'volume'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.