简体   繁体   English

Python:滚动重采样

[英]Python: resample on a rolling basis

I have a DataFrame as follows:我有一个 DataFrame 如下:

data = [[99330,12,122],
   [1123,1230,1287],
   [123,101,812739],
   [1143,12301230,252],
   [234,342,4546],
   [2445,3453,3457],
   [7897,8657,5675],
   [46,5675,453],
   [76,484,3735],
   [363,93,4568],
   [385,568,367],
   [458,846,4847],
   [574,45747,658468],
   [57457,46534,4675]]
df1 = pd.DataFrame(data, index=['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
                           '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
                           '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12',
                           '2022-01-13', '2022-01-14'], 
              columns=['col_A', 'col_B', 'col_C'])
df1.index = pd.to_datetime(df1.index)
df1.resample('1D').last().rolling(7).last()

The last line gives me the following error: AttributeError: 'Rolling' object has no attribute 'last'最后一行给了我以下错误: AttributeError: 'Rolling' object has no attribute 'last'

What I want to do is resample the data on a rolling basis (for 7, 30, 90 days).我想要做的是滚动重新采样数据(7、30、90 天)。

Is there a way to this without using many loops?有没有办法在不使用很多循环的情况下做到这一点?

You apply last as if rolling gave you dataframe;last申请,好像rolling给了你 dataframe; it doesn't (because it's actually "incomplete" as you can see below).它没有(因为它实际上是“不完整的”,如下所示)。

A general tip is that you can grab whatever you get from the previous step, and use help on it.一般提示是,您可以获取从上一步中获得的任何内容,并使用help In this case在这种情况下

x = df1.resample('1D').last().rolling(7)
help(x)

which gives you a very extensive manual.它为您提供了一个非常广泛的手册。

What's missing from your problem is that you haven't actually precisely specified what you want to roll.您的问题缺少的是您实际上并没有精确地指定滚动的内容。 Do you want a rolling mean ?你想要一个滚动的意思吗? That gives you a hint that you want to use .mean() on the rolled data.这会提示您要对滚动数据使用.mean()

More specifically, in this case, it's probably most helpful to use the timedelta, which will make the code even more clear, and help you with corner-cases as opposed to using the integer count更具体地说,在这种情况下,使用 timedelta 可能最有帮助,这将使代码更加清晰,并帮助您处理极端情况,而不是使用 integer 计数

rolled_df = df1.rolling(datetime.timedelta(days=7)).mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM