简体   繁体   English

如何根据日期条件在熊猫的列中找到前n行的平均值?

[英]How to find mean of n previous rows in a column in pandas based on date criteria?

I have a dataset that looks like this: 我有一个看起来像这样的数据集:

value1 value2 value3 date

17    21    22     2005-04-01 12:05:00

19    20    24     2005-04-01 12:06:00

16    26    23     2005-04-01 12:07:00

I need to transform it somehow, so the values of each row with date ending with .05:00 (5th minute of each hour) will be equal to average value of previous 60 rows. 我需要对其进行某种形式的转换,因此日期以.05:00结尾的每一行的值(每小时5分钟)将等于前60行的平均值。

I tried to use groupby based on datetime, it does provide average values for each hour (00 - 59), but i need to adjust it for my case. 我尝试根据日期时间使用groupby,它确实提供了每小时(00-59)的平均值,但是我需要针对我的情况进行调整。

In the end I would like to have something like this: 最后,我想拥有这样的东西:

  value1 value2 value3 date

  17    21    22     2005-04-01 12:05:00

  19    20    24     2005-04-01 13:05:00

  16    26    23     2005-04-01 14:05:00

where 17 for instance is average of 60 previous values in value1 column. 例如,其中17是value1列中60个先前值的平均值。

This will create a rolling mean on 60 minutes windows (makes sure, that date column is datetime64[ns] dtype, if not, convert it beforehand), then you can select the necessary rows with .loc[] : 这将在60分钟的窗口上产生滚动平均值(请确保该date列为datetime64[ns] dtype,如果不是,请事先进行转换),然后可以使用.loc[]选择必要的行:

df.rolling('H', on='date').mean().loc[lambda x: x['date'].dt.minute == 5]

See the docs for further details on .rolling() and .loc[] . 有关.rolling().loc[]更多详细信息,请参阅文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM