简体   繁体   English

熊猫前一组最小/最大

[英]Pandas previous group min/max

In Pandas I have dataset like this:在 Pandas 中,我有这样的数据集:

                     Value
2005-08-03 23:15:00   10.5
2005-08-03 23:30:00   10.0
2005-08-03 23:45:00   10.0
2005-08-04 00:00:00   10.5
2005-08-04 00:15:00   10.5
2005-08-04 00:30:00   11.0
2005-08-04 00:45:00   10.5
2005-08-04 01:00:00   11.0
...
2005-08-04 23:15:00   14.0
2005-08-04 23:30:00   13.5
2005-08-04 23:45:00   13.0
2005-08-05 00:00:00   13.5
2005-08-05 00:15:00   14.0
2005-08-05 00:30:00   14.0
2005-08-05 00:45:00   14.5

First I wanted to group data by date and store each group's max value in new column, I used the following code for this task:首先,我想按日期对数据进行分组并将每个组的最大值存储在新列中,我为此任务使用了以下代码:

df['ValueMaxInGroup'] = df.groupby(pd.TimeGrouper('D'))['Value'].transform(max)

Now I want to create another column to store previous group max value, so the desired data frame would look like:现在我想创建另一列来存储前一组最大值,因此所需的数据框如下所示:

                     Value  ValueMaxInGroup  ValueMaxInPrevGroup
2005-08-03 23:15:00   10.5             10.5                  NaN
2005-08-03 23:30:00   10.0             10.5                  NaN
2005-08-03 23:45:00   10.0             10.5                  NaN
2005-08-04 00:00:00   10.5             14.0                 10.5
2005-08-04 00:15:00   10.5             14.0                 10.5
2005-08-04 00:30:00   11.0             14.0                 10.5
2005-08-04 00:45:00   10.5             14.0                 10.5
2005-08-04 01:00:00   11.0             14.0                 10.5
...
2005-08-04 23:15:00   14.0             14.0                 10.5
2005-08-04 23:30:00   13.5             14.0                 10.5
2005-08-04 23:45:00   13.0             14.0                 10.5
2005-08-05 00:00:00   13.5             14.5                 14.0
2005-08-05 00:15:00   14.0             14.5                 14.0
2005-08-05 00:30:00   14.0             14.5                 14.0
2005-08-05 00:45:00   14.5             14.5                 14.0

So, to simply get previous row's value, I used所以,为了简单地获取上一行的值,我使用了

df['ValueInPrevRow'] = df.shift(1)['Value']

Is there any way to get another group's min/max/f(x)?有什么办法可以得到另一组的 min/max/f(x) 吗? I assumed我假设

df['ValueMaxInPrevGroup'] = df.groupby(pd.TimeGrouper('D')).shift(1)['Value'].transform(max)

but it didn't work.但它没有用。

You could get the desired result by using groupby/agg , shift and merge :您可以通过使用groupby/aggshiftmerge来获得所需的结果:

import numpy as np
import pandas as pd
df = pd.DataFrame({'Value': [10.5, 10.0, 10.0, 10.5, 10.5, 11.0, 10.5, 11.0, 14.0, 13.5, 13.0, 13.5, 14.0, 14.0, 14.5]}, index=['2005-08-03 23:15:00', '2005-08-03 23:30:00', '2005-08-03 23:45:00', '2005-08-04 00:00:00', '2005-08-04 00:15:00', '2005-08-04 00:30:00', '2005-08-04 00:45:00', '2005-08-04 01:00:00', '2005-08-04 23:15:00', '2005-08-04 23:30:00', '2005-08-04 23:45:00', '2005-08-05 00:00:00', '2005-08-05 00:15:00', '2005-08-05 00:30:00', '2005-08-05 00:45:00']) 
df.index = pd.DatetimeIndex(df.index)

# This is equivalent to
# df['group'] = pd.to_datetime(df.index.date)
# when freq='D', but the version below works with any freq string, not just `'D'`.
grouped = df.groupby(pd.TimeGrouper('D'))
labels, uniqs, ngroups = grouped.grouper.group_info
df['group'] = grouped.grouper.binlabels[labels]

result = grouped[['Value']].agg(max)
result = result.rename(columns={'Value':'Max'})
result['PreviouMax'] = result['Max'].shift(1)

df = pd.merge(df, result, left_on=['group'], right_index=True)
print(df)

yields产量

                     Value      group   Max  PreviouMax
2005-08-03 23:15:00   10.5 2005-08-03  10.5         NaN
2005-08-03 23:30:00   10.0 2005-08-03  10.5         NaN
2005-08-03 23:45:00   10.0 2005-08-03  10.5         NaN
2005-08-04 00:00:00   10.5 2005-08-04  14.0        10.5
2005-08-04 00:15:00   10.5 2005-08-04  14.0        10.5
2005-08-04 00:30:00   11.0 2005-08-04  14.0        10.5
2005-08-04 00:45:00   10.5 2005-08-04  14.0        10.5
2005-08-04 01:00:00   11.0 2005-08-04  14.0        10.5
2005-08-04 23:15:00   14.0 2005-08-04  14.0        10.5
2005-08-04 23:30:00   13.5 2005-08-04  14.0        10.5
2005-08-04 23:45:00   13.0 2005-08-04  14.0        10.5
2005-08-05 00:00:00   13.5 2005-08-05  14.5        14.0
2005-08-05 00:15:00   14.0 2005-08-05  14.5        14.0
2005-08-05 00:30:00   14.0 2005-08-05  14.5        14.0
2005-08-05 00:45:00   14.5 2005-08-05  14.5        14.0

The main idea here is to use groupby/agg instead of groupby/transform so that we may obtain这里的主要思想是使用groupby/agg而不是groupby/transform这样我们可以获得

result = grouped[['Value']].agg(max)
result = result.rename(columns={'Value':'Max'})
result['PreviouMax'] = result['Max'].shift(1)
#              Max  PreviouMax
# group                       
# 2005-08-03  10.5         NaN
# 2005-08-04  14.0        10.5
# 2005-08-05  14.5        14.0

Then the desired DataFrame can be expressed as the result of merging df with result on the group date.然后可以将所需的DataFrame表示为将dfgroup日期的结果合并的result

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM