简体   繁体   English

Pandas groupby-按列值扩展均值

[英]Pandas groupby - Expanding mean by column value

I'm new to Pandas and somewhat lost on what to do here. 我是Pandas的新手,对这里的操作有些迷茫。 I have a dataframe imported from a csv, which (heavily simplified) look like this: 我有一个从csv导入的数据框(经过简化),如下所示:

date = ['2013-08-10','2013-08-10','2013-08-10','2013-08-10','2013-08-10',
        '2013-08-10','2013-08-10','2013-08-10','2013-08-10','2013-08-10']
event = ['213','213','213','213','214','214','214','215','215','215']
side = ['A','B','B','B','A','B','A','B','A','B',]
value = [0.193,0.193,0.092,0.027,0.027,0.058,0.027,0.079,0.193,0.159]

df = pd.DataFrame(zip(event,date,side,value),
                  columns=['event','date','side','value'])

  event        date side  value
0   213  2013-08-10    A  0.193
1   213  2013-08-10    B  0.193
2   213  2013-08-10    B  0.092
3   213  2013-08-10    B  0.027
4   214  2013-08-10    A  0.027
5   214  2013-08-10    B  0.058
6   214  2013-08-10    A  0.027
7   215  2013-08-10    B  0.079
8   215  2013-08-10    A  0.193
9   215  2013-08-10    B  0.159

What I want is to sum the values corresponding to each side for every event. 我想要的是将每个事件对应于每一边的值相加。 This I have achieved with groupby: 我通过groupby实现了这一点:

groupby = df.groupby(['event','side']).sum()

            value
event side       
213   A     0.193
      B     0.312
214   A     0.054
      B     0.058
215   A     0.193
      B     0.238

But I also want to add a new column with the expanding mean for each side, like this: 但我也想添加一个新列,每一边的均值都应像这样:

            value
event side          roll_mean
213   A     0.193   0
      B     0.312   0
214   A     0.054   0.193
      B     0.058   0.312
215   A     0.193   0.124
      B     0.238   0.185

Note that every event has two side, but it's not always A and B. What I want is something like excel's mean.if function, which computes the expanding mean for all values of the current side, applied to all previous rows. 请注意,每个事件都有两个面,但并不总是A和B。我想要的是类似excel的mean.if函数,该函数计算当前面的所有值的扩展均值,并应用于所有先前的行。 Any help on this would be appreciated. 任何帮助,将不胜感激。

I think you're actually looking for an expanding mean, not a rolling mean. 我认为您实际上是在寻找扩展的均值,而不是滚动的均值。 An expanding mean considers every previous value. 扩展均值会考虑所有先前的值。 I'll start where you left off: 我将从您中断的地方开始:

In [63]: res = df.groupby(['event','side']).sum()
In [64]: res
Out[64]: 
            value
event side       
213   A     0.193
      B     0.312
214   A     0.054
      B     0.058
215   A     0.193
      B     0.238

Now we want to groupby side and take the expanding mean: 现在,我们要GROUPBY side ,并采取扩大意味着:

In [65]: res['expanding_mean'] = res.groupby(level='side').apply(pd.expanding_mean).shift(2)
In [66]: res
Out[66]: 
            value  expanding_mean
event side                       
213   A     0.193             NaN
      B     0.312             NaN
214   A     0.054          0.1930
      B     0.058          0.3120
215   A     0.193          0.1235
      B     0.238          0.1850

Your result needs to be shift ed by 2 since you want the mean to include all previous ones, and not the current one (make sure this is what you actually want, this seems a bit funny). 您的结果需要shift 2,因为您希望均值包括所有先前的均值,而不是当前的均值(请确保这是您真正想要的,这似乎有点可笑)。 You can replace the shift(2) with len(res.index.levels[1]) to make it a bit more general in case you have more than 2 sides. 您可以将shift(2)替换为len(res.index.levels[1])以使它在具有2个以上的面时更加通用。

I added more 'sides' to your dataframe, so it works when the results aren't just 'A' or 'B'. 我在您的数据框中添加了更多的“边”,所以当结果不只是“ A”或“ B”时,它就可以工作。 Is this what you want? 这是你想要的吗?

import pandas as pd
import numpy as np
date = ['2013-08-10','2013-08-10','2013-08-10','2013-08-10','2013-08-10',
        '2013-08-10','2013-08-10','2013-08-10','2013-08-10','2013-08-10']
event = ['213','213','213','213','214','214','214','215','215','215']
side = ['A','B','A','B','C','A','C','A','C','A',]
value = [0.193,0.193,0.092,0.027,0.027,0.058,0.027,0.079,0.193,0.159]

df = pd.DataFrame(list(zip(event,date,side,value)),
                columns=['event','date','side','value'])
print(df)

event        date side  value
0   213  2013-08-10    A  0.193
1   213  2013-08-10    B  0.193
2   213  2013-08-10    A  0.092
3   213  2013-08-10    B  0.027
4   214  2013-08-10    C  0.027
5   214  2013-08-10    A  0.058
6   214  2013-08-10    C  0.027
7   215  2013-08-10    A  0.079
8   215  2013-08-10    C  0.193
9   215  2013-08-10    A  0.159


ds = df.groupby(['event','side']).sum()
print(ds)

        value
event side       
213   A     0.285
      B     0.220
214   A     0.058
      C     0.054
215   A     0.238
      C     0.193

ds.reset_index(inplace=True)
ds['exp_mean'] = np.NaN
for s in ds.side.unique():
    ndx = ds[ds.side==s].index
    ds.ix[ndx,'exp_mean'] = pd.expanding_mean(ds.ix[ndx,'value']).shift(1)
ds.set_index(['event', 'side'], inplace=True, drop=True)
print(ds)

            value  exp_mean
event side                 
213   A     0.285       NaN
      B     0.220       NaN
214   A     0.058    0.2850
      C     0.054       NaN
215   A     0.238    0.1715
      C     0.193    0.0540

See this pandas commit (lines 60-78): https://github.com/pandas-dev/pandas/commit/699424027fb657192541bcd0c3d9f9b7d26f2300 看到这个熊猫提交(第60-78行): https : //github.com/pandas-dev/pandas/commit/699424027fb657192541bcd0c3d9f9b7d26f2300

`You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. 
These return another deferred object (similar to what ``.rolling()`` and 
``.expanding()`` do on ungrouped pandas objects). You can then operate
 on these ``RollingGroupby`` objects in a similar manner.

Previously you would have to do this to get a rolling window mean per-group:
 .. ipython:: python
    df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                      'B': np.arange(40)})
   df
 .. ipython:: python
    df.groupby('A').apply(lambda x: x.rolling(4).B.mean())
 Now you can do:
 .. ipython:: python
    df.groupby('A').rolling(4).B.mean()`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM