繁体   English   中英

Python Pandas将列添加到多索引GroupBy DataFrame

[英]Python Pandas add column to multi-index GroupBy DataFrame

我正在尝试使用多索引将列添加到Pandas GroupBy DataFrame。 该列是分组后公用密钥的最大值与平均值之差。

这是输入DataFrame:

   Main  Reads  Test  Subgroup
0     1      5    54         1
1     2      2    55         1
2     1     10    56         2
3     2     20    57         3
4     1      7    58         3

这是代码:

import numpy as np
import pandas as pd

df = pd.DataFrame({'Main': [1, 2, 1, 2, 1], 'Reads': [5, 2, 10, 20, 7],\
                   'Test':range(54,59), 'Subgroup':[1,1,2,3,3]})

result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean]})

这是进行diff计算之前的变量result

              Reads     
               amax mean
Main Subgroup           
1    1            5    5
     2           10   10
     3            7    7
2    1            2    2
     3           20   20

接下来,我使用以下公式计算diff列:

result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']

但是这是输出:

/home/userd/test.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/
...stable/indexing.html#indexing-view-versus-copy
...result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']

我希望diff列处于amaxmean相同的水平。

有没有一种方法可以将列添加到Pandas中多索引GroupBy()对象的最内部(底部)列索引中?

您可以使用元组访问多索引

result[('Reads','diff')] = result[('Reads','amax')] - result[('Reads','mean')]

你得到

                    Reads
                    amax    mean    diff
Main    Subgroup            
1       1           5       5       0
        2          10      10       0
        3           7       7       0
2       1           2       2       0
        3          20      20       0

尝试这个:

In [8]: result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean, lambda x: x.max()-x.mean()]})

In [9]: result
Out[9]:
              Reads
               amax mean <lambda>
Main Subgroup
1    1            5    5        0
     2           10   10        0
     3            7    7        0
2    1            2    2        0
     3           20   20        0

In [10]: result = result.rename(columns={'<lambda>':'diff'})

In [11]: result
Out[11]:
              Reads
               amax mean diff
Main Subgroup
1    1            5    5    0
     2           10   10    0
     3            7    7    0
2    1            2    2    0
     3           20   20    0
#you can you lambda to build diff directly.
df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean,lambda x: np.max(x)-np.mean(x)]}).rename(columns={'<lambda>':'diff'})
Out[2360]: 
              Reads          
               amax mean diff
Main Subgroup                
1    1            5    5    0
     2           10   10    0
     3            7    7    0
2    1            2    2    0
     3           20   20    0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM