简体   繁体   English

在pandas中使用groupby时如何分别对负值和正值求和?

[英]How to sum negative and positive values separately when using groupby in pandas?

How to sum positive and negative values differently in pandas and put them let's say in positive and negative columns? 如何在不同的总结正值和负值pandas ,把他们让我们在说positivenegative列?

I have this dataframe like below: 我有这样的数据框如下:

df = pandas.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8), 'D' : np.random.randn(8)})

Output is as below: 输出如下:

df
     A      B         C         D
0  foo    one  0.374156  0.319699
1  bar    one -0.356339 -0.629649
2  foo    two -0.390243 -1.387909
3  bar  three -0.783435 -0.959699
4  foo    two -1.268622 -0.250871
5  bar    two -2.302525 -1.295991
6  foo    one -0.968840  1.247675
7  foo  three  0.482845  1.004697

I used the below code to get negatives: 我使用下面的代码得到否定:

df['negative'] = df.groupby('A')['C'].apply(lambda x: x[x<0].sum()).reset_index()]

But the problem is when I want to add it to one of dataframe columns called negative it gives error: 但问题是,当我想将其添加到名为negativedataframe列之一时,它会给出错误:

ValueError: Wrong number of items passed 2, placement implies 1

Again I know what it says that groupby has returned more than one column and cannot assign it to df['negatives'] but I don't know how to solve this part of the problem. 我再次知道它说groupby已经返回多个列并且无法将其分配给df['negatives']但我不知道如何解决这部分问题。 I need to have positive col too. 我也需要积极的col。

The desired outcome would be: 期望的结果是:

    A      Positive   Negative
0  foo     0.374156  -0.319699
1  bar     0.356339  -0.629649

What is the right solution to the problem? 解决这个问题的正确方法是什么?

In [14]:
df.groupby(df['A'])['C'].agg([('negative' , lambda x : x[x < 0].sum()) , ('positive' , lambda x : x[x > 0].sum())])
Out[14]:
     negative   positive
A       
bar -1.418788   2.603452
foo -0.504695   2.880512

You may groupby on A and df['C'] > 0 , and unstack the result: 你可能groupbyAdf['C'] > 0 ,和unstack结果:

>>> right = df.groupby(['A', df['C'] > 0])['C'].sum().unstack()
>>> right = right.rename(columns={True:'positive', False:'negative'})
>>> right
C    negative  positive
A                      
bar   -3.4423       NaN
foo   -2.6277     0.857

The NaN value is because all the A == bar rows have negative value for C . NaN值是因为所有A == bar行都具有C负值。

if you want to add these to the original frame corresponding to values of groupby key, ie A , it would require a left join : 如果你想将这些添加到对应于groupby键值的原始帧,即A ,则需要左join

>>> df.join(right, on='A', how='left')
     A      B       C       D  negative  positive
0  foo    one  0.3742  0.3197   -2.6277     0.857
1  bar    one -0.3563 -0.6296   -3.4423       NaN
2  foo    two -0.3902 -1.3879   -2.6277     0.857
3  bar  three -0.7834 -0.9597   -3.4423       NaN
4  foo    two -1.2686 -0.2509   -2.6277     0.857
5  bar    two -2.3025 -1.2960   -3.4423       NaN
6  foo    one -0.9688  1.2477   -2.6277     0.857
7  foo  three  0.4828  1.0047   -2.6277     0.857

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM