[英]How to sum negative and positive values separately when using groupby in pandas?
How to sum positive and negative values differently in pandas
and put them let's say in positive
and negative
columns? 如何在不同的总结正值和负值pandas
,把他们让我们在说positive
和negative
列?
I have this dataframe like below: 我有这样的数据框如下:
df = pandas.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C' : np.random.randn(8), 'D' : np.random.randn(8)})
Output is as below: 输出如下:
df
A B C D
0 foo one 0.374156 0.319699
1 bar one -0.356339 -0.629649
2 foo two -0.390243 -1.387909
3 bar three -0.783435 -0.959699
4 foo two -1.268622 -0.250871
5 bar two -2.302525 -1.295991
6 foo one -0.968840 1.247675
7 foo three 0.482845 1.004697
I used the below code to get negatives: 我使用下面的代码得到否定:
df['negative'] = df.groupby('A')['C'].apply(lambda x: x[x<0].sum()).reset_index()]
But the problem is when I want to add it to one of dataframe
columns called negative
it gives error: 但问题是,当我想将其添加到名为negative
的dataframe
列之一时,它会给出错误:
ValueError: Wrong number of items passed 2, placement implies 1
Again I know what it says that groupby
has returned more than one column and cannot assign it to df['negatives']
but I don't know how to solve this part of the problem. 我再次知道它说groupby
已经返回多个列并且无法将其分配给df['negatives']
但我不知道如何解决这部分问题。 I need to have positive col too. 我也需要积极的col。
The desired outcome would be: 期望的结果是:
A Positive Negative
0 foo 0.374156 -0.319699
1 bar 0.356339 -0.629649
What is the right solution to the problem? 解决这个问题的正确方法是什么?
In [14]:
df.groupby(df['A'])['C'].agg([('negative' , lambda x : x[x < 0].sum()) , ('positive' , lambda x : x[x > 0].sum())])
Out[14]:
negative positive
A
bar -1.418788 2.603452
foo -0.504695 2.880512
You may groupby
on A
and df['C'] > 0
, and unstack
the result: 你可能groupby
上A
和df['C'] > 0
,和unstack
结果:
>>> right = df.groupby(['A', df['C'] > 0])['C'].sum().unstack()
>>> right = right.rename(columns={True:'positive', False:'negative'})
>>> right
C negative positive
A
bar -3.4423 NaN
foo -2.6277 0.857
The NaN
value is because all the A == bar
rows have negative value for C
. NaN
值是因为所有A == bar
行都具有C
负值。
if you want to add these to the original frame corresponding to values of groupby
key, ie A
, it would require a left join
: 如果你想将这些添加到对应于groupby
键值的原始帧,即A
,则需要左join
:
>>> df.join(right, on='A', how='left')
A B C D negative positive
0 foo one 0.3742 0.3197 -2.6277 0.857
1 bar one -0.3563 -0.6296 -3.4423 NaN
2 foo two -0.3902 -1.3879 -2.6277 0.857
3 bar three -0.7834 -0.9597 -3.4423 NaN
4 foo two -1.2686 -0.2509 -2.6277 0.857
5 bar two -2.3025 -1.2960 -3.4423 NaN
6 foo one -0.9688 1.2477 -2.6277 0.857
7 foo three 0.4828 1.0047 -2.6277 0.857
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.