简体   繁体   English

如果满足条件,则在 Pandas 数据框中进行数学运算

[英]Doing mathematic operations in pandas dataframe if condition is met

I am new to pandas.我是熊猫的新手。

My DataFrame looks like this:我的 DataFrame 看起来像这样:

    a1  b1   c1  d1  e1 
A   10  10   1   2   0   
B   20  20   2   1   1
C   30  30   3   1   0
D   40  40   4   1   1
E   40  40   4   1   2
F   40  40   4   1   1

I want to do math operations only for values where e1 is the same.我只想对e1相同的值进行数学运算。

For example: ( a1A + a1C ) / ( c1A + c1C ) for values where C is the same.例如:( a1A + a1C ) / ( c1A + c1C ) 对于C相同的值。 So I would end up with a dataframe like this:所以我最终会得到一个这样的数据框:

    a1  b1   c1  d1  e1     result
A   10  10   1   2   0      (a1A + a1C) / ( c1A + c1C )
B   20  20   2   1   1      (a1B + a1D+ a1F) / ( c1B + c1D+ c1F )
C   30  30   3   1   0      Do not calculate it because its already calculated
D   40  40   4   1   1      Do not calculate it because its already calculated
E   40  40   4   1   2      (a1E / c1E)
F   40  40   4   1   1      Do not calculate it because its already calculatedcalculated

I do not know how could I apply a condition to the calculations and how would I omit calculations if it has already been calculated.我不知道如何将条件应用于计算,如果已经计算过,我将如何省略计算。

Thank you for your suggestions.谢谢你的建议。

First aggregate sum per groups, then remove duplicates by Series.drop_duplicates and last use Series.map by difference:首先聚合每个组的总和,然后通过Series.drop_duplicates删除重复Series.drop_duplicates ,最后通过差异使用Series.map

s = df.groupby('e1')['a1','c1'].sum() 

df['new'] = df['e1'].drop_duplicates().map(s.a1 / s.c1)
print (df)
   a1  b1  c1  d1  e1   new
A  10  10   1   2   0  10.0
B  20  20   2   1   1  10.0
C  30  30   3   1   0   NaN
D  40  40   4   1   1   NaN
E  40  40   4   1   2  10.0
F  40  40   4   1   1   NaN

Also I think in pandas obviously map by unique values is not necessary, obviously is used GroupBy.transform and added new column filled by mapped data:另外我认为在熊猫中显然不需要按唯一值映射,显然使用了GroupBy.transform并添加了由映射数据填充的新列:

df2 = df.groupby('e1')['a1','c1'].transform('sum')
print (df2)
    a1  c1
A   40   4
B  100  10
C   40   4
D  100  10
E   40   4
F  100  10

df['new'] = df2.a1 / df2.c1
print (df)
   a1  b1  c1  d1  e1   new
A  10  10   1   2   0  10.0
B  20  20   2   1   1  10.0
C  30  30   3   1   0  10.0
D  40  40   4   1   1  10.0
E  40  40   4   1   2  10.0
F  40  40   4   1   1  10.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM