[英]Pandas Groupby Lambda function multiple conditions/columns
I am trying to create a new column that groups df by Deal
and Month
, and applies a percentage (9%) to the Amount
column. 我正在尝试创建一个按
Deal
和Month
df分组的新列,并将百分比(9%)应用于“ Amount
列。 If all the Amount
values for a particular Deal
in a particular month add up to 20,000 then apply the percentage to the Amount
; 如果在特定月份特定
Deal
所有Amount
值Amount
为20,000,则将百分比应用于Amount
; otherwise, if the TYPE
is MONTHLY
, and the individual Amount
is at least 1500, apply the percentage to the Amount
; 否则,如果
TYPE
为MONTHLY
,并且单个Amount
至少为1500,则将百分比应用于Amount
; failing that, multiply by 0. 否则,请乘以0。
df.groupby(['Deal', 'Month'])["Amount"].apply(
lambda x: x.sum() * 0.09 if x.sum() >= 20000 else (
x * 0.09 if x >= 1500 and x['TYPE'] == 'MONTHLY' else 0
)
)
This is what I tried but keep getting errors such as ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
这是我尝试过的方法,但始终会出现诸如
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
错误ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
OR KeyError: ('TYPE', u'occurred at index 0')
etc. I've tried using transform instead of apply too. 或
KeyError: ('TYPE', u'occurred at index 0')
等。我尝试使用transform而不是apply。 Would greatly appreciate any help. 将不胜感激任何帮助。
This if what my grouped DF looks like + Desired Column 如果我的分组DF看起来像+所需列
Deal TYPE Month Amount Desired Column
0 Com A ANNUAL April 10021.34 0
1 Com A MONTHLY April 35.86 . 0
2 Com B MONTHLY April 11150.05 1,003.50
3 Com B ANNUAL July 661.65 0
4 Com B ANNUAL August 303.63 0
5 Com C ANNUAL April 25624.59 2,306.21
6 Com D ANNUAL June 27309.26 2,457.83
7 Com D ANNUAL July 0.00 0
8 Com D ANNUAL August 0.00 0
9 Com E ANNUAL April 10.65 0
10 Com E MONTHLY May 0.00 0
11 Com E ANNUAL May 18716.70 1,684.5
12 Com E MONTHLY June 0.00 0
13 Com E ANNUAL June 606.49 0
14 Com E MONTHLY July 0.00 0
15 Com E MONTHLY July 8890.17 800.11
16 Com E MONTHLY August 4000 0
17 Com E ANNUAL August 16000 1,800
18 Com E ANNUAL September 2157.34 0
19 Com E ANNUAL October 3025.24 0
You don't need a groupby
in this case. 在这种情况下,您不需要
groupby
。 There are a couple of ways to do it, conceptually easiest is to first calculate the threshold based on whether it is a monthly amount or an annual one 有两种方法可以做到这一点,从概念上讲,最简单的方法是首先根据是月度金额还是年度金额来计算阈值
df['Threshold'] = (df.TYPE=='ANNUAL')*20000 + (df.TYPE=='MONTHLY')*1500
Then you can calculate the amount based on whether the threshold has been met 然后,您可以根据是否已达到阈值来计算金额
df['Desired Amount'] = (df.Amount>df.Threshold)*0.09*df.Amount
But this works here because you don't have multiple rows for the same deal, month, and type. 但这在这里行得通,因为您没有针对同一笔交易,月份和类型的多个行。 If you did then you would first need the groupby to aggregate by all of these
如果您这样做了,则首先需要groupby将所有这些汇总
df = df.groupby(['Deal','Month','TYPE']).sum()
df.reset_index(inplace=True)
Then you can proceed as above. 然后,您可以按照上述步骤进行操作。
I tried to translate your description into this: 我试图将您的描述翻译为:
df['Sum'] = df.groupby(['Deal','Month'])['Amount'].transform('sum')
df['Desired Column'] = np.where(df['Sum'] > 20000, df['Sum'] * 0.09, np.where((df['Amount'] >= 1500) & (df['TYPE'] == 'MONTHLY'), df['Amount'] * 0.09, 0))
Though I found some differences between the result I generated and the "Desired Column" you posted, eg in row 16, it's monthly and has amount greater than 1500, so the result should have been 0.09 * 4000 = 360, not sure how you got 0. I guess either you made a mistake during manual calculation or probably I misunderstood your description, please feel free to explain it so that I can update my script, but I guess the general idea should have solved your problem. 尽管我发现生成的结果与您发布的“所需列”之间存在一些差异,例如在第16行中,它是每月一次且金额大于1500,所以结果应该是0.09 * 4000 = 360,不确定如何获得0.我想您可能是在手动计算过程中犯了一个错误,或者可能是我误解了您的描述,请随时进行解释,以便我可以更新脚本,但是我想一般的想法应该可以解决您的问题。
PS the result df after running my script 在运行我的脚本后PS结果df
Deal TYPE Month Amount Sum Desired Column
0 A ANNUAL April 10021.34 10057.20 0.0000
1 A MONTHLY April 35.86 10057.20 0.0000
2 B MONTHLY April 11150.05 11150.05 1003.5045
3 B ANNUAL July 661.65 661.65 0.0000
4 B ANNUAL August 303.63 303.63 0.0000
5 C ANNUAL April 25624.59 25624.59 2306.2131
6 D ANNUAL June 27309.26 27309.26 2457.8334
7 D ANNUAL July 0.00 0.00 0.0000
8 D ANNUAL August 0.00 0.00 0.0000
9 E ANNUAL April 10.65 10.65 0.0000
10 E MONTHLY May 0.00 18716.70 0.0000
11 E ANNUAL May 18716.70 18716.70 0.0000
12 E MONTHLY June 0.00 606.49 0.0000
13 E ANNUAL June 606.49 606.49 0.0000
14 E MONTHLY July 0.00 8890.17 0.0000
15 E MONTHLY July 8890.17 8890.17 800.1153
16 E MONTHLY August 4000.00 18000.00 360.0000
17 E ANNUAL August 14000.00 18000.00 0.0000
18 E ANNUAL September 2157.34 2157.34 0.0000
19 E ANNUAL October 3025.24 3025.24 0.0000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.