简体   繁体   English

Pandas Groupby Lambda函数具有多个条件/列

[英]Pandas Groupby Lambda function multiple conditions/columns

I am trying to create a new column that groups df by Deal and Month , and applies a percentage (9%) to the Amount column. 我正在尝试创建一个按DealMonth df分组的新列,并将百分比(9%)应用于“ Amount列。 If all the Amount values for a particular Deal in a particular month add up to 20,000 then apply the percentage to the Amount ; 如果在特定月份特定Deal所有AmountAmount为20,000,则将百分比应用于Amount ; otherwise, if the TYPE is MONTHLY , and the individual Amount is at least 1500, apply the percentage to the Amount ; 否则,如果TYPEMONTHLY ,并且单个Amount至少为1500,则将百分比应用于Amount failing that, multiply by 0. 否则,请乘以0。

df.groupby(['Deal', 'Month'])["Amount"].apply(
    lambda x: x.sum() * 0.09 if x.sum() >= 20000 else (
        x * 0.09 if x >= 1500 and x['TYPE'] == 'MONTHLY' else 0
    )
)

This is what I tried but keep getting errors such as ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 这是我尝试过的方法,但始终会出现诸如ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().错误ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). OR KeyError: ('TYPE', u'occurred at index 0') etc. I've tried using transform instead of apply too. KeyError: ('TYPE', u'occurred at index 0')等。我尝试使用transform而不是apply。 Would greatly appreciate any help. 将不胜感激任何帮助。

This if what my grouped DF looks like + Desired Column 如果我的分组DF看起来像+所需列

   Deal         TYPE    Month        Amount   Desired Column
0   Com A   ANNUAL  April   10021.34   0
1   Com A   MONTHLY April   35.86 .    0
2   Com B   MONTHLY April   11150.05   1,003.50
3   Com B   ANNUAL  July    661.65     0
4   Com B   ANNUAL  August  303.63     0
5   Com C   ANNUAL  April   25624.59   2,306.21
6   Com D   ANNUAL  June    27309.26   2,457.83  
7   Com D   ANNUAL  July    0.00       0
8   Com D   ANNUAL  August  0.00       0
9   Com E   ANNUAL  April   10.65      0
10  Com E   MONTHLY May     0.00       0
11  Com E   ANNUAL  May     18716.70   1,684.5
12  Com E   MONTHLY June    0.00       0
13  Com E   ANNUAL  June    606.49     0
14  Com E   MONTHLY July    0.00       0
15  Com E   MONTHLY July    8890.17    800.11
16  Com E   MONTHLY August  4000       0
17  Com E   ANNUAL  August  16000      1,800
18  Com E   ANNUAL  September 2157.34  0
19  Com E   ANNUAL  October 3025.24    0

df DF

You don't need a groupby in this case. 在这种情况下,您不需要groupby There are a couple of ways to do it, conceptually easiest is to first calculate the threshold based on whether it is a monthly amount or an annual one 有两种方法可以做到这一点,从概念上讲,最简单的方法是首先根据是月度金额还是年度金额来计算阈值

df['Threshold'] = (df.TYPE=='ANNUAL')*20000 + (df.TYPE=='MONTHLY')*1500

Then you can calculate the amount based on whether the threshold has been met 然后,您可以根据是否已达到阈值来计算金额

df['Desired Amount'] = (df.Amount>df.Threshold)*0.09*df.Amount

But this works here because you don't have multiple rows for the same deal, month, and type. 但这在这里行得通,因为您没有针对同一笔交易,月份和类型的多个行。 If you did then you would first need the groupby to aggregate by all of these 如果您这样做了,则首先需要groupby将所有这些汇总

df = df.groupby(['Deal','Month','TYPE']).sum()
df.reset_index(inplace=True)

Then you can proceed as above. 然后,您可以按照上述步骤进行操作。

I tried to translate your description into this: 我试图将您的描述翻译为:

df['Sum'] = df.groupby(['Deal','Month'])['Amount'].transform('sum')

df['Desired Column'] = np.where(df['Sum'] > 20000, df['Sum'] * 0.09, np.where((df['Amount'] >= 1500) & (df['TYPE'] == 'MONTHLY'), df['Amount'] * 0.09, 0))

Though I found some differences between the result I generated and the "Desired Column" you posted, eg in row 16, it's monthly and has amount greater than 1500, so the result should have been 0.09 * 4000 = 360, not sure how you got 0. I guess either you made a mistake during manual calculation or probably I misunderstood your description, please feel free to explain it so that I can update my script, but I guess the general idea should have solved your problem. 尽管我发现生成的结果与您发布的“所需列”之间存在一些差异,例如在第16行中,它是每月一次且金额大于1500,所以结果应该是0.09 * 4000 = 360,不确定如何获得0.我想您可能是在手动计算过程中犯了一个错误,或者可能是我误解了您的描述,请随时进行解释,以便我可以更新脚本,但是我想一般的想法应该可以解决您的问题。

PS the result df after running my script 在运行我的脚本后PS结果df

   Deal     TYPE      Month    Amount       Sum  Desired Column
0     A   ANNUAL      April  10021.34  10057.20          0.0000
1     A  MONTHLY      April     35.86  10057.20          0.0000
2     B  MONTHLY      April  11150.05  11150.05       1003.5045
3     B   ANNUAL       July    661.65    661.65          0.0000
4     B   ANNUAL     August    303.63    303.63          0.0000
5     C   ANNUAL      April  25624.59  25624.59       2306.2131
6     D   ANNUAL       June  27309.26  27309.26       2457.8334
7     D   ANNUAL       July      0.00      0.00          0.0000
8     D   ANNUAL     August      0.00      0.00          0.0000
9     E   ANNUAL      April     10.65     10.65          0.0000
10    E  MONTHLY        May      0.00  18716.70          0.0000
11    E   ANNUAL        May  18716.70  18716.70          0.0000
12    E  MONTHLY       June      0.00    606.49          0.0000
13    E   ANNUAL       June    606.49    606.49          0.0000
14    E  MONTHLY       July      0.00   8890.17          0.0000
15    E  MONTHLY       July   8890.17   8890.17        800.1153
16    E  MONTHLY     August   4000.00  18000.00        360.0000
17    E   ANNUAL     August  14000.00  18000.00          0.0000
18    E   ANNUAL  September   2157.34   2157.34          0.0000
19    E   ANNUAL    October   3025.24   3025.24          0.0000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM