简体   繁体   English

熊猫集团全部申请

[英]Pandas GroupBy apply all

I've got an involved situation. 我有一个参与的情况。 Let's say I have the following example dataframe of loans: 假设我有以下示例贷款数据框:

test_df = pd.DataFrame({'name': ['Jack','Jill','John','Jack','Jill'],
                   'date': ['2016-08-08','2016-08-08','2016-08-07','2016-08-08','2016-08-08'],
                   'amount': [1000.0,1500.0,2000.0,2000.0,3000.0],
                   'return_amount': [5000.0,2000.0,3000.0,0.0,0.0],
                   'return_date': ['2017-08-08','2017-08-08','2017-08-07','','2017-08-08']})

test_df.head()

    amount  date        name    return_amount   return_date
0   1000.0  2016-08-08  Jack    5000.0          2017-08-08
1   1500.0  2016-08-08  Jill    2000.0          2017-08-08
2   2000.0  2016-08-07  John    3000.0          2017-08-07
3   2500.0  2016-08-08  Jack    0.0
4   2500.0  2016-08-08  Jill    0.0             2017-08-08

There are a few operations I need to perform after grouping this dataframe by name (grouping loans by person): 按名称对数据帧进行分组后,我需要执行一些操作(按人员分组贷款):

1) return amount needs to allocated proportionally by the sum of amount . 1) return amount需要通过的总和成比例地分配amount

2) If return date is missing for ANY loan for a given person, then all return_dates should be converted to empty strings ''. 2)如果给定人员的任何贷款缺少return date ,则所有return_dates应转换为空字符串''。

I already have a function that I use to allocate the proportional return amount: 我已经有了一个用来分配比例回报金额的函数:

def allocate_return_amount(group):
    loan_amount = group['amount']
    return_amount = group['return_amount']
    sum_amount = loan_amount.sum()
    sum_return_amount = return_amount.sum()
    group['allocated_return_amount'] = (loan_amount/sum_amount) * sum_return_amount
    return group

And I use grouped_test_df = grouped_test_df.apply(allocate_return_amount) to apply it. 我使用grouped_test_df = grouped_test_df.apply(allocate_return_amount)来应用它。

What I am struggling with is the second operation I need to perform, checking if any of the loans to a person are missing a return_date , and if so, changing all return_dates for that person to ''. 我正在努力的是我需要执行的第二个操作,检查一个人的任何贷款是否缺少return_date ,如果是,则将该人的所有return_dates更改为''。

I've found GroupBy.all in the pandas documentation , but I haven't figured out how to use it yet, anyone with experience with this? 我在pandas文档中找到了GroupBy.all,但我还没有弄清楚如何使用它,任何有此经验的人?

Since this example might be a bit hard to follow, here's my ideal output for this example: 由于此示例可能有点难以理解,因此这是此示例的理想输出:

ideal_test_df.head()

    amount  date        name    return_amount   return_date
0   1000.0  2016-08-08  Jack    0.0             ''
1   1500.0  2016-08-08  Jill    666.66          2017-08-08
2   2000.0  2016-08-07  John    3000.0          2017-08-07
3   2500.0  2016-08-08  Jack    0.0             ''
4   2500.0  2016-08-08  Jill    1333.33         2017-08-08

Hopefully this makes sense, and thank you in advance to any pandas expert who takes the time to help me out! 希望这是有道理的,并提前感谢任何花时间帮助我的熊猫专家!

You can do it by iterating through the groups, testing the condition using any , then setting back to the original dataframe using loc : 您可以通过遍历组,使用any测试条件,然后使用loc设置回原始数据框来完成此操作:

test_df = pd.DataFrame({'name': ['Jack','Jill','John','Jack','Jill'],
                   'date': ['2016-08-08','2016-08-08','2016-08-07','2016-08-08','2016-08-08'],
                   'amount': [1000.0,1500.0,2000.0,2000.0,3000.0],
                   'return_amount': [5000.0,2000.0,3000.0,0.0,0.0],
                   'return_date': ['2017-08-08','2017-08-08','2017-08-07','','2017-08-08']})

grouped = test_df.groupby('name')

for name, group in grouped:
    if any(group['return_date'] == ''):
        test_df.loc[group.index,'return_date'] = ''

And if you want to reset return_amount also, and don't mind the additional overhead, just add this line right after: 如果你想重置return_amount ,并且不介意额外的开销,只需在此之后添加以下行:

test_df.loc[group.index, 'return_amount'] = 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM