[英]Pandas GroupBy apply all
I've got an involved situation. 我有一个参与的情况。 Let's say I have the following example dataframe of loans: 假设我有以下示例贷款数据框:
test_df = pd.DataFrame({'name': ['Jack','Jill','John','Jack','Jill'],
'date': ['2016-08-08','2016-08-08','2016-08-07','2016-08-08','2016-08-08'],
'amount': [1000.0,1500.0,2000.0,2000.0,3000.0],
'return_amount': [5000.0,2000.0,3000.0,0.0,0.0],
'return_date': ['2017-08-08','2017-08-08','2017-08-07','','2017-08-08']})
test_df.head()
amount date name return_amount return_date
0 1000.0 2016-08-08 Jack 5000.0 2017-08-08
1 1500.0 2016-08-08 Jill 2000.0 2017-08-08
2 2000.0 2016-08-07 John 3000.0 2017-08-07
3 2500.0 2016-08-08 Jack 0.0
4 2500.0 2016-08-08 Jill 0.0 2017-08-08
There are a few operations I need to perform after grouping this dataframe by name (grouping loans by person): 按名称对数据帧进行分组后,我需要执行一些操作(按人员分组贷款):
1) return amount
needs to allocated proportionally by the sum of amount
. 1) return amount
需要通过的总和成比例地分配amount
。
2) If return date
is missing for ANY loan for a given person, then all return_dates should be converted to empty strings ''. 2)如果给定人员的任何贷款缺少return date
,则所有return_dates应转换为空字符串''。
I already have a function that I use to allocate the proportional return amount: 我已经有了一个用来分配比例回报金额的函数:
def allocate_return_amount(group):
loan_amount = group['amount']
return_amount = group['return_amount']
sum_amount = loan_amount.sum()
sum_return_amount = return_amount.sum()
group['allocated_return_amount'] = (loan_amount/sum_amount) * sum_return_amount
return group
And I use grouped_test_df = grouped_test_df.apply(allocate_return_amount)
to apply it. 我使用grouped_test_df = grouped_test_df.apply(allocate_return_amount)
来应用它。
What I am struggling with is the second operation I need to perform, checking if any of the loans to a person are missing a return_date
, and if so, changing all return_dates
for that person to ''. 我正在努力的是我需要执行的第二个操作,检查一个人的任何贷款是否缺少return_date
,如果是,则将该人的所有return_dates
更改为''。
I've found GroupBy.all in the pandas documentation , but I haven't figured out how to use it yet, anyone with experience with this? 我在pandas文档中找到了GroupBy.all,但我还没有弄清楚如何使用它,任何有此经验的人?
Since this example might be a bit hard to follow, here's my ideal output for this example: 由于此示例可能有点难以理解,因此这是此示例的理想输出:
ideal_test_df.head()
amount date name return_amount return_date
0 1000.0 2016-08-08 Jack 0.0 ''
1 1500.0 2016-08-08 Jill 666.66 2017-08-08
2 2000.0 2016-08-07 John 3000.0 2017-08-07
3 2500.0 2016-08-08 Jack 0.0 ''
4 2500.0 2016-08-08 Jill 1333.33 2017-08-08
Hopefully this makes sense, and thank you in advance to any pandas expert who takes the time to help me out! 希望这是有道理的,并提前感谢任何花时间帮助我的熊猫专家!
You can do it by iterating through the groups, testing the condition using any
, then setting back to the original dataframe using loc
: 您可以通过遍历组,使用any
测试条件,然后使用loc
设置回原始数据框来完成此操作:
test_df = pd.DataFrame({'name': ['Jack','Jill','John','Jack','Jill'],
'date': ['2016-08-08','2016-08-08','2016-08-07','2016-08-08','2016-08-08'],
'amount': [1000.0,1500.0,2000.0,2000.0,3000.0],
'return_amount': [5000.0,2000.0,3000.0,0.0,0.0],
'return_date': ['2017-08-08','2017-08-08','2017-08-07','','2017-08-08']})
grouped = test_df.groupby('name')
for name, group in grouped:
if any(group['return_date'] == ''):
test_df.loc[group.index,'return_date'] = ''
And if you want to reset return_amount
also, and don't mind the additional overhead, just add this line right after: 如果你想重置return_amount
,并且不介意额外的开销,只需在此之后添加以下行:
test_df.loc[group.index, 'return_amount'] = 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.