[英]Pandas groupby apply is taking too much time
I am having the following code.我有以下代码。
pd.DataFrame({'user_wid': {0: 3305613, 1: 57, 2: 80, 3: 31, 4: 38, 5: 12, 6: 35, 7: 25, 8: 42, 9: 16}, 'user_name': {0: 'Ter', 1: 'Am', 2: 'Wi', 3: 'Ma', 4: 'St', 5: 'Ju', 6: 'De', 7: 'Ri', 8: 'Ab', 9: 'Ti'}, 'user_age': {0: 41, 1: 34, 2: 45, 3: 47, 4: 70, 5: 64, 6: 64, 7: 63, 8: 32, 9: 24}, 'user_gender': {0: 'Male', 1: 'Female', 2: 'Male', 3: 'Male', 4: 'Male', 5: 'Female', 6: 'Female', 7: 'Female', 8: 'Female', 9: 'Female'}, 'sale_date': {0: '2018-05-15', 1: '2020-02-28', 2: '2020-04-02', 3: '2020-05-09', 4: '2020-11-29', 5: '2020-12-14', 6: '2020-04-21', 7: '2020-06-15', 8: '2020-07-03', 9: '2020-08-10'}, 'days_since_first_visit': {0: 426, 1: 0, 2: 0, 3: 8, 4: 126, 5: 283, 6: 0, 7: 189, 8: 158, 9: 270}, 'visit': {0: 4, 1: 1, 2: 1, 3: 2, 4: 4, 5: 3, 6: 1, 7: 2, 8: 4, 9: 2}, 'num_user_visits': {0: 4, 1: 2, 2: 1, 3: 2, 4: 10, 5: 7, 6: 1, 7: 4, 8: 4, 9: 2}, 'product': {0: 13, 1: 2, 2: 2, 3: 2, 4: 5, 5: 5, 6: 1, 7: 8, 8: 5, 9: 4}, 'sale_price': {0: 10.0, 1: 0.0, 2: 41.3, 3: 41.3, 4: 49.95, 5: 74.95, 6: 49.95, 7: 5.0, 8: 0.0, 9: 0.0}, 'whether_member': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}})
def f(x):
d = {}
d['user_name'] = x['user_name'].max()
d['user_age'] = x['user_age'].max()
d['user_gender'] = x['user_gender'].max()
d['last_visit_date'] = x['sale_date'].max()
d['days_since_first_visit'] = x['days_since_first_visit'].max()
d['num_visits_window'] = x['visit'].max()
d['num_visits_total'] = x['num_user_visits'].max()
d['products_used'] = x['product'].max()
d['user_total_sales'] = (x['sale_price'].sum()).round(2)
d['avg_spend_visit'] = (x['sale_price'].sum() / x['visit'].max()).round(2)
d['membership'] = x['whether_member'].max()
return pd.Series(d)
users = xactions.groupby('user_wid').apply(f).reset_index()
It is taking too much time to execute, I want to optimize the following function.执行时间太长,我想优化以下功能。 Any suggestions would be appreciated.
任何建议,将不胜感激。
Thanks in advance.提前致谢。
Try:尝试:
users2 = xactions.groupby("user_wid", as_index=False).agg(
user_name=("user_name", "max"),
user_age=("user_age", "max"),
user_gender=("user_gender", "max"),
last_visit_date=("sale_date", "max"),
days_since_first_visit=("days_since_first_visit", "max"),
num_visits_window=("visit", "max"),
num_visits_total=("num_user_visits", "max"),
products_used=("product", "max"),
user_total_sales=("sale_price", "sum"),
membership=("whether_member", "max"),
)
users2["avg_spend_visit"] = (
users2["user_total_sales"] / users2["num_visits_window"]
).round(2)
print(users2)
Prints:印刷:
user_wid user_name user_age user_gender last_visit_date days_since_first_visit num_visits_window num_visits_total products_used user_total_sales membership avg_spend_visit
0 12 Ju 64 Female 2020-12-14 283 3 7 5 74.95 0 24.98
1 16 Ti 24 Female 2020-08-10 270 2 2 4 0.00 0 0.00
2 25 Ri 63 Female 2020-06-15 189 2 4 8 5.00 0 2.50
3 31 Ma 47 Male 2020-05-09 8 2 2 2 41.30 0 20.65
4 35 De 64 Female 2020-04-21 0 1 1 1 49.95 0 49.95
5 38 St 70 Male 2020-11-29 126 4 10 5 49.95 0 12.49
6 42 Ab 32 Female 2020-07-03 158 4 4 5 0.00 0 0.00
7 57 Am 34 Female 2020-02-28 0 1 2 2 0.00 0 0.00
8 80 Wi 45 Male 2020-04-02 0 1 1 2 41.30 0 41.30
9 3305613 Ter 41 Male 2018-05-15 426 4 4 13 10.00 0 2.50
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.