[英]Combining 2 groupby outputs with lambda using pandas python
Table(df):表(df):
customer_id Order_date
1 2015-01-16
1 2015-01-19
2 2014-12-21
2 2015-01-10
1 2015-01-10
3 2018-01-18
3 2017-03-04
4 2019-11-05
4 2010-01-01
3 2019-02-03
3 2019-01-01
3 2018-01-01
Output I want:我想要的输出:
Code to extract number of order_dates (where there were at least 3 transactions by a person) using groupby for each customer id and also I need say the most recent transaction dates.使用 groupby 为每个客户 ID 提取 order_dates 数量(一个人至少进行 3 次交易)的代码,我还需要说明最近的交易日期。
Customer_id No_order_date Most recent order date
1 3 2015-01-19
3 5 2019-02-03
Code tried so far:到目前为止尝试过的代码:
freq = 3
df.groupby('customer_id')['order_date'].nunique().loc[lambda x:
x>=freq].reset_index().rename(columns={'order_date':'No_Order_Dates'})
Customer_id No_Order_Dates
1 3
3 5
df.groupby('customer_id')['order_date'].max().reset_index().rename(columns=
{'order_date':'Most recent order Date'})
Customer_id Most recent order date
1 2015-01-19
3 2019-02-03
How can I combine the two groupby outputs?如何组合两个 groupby 输出? I need both in a single table (is there a way to join without using concatenate or merge or will i have to use concatenate/merge only)我需要在一个表中同时使用(有没有办法在不使用连接或合并的情况下加入,或者我只需要使用连接/合并)
You can use the same named aggregation with .loc[]
after the groupby:您可以在 groupby 之后使用与.loc[]
相同的命名聚合:
(df.groupby('customer_id').agg(No_transactions=('Order_date','nunique'),
Most_recent_order_date = ('Order_date', 'max'))
.loc[lambda x: x['No_transactions']>=3])
Or query:或查询:
(df.groupby('customer_id').agg(No_transactions=('Order_date','nunique'),
Most_recent_order_date = ('Order_date', 'max'))
.query("No_transactions>=3"))
No_transactions Most_recent_order_date
customer_id
1 3 2015-01-19
3 5 2019-02-03
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.