简体   繁体   English

如何获取最近的订单日期?

[英]How to get most recent order date?

I am doing an external exercise where I have a set of data of customers' purchases.我正在做一个外部练习,我有一组客户购买的数据。

I have the following columns: customer_id , date , gender , value (purchase value).我有以下列: customer_iddategendervalue (购买价值)。 One part of the exercise is to create a new column named most_recent_order_date .练习的一部分是创建一个名为most_recent_order_date的新列。 How should I go about accomplishing this?我应该如何着手完成这个?

I tried我试过

df['most_recent_order_date']=df.sort_values('customer_id',ascending=False)['date']

but this only returns the dates of all purchases in ascending order.但这只会按升序返回所有购买的日期。 I need it to be customer_id specific since a customer_id might have multiple purchases.我需要它是特定于customer_id的,因为customer_id可能有多次购买。

Another part of the exercise is to create a order_count column which is what the last column is.练习的另一部分是创建一个order_count列,这是最后一列。

data= pd.read_csv('screening_exercise_orders_v201810.csv')
df=pd.DataFrame(data)

df['most_recent_order_date']= 'default value'
df['order_count']= 'default value'

df['date'] = pd.to_datetime(df['date'])
df['most_recent_order_date']=df.sort_values('customer_id',ascending=False)['date']
df['order_count']= df.groupby(['customer_id']).transform('count')
df.head(10)

I expect something like:我期待这样的事情:

0   1000    0   2017-01-01 00:11:31 198.50  1   2017-02-10 00:11:   1
1   1001    0   2017-01-01 00:29:56 338.00  1   2017-11-01 00:29:56 1
2   1002    1   2017-01-01 01:30:31 733.00  1   2017-06-11 01:30:31 3
3   1003    1   2017-01-01 01:34:22 772.00  1   2017-05-14 01:34:22 4
4   1004    0   2017-01-01 03:11:54 508.00  1   2017-01-01 03:11:54 1

But what I actually get is:但我实际得到的是:

0   1000    0   2017-01-01 00:11:31 198.50  1   2017-01-01 00:11:31 1
1   1001    0   2017-01-01 00:29:56 338.00  1   2017-01-01 00:29:56 1
2   1002    1   2017-01-01 01:30:31 733.00  1   2017-01-01 01:30:31 3
3   1003    1   2017-01-01 01:34:22 772.00  1   2017-01-01 01:34:22 4
4   1004    0   2017-01-01 03:11:54 508.00  1   2017-01-01 03:11:54 1

For most recent date, use groupby.transform with max :对于最近的日期,使用groupby.transformmax

df['date'] = pd.to_datetime(df['date'])
df['most_recent_date'] = df.groupby(['customer_id'])['date'].transform('max')

For count use groupby.cumcount :对于计数使用groupby.cumcount

df['order_count'] = df.groupby(['customer_id']).cumcount().add(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM