Sorry being naive. I have the following data and I want to feature engineered some columns. But I don't have how I can do multiple operations on the same data frame. One thing to mention I have multiple entries for each customer. So, in the end, I want aggregated values (ie 1 entry for each customer)
customer_id purchase_amount date_of_purchase days_since
0 760 25.0 06-11-2009 2395
1 860 50.0 09-28-2012 1190
2 1200 100.0 10-25-2005 3720
3 1420 50.0 09-07-2009 2307
4 1940 70.0 01-25-2013 1071
customer_purchases['amount'] = customer_purchases.groupby(['customer_id'])['purchase_amount'].agg('min')
customer_purchases['frequency'] = customer_purchases.groupby(['customer_id'])['days_since'].agg('count')
customer_purchases['recency'] = customer_purchases.groupby(['customer_id'])['days_since'].agg('mean')
customer_id purchase_amount date_of_purchase days_since recency frequency amount first_purchase
0 760 25.0 06-11-2009 2395 1273 5 38.000000 3293
1 860 50.0 09-28-2012 1190 118 10 54.000000 3744
2 1200 100.0 10-25-2005 3720 1192 9 102.777778 3907
3 1420 50.0 09-07-2009 2307 142 34 51.029412 3825
4 1940 70.0 01-25-2013 1071 686 10 47.500000 3984
One solution:
I can think of 3 separate operations for each needed column and then join all those to get a new data frame. I know it's not efficient for just sake what I need
df_1 = customer_purchases.groupby('customer_id', sort = False)["purchase_amount"].min().reset_index(name ='amount')
df_2 = customer_purchases.groupby('customer_id', sort = False)["days_since"].count().reset_index(name ='frequency')
df_3 = customer_purchases.groupby('customer_id', sort = False)["days_since"].mean().reset_index(name ='recency')
However, either I get an error or not data frame with correct data. Your help and patience will be appreciated.
finally i found the solution
def f(x):
recency = x['days_since'].min()
frequency = x['days_since'].count()
monetary_value = x['purchase_amount'].mean()
c = ['recency','frequency, monetary_value']
return pd.Series([recency, frequency, monetary_value], index =c )
df1 = customer_purchases.groupby('customer_id').apply(f)
print (df1)
Use instead
customer_purchases.groupby('customer_id')['purchase_amount'].transform(lambda x : x.min())
Transform will give output for each row of original dataframe instead of grouped row as in case of using agg
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.