[英]Pandas: concat multiple new columns to an existing data-frame based on the value of one of the columns
[英]Feature engineered multiple columns of pandas data frame (add new columns based on existing ones)
抱歉太天真了。 我有以下数据,我想对一些列进行特征设计。 但我不知道如何在同一个数据帧上执行多项操作。 值得一提的是,我为每个客户提供了多个条目。 所以,最后,我想要汇总值(即每个客户 1 个条目)
customer_id purchase_amount date_of_purchase days_since
0 760 25.0 06-11-2009 2395
1 860 50.0 09-28-2012 1190
2 1200 100.0 10-25-2005 3720
3 1420 50.0 09-07-2009 2307
4 1940 70.0 01-25-2013 1071
customer_purchases['amount'] = customer_purchases.groupby(['customer_id'])['purchase_amount'].agg('min')
customer_purchases['frequency'] = customer_purchases.groupby(['customer_id'])['days_since'].agg('count')
customer_purchases['recency'] = customer_purchases.groupby(['customer_id'])['days_since'].agg('mean')
customer_id purchase_amount date_of_purchase days_since recency frequency amount first_purchase
0 760 25.0 06-11-2009 2395 1273 5 38.000000 3293
1 860 50.0 09-28-2012 1190 118 10 54.000000 3744
2 1200 100.0 10-25-2005 3720 1192 9 102.777778 3907
3 1420 50.0 09-07-2009 2307 142 34 51.029412 3825
4 1940 70.0 01-25-2013 1071 686 10 47.500000 3984
一种解决方案:
我可以为每个需要的列考虑 3 个单独的操作,然后将所有这些操作连接起来以获得一个新的数据框。 我知道这只是为了我需要的东西而没有效率
df_1 = customer_purchases.groupby('customer_id', sort = False)["purchase_amount"].min().reset_index(name ='amount')
df_2 = customer_purchases.groupby('customer_id', sort = False)["days_since"].count().reset_index(name ='frequency')
df_3 = customer_purchases.groupby('customer_id', sort = False)["days_since"].mean().reset_index(name ='recency')
但是,要么我收到错误,要么没有正确数据的数据框。 我们将不胜感激您的帮助和耐心。
最后我找到了解决方案
def f(x):
recency = x['days_since'].min()
frequency = x['days_since'].count()
monetary_value = x['purchase_amount'].mean()
c = ['recency','frequency, monetary_value']
return pd.Series([recency, frequency, monetary_value], index =c )
df1 = customer_purchases.groupby('customer_id').apply(f)
print (df1)
改为使用
customer_purchases.groupby('customer_id')['purchase_amount'].transform(lambda x : x.min())
Transform 将为原始 dataframe 的每一行提供 output 而不是分组行,如使用 agg
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.