简体   繁体   English

如何用两个数据框汇总数据

[英]How to summarise data with two data frames r

I have two data frames. 我有两个数据框。 The first one has the client's id, name, and address. 第一个具有客户的ID,名称和地址。 The second one has all of your transactions (values, date of purchase, cash or credit card ...). 第二个拥有您的所有交易(价值,购买日期,现金或信用卡...)。

str(data.frame_1)

Classes ‘data.table’ and 'data.frame':  201917 obs. of  5 variables:
 $ clie_id           : chr  "C_ID_97" "C_ID_3f" "C_ID_dd" "C_ID_11" ...
 $ address_1         : int  5 4 2 4 1 4 3 3 2 2 ...
 $ salary            : int  2 1 2 3 3 2 2 2 1 2 ...
 $ gender            : int  1 0 0 0 0 0 1 1 0 0 ...
 $ have_kids         : num  -0.82 0.393 0.688 0.142 -0.16 ...


str(data.frame_2)

 $ clie_id             : chr  "C_ID_00007093c1" "C_ID_00007093c1" "C_ID_00007093c1" "C_ID_00007093c1" ...
 $ city                : int  -1 -1 -1 -1 76 76 76 76 76 244 ...
 $ purchase_date       : Date, format: "2012-06-14" "2013-08-01" "2013-09-08" "2013-10-28" ...
 $ state               : int  -1 -1 -1 -1 2 2 2 2 2 2 ...
 $ sector              : int  8 8 8 8 33 33 33 33 1 34 ...
 $ category            : chr  "Y" "Y" "Y" "Y" ...
 $ purchase_amount     : num  -0.729 -0.709 -0.721 -0.672 -0.672 ...

Variables that I need to add in the date frame 1: oldest date, lower purchase value, higher purchase value, average value of purchases, quantity of purchases (in this case would be the number of lines of each id in the second data frame). 我需要在日期框架中添加的变量1:最旧的日期,较低的购买价值,较高的购买价值,购买平均值,购买数量(在这种情况下,将是第二个数据框中每个ID的行数) 。

I tried to create a third date frame to then merge the columns of the first date frame with that of the third date frame using clie_id as reference. 我尝试创建第三个日期框架,然后使用clie_id作为参考将第一个日期框架的列与第三个日期框架的列合并。 So I did this: 所以我这样做:

total_data_summarise_by_id <- data.frame_2 %>% 
                                  group_by(clie_id) %>%
                                  summarise(first_date = min(purchase_date),
                                            min_purchase_amount = min(purchase_amount),
                                            max_purchase_amount = max(purchase_amount),
                                            mean_purchase_amount = mean(purchase_amount))

However, the R returned only one answer line. 但是,R仅返回了一条回答行。 He did not summarize for each id. 他没有总结每个ID。

How can I do this join? 我该如何加入?

Success 成功

total_data_summarise_by_id <- data.frame_2 %>% 
                                  group_by(clie_id) %>%
                                  summarise(first_date = min(purchase_date),
                                            min_purchase_amount = min(purchase_amount),
                                            max_purchase_amount = max(purchase_amount),
                                            mean_purchase_amount = mean(purchase_amount)),
                                            total = n())

Thanks a lot for the help 非常感谢您的帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM