繁体   English   中英

如何在一列中将值分成相等的范围,并在R中将另一列的关联值求和?

[英]How do I split values into equal ranges in one column and sum the associated value of another column in R?

我有一个名为Cust_Amount的数据Cust_Amount ,如下所示:

Age    Amount_Spent
25       20
43       15
32       27
37       10
45       17
29       10

我想将其分解成相等大小的年龄段,并总结出每个年龄段的花费金额,如下所示:

Age_Group  Total_Amount
 20-30     30
 30-40     37
 40-50     32

我们可以使用cut来对“年龄”进行分组,并根据分组变量获取“ Amount_Spent” sum

library(data.table)
setDT(df1)[,.(Total_Amount = sum(Amount_Spent)) , 
       by = .(Age_Group = cut(Age, breaks = c(20, 30, 40, 50)))]

或与dplyr

library(dplyr)
df1 %>%
    group_by(Age_Group = cut(Age, breaks = c(20, 30, 40, 50))) %>%
    summarise(Total_Amount = sum(Amount_Spent))
#     Age_Group Total_Amount
#      <fctr>        <int>
#1   (20,30]           30
#2   (30,40]           37
#3   (40,50]           32

这是一个使用cutaggregate ,然后使用setNames命名结果列的基本解决方案:

mydf$Age_Group <- cut(mydf$Age, breaks = seq(20,50, by = 10))
with(mydf, setNames(aggregate(Amount_Spent ~ Age_Group, FUN = sum), 
                    c('Age_Group', 'Total_Spent')))

  Age_Group Total_Spent
1   (20,30]          30
2   (30,40]          37
3   (40,50]          32

我们可以进一步使用gsub来匹配您所需的输出(请注意,我不是正则表达式专家):

mydf$Age_Group <- 
    gsub(pattern = ',',
     x = gsub(pattern = ']', 
     x = gsub(pattern = '(', x = mydf$Age_Group, replacement = '', fixed = T),
     replacement = '', fixed = T),
     replacement = ' - ', fixed = T)
with(mydf, setNames(aggregate(Amount_Spent ~ Age_Group, FUN = sum), 
                  c('Age_Group', 'Total_Spent')))

  Age_Group Total_Spent
1   20 - 30          30
2   30 - 40          37
3   40 - 50          32

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM