![](/img/trans.png)
[英]How to sum values in a column associated with duplicates in another column in R?
[英]How do I split values into equal ranges in one column and sum the associated value of another column in R?
我有一个名为Cust_Amount
的数据Cust_Amount
,如下所示:
Age Amount_Spent
25 20
43 15
32 27
37 10
45 17
29 10
我想将其分解成相等大小的年龄段,并总结出每个年龄段的花费金额,如下所示:
Age_Group Total_Amount
20-30 30
30-40 37
40-50 32
我们可以使用cut
来对“年龄”进行分组,并根据分组变量获取“ Amount_Spent” sum
。
library(data.table)
setDT(df1)[,.(Total_Amount = sum(Amount_Spent)) ,
by = .(Age_Group = cut(Age, breaks = c(20, 30, 40, 50)))]
或与dplyr
library(dplyr)
df1 %>%
group_by(Age_Group = cut(Age, breaks = c(20, 30, 40, 50))) %>%
summarise(Total_Amount = sum(Amount_Spent))
# Age_Group Total_Amount
# <fctr> <int>
#1 (20,30] 30
#2 (30,40] 37
#3 (40,50] 32
这是一个使用cut
和aggregate
,然后使用setNames
命名结果列的基本解决方案:
mydf$Age_Group <- cut(mydf$Age, breaks = seq(20,50, by = 10))
with(mydf, setNames(aggregate(Amount_Spent ~ Age_Group, FUN = sum),
c('Age_Group', 'Total_Spent')))
Age_Group Total_Spent
1 (20,30] 30
2 (30,40] 37
3 (40,50] 32
我们可以进一步使用gsub
来匹配您所需的输出(请注意,我不是正则表达式专家):
mydf$Age_Group <-
gsub(pattern = ',',
x = gsub(pattern = ']',
x = gsub(pattern = '(', x = mydf$Age_Group, replacement = '', fixed = T),
replacement = '', fixed = T),
replacement = ' - ', fixed = T)
with(mydf, setNames(aggregate(Amount_Spent ~ Age_Group, FUN = sum),
c('Age_Group', 'Total_Spent')))
Age_Group Total_Spent
1 20 - 30 30
2 30 - 40 37
3 40 - 50 32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.