[英]R: Sum of columns based on range of column name
我有一个数据框,其中包含每年的值数:
V1 11 14 15 17 19 20 21 22 24 26 28 29 3 31 32 35 37 39 47 5 51 54 55 6 63 66 7 9 91
2008 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
2013 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2015 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 2 0 1 0 0 0 0 0 0 0
2016 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2017 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2018 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0
2019 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
2020 0 0 0 0 1 2 1 0 0 0 0 0 0 2 1 1 0 0 0 0 1 0 0 0 1 0 2 1 0
2021 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 3 0 1
2022 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
我需要按 header 值的范围对数字求和,因此结果应如下所示:
V1 0-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
2008 0 0 0 0 0 0 0 0 0 0
2011 0 0 0 0 0 0 1 0 0 0
2013 1 0 0 0 0 0 0 0 0 0
2014 1 0 0 0 0 0 0 0 0 0
2015 2 0 1 0 1 1 0 0 0 0
2016 1 0 2 1 0 0 0 0 0 0
2017 1 0 2 0 0 0 0 0 0 0
2018 1 0 3 2 0 1 1 0 0 0
2019 1 1 0 0 0 1 0 0 0 0
2020 3 1 1 4 0 1 1 0 0 0
2021 4 2 1 1 0 0 0 0 0 1
2022 1 2 0 0 0 1 0 0 0 0
我尝试了这段代码,但它按 header 值的第二个数字对数字求和,而不是按我需要的第一个数字求和
xtabs(values~., transform(cbind(df[1],stack(df[-1])), ind = sub('.','',ind)))
请问我应该如何得到我需要的结果表? 提前致谢。
将数据转换为长格式
library(reshape2)
d.m = melt(dat, id.vars=1)
现在对值进行分组:
d.m$group = cut(d.m$variable, seq(0,100,10))
总计的
dm2 = aggregate(d.m$value, by=list(V1=d.m$X, group=d.m$group), FUN=sum)
用dcast
:
dcast(dm2, V1~group)
V1 (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (90,100]
1 2008 0 1 0 0 0 0 0 0
2 2011 0 0 0 0 0 0 1 0
3 2013 1 0 0 0 0 0 0 0
4 2014 1 0 0 0 0 0 0 0
5 2015 2 0 1 0 1 1 0 0
6 2016 1 0 2 1 0 0 0 0
7 2017 1 0 2 0 0 0 0 0
8 2018 1 0 3 2 0 1 1 0
9 2019 1 1 0 0 0 1 0 0
10 2020 3 3 1 4 0 1 1 0
11 2021 4 2 1 1 0 0 0 1
12 2022 1 2 0 0 0 1 0 0
使用dplyr
可能有一种更清洁的方法
你可以做:
library(tidyverse)
d <- read.table(text = " V1 11 14 15 17 19 20 21 22 24 26 28 29 3 31 32 35 37 39 47 5 51 54 55 6 63 66 7 9 91
2008 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
2013 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2015 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 2 0 1 0 0 0 0 0 0 0
2016 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2017 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2018 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0
2019 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
2020 0 0 0 0 1 2 1 0 0 0 0 0 0 2 1 1 0 0 0 0 1 0 0 0 1 0 2 1 0
2021 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 3 0 1
2022 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0",
header = TRUE, check.names = FALSE)
d <- d %>%
pivot_longer(cols = -V1, names_to = "category", values_to = "val") %>%
mutate(new_cat = cut(as.numeric(category), seq(0, 100, by = 10)))
xtabs(val ~ V1 + new_cat, d)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.