[英]summing rows by combining levels R
I would like to ask you a challenge on this data frame.我想问你一个关于这个数据框的挑战。 I do not know how to aggregate() rows in summing by combining two levels of "year factor".我不知道如何通过结合两个级别的“年份因子”来汇总()行。 Find a model of my df.找到我的 df 的 model。
placette year SP1 SP2 ... SPX
1 1 2013 43 4 ...
2 2 2013 30 0 ...
3 3 2013 23 3 ...
4 1 2014 0 0 ...
5 2 2014 2 2 ...
6 3 2014 5 0
7 1 2015 16 3
8 2 2015 16 1
9 3 2015 20 0
10 1 2016 54 4
11 2 2016 51 2
12 3 2016 51 0
I need to sum values of variables SP1, SP2, to SPX by period.我需要按周期将变量 SP1、SP2 和 SPX 的值相加。 For example, for 2 years (2013+2014) and three years (2013+2014+2015)...例如,2年(2013+2014)和3年(2013+2014+2015)...
I will expect this:我会期待这个:
placette period SP1 SP2 ... SPX
1 1 2(2013+2014) 43 4 ... ...
2 1 3(13+14+15) 59 7 ... ...
3 1 4 (13+14+15+16) 113 11 ... ...
4 2 2 (13+14) 32 2 ... ...
5 ... ... ... ... ... ... ...
in keeping my factor "site".保持我的因素“网站”。
I am working on R and I want to resolve it.我正在研究 R,我想解决它。
Thank you so much to help me.非常感谢你帮助我。
Regards, Thomas.问候,托马斯。
Using dplyr
, we arrange
by 'placette', 'year', grouped by 'placette', get the cumsum
of variables whose names starts_with
'SP'使用dplyr
,我们按'placette','year' arrange
,按'placette'分组,得到名称cumsum
'的变量的starts_with
library(dplyr)
data %>%
arrange(placette, year) %>%
group_by(placette) %>%
mutate_at(vars(starts_with("SP")), cumsum)
# A tibble: 12 x 4
# Groups: placette [3]
# placette year SP1 SP2
# <int> <int> <int> <int>
# 1 1 2013 43 4
# 2 1 2014 43 4
# 3 1 2015 59 7
# 4 1 2016 113 11
# 5 2 2013 30 0
# 6 2 2014 32 2
# 7 2 2015 48 3
# 8 2 2016 99 5
# 9 3 2013 23 3
#10 3 2014 28 3
#11 3 2015 48 3
#12 3 2016 99 3
data <- structure(list(placette = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), year = c(2013L, 2013L, 2013L, 2014L, 2014L, 2014L,
2015L, 2015L, 2015L, 2016L, 2016L, 2016L), SP1 = c(43L, 30L,
23L, 0L, 2L, 5L, 16L, 16L, 20L, 54L, 51L, 51L), SP2 = c(4L, 0L,
3L, 0L, 2L, 0L, 3L, 1L, 0L, 4L, 2L, 0L)), class = "data.frame",
row.names = c(NA,
-12L))
Here is an approach using data.table.这是使用 data.table 的方法。
library(data.table)
setDT(data)[order(year),][,
lapply(.SD,cumsum),
by="placette",
.SDcols = setdiff(names(data),c("placette","year"))][
,N.Years := 1:.N,by="placette"][]
# placette N.Years SP1 SP2
# 1: 1 1 43 4
# 2: 1 2 43 4
# 3: 1 3 59 7
# 4: 1 4 113 11
# 5: 2 1 30 0
# 6: 2 2 32 2
# 7: 2 3 48 3
# 8: 2 4 99 5
# 9: 3 1 23 3
#10: 3 2 28 3
#11: 3 3 48 3
#12: 3 4 99 3
Data数据
data <- structure(list(placette = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), year = c(2013L, 2013L, 2013L, 2014L, 2014L, 2014L,
2015L, 2015L, 2015L, 2016L, 2016L, 2016L), SP1 = c(43L, 30L,
23L, 0L, 2L, 5L, 16L, 16L, 20L, 54L, 51L, 51L), SP2 = c(4L, 0L,
3L, 0L, 2L, 0L, 3L, 1L, 0L, 4L, 2L, 0L)), class = "data.frame", row.names = c(NA,
-12L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.