[英]R - build unique groups based on consecutive rows and factor level
In general, how would I group based on identical factors as long as they come from consecutive rows in the data frame?一般来说,只要它们来自数据框中的连续行,我将如何根据相同的因素进行分组? For example, I would like the desired
good_output
below from test
.例如,我想从
test
下面得到所需的good_output
。
test <- data.frame(time = 1:10, letter = c("a","a","a","b","a","a","a","b","b","b"))
bad_output <- test %>% group_by(letter) %>% summarize(mean_time = mean(time))
bad_output
# A tibble: 2 x 2
letter mean_time
<fct> <dbl>
1 a 4
2 b 7.75
good_output <- data.frame(letter=c("a","b","a","b"), id=c(1,1,2,2), mean_time=c(2,4,6,9))
good_output
letter id mean_time
1 a 1 2
2 b 1 4
3 a 2 6
4 b 2 9
We can do a group by on 'letter' and the run-length-id ( rleid
from data.table
) on the 'letter', summarise
to get the mean
of 'time', create the sequence column with row_number()
and select out the 'grp' column我们可以通过“字母”和“字母”上的运行长度 ID(来自
rleid
的data.table
)进行分组, summarise
以获得“时间”的mean
,使用row_number()
和 select 创建序列列“grp”列
library(dplyr)
library(data.table)
test %>%
group_by(letter, grp = rleid(letter)) %>%
summarise(mean_time = mean(time)) %>%
mutate(id = row_number()) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 3
# letter mean_time id
# <fct> <dbl> <int>
#1 a 2 1
#2 a 6 2
#3 b 4 1
#4 b 9 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.