Suppose we have this tibble:
group item
x 1
x 2
x 2
y 3
z 2
x 2
x 2
z 1
I want to perform a group_by by group. However, I'd rather group only by the elements that are adjacent. For example, in my case, I'd have three 'x' groups, summing 'item' elements. The result would be something like:
group item
x 5
y 3
z 2
x 4
z 1
I know how to solve this problem using 'for' loops. However, this is not fast and doesn't sound straightforward. I'd rather use some dplyr or tidyverse function with an easy logic.
This question is not duplicated. I know there's already a question about rle in SO, but my question was more general than that. I asked for general solutions.
If you want to use only base R + tidyverse, this code exactly replicates your desired results
mydf <- tibble(group = c("x", "x", "x", "y", "z", "x", "x", "z"),
item = c(1, 2, 2, 3, 2, 2, 2, 1))
mydf
# A tibble: 8 × 2
group item
<chr> <dbl>
1 x 1
2 x 2
3 x 2
4 y 3
5 z 2
6 x 2
7 x 2
8 z 1
runs <- rle(mydf$group)
mydf %>%
mutate(run_id = rep(seq_along(runs$lengths), runs$lengths)) %>%
group_by(group, run_id) %>%
summarise(item = sum(item)) %>%
arrange(run_id) %>%
select(-run_id)
Source: local data frame [5 x 2]
Groups: group [3]
group item
<chr> <dbl>
1 x 5
2 y 3
3 z 2
4 x 4
5 z 1
You can construct group identifiers with rle
, but the easier route is to just use data.table::rleid
, which does it for you:
library(dplyr)
df %>%
group_by(group,
group_run = data.table::rleid(group)) %>%
summarise_all(sum)
#> # A tibble: 5 x 3
#> # Groups: group [?]
#> group group_run item
#> <fctr> <int> <int>
#> 1 x 1 5
#> 2 x 4 4
#> 3 y 2 3
#> 4 z 3 2
#> 5 z 5 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.