How to perform a group_by with elements that are contiguous in R and dplyr

Question

Suppose we have this tibble:

 group item
 x     1
 x     2
 x     2
 y     3
 z     2
 x     2
 x     2
 z     1

I want to perform a group_by by group. However, I'd rather group only by the elements that are adjacent. For example, in my case, I'd have three 'x' groups, summing 'item' elements. The result would be something like:

group item
x 5
y 3
z 2
x 4
z 1

I know how to solve this problem using 'for' loops. However, this is not fast and doesn't sound straightforward. I'd rather use some dplyr or tidyverse function with an easy logic.

This question is not duplicated. I know there's already a question about rle in SO, but my question was more general than that. I asked for general solutions.

Answer 1

If you want to use only base R + tidyverse, this code exactly replicates your desired results

mydf <- tibble(group = c("x", "x", "x", "y", "z", "x", "x", "z"), 
                   item = c(1, 2, 2, 3, 2, 2, 2, 1))

mydf

# A tibble: 8 × 2
  group  item
  <chr> <dbl>
1     x     1
2     x     2
3     x     2
4     y     3
5     z     2
6     x     2
7     x     2
8     z     1

runs <- rle(mydf$group)

mydf %>% 
  mutate(run_id = rep(seq_along(runs$lengths), runs$lengths)) %>% 
  group_by(group, run_id) %>% 
  summarise(item = sum(item)) %>% 
  arrange(run_id) %>% 
  select(-run_id) 

Source: local data frame [5 x 2]
Groups: group [3]

  group  item
  <chr> <dbl>
1     x     5
2     y     3
3     z     2
4     x     4
5     z     1

Answer 2

You can construct group identifiers with rle , but the easier route is to just use data.table::rleid , which does it for you:

library(dplyr)

df %>% 
    group_by(group, 
             group_run = data.table::rleid(group)) %>% 
    summarise_all(sum)
#> # A tibble: 5 x 3
#> # Groups:   group [?]
#>    group group_run  item
#>   <fctr>     <int> <int>
#> 1      x         1     5
#> 2      x         4     4
#> 3      y         2     3
#> 4      z         3     2
#> 5      z         5     1

How to perform a group_by with elements that are contiguous in R and dplyr

Question

2 answers

solution1
3 ACCPTED 2017-06-21 03:35:01

solution2
2 2017-06-20 15:33:09

How to perform a group_by with elements that are contiguous in R and dplyr

Question

2 answers

solution1 3 ACCPTED 2017-06-21 03:35:01

solution2 2 2017-06-20 15:33:09

solution1
3 ACCPTED 2017-06-21 03:35:01

solution2
2 2017-06-20 15:33:09