简体   繁体   中英

How to perform a group_by with elements that are contiguous in R and dplyr

Suppose we have this tibble:

 group item
 x     1
 x     2
 x     2
 y     3
 z     2
 x     2
 x     2
 z     1

I want to perform a group_by by group. However, I'd rather group only by the elements that are adjacent. For example, in my case, I'd have three 'x' groups, summing 'item' elements. The result would be something like:

group item
x 5
y 3
z 2
x 4
z 1

I know how to solve this problem using 'for' loops. However, this is not fast and doesn't sound straightforward. I'd rather use some dplyr or tidyverse function with an easy logic.

This question is not duplicated. I know there's already a question about rle in SO, but my question was more general than that. I asked for general solutions.

If you want to use only base R + tidyverse, this code exactly replicates your desired results

mydf <- tibble(group = c("x", "x", "x", "y", "z", "x", "x", "z"), 
                   item = c(1, 2, 2, 3, 2, 2, 2, 1))

mydf

# A tibble: 8 × 2
  group  item
  <chr> <dbl>
1     x     1
2     x     2
3     x     2
4     y     3
5     z     2
6     x     2
7     x     2
8     z     1

runs <- rle(mydf$group)

mydf %>% 
  mutate(run_id = rep(seq_along(runs$lengths), runs$lengths)) %>% 
  group_by(group, run_id) %>% 
  summarise(item = sum(item)) %>% 
  arrange(run_id) %>% 
  select(-run_id) 

Source: local data frame [5 x 2]
Groups: group [3]

  group  item
  <chr> <dbl>
1     x     5
2     y     3
3     z     2
4     x     4
5     z     1

You can construct group identifiers with rle , but the easier route is to just use data.table::rleid , which does it for you:

library(dplyr)

df %>% 
    group_by(group, 
             group_run = data.table::rleid(group)) %>% 
    summarise_all(sum)
#> # A tibble: 5 x 3
#> # Groups:   group [?]
#>    group group_run  item
#>   <fctr>     <int> <int>
#> 1      x         1     5
#> 2      x         4     4
#> 3      y         2     3
#> 4      z         3     2
#> 5      z         5     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM