R tidyverse: create groups based on index column

Question

I have this tibble

# Data
set.seed(1)
x <- tibble(values = round(rnorm(20, 10, 10), 0),
            index = c(0,0,1,1,1,0,1,0,1,1,1,1,1,1,0,
                      1,1,0,0,0))
x
#> # A tibble: 20 x 2
#>    values index
#>     <dbl> <dbl>
#>  1      4     0
#>  2     12     0
#>  3      2     1
#>  4     26     1
#>  5     13     1
#>  6      2     0
#>  7     15     1
#>  8     17     0
#>  9     16     1
#> 10      7     1
#> 11     25     1
#> 12     14     1
#> 13      4     1
#> 14    -12     1
#> 15     21     0
#> 16     10     1
#> 17     10     1
#> 18     19     0
#> 19     18     0
#> 20     16     0

I'd like to create groups where the value in the index column are consecutive ones. The final aim is to compute the sum per each group.

This is the expected tibble is someting like:

# A tibble: 20 x 3
   values index group
    <dbl> <dbl> <chr>
 1      4     0 NA   
 2     12     0 NA   
 3      2     1 A    
 4     26     1 A    
 5     13     1 A    
 6      2     0 NA   
 7     15     1 B    
 8     17     0 NA   
 9     16     1 C    
10      7     1 C    
11     25     1 C    
12     14     1 C    
13      4     1 C    
14    -12     1 C    
15     21     0 NA   
16     10     1 D    
17     10     1 D    
18     19     0 NA   
19     18     0 NA   
20     16     0 NA

Thank you in advance for your advice.

Answer 1

You could use cumsum() on runs identified by rle() , replacing the values where index is zero with NA . If there are more than 26 IDs it will need a minor modification.

library(dplyr)

x2 <- x %>%
  mutate(id = LETTERS[replace(with(rle(index),
                                   rep(cumsum(values), lengths)), index == 0, NA)])

Giving:

# A tibble: 20 x 3
   values index id   
    <dbl> <dbl> <chr>
 1      4     0 NA   
 2     12     0 NA   
 3      2     1 A    
 4     26     1 A    
 5     13     1 A    
 6      2     0 NA   
 7     15     1 B    
 8     17     0 NA   
 9     16     1 C    
10      7     1 C    
11     25     1 C    
12     14     1 C    
13      4     1 C    
14    -12     1 C    
15     21     0 NA   
16     10     1 D    
17     10     1 D    
18     19     0 NA   
19     18     0 NA   
20     16     0 NA

To sum the values:

x2 %>%
  group_by(id) %>%
  summarise(sv = sum(values))

# A tibble: 5 x 2
  id       sv
* <chr> <dbl>
1 A        41
2 B        15
3 C        54
4 D        20
5 NA      109

Answer 2

An option with data.table

library(data.table)
setDT(x)[, group :=  LETTERS[as.integer(factor((NA^!index) *rleid(index)))]]
x
#    values index group
# 1:      4     0  <NA>
# 2:     12     0  <NA>
# 3:      2     1     A
# 4:     26     1     A
# 5:     13     1     A
# 6:      2     0  <NA>
# 7:     15     1     B
# 8:     17     0  <NA>
# 9:     16     1     C
#10:      7     1     C
#11:     25     1     C
#12:     14     1     C
#13:      4     1     C
#14:    -12     1     C
#15:     21     0  <NA>
#16:     10     1     D
#17:     10     1     D
#18:     19     0  <NA>
#19:     18     0  <NA>
#20:     16     0  <NA>

Or similar logic in dplyr

library(dplyr)
x %>% 
  mutate(group = LETTERS[as.integer(factor((NA^!index) *rleid(index)))])
# A tibble: 20 x 3
#   values index group
#    <dbl> <dbl> <chr>
# 1      4     0 <NA> 
# 2     12     0 <NA> 
# 3      2     1 A    
# 4     26     1 A    
# 5     13     1 A    
# 6      2     0 <NA> 
# 7     15     1 B    
# 8     17     0 <NA> 
# 9     16     1 C    
#10      7     1 C    
#11     25     1 C    
#12     14     1 C    
#13      4     1 C    
#14    -12     1 C    
#15     21     0 <NA> 
#16     10     1 D    
#17     10     1 D    
#18     19     0 <NA> 
#19     18     0 <NA> 
#20     16     0 <NA>

R tidyverse: create groups based on index column

Question

2 answers

solution1
3 2020-05-21 14:31:20

solution2
0 2020-05-21 20:43:32

R tidyverse: create groups based on index column

Question

2 answers

solution1 3 2020-05-21 14:31:20

solution2 0 2020-05-21 20:43:32

solution1
3 2020-05-21 14:31:20

solution2
0 2020-05-21 20:43:32