简体   繁体   中英

The Most Efficient Way of Forming Groups using R

I have a tibble dt given as follows:

library(tidyverse) 

dt <- tibble(x=as.integer(c(0,0,1,0,0,0,1,1,0,1))) %>% 
  mutate(grp = as.factor(c(rep("A",3), rep("B",4), rep("C",1), rep("D",2))))
dt

在此处输入图像描述

As one can observe the rule for grouping is:

  1. starts 0 and ends with 1 (eg, groups A, B, D) or
  2. it solely contains 1 (eg, group C)

Problem : Given a tibble with column integer vector x of zeros and 1 that starts with 0 and ends in 1, what is the most efficient way to obtain a grouping using R? (You can use any grouping symbols/factors.)

We can get the cumulative sum of 'x' (assuming it is binary), take the lag add 1 and use that index to replace it with LETTERS (Note that LETTERS was used only as part of matching with the expected output - it can take go up to certain limit)

library(dplyr)
dt %>% 
   mutate(grp2 = LETTERS[lag(cumsum(x), default = 0)+ 1])

-output

# A tibble: 10 x 3
       x grp   grp2 
   <int> <fct> <chr>
 1     0 A     A    
 2     0 A     A    
 3     1 A     A    
 4     0 B     B    
 5     0 B     B    
 6     0 B     B    
 7     1 B     B    
 8     1 C     C    
 9     0 D     D    
10     1 D     D    

Though the strategy proposed by Akrun is fantastic, yet to show that it can be managed through accumulate also

library(tidyverse) 

dt <- tibble(x=as.integer(c(0,0,1,0,0,0,1,1,0,1))) %>% 
  mutate(grp = as.factor(c(rep("A",3), rep("B",4), rep("C",1), rep("D",2))))

dt %>%
  mutate(GRP = accumulate(lag(x, default = 0),.init =1, ~ if(.y != 1) .x  else .x+1)[-1])
#> # A tibble: 10 x 3
#>        x grp     GRP
#>    <int> <fct> <dbl>
#>  1     0 A         1
#>  2     0 A         1
#>  3     1 A         1
#>  4     0 B         2
#>  5     0 B         2
#>  6     0 B         2
#>  7     1 B         2
#>  8     1 C         3
#>  9     0 D         4
#> 10     1 D         4

Created on 2021-06-13 by the reprex package (v2.0.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM