Add group counter on data frame based on column

Question

Say I have a sorted data frame with a distance variable d indicating the distance between measures in variable a .

library(dplyr)
set.seed(1)
df <- 
  data.frame(a=sort(sample(2:20,8))) %>% 
  mutate(d = a-lag(a))

This gives:

I am trying to add a kind off counter/grouping variable g which indicates whether d is larger than, say, 2. g could take values like: g1, g2, ... etc. In other words I would like to "increase" g when d > 2. In the data below we would get:

>df a   d   g
1   5   NA  g1
2   7   2   g1
3   8   1   g1
4   9   1   g1
5   11  2   g1
6   14  3   g2
7   15  1   g2
8   16  1   g2

I though of using a function with global side-effect along (and yes, this is generally a bad idea, I could not think of anything else):

f <- function(x){
  if(x)
    g <<- g +1
  return(paste0('g', g))
}

And then do:

g=0
df %>% 
  mutate(g = ifelse(is.na(d)|d>2, f(T), f(F)))

But g is not increased in mutate (or sapply). In real -world data I might have 1000s of g groups.

Answer 1

You can try,

with(df, paste0('g', cumsum(replace(d, is.na(d), 0) > 2)+1))
#[1] "g1" "g1" "g1" "g1" "g1" "g2" "g2" "g2"

Answer 2

A solution using dplyr and data.table . df2 is the final output.

library(dplyr)
library(data.table)

df2 <- df %>%
  mutate(Large2 = ifelse(d > 2, 1, NA)) %>%
  mutate(RunID = rleid(Large2)) %>%
  mutate(ID = ifelse(RunID %% 2 == 0, RunID + 1, RunID)) %>%
  mutate(g = paste0("g", group_indices(., ID))) %>%
  select(a, d, g)

Add group counter on data frame based on column

Question

2 answers

solution1
2 ACCPTED 2017-08-24 14:36:37

solution2
0 2017-08-24 14:55:01

Add group counter on data frame based on column

Question

2 answers

solution1 2 ACCPTED 2017-08-24 14:36:37

solution2 0 2017-08-24 14:55:01

solution1
2 ACCPTED 2017-08-24 14:36:37

solution2
0 2017-08-24 14:55:01