简体   繁体   中英

How do I change subsequent row values if some condition is met with multiple groups?

I have a dataframe that looks like this:

ID  value   condition
A   0         0
A   3         0
A   0         1
A   7         1
A   5         0
A   5         0
A   5         0
A   7         0
B   6         0
B   2         1
B   7         0
B   10        1
B   0         0
B   6         0

I want to change the ID name when the condition is met and change the name of the ID that follows too. The condition can be met multiple times per ID so I'd like to modify it each time.

The result would change the original ID or just add a new column:

ID  value   condition   newID
A   0              0    A
A   3              0    A
A   0              1    A1
A   7              1    A1
A   5              0    A2
A   5              0    A2
A   5              0    A2
A   7              0    A2
B   6              0    B
B   2              1    B1
B   7              0    B2
B   10             1    B3
B   0              0    B4
B   6              0    B4

One option after grouping by 'ID', create the index with rleid (from data.table ) and change it to paste the 'ID' based on the condition with case_when

library(dplyr)
library(data.table)
df1 %>% 
   group_by(ID) %>%
   mutate(newID = rleid(condition)-1,
          newID = case_when(newID == 0 ~ first(ID), TRUE ~ paste0(first(ID), newID)))
# A tibble: 14 x 4
# Groups:   ID [2]
#   ID    value condition newID
#   <chr> <int>     <int> <chr>
# 1 A         0         0 A    
# 2 A         3         0 A    
# 3 A         0         1 A1   
# 4 A         7         1 A1   
# 5 A         5         0 A2   
# 6 A         5         0 A2   
# 7 A         5         0 A2   
# 8 A         7         0 A2   
# 9 B         6         0 B    
#10 B         2         1 B1   
#11 B         7         0 B2   
#12 B        10         1 B3   
#13 B         0         0 B4   
#14 B         6         0 B4   

data

df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", 
 "B", "B", "B", "B", "B", "B"), value = c(0L, 3L, 0L, 7L, 5L, 
 5L, 5L, 7L, 6L, 2L, 7L, 10L, 0L, 6L), condition = c(0L, 0L, 1L, 
 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L)), class = "data.frame", 
 row.names = c(NA, -14L))

Same idea as @akrun but using only data.table

library(data.table)
setDT(df)

df[, newID := paste0(ID, gsub('^0$', '', rleid(condition) - 1)), ID]
df
#     ID value condition newID
#  1:  A     0         0     A
#  2:  A     3         0     A
#  3:  A     0         1    A1
#  4:  A     7         1    A1
#  5:  A     5         0    A2
#  6:  A     5         0    A2
#  7:  A     5         0    A2
#  8:  A     7         0    A2
#  9:  B     6         0     B
# 10:  B     2         1    B1
# 11:  B     7         0    B2
# 12:  B    10         1    B3
# 13:  B     0         0    B4
# 14:  B     6         0    B4

Could also do:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(newID = cumsum(c(0, (condition != lag(condition))[-1])),
         newID = ifelse(newID != 0, paste0(ID, newID), ID))

Output:

# A tibble: 14 x 4
# Groups:   ID [2]
   ID    value condition newID
   <chr> <int>     <int> <chr>
 1 A         0         0 A    
 2 A         3         0 A    
 3 A         0         1 A1   
 4 A         7         1 A1   
 5 A         5         0 A2   
 6 A         5         0 A2   
 7 A         5         0 A2   
 8 A         7         0 A2   
 9 B         6         0 B    
10 B         2         1 B1   
11 B         7         0 B2   
12 B        10         1 B3   
13 B         0         0 B4   
14 B         6         0 B4  

If I understand correctly, the OP wants to create subgroups within each ID for each contiguous streak of condition .

Unfortunately, the OP has requested to name the subgroups in a special way which makes the solutions overly complicated. By OP's request, the subgroups are to be named, eg, A, A1, A2 which means that subgroup numbering and subgroup name is off by one, eg, the second subgroup is named A1 , the third one A2 , etc.

In case a more simplified naming scheme would be acceptable, we can directly benefit from the prefix parameter of the rleid() function. Then, the first subgroup of group A will be named A1 , the second A2 , etc.

dplyr

library(dplyr)
df %>% 
  group_by(ID) %>% 
  mutate(newID = data.table::rleid(condition, prefix = first(ID)))
 # A tibble: 14 x 4 # Groups: ID [2] ID value condition newID <chr> <int> <int> <chr> 1 A 0 0 A1 2 A 3 0 A1 3 A 0 1 A2 4 A 7 1 A2 5 A 5 0 A3 6 A 5 0 A3 7 A 5 0 A3 8 A 7 0 A3 9 B 6 0 B1 10 B 2 1 B2 11 B 7 0 B3 12 B 10 1 B4 13 B 0 0 B5 14 B 6 0 B5 

data.table

library(data.table)
setDT(df)[, newID := rleid(condition, prefix = ID), ID][]
  ID value condition newID 1: A 0 0 A1 2: A 3 0 A1 3: A 0 1 A2 4: A 7 1 A2 5: A 5 0 A3 6: A 5 0 A3 7: A 5 0 A3 8: A 7 0 A3 9: B 6 0 B1 10: B 2 1 B2 11: B 7 0 B3 12: B 10 1 B4 13: B 0 0 B5 14: B 6 0 B5 

Data

library(data.table)
df <- fread("ID  value   condition
A   0         0
A   3         0
A   0         1
A   7         1
A   5         0
A   5         0
A   5         0
A   7         0
B   6         0
B   2         1
B   7         0
B   10        1
B   0         0
B   6         0")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM