简体   繁体   English

如何在下一行匹配时在第一次出现后删除行

[英]How to remove rows after the first occurrence when the following row matches

Working in RI am trying to remove all rows following a change. 在RI工作我试图删除更改后的所有行。 A business is open for 3 years then closes, the closed flag stays in the table for the following years. 一家企业开业3年,然后关闭,关闭的旗帜在接下来的几年中保持不变。 I want to remove the 2 extra years keeping only the data for the year it closed. 我想删除额外的2年,只保留它关闭的年份的数据。 Some locations close and reopen in the same year they should not be changed. 有些地点在同一年关闭并重新开放,不应更改。

I've tried slice on min date when status = "close" but this will not work because of the locations that reopen. status = "close"时,我在最小日期尝试过切片,但由于重新打开的位置,这不起作用。

Sample data 样本数据

date <- c("2014","2015","2016","2017","2018","2019","2016","2017","2018","2019","2015","2016","2017","2018","2018","2019","2019")
ID <- c("1","1","1","1","1", "1","2","2","2","2","3","3","3","3","3","3", "3")
status <- c("open", "open", "open", "close", "close", "close", "open", "open","open","open","open", "open", "open","close", "open", "close", "open")


start <- data.frame(date, ID, status)

Above I want to remove the 2018 and 2019 for ID = 1 上面我想删除ID = 1的2018年和2019年

date <- c("2014","2015","2016","2017","2016","2017","2018","2019","2015","2016","2017","2018","2018","2019","2019")
ID <- c("1","1","1","1","2","2","2","2","3","3","3","3","3","3", "3")
status <- c("open", "open", "open", "close", "open", "open","open","open","open", "open", "open","close", "open", "close", "open")


ideal_outcome <- data.frame(date, ID, status)

One way using rleid from data.table is to group_by ID and consecutive runs of status , keep only one row for the group where status = "close" and select all rows for "open" . 使用一种方法rleiddata.tablegroup_by ID和连续运行status ,只保留一行群里status = "close" ,并选择所有行"open"

library(dplyr)
library(data.table)

start %>%
  group_by(ID, group = rleid(status)) %>%
  slice(if (first(status) == "open") seq_len(n()) else  1L) %>%
  mutate(status = replace(as.character(status), 
                    status == "close", "permanently_closed")) %>%  
  ungroup() %>%
  select(-group)

# A tibble: 15 x 3
#   date  ID    status
#   <fct> <fct> <chr> 
# 1 2014  1     open  
# 2 2015  1     open  
# 3 2016  1     open  
# 4 2017  1     permanently_closed 
# 5 2016  2     open  
# 6 2017  2     open  
# 7 2018  2     open  
# 8 2019  2     open  
# 9 2015  3     open  
#10 2016  3     open  
#11 2017  3     open  
#12 2018  3     permanently_closed 
#13 2018  3     open  
#14 2019  3     permanently_closed 
#15 2019  3     open  

However, you don't really need to import data.table just for one function, behaviour of rleid can be replicated with base rle as well 但是,你并不真的需要进口data.table只是一个功能,行为rleid可以用碱复制rle以及

start %>%
  group_by(ID, group = with(rle(as.character(status)), 
                       rep(seq_along(values), lengths))) %>%
   slice(if (first(status) == "open") seq_len(n()) else  1L) %>%
   ungroup() %>%
   select(-group)

Another way to create groups as suggested by @Sotos using factor , diff and cumsum 另一种使用factordiffcumsum创建@Sotos建议组的方法

start %>% 
  group_by(grp = as.numeric(as.factor(status)), 
           grp = cumsum(c(TRUE, diff(grp) != 0))) %>%
  slice(if (first(status) == "open") seq_len(n()) else  1L)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM