[英]How to remove rows after the first occurrence when the following row matches
Working in RI am trying to remove all rows following a change. 在RI工作我试图删除更改后的所有行。 A business is open for 3 years then closes, the closed flag stays in the table for the following years. 一家企业开业3年,然后关闭,关闭的旗帜在接下来的几年中保持不变。 I want to remove the 2 extra years keeping only the data for the year it closed. 我想删除额外的2年,只保留它关闭的年份的数据。 Some locations close and reopen in the same year they should not be changed. 有些地点在同一年关闭并重新开放,不应更改。
I've tried slice on min date when status = "close"
but this will not work because of the locations that reopen. 当status = "close"
时,我在最小日期尝试过切片,但由于重新打开的位置,这不起作用。
Sample data 样本数据
date <- c("2014","2015","2016","2017","2018","2019","2016","2017","2018","2019","2015","2016","2017","2018","2018","2019","2019")
ID <- c("1","1","1","1","1", "1","2","2","2","2","3","3","3","3","3","3", "3")
status <- c("open", "open", "open", "close", "close", "close", "open", "open","open","open","open", "open", "open","close", "open", "close", "open")
start <- data.frame(date, ID, status)
Above I want to remove the 2018 and 2019 for ID = 1 上面我想删除ID = 1的2018年和2019年
date <- c("2014","2015","2016","2017","2016","2017","2018","2019","2015","2016","2017","2018","2018","2019","2019")
ID <- c("1","1","1","1","2","2","2","2","3","3","3","3","3","3", "3")
status <- c("open", "open", "open", "close", "open", "open","open","open","open", "open", "open","close", "open", "close", "open")
ideal_outcome <- data.frame(date, ID, status)
One way using rleid
from data.table
is to group_by
ID
and consecutive runs of status
, keep only one row for the group where status = "close"
and select all rows for "open"
. 使用一种方法rleid
从data.table
是group_by
ID
和连续运行status
,只保留一行群里status = "close"
,并选择所有行"open"
。
library(dplyr)
library(data.table)
start %>%
group_by(ID, group = rleid(status)) %>%
slice(if (first(status) == "open") seq_len(n()) else 1L) %>%
mutate(status = replace(as.character(status),
status == "close", "permanently_closed")) %>%
ungroup() %>%
select(-group)
# A tibble: 15 x 3
# date ID status
# <fct> <fct> <chr>
# 1 2014 1 open
# 2 2015 1 open
# 3 2016 1 open
# 4 2017 1 permanently_closed
# 5 2016 2 open
# 6 2017 2 open
# 7 2018 2 open
# 8 2019 2 open
# 9 2015 3 open
#10 2016 3 open
#11 2017 3 open
#12 2018 3 permanently_closed
#13 2018 3 open
#14 2019 3 permanently_closed
#15 2019 3 open
However, you don't really need to import data.table
just for one function, behaviour of rleid
can be replicated with base rle
as well 但是,你并不真的需要进口data.table
只是一个功能,行为rleid
可以用碱复制rle
以及
start %>%
group_by(ID, group = with(rle(as.character(status)),
rep(seq_along(values), lengths))) %>%
slice(if (first(status) == "open") seq_len(n()) else 1L) %>%
ungroup() %>%
select(-group)
Another way to create groups as suggested by @Sotos using factor
, diff
and cumsum
另一种使用factor
, diff
和cumsum
创建@Sotos建议组的方法
start %>%
group_by(grp = as.numeric(as.factor(status)),
grp = cumsum(c(TRUE, diff(grp) != 0))) %>%
slice(if (first(status) == "open") seq_len(n()) else 1L)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.