[英]Remove ID if any consecutive values does not meet threshold
My data frame looks like this: 我的数据框如下所示:
id year value
1 2000 23
1 2001 40
1 2003 93
2 1998 90
2 1999 91
2 2002 92
3 2015 12
3 2016 13
3 2017 14
I want to remove the ID if there are any two consecutive values that do not meet the threshold of 90. Note: Consecutive in this case, just means 1 year after another year, does not have to be exactly 1 year after. 如果有两个连续的值不满足90的阈值,我想删除ID。注意:在这种情况下,连续是指一年又一年,而不必是一年后。 (Example: 2001 and 2003 for ID 1 are consecutive years) (例如:ID为1的2001和2003是连续的年份)
The output should be just id 2. If id 2 had any instances where two consecutive values were <90, they would also be removed. 输出应仅为id2。如果id 2包含两个连续值<90的任何实例,则它们也将被删除。
id year value
2 1998 90
2 1999 91
2 2002 92
Could also do: 也可以做:
library(dplyr)
df %>%
group_by(id) %>%
filter(!any(value < 90 & lag(value) < 90))
Output: 输出:
# A tibble: 3 x 3
# Groups: id [1]
id year value
<int> <int> <int>
1 2 1998 90
2 2 1999 91
3 2 2002 92
This solution uses package dplyr
. 此解决方案使用软件包dplyr
。
library(dplyr)
df1 %>%
group_by(id) %>%
filter(all(value[-1] >= 90 | value[-n()] >= 90))
## A tibble: 3 x 3
## Groups: id [1]
# id year value
# <int> <int> <int>
#1 2 1998 90
#2 2 1999 91
#3 2 2002 92
Data. 数据。
df1 <- read.table(text = "
id year value
1 2000 23
1 2001 40
1 2003 93
2 1998 90
2 1999 91
2 2002 92
3 2015 12
3 2016 13
3 2017 14
", header = TRUE)
Using dplyr
you can first identify the values that are smaller than 90. Then you can count how many entries in sequence are smaller than 90. After that you can keep just the ids in which you do not observe 2 consecutive values smaller than 90. 使用dplyr
您可以首先识别小于90的值。然后可以计算顺序中有多少个条目小于90。之后,您可以仅保留不观察到两个连续值小于90的id。
library(dplyr)
df %>%
mutate(value_90 = value < 90) %>%
group_by(id) %>%
mutate(n_cons = cumsum(value_90)) %>%
filter(!any(n_cons == 2)) %>%
select(id, year, value)
# A tibble: 3 x 3
# Groups: id [1]
id year value
<dbl> <dbl> <dbl>
1 2 1998 90
2 2 1999 91
3 2 2002 92
Using dplyr and rle
... 使用dplyr和rle
...
library(dplyr)
DT %>% mutate(test = value < 90) %>% group_by(id) %>% filter(
with(rle(test), !any(lengths >= 2 & values))
) %>% select(-test)
# A tibble: 3 x 3
# Groups: id [1]
id year value
<int> <int> <int>
1 2 1998 90
2 2 1999 91
3 2 2002 92
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.