简体   繁体   English

如果任何连续值不符合阈值,请删除ID

[英]Remove ID if any consecutive values does not meet threshold

My data frame looks like this: 我的数据框如下所示:

id        year        value
1         2000        23
1         2001        40
1         2003        93
2         1998        90
2         1999        91
2         2002        92
3         2015        12
3         2016        13
3         2017        14

I want to remove the ID if there are any two consecutive values that do not meet the threshold of 90. Note: Consecutive in this case, just means 1 year after another year, does not have to be exactly 1 year after. 如果有两个连续的值不满足90的阈值,我想删除ID。注意:在这种情况下,连续是指一年又一年,而不必是一年后。 (Example: 2001 and 2003 for ID 1 are consecutive years) (例如:ID为1的2001和2003是连续的年份)

The output should be just id 2. If id 2 had any instances where two consecutive values were <90, they would also be removed. 输出应仅为id2。如果id 2包含两个连续值<90的任何实例,则它们也将被删除。

id        year        value
2         1998        90
2         1999        91
2         2002        92

Could also do: 也可以做:

library(dplyr)

df %>%
  group_by(id) %>%
  filter(!any(value < 90 & lag(value) < 90))

Output: 输出:

# A tibble: 3 x 3
# Groups:   id [1]
     id  year value
  <int> <int> <int>
1     2  1998    90
2     2  1999    91
3     2  2002    92

This solution uses package dplyr . 此解决方案使用软件包dplyr

library(dplyr)

df1 %>%
  group_by(id) %>%
  filter(all(value[-1] >= 90 | value[-n()] >= 90))
## A tibble: 3 x 3
## Groups:   id [1]
#     id  year value
#  <int> <int> <int>
#1     2  1998    90
#2     2  1999    91
#3     2  2002    92

Data. 数据。

df1 <- read.table(text = "
id        year        value
1         2000        23
1         2001        40
1         2003        93
2         1998        90
2         1999        91
2         2002        92
3         2015        12
3         2016        13
3         2017        14                  
", header = TRUE)

Using dplyr you can first identify the values that are smaller than 90. Then you can count how many entries in sequence are smaller than 90. After that you can keep just the ids in which you do not observe 2 consecutive values smaller than 90. 使用dplyr您可以首先识别小于90的值。然后可以计算顺序中有多少个条目小于90。之后,您可以仅保留不观察到两个连续值小于90的id。

library(dplyr)
df %>%
  mutate(value_90 = value < 90) %>%
  group_by(id) %>%
  mutate(n_cons = cumsum(value_90)) %>%
  filter(!any(n_cons == 2)) %>%
  select(id, year, value)

# A tibble: 3 x 3
# Groups:   id [1]
     id  year value
  <dbl> <dbl> <dbl>
1     2  1998    90
2     2  1999    91
3     2  2002    92

Using dplyr and rle ... 使用dplyr和rle ...

library(dplyr)
DT %>% mutate(test = value < 90) %>% group_by(id) %>% filter(
  with(rle(test), !any(lengths >= 2 & values))
) %>% select(-test)

# A tibble: 3 x 3
# Groups:   id [1]
     id  year value
  <int> <int> <int>
1     2  1998    90
2     2  1999    91
3     2  2002    92

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM