[英]Subset lagged values in R
对于给定的数据表,请参见下面的示例,我只想通过Unique_ID为大于2的值保留差异列,而不删除NA行。
My_data_table <- structure(list(Unique_ID = structure(c(1L, 1L, 2L, 2L, 3L,
3L, 3L, 4L, 4L, 4L), .Label = c("1AA", "3AA", "5AA", "6AA"),
class = "factor"), Distance.km. = c(1, 2.05, 2, 4, 2, 4, 7,
8, 9, 10), Difference = c(NA, 1.05, NA, 2, NA, 2, 3, NA, 1, 1)),
.Names = c("Unique_ID", "Distance.km.", "Difference"),
class = "data.frame", row.names = c(NA, -10L))
My_data_table
Unique_ID Distance(km) Difference
1AA 1 NA
1AA 2.05 1.05
3AA 2 NA
3AA 4 2
5AA 2 NA
5AA 4 2
5AA 7 3
6AA 8 NA
6AA 9 1
6AA 10 1
这是我正在寻找的结果
My_data_table
Unique_ID Distance(km) Difference
3AA 2 NA
3AA 4 2
5AA 2 NA
5AA 4 2
5AA 7 3
转换为'data.table'( setDT(df1)
)后,按'Unique_ID'分组, if
逻辑矢量( Difference >= 2
)的sum
大于0,则获取Data.table的子集( .SD
)其中'差异'是NA
或|
它大于或等于2
library(data.table)
setDT(df1)[, if(sum(Difference >=2, na.rm = TRUE)>0)
.SD[is.na(Difference)|Difference>=2], by = Unique_ID]
# Unique_ID Distance.km. Difference
#1: 3AA 2 NA
#2: 3AA 4 2
#3: 5AA 2 NA
#4: 5AA 4 2
#5: 5AA 7 3
一个dplyr
解决方案:
library(dplyr)
df %>%
group_by(Unique_ID) %>%
filter(any(Difference >= 2 & !is.na(Difference)))
# # A tibble: 5 x 3
# # Groups: Unique_ID [2]
# Unique_ID Distance.km. Difference
# <fctr> <dbl> <dbl>
# 1 3AA 2 NA
# 2 3AA 4 2
# 3 5AA 2 NA
# 4 5AA 4 2
# 5 5AA 7 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.