[英]sub setting panel data based on two variables in R
library(dplyr)
id <- c(rep(1,4),rep(2,3),rep(3,4))
missing <- c(rep(0,4),rep(0,3),1,0,0,0)
wave <- c(seq(1:4),1,2,3,seq(1:4))
df <- as.data.frame(cbind(id,missing,wave))
df
id missing wave
1 1 0 1
2 1 0 2
3 1 0 3
4 1 0 4
5 2 0 1
6 2 0 2
7 2 0 3
8 3 1 1
9 3 0 2
10 3 0 3
11 3 0 4
I am trying to delete cases if they have missing=1 or if they are missing a wave (1:4). 如果它们缺少= 1或缺少波形(1:4),我将尝试删除它们。 For example, ID=3 should be dropped because at wave=1 they have missing=1 and ID=2 should be dropped because they only have values of 1, 2, and 3 in Wave.
例如,应该删除ID = 3,因为在wave = 1时它们丢失了= 1,而应该删除ID = 2,因为它们在Wave中只有值1、2和3。
I tried to use dplyr's group_by and filter functions but this removes all cases. 我试图使用dplyr的group_by和filter函数,但这会删除所有情况。 I want to only end up with cases for ID=1.
我只想结束ID = 1的案例。
df <- df %>% group_by(id) %>% filter(missing==0, wave==1, wave==2, wave==3, wave==4)
df
Try this. 尝试这个。 We first
group_by
id
, and then create a list column with the sorted unique values of wave
for each id
. 我们首先使用
group_by
id
,然后使用每个id
的wave
排序后的唯一值创建一个列表列。 Then we check to make sure this list equals 1:4
. 然后我们检查以确保此列表等于
1:4
。 We create a missing_check
variable, which is just the max
of missing
for each id
. 我们创建一个
missing_check
变量,它只是每个id
的missing
max
。 We filter on both missing_check
and wave_check
. 我们同时对
missing_check
和wave_check
过滤。
df %>%
group_by(id) %>%
mutate(wave_list = I(list(sort(unique(wave))))) %>%
mutate(wave_list_check = all(unlist(wave_list) == 1:4),
missing_check = max(missing)) %>%
filter(missing_check == 0, wave_list_check) %>%
select(id:wave)
id missing wave
<dbl> <dbl> <dbl>
1 1 0 1
2 1 0 2
3 1 0 3
4 1 0 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.