I have a dataset consisting of a time series study. Since some participants didn't show up for certain days, they have NA values for rest of the data frame, but certain study days were crucial, so I am trying to subset my data to participants not missing these crucial days. My dataset is actually very large but here's the general structure:
fakedat <- data.frame(ID = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C",
"D", "D", "D", "D", "E", "E", "E", "E", "F", "F", "F", "F"),
StudyDay = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,
1, 2, 3, 4),
Ab = c(10, NA, 15, 10, 10, 20, 10, NA, 10, 10, NA, 30, NA, NA, 15, NA, 10, 20,
10, 30, NA, 10, NA, 20))
Now let's say it was crucial they show up at day 2 and 4, I tried subsetting using dplyr filtering like this:
fakedat2 <- fakedat %>%
dplyr::group_by(ID) %>%
dplyr::filter(StudyDay %in% c(2, 4) & !is.na(Ab)) %>%
dplyr:: ungroup()
EDIT: But the output of this dataset is only the list if IDs that have a 2 or 4 that's not an NA value. I need to find (in my real data) subjects who have NA Ab values at 4 specific Study Days. The answer I accepted below works but still curious about performing conditional filtering? Like in SAS you could code "IF Ab.=NA at (StudyDay=2 AND StudyDay=4) THEN ID....or something like that.
Maybe this will achieve your goal. If all participants have all StudyDay
timepoints, and you just want to see if not missing in days 2 or 4, you can just check the Ab
values at those time points in your filter
. In this case, an ID
will be omitted if is NA
in both days 2 and 4 (in this example, "D").
Alternatively, if you want to require that both values are available for days 2 and 4, you can use &
(AND) instead of |
(OR).
library(dplyr)
fakedat %>%
group_by(ID) %>%
filter(!is.na(Ab[StudyDay == 2]) | !is.na(Ab[StudyDay == 4]))
If you have multiple days to check are not missing, you can use all
and check values for NA
where the StudyDay
is %in%
a vector of required days as follows:
required_vals <- c(2, 4)
fakedat %>%
group_by(ID) %>%
filter(all(!is.na(Ab[StudyDay %in% required_vals])))
Output
ID StudyDay Ab
<chr> <dbl> <dbl>
1 A 1 10
2 A 2 NA
3 A 3 15
4 A 4 10
5 B 1 10
6 B 2 20
7 B 3 10
8 B 4 NA
9 C 1 10
10 C 2 10
11 C 3 NA
12 C 4 30
13 E 1 10
14 E 2 20
15 E 3 10
16 E 4 30
17 F 1 NA
18 F 2 10
19 F 3 NA
20 F 4 20
In base R
, we can do
subset(fakedat, ID %in% ID[StudyDay %in% c(2, 4) & !is.na(Ab)])
-output
# ID StudyDay Ab
#1 A 1 10
#2 A 2 NA
#3 A 3 15
#4 A 4 10
#5 B 1 10
#6 B 2 20
#7 B 3 10
#8 B 4 NA
#9 C 1 10
#10 C 2 10
#11 C 3 NA
#12 C 4 30
#17 E 1 10
#18 E 2 20
#19 E 3 10
#20 E 4 30
#21 F 1 NA
#22 F 2 10
#23 F 3 NA
#24 F 4 20
Or a similar option in dplyr
library(dplyr)
fakedat %>%
filter(ID %in% ID[StudyDay %in% c(2, 4) & !is.na(Ab)])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.