I have a df in R as follows:
ID Age Score1 Score2
2 22 12 NA
3 19 11 22
4 20 NA NA
1 21 NA 20
Now I want to only remove the rows where both Score 1 and Score 2 is missing (ie 3rd row)
You can filter it like this:
df <- read.table(head=T, text="ID Age Score1 Score2
2 22 12 NA
3 19 11 22
4 20 NA NA
1 21 NA 20")
df[!(is.na(df$Score1) & is.na(df$Score2)), ]
# ID Age Score1 Score2
# 1 2 22 12 NA
# 2 3 19 11 22
# 4 1 21 NA 20
Ie take rows where there's not ( !
) Score1
missing and ( &
) Score2
missing.
One option is rowSums
df1[ rowSums(is.na(df1[grep("Score", names(df1))])) < 2,]
Or another option with base R
df1[!Reduce(`&`, lapply(df1[grep("Score", names(df1))], is.na)),]
df1 <- structure(list(ID = c(2L, 3L, 4L, 1L), Age = c(22L, 19L, 20L,
21L), Score1 = c(12L, 11L, NA, NA), Score2 = c(NA, 22L, NA, 20L
)), class = "data.frame", row.names = c(NA, -4L))
Here are two version with dplyr
which can be extended to many columns with prefix "Score".
Using filter_at
library(dplyr)
df %>% filter_at(vars(starts_with("Score")), any_vars(!is.na(.)))
# ID Age Score1 Score2
#1 2 22 12 NA
#2 3 19 11 22
#3 1 21 NA 20
and filter_if
df %>% filter_if(startsWith(names(.),"Score"), any_vars(!is.na(.)))
A base R version with apply
df[apply(!is.na(df[startsWith(names(df),"Score")]), 1, any), ]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.