简体   繁体   中英

How can I remove NAs when both columns are missing only?

I have a df in R as follows:

ID    Age   Score1     Score2      
2      22    12           NA
3      19    11           22
4      20    NA           NA
1      21    NA           20

Now I want to only remove the rows where both Score 1 and Score 2 is missing (ie 3rd row)

You can filter it like this:

df <- read.table(head=T, text="ID    Age   Score1     Score2      
2      22    12           NA
3      19    11           22
4      20    NA           NA
1      21    NA           20")
df[!(is.na(df$Score1) & is.na(df$Score2)), ]
#   ID Age Score1 Score2
# 1  2  22     12     NA
# 2  3  19     11     22
# 4  1  21     NA     20

Ie take rows where there's not ( ! ) Score1 missing and ( & ) Score2 missing.

One option is rowSums

df1[ rowSums(is.na(df1[grep("Score", names(df1))])) < 2,]

Or another option with base R

df1[!Reduce(`&`, lapply(df1[grep("Score", names(df1))], is.na)),]

data

df1 <- structure(list(ID = c(2L, 3L, 4L, 1L), Age = c(22L, 19L, 20L, 
 21L), Score1 = c(12L, 11L, NA, NA), Score2 = c(NA, 22L, NA, 20L
 )), class = "data.frame", row.names = c(NA, -4L))

Here are two version with dplyr which can be extended to many columns with prefix "Score".

Using filter_at

library(dplyr)

df %>% filter_at(vars(starts_with("Score")), any_vars(!is.na(.)))

#  ID Age Score1 Score2
#1  2  22     12     NA
#2  3  19     11     22
#3  1  21     NA     20

and filter_if

df %>% filter_if(startsWith(names(.),"Score"), any_vars(!is.na(.)))

A base R version with apply

df[apply(!is.na(df[startsWith(names(df),"Score")]), 1, any), ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM