简体   繁体   中英

Select columns that don't contain any NA value in R

How to select columns that don't contain any NA values in R? As long as a column contains at least one NA , I want to exclude it. What's the best way to do it? I am trying to use sum(is.na(x)) to achieve this, but haven't been successful.

Also, another R question. Is it possible to use commands to exclude columns that contain all same values? For example,

  column1  column2
row1   a        b  
row2   a        c
row3   a        c

My purpose is to exclude column1 from my matrix so the final result is:

   column2
row1   b  
row2   c
row3   c

Remove columns from dataframe where ALL values are NA deals with the case where ALL values are NA

For a matrix, you can use colSums(is.na(x) to find out which columns contain NA values

given a matrix x

x[, !colSums(is.na(x)), drop = FALSE]

will subset appropriately.

For a data.frame , it will be more efficient to use lapply or sapply and the function anyNA

xdf[, sapply(xdf, Negate(anyNA)), drop = FALSE]

Also if 'mat1' is the matrix:

indx <- unique(which(is.na(mat1), arr.ind=TRUE)[,2])
subset(mat1, select=-indx)

Also, could do

new.df <- df[, colSums(is.na(df)) == 0 ]

this way lets you subset based on the number of NA values in the columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM