简体   繁体   English

如何从 select 列组中删除 na?

[英]How to remove na's from a select group of columns?

Very novice R user here.非常新手 R 用户在这里。 I have a data set and want avoid reducing my data set by a signficant amount (if I use na.omit or complex.cases it deletes ALL of the rows that contain na's, which massively shrinks my data set).我有一个数据集,并希望避免大量减少我的数据集(如果我使用 na.omit 或 complex.cases,它会删除所有包含 na 的行,这会大大缩小我的数据集)。 I only want to remove the na's in the columns that are directly relevant to my project.我只想删除与我的项目直接相关的列中的 na。 Lets say column 1 and column 2 are relevant.可以说第 1 列和第 2 列是相关的。 I've tried to use foo2 <- na.omit(foo1[-3:-4]) but I'm met with an error "Warning, in 3:4, numeric expression has 2 elements, only first will be used".我尝试使用 foo2 <- na.omit(foo1[-3:-4]) 但遇到错误“警告,在 3:4 中,数值表达式有 2 个元素,只会使用第一个元素” .

I'd like to go from this我想从这个 go

       column 1    column 2   column 3   column 4

 1         NA         4           3         9
 2         5          NA          NA        10
 3         8          10          NA        4
 4         11         6           2         NA

To this对此

        column 1    column 2   column 3   column 4

    3      8          10          NA        4
    4      11         6           2         NA

So instead of removing every single row, it only removed row 1 and 2.因此,它没有删除每一行,而是只删除了第 1 行和第 2 行。

Thank you in advance.先感谢您。

We can use complete.cases to return a logical vector (TRUE/FALSE - corresponds to no NA in a row/any NA) on the subset of columns and use that as row index to subset the full dataset我们可以使用complete.cases在列子集上返回一个逻辑向量(TRUE/FALSE - 对应于行中没有 NA/任何 NA),并将其用作行索引来子集整个数据集

df1[complete.cases(df1[c('column1', 'column2')]),]
#    column1 column2 column3 column4
#3       8      10      NA       4
#4      11       6       2      NA

na.omit applied on the subset of columns only return the data without any NA for that subset data and not on the full dataset.应用于列子集的na.omit仅返回该子集数据没有任何 NA 的数据,而不是完整数据集。

data数据

df1 <- data.frame(column1=c(NA,5,8,11),column2=c(4,NA,10,6),
         column3=c(3,NA,NA,2),column4=c(9,10,4,NA))

With df defined as your dataframe, you could determine the rows where neither column1 nor column2 is NA, and then make your selection:df定义为 dataframe,您可以确定 column1 和 column2 都不是 NA 的行,然后进行选择:

df <- data.frame(column1=c(NA,5,8,11),column2=c(4,NA,10,6),column3=c(3,NA,NA,2),column4=c(9,10,4,NA))

df[with(df, !is.na(column1) & !is.na(column2)),]
  column1 column2 column3 column4
3       8      10      NA       4
4      11       6       2      NA

You can use drop_na from the tidyr package:您可以使用drop_na package 中的tidyr

library(tidyr) # alternatively, you can load it from the tidyverse package

df <- tibble(
  col1 = c(NA_real_, 5, 8, 11),
  col2 = c(4, NA_real_, 10, 6),
  col3 = c(3, NA_real_, NA_real_, 2),
  col4 = c(9, NA_real_, 4, NA_real_)
)

drop_na(df, col1, col2)

# # A tibble: 2 x 4
#    col1  col2  col3  col4
#   <dbl> <dbl> <dbl> <dbl>
# 1     8    10    NA     4
# 2    11     6     2    NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM