简体   繁体   English

R根据NA值分割数据帧

[英]R splitting a data frame based on NA values

I want to split a dataset in R based on NA values from a variable, for example: 我想基于变量的NA值在R中拆分数据集,例如:

 var1 var2
   1    21
   2    NA
   3    NA
   4    10 

and make it like this: 并使其如下所示:

  var1 var2
   1    21
   4    10  




 var1 var2
   2    NA
   3    NA 

See More Details: 查看更多详细信息:

Most statistical functions (eg, lm()) have something like na.action which applies to the model, not to individual variables. 大多数统计函数(例如lm())都具有类似于na.action的内容 ,它适用于模型,而不适用于单个变量。 na.fail() returns the object (the dataset) if there are no NA values, otherwise it returns NA (stopping the analysis). 如果没有NA值,则na.fail()返回对象(数据集),否则返回NA(停止分析)。 na.pass() returns the data object whether or not it has NA values, which is useful if the function deals with NA values internally. na.pass()返回数据对象是否具有NA值,如果该函数内部处理NA值,则该函数很有用。 na.omit () returns the object with entire observations (rows) omitted if any of the variables used in the model are NA for that observation. 如果模型中使用的任何变量均为该观察值的NA,则na.omit()返回省略了整个观察值(行)的对象。 na.exclude() is the same as na.omit(), except that it allows functions using naresid or napredict. na.exclude()与na.omit()相同,除了它允许使用naresid或napredict的函数。 You can think of na.action as a function on your data object, the result being the data object in the lm() function. 您可以将na.action视为数据对象上的一个函数,结果是lm()函数中的数据对象。 The syntax of the lm() function allows specification of the na.action as a parameter: lm()函数的语法允许将na.action指定为参数:

lm(na.omit(dataset),y~a+b+c)
lm(dataset,y~a+b+c,na.omit) # same as above, and the more common usage

You can set your default handling of missing values with 您可以使用以下方法设置默认值:

options("na.actions"=na.omit)

You could just subset the data frame using is.na() : 您可以只使用is.na()将数据框is.na()子集:

df1 <- df[!is.na(df$var2), ]
df2 <- df[is.na(df$var2), ]

Demo here: 演示在这里:

Rextester 右旋酯

Hi try this 嗨试试这个

new_DF <- DF[rowSums(is.na(DF)) > 0,]

or in case you want to check a particular column, you can also use 或者,如果您要检查特定的列,也可以使用

new_DF <- DF[is.na(DF$Var),]

In case you have NA character values, first run 如果您有NA字符值,请先运行

Df[Df=='NA'] <- NA

to replace them with missing values. 用缺少的值替换它们。

split function comes handily in this case. 在这种情况下, split功能很方便。

data <- read.table(text="var1 var2
   1    21
   2    NA
   3    NA
   4    10", header=TRUE)

split(data, is.na(data$var2))
# 
# $`FALSE`
# var1 var2
# 1    1   21
# 4    4   10
# 
# $`TRUE`
# var1 var2
# 2    2   NA
# 3    3   NA

An alternative and more general approach is using the complete.cases command. 另一种更通用的方法是使用complete.cases命令。 The command spots rows that have no missing values (no NAs) and returns TRUE/FALSE values. 该命令将发现没有缺失值(无NA)的行,并返回TRUE / FALSE值。

dt = data.frame(var1 = c(1,2,3,4),
                var2 = c(21,NA,NA,10))

dt1 = dt[complete.cases(dt),]
dt2 = dt[!complete.cases(dt),]

dt1

#   var1 var2
# 1    1   21
# 4    4   10

dt2

#   var1 var2
# 2    2   NA
# 3    3   NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM