简体   繁体   English

使用R从数据框中删除NA值

[英]Delete NA values from a data frame with R

I have a large scale data frame with ?_? 我有一个大规模的数据框架?_? values which dimensions are 501 rows and 42844 columns. 维度为501行和42844列的值。 Using R , i have already replaced them with NA by using this code below : 使用R,我已经使用以下代码用NA替换它们:

data[data == "?_?"] <- NA

So i have NA values now and I want to omit these from the Data.frame but something is going bad.... When I hit the command below : 所以我现在有NA值,我想从Data.frame中省略这些值,但是有些事情变得很糟糕......当我点击下面的命令:

data_na_rm <- na.omit(data)

I get a 0 , 42844 object as a result. 结果我得到一个0,42844对象。

dim(data_na_rm) #gives me 0 42844
data_na_rm[1,2] #gives me NA
data_na_rm[5,3] #gives me NA
############################
data_na_rm[2]   #gives me the title of the second column 
data_na_rm[5]   #gives me the title fo the fifth

What i have to do?? 我该怎么办? I've spend on this thing to many hours. 我花了很多时间在这件事上。 I would appreciate if anyone could spend some time for this in order to help me. 如果有人能花一些时间来帮助我,我将不胜感激。

Like what JackStat said in the comments, you might have NAs in every row. 就像JackStat在评论中所说的那样,你可能每行都有NAs。 Maybe you should test for that?: 也许你应该测试一下?:

    # Some Data. All rows have an NA but not all columns

    df <- data.frame(col1 = c(NA, 2, 3, 4),
             col2 = c(1, NA, 3, 4),
             col3 = c(1, 2, NA, 4),
             col4 = c(1, 2, 3, NA),
             col5 = c(1, 2, 3, 4))

# test whether an NA is present in each row

apply(df, 1, function(x) {sum(is.na(x)) > 0})
[1] TRUE TRUE TRUE TRUE

This will help you find which columns are contributing the most NAs. 这将帮助您找到哪些列贡献最多的NA。 It sums up the number of NAs: 它总结了NA的数量:

apply(df, 2, function(x) {sum(is.na(x))})
col1 col2 col3 col4 col5 
   1    1    1    1    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM