简体   繁体   English

在R中使用is.na和Sapply函数

[英]Using is.na with Sapply function in R

Can anyone tell me what the line of code written below do? 有人能告诉我下面写的代码行是什么吗?

sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100

What is understood is that it will drop NA s when it applies the sum function but keeps them in the matrix. 可以理解的是,当它应用求和函数时会丢弃NA ,但会将它们保留在矩阵中。

Any help is appreciated. 任何帮助表示赞赏。

Thank you 谢谢

Enough comments, time for an answer: 足够的评论,回答的时间:

sapply(X,      # apply to each item of X (each column, if X is a data frame)
  function(x)  # this function:
    sum(is.na(x))  # count the NAs
) / nrow(airports) * 100  # then divide the result by the number of rows in the the airports object
  # and multiply by 100

In words, it counts the number of missing values in each column of X , then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports . 换句话说,它计算X每一列中缺失值的数量,然后将结果除以airports的行数并乘以100.计算每列中缺失值的百分比,假设X具有相同的行数作为airports

It's strange to mix and match the columns of X with the nrow(airports) , I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X) . 混合和匹配X的列与nrow(airports)是很奇怪的,我希望它们是相同的(即,无论是sapply(airports, ...) / nrow(airports)还是sapply(X, ...) / nrow(X)

As I mentioned in comments, nothing is being "dropped". 正如我在评论中提到的,没有任何东西被“放弃”。 If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE) . 如果你想做一个忽略NA值的sum ,你sum(foo, na.rm = TRUE) Instead, here, *what is being summed is is.na(x) , that is we are summing whether or not each value is missing: counting missing values. 相反,在这里,*正在总结的是is.na(x) ,即我们总结每个值是否缺失:计算缺失值。 sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo . sum(is.na(foo))是计算fooNA值的数量的惯用方法。

In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n : 在这种情况下,目标是百分比而不是计数,我们可以通过使用mean()而不是sum() / n来简化:

# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100

We could also use is.na() on the entire data so we don't need the "anonymous function": 我们也可以在整个数据上使用is.na() ,所以我们不需要“匿名函数”:

# rearrange for more simplicity
sapply(is.na(airports), mean) * 100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM