在R中使用is.na和Sapply函数

Question

Can anyone tell me what the line of code written below do? 有人能告诉我下面写的代码行是什么吗？

sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100

What is understood is that it will drop NA s when it applies the sum function but keeps them in the matrix. 可以理解的是，当它应用求和函数时会丢弃NA ，但会将它们保留在矩阵中。

Any help is appreciated. 任何帮助表示赞赏。

Thank you 谢谢

Answer 1

Enough comments, time for an answer: 足够的评论，回答的时间：

sapply(X,      # apply to each item of X (each column, if X is a data frame)
  function(x)  # this function:
    sum(is.na(x))  # count the NAs
) / nrow(airports) * 100  # then divide the result by the number of rows in the the airports object
  # and multiply by 100

In words, it counts the number of missing values in each column of X , then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports . 换句话说，它计算X每一列中缺失值的数量，然后将结果除以airports的行数并乘以100.计算每列中缺失值的百分比，假设X具有相同的行数作为airports 。

It's strange to mix and match the columns of X with the nrow(airports) , I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X) . 混合和匹配X的列与nrow(airports)是很奇怪的，我希望它们是相同的（即，无论是sapply(airports, ...) / nrow(airports)还是sapply(X, ...) / nrow(X) 。

As I mentioned in comments, nothing is being "dropped". 正如我在评论中提到的，没有任何东西被“放弃”。 If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE) . 如果你想做一个忽略NA值的sum ，你sum(foo, na.rm = TRUE) 。 Instead, here, *what is being summed is is.na(x) , that is we are summing whether or not each value is missing: counting missing values. 相反，在这里，*正在总结的是is.na(x) ，即我们总结每个值是否缺失：计算缺失值。 sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo . sum(is.na(foo))是计算foo中NA值的数量的惯用方法。

In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n : 在这种情况下，目标是百分比而不是计数，我们可以通过使用mean()而不是sum() / n来简化：

# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100

We could also use is.na() on the entire data so we don't need the "anonymous function": 我们也可以在整个数据上使用is.na() ，所以我们不需要“匿名函数”：

# rearrange for more simplicity
sapply(is.na(airports), mean) * 100

在R中使用is.na和Sapply函数

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-11-12 21:47:02

在R中使用is.na和Sapply函数

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-11-12 21:47:02

解决方案1
3 已采纳 2018-11-12 21:47:02