[英]Using is.na with Sapply function in R
Can anyone tell me what the line of code written below do? 有人能告诉我下面写的代码行是什么吗?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NA
s when it applies the sum function but keeps them in the matrix. 可以理解的是,当它应用求和函数时会丢弃
NA
,但会将它们保留在矩阵中。
Any help is appreciated. 任何帮助表示赞赏。
Thank you 谢谢
Enough comments, time for an answer: 足够的评论,回答的时间:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X
, then divides the result by the number of rows in airports
and multiplies by 100. Calculating the percentage of missing values in each column, assuming X
has the same number of rows as airports
. 换句话说,它计算
X
每一列中缺失值的数量,然后将结果除以airports
的行数并乘以100.计算每列中缺失值的百分比,假设X
具有相同的行数作为airports
。
It's strange to mix and match the columns of X
with the nrow(airports)
, I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports)
or sapply(X, ...) / nrow(X)
. 混合和匹配
X
的列与nrow(airports)
是很奇怪的,我希望它们是相同的(即,无论是sapply(airports, ...) / nrow(airports)
还是sapply(X, ...) / nrow(X)
。
As I mentioned in comments, nothing is being "dropped". 正如我在评论中提到的,没有任何东西被“放弃”。 If you wanted to do a
sum
ignoring the NA
values, you do sum(foo, na.rm = TRUE)
. 如果你想做一个忽略
NA
值的sum
,你sum(foo, na.rm = TRUE)
。 Instead, here, *what is being summed is is.na(x)
, that is we are summing whether or not each value is missing: counting missing values. 相反,在这里,*正在总结的是
is.na(x)
,即我们总结每个值是否缺失:计算缺失值。 sum(is.na(foo))
is the idiomatic way to count the number of NA
values in foo
. sum(is.na(foo))
是计算foo
中NA
值的数量的惯用方法。
In this case, where the goal is a percent not a count, we can simplify by using mean()
instead of sum() / n
: 在这种情况下,目标是百分比而不是计数,我们可以通过使用
mean()
而不是sum() / n
来简化:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na()
on the entire data so we don't need the "anonymous function": 我们也可以在整个数据上使用
is.na()
,所以我们不需要“匿名函数”:
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.