简体   繁体   English

R 上 Na 值(数据框和变量)的百分比

[英]Percentuage on Na Values (Dataframe and Variables) on R

I would like to calculate percentage of NA -values in a dataframe and for variables.我想计算数据帧和变量中NA值的百分比。

My dataframe has this:我的数据框有这个:

mean(is.na(dataframe))
# 0.03354

How I read this result?我如何阅读这个结果? Na 0,033%?钠 0,033%? I don't understand.我不明白。

For the individual variables I did the following for the count of NA s对于单个变量,我对NA的计数做了以下操作

sapply(DATAFRAME, function(x) sum(is.na(x)))

Then, for the percentage of NA -values:然后,对于NA值的百分比:

colMeans(is.na(VARIABLEX)) 

Doesn't work because I get the following error:不起作用,因为我收到以下错误:

"x must be an array of at least two dimension" “x 必须是至少二维的数组”

Why does this error occur?为什么会出现这个错误? Anyway, afterwards I tried the following:无论如何,之后我尝试了以下操作:

mean(is.na(VariableX))
# 0.1188

Should I interpret this as having 0.11% NA -values?我应该将其解释为 0.11% NA吗?

I'd just divide the number of rows containing NAs by the total number of rows:我只是将包含 NA 的行数除以总行数:

df <- data.frame(data = c(NA, NA, NA, NA, 2, 4, NA, 7, NA))

percent_NA <- NROW(df[is.na(df$data),])/NROW(df)

Which gives:这使:

> percent_NA
[1] 0.6666667

Which means there are 66,67% NAs in my dataframe这意味着我的数据框中有 66,67% 的 NA

I don't understand the issue you are trying to solve.我不明白你试图解决的问题。 It all works as expected.这一切都按预期工作。
First, a dataset since you haven't provided one.首先,一个数据集,因为你没有提供一个。

set.seed(6180)  # make it reproducible
dat <- data.frame(x = sample(c(1:4, NA), 100, TRUE),
                  y = sample(c(1:5, NA), 100, TRUE))

Now the code for sums.现在是求和的代码。

s <- sapply(dat, function(x) sum(is.na(x)))
s
# x  y 
#18 13
sum(s)
#[1] 31
sum(is.na(dat))
#[1] 31

colSums(is.na(dat))
# x  y 
#18 13

The same goes for means, be it mean or colMeans .手段也是如此,无论是mean还是colMeans
EDIT.编辑。
Here is the code to get the means of NA values per column/variable and a grand total.这是获取每列/变量的NA值平均值和总计的代码。

sapply(dat, function(x) mean(is.na(x)))
#   x    y 
#0.18 0.13
colMeans(is.na(dat))   # Same result, faster
#   x    y 
#0.18 0.13
mean(is.na(dat))       # overall mean
#[1] 0.155

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM