[英]Sum NA values in r
I am using a dataframe that has multiple NA values so I was thinking about sorting the attributes based on their NA values. 我正在使用具有多个NA值的数据帧,因此我考虑根据其NA值对属性进行排序。 I was trying to use a
for
loop and this is what I have so far: 我试图使用
for
循环,这是我到目前为止:
> data <- read.csv("C:/Users/Nikita/Desktop/first1k.csv")
> for (i in 1:length(data) ) {
+ temp <- c(sum(is.na(data[i])))}
> temp
[1] 0
It is the first time I am using a for loop in r so I am sure it is just a silly syntax problem but I can't understand which one exactly. 这是我第一次在r中使用for循环,所以我确信它只是一个愚蠢的语法问题,但我无法理解究竟是哪一个。
Ultimately, I need a list that shows the name of the attribute and its NA count. 最终,我需要一个列表,显示属性的名称及其NA计数。 This way I could sort the list and get the desired information.
这样我就可以对列表进行排序并获得所需的信息。 Here is some mock data to make it easier.
这是一些模拟数据,使其更容易。
data <- data.frame(A = c(500, 600, 700, 1000),
B = c(500, 600, 700, NA),
C = c(NA, NA, 500, 700),
D = c(800, NA, 933, NA),
E = c(NA, NA, NA, NA))
Edit: Thank you all for the help. 编辑:谢谢大家的帮助。 All three solution worked for me.
这三种解决方案都适合我。 I do wonder though if there is a one line code that will sort those attributes before I export them into a file.
我确实想知道是否有一行代码在将它们导出到文件之前对这些属性进行排序。 like I mentioned before, I am quite new in
r
so I am not sure if it is possible. 就像我之前提到的,我在
r
很新,所以我不确定它是否可能。
Edit 2: When I run the sort is gives me the next error: 编辑2:当我运行sort时,会给出下一个错误:
temp <- sort(temp)
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
Any idea why? 知道为什么吗?
Here is a quick answer using is.na
and colSums
: 以下是使用
is.na
和colSums
的快速答案:
colSums(is.na(data))
returning: 返回:
A B C D E
0 1 2 2 4
for your above data. 对于您的上述数据。
Thanks to @akrun for showing my surplus apply
感谢@akrun显示我的剩余
apply
The right way to do iterative code in R is to avoid explicit for
loops. 在R中执行迭代代码的正确方法是避免显式的
for
循环。 Use apply
(and the company) instead. 请改用
apply
(和公司)。 @jeremycg gave you the right R-ish answer. @jeremycg给了你正确的R-ish答案。 Regarding your code, you should make some editing to make it work.
关于您的代码,您应该进行一些编辑以使其工作。
temp <- c()
for (i in 1:length(data)){
temp[names(data)[i]] <- sum(is.na(data[i]))
}
You had temp
rewritten at each iteration. 你不得不
temp
在每次迭代改写。 Moreover you didn't write the labels of your variables into temp
. 此外,您没有将变量的标签写入
temp
。 Hence the output you see is the number of NA
s in the last column of your dataset. 因此,您看到的输出是数据集最后一列中的
NA
数。
Regarding OP's edit 关于OP的编辑
temp <- sort(temp) # pass decreasing=T into arguments in case
# you want reversed order
This answer shows how to make the for loop work. 这个答案显示了如何使for循环工作。
temp <- vector(length = ncol(data))
for (i in 1:length(data)) {
temp[i] <- c(sum(is.na(data[, i])))
}
names(temp) <- colnames(data)
temp
# A B C D E
# 0 1 2 2 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.