[英]how to count elements in each column of a table in R
i have a data set which look like this (actually it has >50 columns) 我有一个看起来像这样的数据集(实际上它有> 50列)
data <- read.csv("sample.csv")
subject gender age type satisfation agree
1 f 22 a yes yes
2 f 23 b no yes
3 f 21 b no
4 m 24 c yes yes
5 f 22 b no yes
6 m a yes yes
7 25 c yes no
8 m 21 b no yes
9 f 23 c yes yes
I would like to count the elements in each column (don't count NA) and export the result as the layout below 我想计算每列中的元素(不计算NA)并将结果导出为以下布局
subject gender age type satisfation agree
9 8 8 9 8 9
i wrote a script to count 我写了一个脚本来计数
counting <- function(x) {
for(i in 1:length(data)) {
data <- length(which(!is.na(x$i)))
print(data)
}
return(data)
}
counting(data)
i didn't work out as it gave all 0. 我没有解决,因为它给了所有0。
dput(head(data, 9))
structure(list(subject = 1:9, gender = structure(c(2L, 2L, 2L,
3L, 2L, 3L, 1L, 3L, 2L), .Label = c("", "f", "m"), class = "factor"),
age = c(22L, 23L, 21L, 24L, 22L, NA, 25L, 21L, 23L), type = structure(c(1L,
2L, 2L, 3L, 2L, 1L, 3L, 2L, 3L), .Label = c("a", "b", "c"
), class = "factor"), satisfation = structure(c(3L, 2L, 1L,
3L, 2L, 3L, 3L, 2L, 3L), .Label = c("", "no", "yes"), class = "factor"),
agree = structure(c(2L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 2L), .Label = c("no",
"yes", "yes "), class = "factor"), time = c(23L, 54L, 67L,
324L, 87L, 12L, 756L, 34L, 98L), day = c(1L, 3L, 2L, 5L,
7L, 4L, 3L, 1L, 4L)), .Names = c("subject", "gender", "age",
"type", "satisfation", "agree", "time", "day"), row.names = c(NA,
9L), class = "data.frame")
Is there any recommendation for the script, plz? 请问该脚本有什么建议吗?
Thank you all in advance! 谢谢大家!
Assuming you have handled NA
then simply use colSums
, 假设您已经处理了
NA
然后只需使用colSums
,
colSums(!is.na(df))
# subject gender age type satisfation agree time day
# 9 9 8 9 9 9 9 9
Adding @DavidArenburg suggestion so as to overcome any NA
trouble, 添加@DavidArenburg建议以克服所有
NA
问题,
colSums(!is.na(df) | df != "", na.rm = TRUE)
When I load your table into R there are just blank spaces instead of NAs. 当我将表格加载到R中时,只有空格而不是NA。 So when you read your .csv file, specify how NAs are coded.
因此,当您读取.csv文件时,请指定NA的编码方式。 It looks like they are coded as "" or maybe " ".
看起来它们被编码为“”或“”。
After you get the NAs, you can run this code. 获得NA后,您可以运行此代码。 Assume your table is called
df
. 假设您的表名为
df
。
counts <- apply(df, 2, function(x) length(na.omit(x)))
Or, as @JasonAizkalns suggests: 或者,就像@JasonAizkalns所建议的那样:
data <- read.csv("sample.csv", na.strings = "")
sapply(data, function(x) sum(!is.na(x))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.