[英]Function to count NA values at each level of a factor
I have this dataframe: 我有这个数据框:
set.seed(50)
data <- data.frame(age=c(rep("juv", 10), rep("ad", 10)),
sex=c(rep("m", 10), rep("f", 10)),
size=c(rep("large", 10), rep("small", 10)),
length=rnorm(20),
width=rnorm(20),
height=rnorm(20))
data$length[sample(1:20, size=8, replace=F)] <- NA
data$width[sample(1:20, size=8, replace=F)] <- NA
data$height[sample(1:20, size=8, replace=F)] <- NA
age sex size length width height
1 juv m large NA -0.34992735 0.10955641
2 juv m large -0.84160374 NA -0.41341885
3 juv m large 0.03299794 -1.58987765 NA
4 juv m large NA NA NA
5 juv m large -1.72760411 NA 0.09534935
6 juv m large -0.27786453 2.66763339 0.49988990
7 juv m large NA NA NA
8 juv m large -0.59091244 -0.36212039 -1.65840096
9 juv m large NA 0.56874633 NA
10 juv m large NA 0.02867454 -0.49068623
11 ad f small 0.29520677 0.19902339 NA
12 ad f small 0.55475223 -0.85142228 0.33763747
13 ad f small NA NA -1.96590570
14 ad f small 0.19573384 0.59724896 -2.32077461
15 ad f small -0.45554055 -1.09604786 NA
16 ad f small -0.36285547 0.01909655 1.16695158
17 ad f small -0.15681338 NA NA
18 ad f small NA NA NA
19 ad f small NA 0.40618657 -1.33263085
20 ad f small -0.32342568 NA -0.13883976
I'm trying to make a function that counts the number of NA values of each of length
, width
and height
at each level of the three factors in the dataframe. 我正在尝试创建一个函数来计算数据帧中三个因子的每个级别的
length
, width
和height
的NA值的数量。 I've tried this: 我已经试过了:
exploreMissingValues <- function(dataframe, factors, variables){
library(plyr)
Variables <- list(variables)
llply(Variables, function(x) ddply(dataframe, .(factors),
summarise,
number.of.NA=length(x[is.na(x)])))
}
exploreMissingValues(data,
c("age", "sex", "size"),
c("length", "width", "height"))
...but this gives an error. ...但这会导致错误。 How can I get this function to return number of NA values at each level of the dataframe?
如何获得此函数以返回数据帧每个级别的NA值数量?
Looking for something like this...??? 寻找这样的东西... ???
library(doBy)
summaryBy(length+width+height~age+sex+size,
data=data,
FUN=function(x) sum(is.na(x)),
keep.names=TRUE)
age sex size length width height
1 ad f small 3 4 4
2 juv m large 5 4 4
Use aggregate
: 使用
aggregate
:
nacheck <- function(var, factor)
aggregate(var, list(factor), function(x) sum(is.na(x)))
nacheck(data$length, data$age)
nacheck(data$length, data$sex)
nacheck(data$length, data$size)
You could also apply
this to your dataframe, by each factor to get NA
counts for all of the dimension measures for each factor. 您还
apply
按每个因子将此值应用于数据框,以获取每个因子的所有维度量度的NA
计数。
apply(data[,c("length","width","height")], 2, nacheck, factor=data$age)
apply(data[,c("length","width","height")], 2, nacheck, factor=data$sex)
apply(data[,c("length","width","height")], 2, nacheck, factor=data$size)
To do this all as one function, nest nacheck
in something and then lapply
: 为了将所有功能作为一个功能来完成,
nacheck
嵌套在其中,然后lapply
:
exploreNA <- function(df, factors){
nacheck <- function(var, factor)
aggregate(var, list(factor), function(x) sum(is.na(x)))
lapply(factors, function(x) apply(df, 2, nacheck, factor=x))
}
exploreNA(data[,c("length","width","height")], list(data$age, data$sex, data$size))
A data.table
approach: 数据
data.table
方法:
library(data.table)
DT <- data.table(data)
DT[, lapply(.SD, function(x) sum(is.na(x))) , by = list(age,sex,size)]
## age sex size length width height
## 1: juv m large 5 4 4
## 2: ad f small 3 4 4
and the plyr
equivalent using colwise
and ddply
以及使用
colwise
和ddply
的plyr
等效ddply
ddply(data, .(age,sex,size), colwise(.fun = function(x) sum(is.na(x))))
## age sex size length width height
## 1 ad f small 3 4 4
## 2 juv m large 5 4 4
You could always use a vector of column names for the by
components 您总是可以为
by
组件使用列名的向量
by.cols <- c('age', 'sex' ,'size')
# then the following will work....
DT[, lapply(.SD, function(x) sum(is.na(x))), by = by.cols]
ddply(data, by.cols, colwise(.fun = function(x) sum(is.na(x))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.