简体   繁体   English

忽略函数中的NA值

[英]Ignoring NA values in function

Im writing my own function to calculate the mean of a column in a data set and then applying it using apply() but it only returns the first columns mean. 我正在编写自己的函数来计算数据集中列的平均值,然后使用apply()应用它,但它只返回第一列的意思。 Below is my code 以下是我的代码

mymean <- function(cleaned_us){
  column_total = sum(cleaned_us)
  column_length = length(cleaned_us)
  return (column_total/column_length)
}

Average_2 <- apply(numeric_clean_usnews,2,mymean,na.rm=T)

We need to use the na.rm=TRUE in the sum and using it in apply is not going to work as mymean doesn't have that argument 我们需要在sum使用na.rm=TRUE并且在apply使用它不会起作用,因为mymean没有那个参数

mymean <- function(cleaned_us){
   column_total = sum(cleaned_us, na.rm = TRUE) #change
   column_length = sum(!is.na(cleaned_us)) #change
  return(column_total/column_length)
 }

Note that colMeans can be used for getting the mean for each column. 请注意, colMeans可用于获取每列的mean

In order to pass an na.rm parameter to the function you defined, you need to make it a parameter of the function. 为了将na.rm参数传递给您定义的函数,您需要将其作为函数的参数。 The sum() function has an na.rm param, but length() doesn't. sum()函数有一个na.rm参数,但是length()没有。 So to write the function you are trying to write, you could say: 所以要编写你想写的函数,你可以说:

# include `na.rm` as a param of the argument 
mymean <- function(cleaned_us, na.rm){

  # pass it to `sum()` 
  column_total = sum(cleaned_us, na.rm=na.rm)

  # if `na.rm` is set to `TRUE`, then don't count `NA`s 
  if (na.rm==TRUE){
    column_length = length(cleaned_us[!is.na(cleaned_us)])

  # but if it's `FALSE`, just use the full length
  } else {
    column_length = length(cleaned_us)
  }

  return (column_total/column_length)
}

Then your call should work: 然后你的电话应该工作:

Average_2 <- apply(numeric_clean_usnews, 2, mymean, na.rm=TRUE)

Use na.omit() 使用na.omit()

set.seed(1)
m <- matrix(sample(c(1:9, NA), 100, replace=TRUE), 10)

mymean <- function(cleaned_us, na.rm){
    if (na.rm) cleaned_us <- na.omit(cleaned_us)
    column_total = sum(cleaned_us)
    column_length = length(cleaned_us)
    column_total/column_length
}

apply(m, 2, mymean, na.rm=TRUE)

# [1] 5.000 5.444 4.111 5.700 6.500 4.600 5.000 6.222 4.700 6.200

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM