简体   繁体   中英

mean calculation with data.frame in R

I am a beginner in R. I am trying to calculate the between group variance using the following code.

calcBetweenGroupsVariance <- function(variable,groupvariable)
{
 # find out how many values the group variable can take
 groupvariable2 <- as.factor(groupvariable[[1]])
 levels <- levels(groupvariable2)
 numlevels <- length(levels)
 # calculate the overall grand mean:
 grandmean <- mean(variable)
 # get the mean and standard deviation for each group:
 numtotal <- 0
 denomtotal <- 0
 for (i in 1:numlevels)
 {
    leveli <- levels[i]
    levelidata <- variable[groupvariable==leveli,]
    levelilength <- length(levelidata)
    # get the mean and standard deviation for group i:
    meani <- mean(levelidata)
    sdi <- sd(levelidata)
    numi <- levelilength * ((meani - grandmean)^2)
    denomi <- levelilength
    numtotal <- numtotal + numi
    denomtotal <- denomtotal + denomi
 }
 # calculate the between-groups variance
 Vb <- numtotal / (numlevels - 1)
 Vb <- Vb[[1]]
 return(Vb)
}

However, I am getting the following error while using this function,

calcBetweenGroupsVariance (data[3],data[2])

Warning message: In mean.default(variable) : argument is not numeric or logical: returning NA

I understand something is going wrong while using the mean function.

Here is the output of str(data)

'data.frame':   45 obs. of  11 variables:
 $ V1 : int  2 3 3 2 3 2 2 2 3 2 ...
 $ V2 : num  1.3243 -2.4546 0.1352 0.0676 -1.1901 ...
 $ V3 : num  0.913 -2.644 0.663 1.217 -0.409 ...  
 $ V4 : num  -1.863 1.965 -0.698 -0.945 0.617 ...
 $ V5 : num  -0.574 1.031 -0.308 -0.574 0.354 ...
 $ V6 : num  -0.8963 2.5702 0.0736 -1.3671 0.9045 ...
 $ V7 : num  0.2276 0.0624 0.5945 0.6194 0.5473 ...
 $ V8 : num  1.304 -1.624 0.408 0.368 -0.559 ...
 $ V9 : num  -0.1827 -0.9748 -0.5158 -0.0191 -0.3053 ...  
 $ V10: num  -0.964 0.67 -0.12 0.789 0.711 ...  
 $ V11: num  -0.833 -0.833 -0.833 -0.0539 -0.0539 ...

Kindly suggest how to get rid of this error.

Thanks and regards

There are mutliple errors in your script related to dimensions of the arrays and the difference bewteen a vector and a list

Let's assume the arguments variable, groupvariable of your function should be vectors / 1d-arrays.

  • The line groupvariable2 <- as.factor(groupvariable[[1]]) should be groupvariable2 <- as.factor(groupvariable) because groupvariable is not a list and youa re not just interested in the first element but in all.

  • The line levelidata <- variable[groupvariable==leveli,] should be levelidata <- variable[groupvariable==leveli] because variable has only one dimension (it is not a matrix)

  • The call to your function should be calcBetweenGroupsVariance(data[[3]], data[[2]]) (with double brackets [[]] )) or alternatively calcBetweenGroupsVariance(data[, 3],data[, 2]) or you will pass a list instead of a vector to the function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM