简体   繁体   中英

How to look through column names in R and perform operations then store it in a list of unknown row size

I am a new R programmer and am trying to create a loop through a large amount of columns to weigh data by a certain metric.

I have a large data set of variables (some factors, some numerics). I want to loop through my columns, determine which one is a factor, and then if it is a factor I would like to use some tapply functions to do some weighting and return a mean. I have established a function that can do this one at a time here:

weight.by.mean <- function(metric,by,x,funct=sum()){

if(is.factor(x)){
a <- tapply(metric, x, funct)
b <- tapply(by, x, funct)
return (a/b)
} 
}

I am passing in the metric that I want to weigh and the by argument is what 
I am weighting the metric BY. x is simply a factor variable that I would 
like to group by.

Example: I have 5 donut types (my argument x ) and I would like to see the mean dough (my argument metric ) used by donut type but I need to weigh the dough used by the amount (argument by ) of dough used for that donut type.

In other words, I am trying to avoid skewing my means by not weighting different donut types more than others (maybe I use a lot of normal dough for glazed donuts but dont use as much special dough for cream filled donuts. I hope this makes sense!

This is the function I am working on to loop through a large data set with many possible different factor variables such as "donut type" in my prior example. It is not yet functional because I am not sure what else to add. Thank you for any assistance you can provide for me. I have been using R for less than a month so please keep that in mind.

My end goal is to output a matrix or data frame of all these different means but each factor may have anywhere from 5 to 50 different levels so the row size is dependent on the number of levels of each factor.

weight.matrix <- function(df,metric,by,funct=sum()){


  n <- ncol(df) ##Number of columns to iterate through
  ColNames <- as.matrix(names(df))
  OutputMatrix <- matrix(1, ,3,nrow=, ncol=3)

 for(i in 1:n){


 if(is.factor(paste("df$",ColNames[i], sep=""))){
  a[[i]] <- tapply(metric, df[,i], funct)
  b[[i]] <- tapply(by, df[,i], funct)
}
OutputMatrix <- (a[[i]]/b[[i]])
}
}

If each of your factors has different levels, then it would make more sense to use a long data frame instead of a wide one. For example:

Metric      Value        Mean
DonutType   Glazed       3.0
DonutType   Chocolate    5.2
DonutSize   Small        1.2
DonutSize   Medium       2.3
DonutSize   Large        3.6

Data frames are not meant for vectors of different lengths. If you want to store your data in a data frame, you need to organize it so all the vector lengths are the same. gather() and spread() are functions from the tidyverse package you can use to convert between long and wide data frames.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM